International Journal of Computational Intelligence Research
  • Year: 2007
  • Volume: 3
  • Issue: 1

RVM ensemble for text classification

  • Author:
  • Catarina Silva1,2,, Bernardete Ribeiro2,
  • Total Page Count: 5
  • Page Number: 31 to 35

1School of Technology and Management of the Polytechnic Institute of Leiria Morro do Lena - Alto do Vieiro, Portugal, P-2411-901 Leiria, Portugal.

2Department of Informatics Engineering, Center for Informatics and Systems University of Coimbra, Polo II, P-3030-290 Coimbra, Portugal.

* E-mail: catarina@dei.uc.pt

** E-mail: bribeiro@dei.uc.pt

Abstract

Automated classification of texts by their likeness or affinity has greatly eased the management and processing of the massive volumes of information we face everyday. Although Support Vector Machines (SVM) provide a state-of-the art technique to tackle this problem, Relevance Vector Machines (RVM), which rely on Bayesian inference learning, offer advantages such as their capacity to find sparser and probabilistic solutions. A known problem with the Bayesian approaches, however, is their relative inability to scale to larger problems where millions of documents are involved as well as real-time user's requests.

We propose an ensemble strategy to circumvent RVMs scalability problem by applying a divide-and-conquer technique to handle the overload of available data, where the training documents are divided amongst small RVM classifiers, then the ensemble combines their individual contributions. The solution achieved keeps a sparse decision function and is computationally efficient. Results with respect to Reuters-21578 clearly demonstrate the proposed strategy can surpass other techniques, in both in terms classification performance and response time.

Keywords

Text classification, Relevance Vector Machines, Ensembles, Scaling Machine Learning Algorithms