RVM ensemble for text classification

Catarina Silva; Bernardete Ribeiro

Year: 2007
Volume: 3
Issue: 1

RVM ensemble for text classification

Author:
Catarina Silva^1,2,, Bernardete Ribeiro^2,
Total Page Count: 5
Page Number: 31 to 35

¹School of Technology and Management of the Polytechnic Institute of Leiria Morro do Lena - Alto do Vieiro, Portugal, P-2411-901 Leiria, Portugal.

²Department of Informatics Engineering, Center for Informatics and Systems University of Coimbra, Polo II, P-3030-290 Coimbra, Portugal.

* E-mail: catarina@dei.uc.pt

** E-mail: bribeiro@dei.uc.pt

Abstract

Automated classification of texts by their likeness or affinity has greatly eased the management and processing of the massive volumes of information we face everyday. Although Support Vector Machines (SVM) provide a state-of-the art technique to tackle this problem, Relevance Vector Machines (RVM), which rely on Bayesian inference learning, offer advantages such as their capacity to find sparser and probabilistic solutions. A known problem with the Bayesian approaches, however, is their relative inability to scale to larger problems where millions of documents are involved as well as real-time user's requests.

We propose an ensemble strategy to circumvent RVMs scalability problem by applying a divide-and-conquer technique to handle the overload of available data, where the training documents are divided amongst small RVM classifiers, then the ensemble combines their individual contributions. The solution achieved keeps a sparse decision function and is computationally efficient. Results with respect to Reuters-21578 clearly demonstrate the proposed strategy can surpass other techniques, in both in terms classification performance and response time.

Keywords

Text classification, Relevance Vector Machines, Ensembles, Scaling Machine Learning Algorithms

RVM ensemble for text classification

Abstract

Keywords

Products

Company

Account

Support