CISUC

RVM Ensemble for Text Classification

Authors

Abstract

Automated classification of texts by their likeness or affinity has greatly eased the management and processing of the massive volumes of information we face everyday. Although Support Vector Machines (SVM) provide a state-of-the-art technique to tackle this problem, Relevance Vector Machines (RVM), which rely on Bayesian inference learning, offer advantages such as their capacity to find sparser and probabilistic solutions. A known problem with the Bayesian approaches, however, is their relative inability to scale to larger problems where millions of documents are involved as well as real-time user's requests.

We propose an ensemble strategy to circumvent RVMs scalability problem by applying a divide-and-conquer technique to handle the overload of available data, where the training documents are divided amongst small RVM classifiers, then the ensemble combines their individual contributions. The solution achieved keeps a sparse decision function and is computationally efficient. Results with respect to Reuters-21578 clearly demonstrate the proposed strategy can surpass other techniques, in both in terms classification performance and response time.

Subject

Relevance Vector Machines, Text Classifi

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Journal

International Journal of Computational Intelligence Research, Vol. 3, #1, pp. 31-35, January 2007

Cited by

Year 2011 : 2 citations

 Fokoué, E.a , Goel, P.b
An optimal experimental design perspective on radial basis function regression
(2011) Communications in Statistics - Theory and Methods, 40 (7), pp. 1184-1195.

 Fokoué, E.a , Sun, D.b , Goel, P.c
Fully Bayesian analysis of the relevance vector machine with an extended hierarchical prior structure
(2011) Statistical Methodology, 8 (1), pp. 83-96. Cited 1 time.