CISUC

RVM Ensemble for Text Classification

Authors

Abstract

Automated classification of texts by their likeness or affinity has greatly eased the management and processing of the massive volumes of information we face everyday. Although Support Vector Machines (SVM) provide a state-of-the-art technique to tackle this problem, Relevance Vector Machines (RVM), which rely on Bayesian inference learning, offer advantages such as their capacity to find sparser and probabilistic solutions. A known problem with the Bayesian approaches, however, is their relative inability to scale to larger problems where millions of documents are involved as well as real-time user's requests.

We propose an ensemble strategy to circumvent RVMs scalability problem by applying a divide-and-conquer technique to handle the overload of available data, where the training documents are divided amongst small RVM classifiers, then the ensemble combines their individual contributions. The solution achieved keeps a sparse decision function and is computationally efficient. Results with respect to Reuters-21578 clearly demonstrate the proposed strategy can surpass other techniques, in both in terms classification performance and response time.

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Conference

ICONIP 2006 - 13th International Conference on Neural Information Processing (Poster presentation), October 2006


Cited by

Year 2010 : 1 citations

 Fokoue, Ernest; Goel, Prem,"An optimal experimental design perspective on redial basis function regression",
John D. Hromi Center for Quality and Applied Statistics (KGCOE), 2010

Year 2008 : 1 citations

 Bo Yu and Zong-ben Xu, A comparative study for content-based dynamic spam classification using four machine learning algorithms, Knowledge-Based Systems
Volume 21, Issue 4, May 2008, Pages 355-362