Scaling Text Classification with Relevance Vector Machines

Authors

Abstract

Text classification (TC) is a complex ubiquitous task that handles a huge amount of data. Current research has recently proved that kernel learning based methods are quite effective in this problem. As opposed to Support Vector Machines (SVMs), the Relevance Vector Machine (RVM) in particular yields a probabilistic output while preserving its accuracy. However, few research efforts have addressed the issue of scalability that arises when applying RVMs to large scale problems like TC. We propose a new model which consists of a two-step RVM classifier able to (i) be competitive regarding processing time, (ii) use all available training elements and (iii) improve RVM classification performance. The paper also shows that a convenient similitude measure among documents can be defined on all the collection data, which does not only make the process swifter but also parallelizable.
Using REUTERS-21578, we show that deployment of successful real-time applications is possible through reduction of the computational complexity and improvement of overall performance, obtained by the proposed model.

Subject

RVM; Text Classification

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Conference

IEEE SMC06, October 2006

Cited by

Year 2010 : 1 citations

Automatic Chinese Text Classification Using N-Gram Model
SJ Yen, YS Lee, YC Wu, JC Ying… - Computational Science and its applications, 2010 - Springer

Year 2009 : 1 citations

Image Classification Using No-balance Binary Tree Relevance Vector Machine, K Wang… - 2009 International l Asia Symposium on Intelligent Interaction and Affective Computing, 2009 - computer.org