Scaling Text Classification with Relevance Vector Machines
Authors
Abstract
Text classification (TC) is a complex ubiquitous task that handles a huge amount of data. Current research has recently proved that kernel learning based methods are quite effective in this problem. As opposed to Support Vector Machines (SVMs), the Relevance Vector Machine (RVM) in particular yields a probabilistic output while preserving its accuracy. However, few research efforts have addressed the issue of scalability that arises when applying RVMs to large scale problems like TC. We propose a new model which consists of a two-step RVM classifier able to (i) be competitive regarding processing time, (ii) use all available training elements and (iii) improve RVM classification performance. The paper also shows that a convenient similitude measure among documents can be defined on all the collection data, which does not only make the process swifter but also parallelizable.Using REUTERS-21578, we show that deployment of successful real-time applications is possible through reduction of the computational complexity and improvement of overall performance, obtained by the proposed model.
Subject
RVM; Text ClassificationRelated Project
CATCH - Inductive Inference for Large Scale Data Bases Text CATegorizationConference
IEEE SMC06, October 2006Cited by
Year 2010 : 1 citations
Automatic Chinese Text Classification Using N-Gram Model
SJ Yen, YS Lee, YC Wu, JC Ying… - Computational Science and its applications, 2010 - Springer
Year 2009 : 1 citations
Image Classification Using No-balance Binary Tree Relevance Vector Machine, K Wang… - 2009 International l Asia Symposium on Intelligent Interaction and Affective Computing, 2009 - computer.org