CISUC

Towards Expanding Relevance Vector Machines to Large Scale Datasets

Authors

Abstract

In this paper we develop and analyze methods for expanding automated learning of Relevance Vector Machines (RVM) to large scale text sets. RVM rely on Bayesian inference learning and while maintaining state-of-the-art performance, offer sparse and probabilistic solutions. However, efforts towards applying RVM to large scale sets have met with limited success in the past, due to computational constraints.

We propose a diversified set of divide-and-conquer approaches where decomposition techniques promote the definition of smaller working sets that permit the use of all training examples. The rationale is that by exploring incremental, ensemble and boosting strategies, it is possible to improve classification performance, taking advantage of the large training set available. Results on Reuters-21578 and RCV1 are presented, showing performance gains and maintaining sparse solutions that can be deployed in distributed environments.

Keywords

Large scale learning; text classification; relevance vector machines

Subject

Relevance Vector Machines

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Journal

International Journal of Neural Systems, Vol. 18, #1, pp. 45-58, World Scientific Publishing Company, February 2008

Cited by

Year 2011 : 3 citations

 Fully Bayesian analysis of the relevance vector machine with an extended hierarchical prior structure
E Fokoué, D Sun… - Statistical Methodology, 2011 - Elsevier

 Acharya, U.R.a , Sree, S.V.b , Suri, J.S.c d
Automatic detection of epileptic eeg signals using higher order cumulant features
(2011) International Journal of Neural Systems, 21 (5), pp. 403-414.
/S0129065708001361, PII S0129065708001361

 Fokoué, E.a , Goel, P.b
An optimal experimental design perspective on radial basis function regression
(2011) Communications in Statistics - Theory and Methods, 40 (7), pp. 1184-1195.

Year 2010 : 3 citations

 Fokoue, Ernest; Goel, Prem,\"An optimal experimental design perspective on redial basis function regression\",
John D. Hromi Center for Quality and Applied Statistics (KGCOE), 2010

 Patel, P.B., Marwala, T. , Caller behaviour classification using computational intelligence methods, International Journal of Neural Systems 20 (1), pp. 87-93, 2010

 Yang, Y., Lu, B.-L. , Protein subcellular multi-localization prediction using a min-max modular support vector machine , International Journal of Neural Systems 20 (1), pp. 13-28 , 2010