Combining Active Learning and Relevance Vector Machines for Text Classification

Authors

Abstract

Relevance Vector Machines (RVM) have proven successful
in many learning tasks. However, in large applications,
they scale poorly. In many settings there is a large
amount of unlabeled data which could be actively chosen
by a learner and integrated in the learning procedure. The
idea is to improve performance meanwhile reducing costs
from data categorization.
In this paper we propose an Active Learning RVM
method based on the kernel trick. The underpinning idea
is to define a working space between the Relevance Vectors
(RV) initially obtained in a small labeled data set and
the new unlabeled examples, where the most informative instances
are chosen. By using kernel distance metrics, such
a space can be defined and more informative examples can
be added to the training set, increasing performance even
though the problem dimension is not significantly affected.
We detail the proposed method giving illustrative examples
in the Reuters-21578 benchmark. Results show performance
improvement and scalability.

Subject

Active learning;Text classification;RVM

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Conference

IEEE ICMLA 2007, December 2007

Cited by

Year 2011 : 2 citations

Fully Bayesian analysis of the relevance vector machine with an extended hierarchical prior structure, E Fokoué, D Sun… - Statistical Methodology, 2011 - Elsevier

Fokoué, E.a , Goel, P.b
An optimal experimental design perspective on radial basis function regression
(2011) Communications in Statistics - Theory and Methods, 40 (7), pp. 1184-1195.

Year 2010 : 1 citations

An optimal experimental design perspective on redial basis function regression, E Fokoue, John D. Hromi Center for Quality and Applied Statistics (KGCOE), 2010 - ritdml.rit.edu