Combining Active Learning and Relevance Vector Machines for Text Classification
Authors
Abstract
Relevance Vector Machines (RVM) have proven successfulin many learning tasks. However, in large applications,
they scale poorly. In many settings there is a large
amount of unlabeled data which could be actively chosen
by a learner and integrated in the learning procedure. The
idea is to improve performance meanwhile reducing costs
from data categorization.
In this paper we propose an Active Learning RVM
method based on the kernel trick. The underpinning idea
is to define a working space between the Relevance Vectors
(RV) initially obtained in a small labeled data set and
the new unlabeled examples, where the most informative instances
are chosen. By using kernel distance metrics, such
a space can be defined and more informative examples can
be added to the training set, increasing performance even
though the problem dimension is not significantly affected.
We detail the proposed method giving illustrative examples
in the Reuters-21578 benchmark. Results show performance
improvement and scalability.
Subject
Active learning;Text classification;RVMRelated Project
CATCH - Inductive Inference for Large Scale Data Bases Text CATegorizationConference
IEEE ICMLA 2007, December 2007Cited by
Year 2011 : 2 citations
Fully Bayesian analysis of the relevance vector machine with an extended hierarchical prior structure, E Fokoué, D Sun… - Statistical Methodology, 2011 - Elsevier
Fokoué, E.a , Goel, P.b
An optimal experimental design perspective on radial basis function regression
(2011) Communications in Statistics - Theory and Methods, 40 (7), pp. 1184-1195.
Year 2010 : 1 citations
An optimal experimental design perspective on redial basis function regression, E Fokoue, John D. Hromi Center for Quality and Applied Statistics (KGCOE), 2010 - ritdml.rit.edu