Selecting Examples in Manifold Reduced Feature Space for Active Learning

Authors

Abstract

Nowadays machine learning are faced with an overload of data, both in terms of examples and features. Although recent algorithms, like support vector machines, can handle high dimensionality, it remains valuable to find smaller and more fitted spaces to perform learning tasks.

We propose a twofold approach to tackle these high dimensionality issues in a text classification setting. First we use manifold learning as a pre-processing step to nonlinearly reduce the feature space. Second we use support vector machines to implement an active learning strategy, where the kernel trick is used to define the active examples. This approach deals with the high dimensionality both reducing the features and the number of examples needed to reach a desired performance.

Results on a real-world benchmark corpus from Reuters and also on a reduced realistic version of the corpus show first the visualization capabilities of manifold learning and the performance improvement achieved with the active learning strategy.

Subject

Text classification; Manifold learning;

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Conference

IEEE ICMLA 2008, December 2008

Cited by

No citations found