Selecting Examples in Manifold Reduced Feature Space for Active Learning
Authors
Abstract
Nowadays machine learning are faced with an overload of data, both in terms of examples and features. Although recent algorithms, like support vector machines, can handle high dimensionality, it remains valuable to find smaller and more fitted spaces to perform learning tasks.We propose a twofold approach to tackle these high dimensionality issues in a text classification setting. First we use manifold learning as a pre-processing step to nonlinearly reduce the feature space. Second we use support vector machines to implement an active learning strategy, where the kernel trick is used to define the active examples. This approach deals with the high dimensionality both reducing the features and the number of examples needed to reach a desired performance.
Results on a real-world benchmark corpus from Reuters and also on a reduced realistic version of the corpus show first the visualization capabilities of manifold learning and the performance improvement achieved with the active learning strategy.