Margin-based Active Learning and Background Knowledge in Text Mining

Authors

Abstract

Text mining, also known as intelligent text analysis, text data mining or knowledge-discovery in text, refers generally to the process of extracting interesting and non-trivial information and knowledge from text.

One of the main problems with text mining and classification systems is the lack of labeled data, as well as the cost of labeling unlabeled data. Thus, there is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text classification. The ready availability of this kind of data in most applications makes it an appealing source of information.

In this work we evaluate the benefits of introducing unlabeled data in a support vector machine automatic text classifier. We further evaluate the possibility of learning actively and propose a method for choosing the samples to be learned.

Keywords

Text Mining, Support Vector Machines

Subject

Text Mining, Support Vector Machines

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Conference

HIS 2004, December 2004

Cited by

Year 2010 : 1 citations

Zhang Xiang Zhou Ming-quan GENG Guo-hua, "Bagging the improvement in Chinese text categorization method", "Mini-Micro Systems," No. 2, 2010.

Year 2009 : 2 citations

Active Learning Methods for Remote Sensing Image Classification, D Tuia, F Ratle, F Pacifici, MF Kanevski, WJ ? - IEEE Transactions on Geoscience and Remot, 2009, vol. 47 (2), no7, pp. 2218-2232, 2009

Clasificación de grandes conjuntos de datos vía
Máquinas de Vectores Soporte y aplicaciones en
sistemas biológicos, Ph D Thesis, Jair Cervantes Canales, Computer Science, Mexico.

Year 2006 : 1 citations

Li Rongyan, Jin Xin¡¡ Wang Chunhui, Zheng Ning, Bie Rongfang, "A New Algorithm of Chinese Text Classification", JOURNAL OF BEIJING NORMAL UNIVERSITY (NATURAL SCIENCE), 2006 Vol.42 No.5 P.501-505.