Margin-based Active Learning and Background Knowledge in Text Mining
Authors
Abstract
Text mining, also known as intelligent text analysis, text data mining or knowledge-discovery in text, refers generally to the process of extracting interesting and non-trivial information and knowledge from text.One of the main problems with text mining and classification systems is the lack of labeled data, as well as the cost of labeling unlabeled data. Thus, there is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text classification. The ready availability of this kind of data in most applications makes it an appealing source of information.
In this work we evaluate the benefits of introducing unlabeled data in a support vector machine automatic text classifier. We further evaluate the possibility of learning actively and propose a method for choosing the samples to be learned.
Keywords
Text Mining, Support Vector MachinesSubject
Text Mining, Support Vector MachinesRelated Project
CATCH - Inductive Inference for Large Scale Data Bases Text CATegorizationConference
HIS 2004, December 2004Cited by
Year 2010 : 1 citations
Zhang Xiang Zhou Ming-quan GENG Guo-hua, "Bagging the improvement in Chinese text categorization method", "Mini-Micro Systems," No. 2, 2010.
Year 2009 : 2 citations
Active Learning Methods for Remote Sensing Image Classification, D Tuia, F Ratle, F Pacifici, MF Kanevski, WJ ? - IEEE Transactions on Geoscience and Remot, 2009, vol. 47 (2), no7, pp. 2218-2232, 2009
Clasificación de grandes conjuntos de datos vía
Máquinas de Vectores Soporte y aplicaciones en
sistemas biológicos, Ph D Thesis, Jair Cervantes Canales, Computer Science, Mexico.
Year 2006 : 1 citations
Li Rongyan, Jin Xin¡¡ Wang Chunhui, Zheng Ning, Bie Rongfang, "A New Algorithm of Chinese Text Classification", JOURNAL OF BEIJING NORMAL UNIVERSITY (NATURAL SCIENCE), 2006 Vol.42 No.5 P.501-505.