Labeled and Unlabeled Data in Text Categorization

Authors

Abstract

There is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text categorization. The ready availability of this kind of data in most applications makes it an appealing source of information. This work reports a study carried out on the Reuters-21578 corpus to evaluate the performance of Support Vector Machines when unlabeled examples are introduced in the learning process. The improvement achieved, especially in false negative values and therefore in recall values, demonstrates that the use of unlabeled examples can be very important for small data sets.

Keywords

Text mining; Support Vector Machines

Subject

Text mining; Support Vector Machines

Related Project

CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization

Conference

IJCNN 2004, July 2004

Cited by

Year 2011 : 1 citations

Show-Jane Yen, , Yue-Shi Lee, , , Jia-Ching Ying, , Yu-Chieh Wu, "A
logistic regression-based smoothing method for Chinese text
categorization",Expert Systems with Applications, Volume 38, Issue 9,
pp. 11581–11590, September 2011,

Year 2007 : 1 citations

Pan-Jun Kim and Jae-Yun Lee,
Utilizing Unlabeled Documents in Automatic Classification with Inter-document Similarities, Journal of the Korean society for information management, pp. 251~271 (21 pages), 2007