Given a data set and a learning task such as classification, there are two prime motives for executing some kind of data set reduction. On one hand there is the possible algorithm performance improvement. On the other hand the decrease in the overall size of the data set can bring advantages in storage space used and time spent computing. Our purpose is to determine the importance of several basic reduction techniques on Support Vector Machines, by comparing their relative performance improvement when applied on the standard REUTERS-21578 benchmark.
Related Project
CATCH - Inductive Inference for Large Scale Data Bases Text CATegorization
Conference
IJCNN 2003, July 2003
Cited by
Year 2011 : 4 citations
Ayral, H., Yavuz, S.
An automated domain specific stop word generation method for natural language text classification
(2011) INISTA 2011 - 2011 International Symposium on INnovations in Intelligent SysTems and Applications, art. no. 5946149, pp. 500-503.
Yao, Z., Ze-Wen, C.
Research on the construction and filter method of stop-word list in text preprocessing
(2011) Proceedings - 4th International Conference on Intelligent Computation Technology and Automation, ICICTA 2011, 1, art. no. 5750595, pp. 217-221.
Abdelmoneim, Dareen, "Semantic deontic modeling and text
classification for supporting automated environmental compliance
checking in construction" , MsC Thesis, University of Illinois at Urbana-Champaign, USA, 2011
Agrawal, N., "Auto complete using graph mining: A different approach",
Proceedings of IEEE Southeastcon Conference, 2011 , pp. 268 - 271,
17-20 March, 2011, doi:10.1109/SECON.2011.5752947
Year 2010 : 3 citations
Sentiment text classification of customers reviews on the Web based on SVM, Huosong Xia; Min Tao; Yi Wang, Sixth International Conference on Natural Computation (ICNC), 2010, pp. 3633 - 3637.
Characteristic pattern discovery in videos, Mihir Jain , C. V. Jawahar, ICVGIP '10 Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, 2010
Ditch the Smileys: Customizing a Stopword List for Email-based Data, Dinesh Rathi, Michael B. Twidale, Canadian Association for Infrmation Science Conference, 2010
Year 2009 : 2 citations
Evaluation of stop word list in Mongolian , Bao, Y., Yang, G., Jin, W., Journal of Information and Computational Science 6 (3), pp. 1139-1145, 2009
Prevention of Spyware by Runtime Classification of End User License Agreements, Muhammad Usman Rashid, MSc. Thesis, Blekinge Institute of Technology , Sweden, 2009.
Year 2008 : 1 citations
Automatic classifications of malay proverbs using Naïve Bayesian Algorithm Noah, S.A., Ismail, F. Information Technology Journal 7 (7), pp. 1016-1022, 2008
Year 2007 : 2 citations
CINDI robot: An intelligent web crawler based on multi-level inspection, Chen, R., Desai, B.C., Zhou, C, Proceedings of the International Database Engineering and Applications Symposium, IDEAS, art. no. 4318093, pp. 93-101, 2007.
DIMENSIONALITY REDUCTION TECHNIQUES FOR ENHANCING AUTOMATIC TEXT CATEGORIZATION, Dina Adel Said, MSc. Thesis, FACULTY OF ENGINEERING, CAIRO UNIVERSITY, 2007.
Year 2005 : 1 citations
Automatic selection of Chinese stoplist
Show Abstract Gu, Y.-J., Fan, X.-Z., Wang, J.-H., Wang, T., Huang, W.-J. 2005 Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology 25 (4), pp. 337-340