CISUC

OHDOCLUS – Online and Hierarchical Document Clustering

Authors

Abstract

Usually, clustering algorithms consider that document collections are static and are processed as a whole. However, in contexts where data is constantly being produced (e.g. the Web), systems that receive and process documents incrementally are becoming more and more important. We propose OHDOCLUS, an online and hierarchical algorithm for document clustering. OHDOCLUS creates a tree of clusters where documents are classified as soon as they are received. It is based on COBWEB and CLASSIT, two well-known data clustering algorithms that create hierarchies of probabilistic concepts and were seldom applied to text data. An experimental evaluation was conducted with categorized corpora, and the preliminary results confirm the validity of the proposed method.

Keywords

Clustering, document clustering, online clustering, hierarchical clustering, dimensionality reduction

Subject

Document Clustering

Conference

Proceedings of the Eighth European Starting AI Researcher Symposium (STAIRS 2016), September 2016

PDF File

DOI


Cited by

No citations found