OHDOCLUS – Online and Hierarchical Document Clustering
Authors
Abstract
Usually, clustering algorithms consider that document collections are static and are processed as a whole. However, in contexts where data is constantly being produced (e.g. the Web), systems that receive and process documents incrementally are becoming more and more important. We propose OHDOCLUS, an online and hierarchical algorithm for document clustering. OHDOCLUS creates a tree of clusters where documents are classified as soon as they are received. It is based on COBWEB and CLASSIT, two well-known data clustering algorithms that create hierarchies of probabilistic concepts and were seldom applied to text data. An experimental evaluation was conducted with categorized corpora, and the preliminary results confirm the validity of the proposed method.
Keywords
Clustering, document clustering, online clustering, hierarchical clustering, dimensionality reduction
Subject
Document Clustering
Conference
Proceedings of the Eighth European Starting AI Researcher Symposium (STAIRS 2016), September 2016
PDF File
DOI
Cited by
No citations found