Towards the automatic enrichment of a thesaurus with information in dictionaries

Authors

Abstract

Regarding that information in broad-coverage knowledge bases, such as thesauri, is usually incomplete, merging information from different sources is an alternative to amplify coverage. We propose a method for the enrichment of a thesaurus with information acquired automatically from dictionaries. First, synonymy pairs are extracted. Then, these pairs are assigned to the most similar candidate synsets. Finally, the remaining pairs are the target of clustering to identify new synsets. After selecting the adequate experimentation settings, this method was applied to enrich a Portuguese thesaurus with synonyms extracted from three dictionaries, which resulted in TRIP, a larger and broader thesaurus with new words and concepts. The steps towards the creation of this new thesaurus and its evaluation are described here.

Keywords

thesaurus, enrichment, synonymy, words, ontologies

Subject

Natural Language Processing

Related Project

Onto.PT

Journal

Expert Systems, Vol. 30, #4, pp. 320-332, Jon G. Hall, May 2013

DOI

Cited by

Year 2017 : 1 citations

Hetsevich, Y. and Reentovich, I. (2016). Linguistic analysis for the be- larusian corpus with the application of natural language processing and machine learning techniques. (Informatics), 56(4):64–69.