A wordnet is an important tool for developing natural language processing applications for a language. However, most wordnets are handcrafted by experts, which limits their growth. In this article, we propose an automatic approach to create wordnets by exploiting textual resources, dubbed ECO. After extracting semantic relation instances, identified by discriminating textual patterns, ECO discovers synonymy clusters, used as synsets, and attaches the remaining relations to suitable synsets. Besides introducing each step of ECO, we report on how it was implemented to create Onto.PT, a public lexical ontology for Portuguese. Onto.PT is the result of the automatic exploitation of Portuguese dictionaries and thesauri, and it aims to minimise the main limitations of existing Portuguese lexical knowledge bases.
Keywords
wordnet, automatic, information extraction, portuguese
Subject
Natural Language Processing
Related Project
Onto.PT
Journal
Language Resources and Evaluation Journal, Vol. 48, #2, pp. 373-393, June 2014
DOI
Cited by
Year 2020 : 4 citations
Fonseca, E. and Alvarenga, J. P. R. (2020). Wide and deep transformers applied to semantic relatedness and textual entailment. In Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese, volume 2583 of CEUR Workshop Proceedings. CEUR-WS.org.
Branco, A., Grilo, S., Bolrinha, M., Saedi, C., Branco, R., Silva, J., Querido, A., de Carvalho, R., Gaudio, R., Avelãs, M., and Pinto, C. (2020). The mwn.pt wordnet for portuguese: Projection, validation, cross-lingual alignment and distribution. In Proceedings of The 12th Language Resources and Evaluation Conference, pages 4861–4868, Marseille, France. ELRA.
de Sousa, A. G., Ribeiro., D. D. S., de Sousa., R. C. C., Rodrigues., A. M. B., Furtado., P. H. T., Barbosa., S. D. J., and Lopes., H. (2020). Using a domain ontology to bridge the gap between user intention and expression in natural language queries. In Proceedings of the 22nd International Conference on Enterprise Information Systems - Volume 1: ICEIS,, pages 751–758. INSTICC, SciTePress.
Salgado, A., Ahmadi, S., Simões, A., McCrae, J. P., and Costa, R. (2020). Challenges of word sense alignment: Portuguese language resources. In the 7th Workshop on Linked Data in Linguistics: Building tools and infrastructure at the 12th International Conference on Language Resources and Evaluation (LREC). National University of Ireland Galway.
Year 2019 : 5 citations
Ercan, G. and Haziyev, F. (2019). Synset expansion on translation graph for automatic wordnet construction. Information Processing & Management, 56(1):130–150.
Ustalov, D., Panchenko, A., Biemann, C., and Ponzetto, S. P. (online since June 2019). Watset: Local-Global graph clustering with applications in sense and frame induction. Computational Linguistics.
Shen, J., Lyu, R., Ren, X., Vanni, M., Sadler, B., and Han, J. (2019). Mining entity synonyms with efficient neural set generation. In Proceedings of 33rd AAAI Conference on Artificial Intelligence. AAAI Press.
da Costa, J. A. F. (2019). AutoSpeech: Automatic speech analysis of verbal fluency for older adults. Master’s thesis, Universidade do Porto.
Rai, S., Jain, A., and Pandey, P. (2019). Inclusion of Wikipedia, a language specific knowledge resource to generate and update a synset in wordnet. International Journal of Technology, Policy and Management, 19(4).
Year 2018 : 7 citations
Ustalov, D., Panchenko, A., Biemann, C., and Ponzetto, S. P. (2018). Local-Global Graph Clustering with Applications in Sense and Frame Induction. ArXiv e-prints.
de Paiva, V., Rademaker, A., Real, L., Chalub, F., and de Melo, G. (2018). Openwordnet-pt: Taking stock. In Proceedings of the 5th Natural Language in Computer Science (NLCS), Oxford, UK.
Lima, T., Collovini, S., Leal, A. L., Fonseca, E., Han, X., Huang, S., and Vieira, R. (2018). Analysing semantic resources for coreference resolution. In Computational Processing of the Portuguese Language - 13th International Conference, PROPOR 2018, Canela, Brazil, September 24-26, 2018, Proceedings, volume 11122 of LNCS, pages 284–293. Springer.
Alexeyevsky, D. (2018). Word sense disambiguation features for taxonomy extraction. Computación y Systemas, 22(3).
Antonio, N., de Almeida, A., Nunes, L., Batista, F., and Ribeiro, R. (2018). Hotel online reviews: different languages, different opinions. Information Technology & Tourism, online since March 2018.
Simões, A. and Guinovart, X. G. (2018). Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation. In 7th Symposium on Languages, Applications and Technologies (SLATE 2018), volume 62 of OASIcs, pages 14:1–14:13, Dagstuhl, Germany. Schloss Dagstuhl.
Fonseca, E. B. (2018). Resolução de Correferência Nominal Usando Semântica em Língua Portuguesa. PhD thesis, Pontfícia Universidade Católica do Rio Grande do Sul.
Year 2017 : 6 citations
Ustalov, D. (2017). Concept discovery from synonymy graphs. Journal of Computational Technologies, 22:99–112.
Ustalov, D. and Sozykin, A. (2017). A software system for automatic construction of a semantic word network. Computational Mathematics and Informatics, 6(2).
Chernobay, Y. (2017). Building wordnet for Russian language from RU.Wiktionary. In Proceedings of Conference on Artificial Intelligence and Natural Language (AINL 2017), volume 789 of CCIS, pages 113–120. Springer, Cham.
Rodrigues, A. X. C. (2017). Validação de termos de domínio por meio de uma base lexical-semântica difusa. TradTerm – Revista do Centro Interdepartamental de Tradução e Terminologia, 30:71–86.
de Almeida, W. R. (2017). PortService-BR: Uma plataforma para processamento de linguagem natural para língua portuguesa. Master’s thesis, Universidade Federal do Esprito Santo, Vitória.
Ustalov, D., Chernoskutov, M., Biemann, C., and Panchenko, A. (2017). Fighting with the sparsity of synonymy dictionaries for automatic synset induction. In Proceedings of the 6th Conference on Analysis of Images, Social Networks, and Texts (AIST’2017). Springer.
Year 2016 : 7 citations
Ustalov, D. A. (2016). Joining dictionaries and word embeddings for ontology induction. In Proceedings of the Institute for System Programming (ISP RAS), volume 28, pages 197–206.
Fonseca, E., Vieira, R., and Vanin, A. (2016). CORP: Coreference resolution for Portuguese. In Demo session of PROPOR 2016 – 12th International Conference on the Processing of the Portuguese Language.
Simões, A., Guinovart, X. G., and Almeida, J. J. (2016). Enriching a portuguese wordnet using synonyms from a monolingual dictionary. In Proceedings of 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. ELRA.
Fonseca, E., Vieira, R., and Vanin, A. (2016). Adapting an entity centric model for portuguese coreference resolution. In Proceedings of 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia. ELRA.
Fonseca, E., Vieira, R., and Vanin, A. (2016). Improving coreference resolution with semantic knowledge. In Proceedings of 12th International Conference on Computational Processing of the Portuguese Language (PROPOR 2016), volume 9727 of LNAI, pages 213–224, Tomar, Portugal. Springer.
Reis, S. and Baptista, J. (2016). Let's play wit proverbs? nlp tools and resources for icall applications around proverbs for pf. In Proceedings of the International Congress on Interdisciplinarity in Social and Human Sciences, pages 435–454. Research Centre for Spatial and Organizational Dynamics, University of Algarve.
Vieira, R., do Amaral, D., Collovini, S., Fonseca, E., Freitas, A., Freitas, L., Granada, R., Hilgert, L., Lopes, L., Schmidt, D., Severo, B., Souza, M., and Trojahn, C. (2016). Language resources for information extraction and semantic computing - NLP at PUCRS. In Proceedings Corpora and Tools for Processing Corpora at PROPOR 2016, pages 17–25.
Year 2015 : 8 citations
Wilkens, R., Zilio, L., Ferreira, E., Gonçalves, G., and Villavicencio, A. (2015). Tesauros distribucionais para o português: avaliação de metodologias. In Proceedings of Symposium in Information and Human Language Technology, STIL 2015, pages 131–140, Natal, RN, Brazil.
Mendonça, V., Coheur, L., and Sardinha, A. (2015). Vithea-kids: A platform for improving language skills of children with autism spectrum disorder. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility, ASSETS ’15, pages 345–346, New York, NY, USA. ACM Press.
Gonçalves, G. C. (2015). Construção e avaliação de modelos semânticos distribucionais. BSc’s thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brasil.
Simões, A. and Almeida, J. J. (2015). Experiments on enlarging a lexical ontology. In Languages, Applications and Technologies – Revised Selected Papers of 4th International Symposium SLATE 2015, Madrid, Spain, June, CCIS, pages 49–56. Springer.
Evandro B. Fonseca, Renata Vieira, Aline A. Vanin. Dealing With Imbalanced Datasets For Coreference Resolution. Proceedings of the 28th International Florida Artificial Intelligence Research Society Conference (FLAIRS), pp. 169-174. AAAI, 2015.
Lagutina, N., Paramonov, I., Vorontsova, I., and Kasatkina, N. (2015). An approach to automated thesaurus construction using clusterization-based dictionary analysis. In Proceedings of the 17th Conference of FRUCT Association, pages 104–109, Yaroslav, Russia. ITMO University.
Mendonça, V. (2015). Project: Extending VITHEA in order to improve children’s linguistic skills. Master’s thesis, Instituto Superior Técnico.
Amita Jain, Devendra K. Tayal, Sunny Rai. Shrinking digital gap through automatic generation of WordNet for Indian languages. AI & SOCIETY 30(2):215–222. Springer, May 2015.
Year 2014 : 2 citations
Brett Drury, Paula C.F. Cardoso, Janie M. Thomas, Alneu de Andrade Lopes. Lexical Resources for the Identification of Causative Relations in Portuguese Texts. Proceedings of the 1st Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish (ToRPorEsp), pp 56-63. São Carlos, SP, Brasil. 2014
Xavier Gómez Guinovar and Miguel Anxo Solla Portela (2014). O dicionario de sinónimos como recurso para a expansión de wordnet. Linguamática, 6(2):69–74