ASAPP 2.0: Advancing the State-of-the-Art of Semantic Textual Similarity for Portuguese

Authors

Ana Cristina da Costa Oliveira Alves
Hugo Gonçalo Oliveira
Ricardo Rodrigues
Rui Alberto Cardoso da Encarnação

Abstract

Semantic Textual Similarity (STS) is a natural language processing task that aims at computing the similarity of meaning transmitted by two sentences.
For English, there is currently much research on this topic, especially since its inclusion in the SemEval evaluations. For other languages, however, it is not as common, mostly due to the unavailability of benchmarks. In 2016, the ASSIN shared task targeted STS in Portuguese and released training and test collections.

This paper describes the development of ASAPP, a system that participated in ASSIN, but has been improved since then and now achieves the best results in this task.
ASAPP learns a STS function from a broad range of lexical, syntactic, semantic and, now, also distributional features. Here, we describe the features used in the current version of ASAPP; the performance of some of them, when used alone; and then how they are exploited in a regression algorithm to achieve the best results for ASSIN to date, both in the European and Brazilian Portuguese variants.

Keywords

natural language processing, semantic textual similarity, semantic relation, supervised machine learning

Conference

7th Symposium on Languages, Applications and Technologies (SLATE'18), June 2018

DOI

Cited by

Year 2020 : 1 citations

Rodrigues, R. C., Rodrigues, J., de Castro, P. V. Q., da Silva, N. F. F., and da Silva Soares, A. (2020). Portuguese language models and word embeddings: Evaluating on semantic similarity tasks. In Computational Processing of the Portuguese Language - 14th International Conference, PROPOR 2020, Evora, Portugal, March 2-4, 2020, Proceedings, volume 12037 of LNCS, pages 239–248. Springer.

Year 2018 : 1 citations

Souza, M. and Sanches, L. M. P. (2018). Detecção de Paráfrases na Língua Portuguesa usando Sentence Embeddings. Linguamática, 10(2):31–44.