CISUC

DEI/CISUC Seminars

Publication Date: 2020-03-05 10:40:05



March 05, Thursday,
13h (sharp),
Room A.5.4 DEI-FCTUC

Invited Speaker: Alexandre Rademaker

Title: Challenges for Information Extraction from Text in the Industry

Abstract :Increasingly, governments, corporations, and scientific organizations need to extract complex information from highly technical documents expressed in natural languages with a specialized lexicon, non-standard syntax, and domain-specific semantic interpretations. While linguistic resources exist in some specialized domains, they are mostly unavailable in technical fields such as legal or Oil & Gas. Furthermore, developing sufficient corpora for these domains can be expensive. In this presentation, I will describe our experiments in information extraction in technical domains. Our pipeline includes deep parsing, word sense disambiguation using expanded wordnet, and ontologies; it combines statistical and rule-based methods. We also have some words to share about the previous use of dependency parsers using Universal Dependencies (UD) and human annotation of entities and relations. We conclude with future works and possible ideas for improving the results.

Short-bio: > Alexandre is a Research Staff Member in the Natural Resources Solutions group at IBM Research and adjunct professor at the Applied Mathematics School of Getulio Vargas Foundation (EMAp/FGV). Alexandre has taught many graduate and undergraduate courses: logic, data structures, programming, discrete mathematics, type theory, formal languages, and automata theory. Alexandre holds a Ph.D. (2010) in Computer Science from Pontifical Catholic University of Rio de Janeiro (PUC-Rio). During his Ph.D.,
Alexandre was an international fellow at Microsoft Research and SRI International. At MSR, in 2008, he worked with the Z3 SMT Solver team (Leonardo de Moura and Nikolaj Bjørner) developing a distributed environment for testing and optimizations of Z3. At SRI International, in 2009, he worked under the supervision of Natarajan Shankar in different research projects including the preliminary formalization of
ALC deduction systems in PVS. Alexandre participated in several research projects like MIST (using natural language processing and description logics for Knowledge modeling), ANUBIS (database consistency check) and Ontology and Context (investigating ontology alignment). In his thesis, we proposed new deduction systems for description logics, published by Springer with the title A proof theory for Description Logics in 2012 in the Springer Briefs series. Alexandre is the author/co-author of more than 90 papers published in peer-reviewed journals and international conferences. His areas of expertise and interesting are logic, proof theory, knowledge representation and reasoning, type theory, lexical resources and computational linguistics. Alexandre participate as member of many program committee of conferences like ACL, COLING, LREC etc. He is also a board member of the Global WordNet Association and coordinated the CE-PLN between 2017-19.

See http://arademaker.github.io/about.html