![CISUC CISUC](https://old.cisuc.uc.pt/images/cisuc2.png)
Bases de Dados de Texto: a Gestão de Documentos Armazenados numa Base de Dados Relacional
Authors
Abstract
This dissertation focuses on Text Databases, which has grown in interest and relevance in the last years. In fact, text databases are now particularly relevant in several contexts and application areas such as administration of organization documents, digital libraries, creation of automated dictionaries and encyclopedias, and in any problem dealing with the storage and retrieval of textual information. These databases manage efficiently the textual information through the use of the Relational Databases technologies in combination with the indexing and searching techniques of the Information Retrieval. However, the integration of the information retrieval techniques in the database systems has not been easy and several approaches have emerged for this problem.This work presents a survey on text databases and discusses the fundamental theoretical concepts of databases and information retrieval and the main approaches for the integration of databases systems and information retrieval systems.
The ExpSRI system, specifically developed for the study done in this dissertation, allows the manipulation of textual information stored in a text database of large capacity. This database is managed by a DataBase Management System (DBMS) that also integrates information (textual) retrieval techniques. The experimental setup done around the ExpSRI system holds a large test collection (˜473000 documents) and has been used as a test
bench. As the aim of a retrieval information system is to find the textual information required by the system users, it is important to assess the effectiveness of these systems.
The effectiveness evaluation of the ExpSRI has been based on the experimental evaluation of measures such as precision, recall, precision versus recall and R-Precision. The analysis performed to the results obtained during ExpSRI evaluation permitted the identification of the main factors that can influence the effectiveness of information retrieval systems.