CISUC

Bases de Dados de Texto: a Gestão de Documentos Armazenados numa Base de Dados Relacional

Authors

Abstract

This dissertation focuses on Text Databases, which has grown in interest and relevance in the last years. In fact, text databases are now particularly relevant in several contexts and application areas such as administration of organization documents, digital libraries, creation of automated dictionaries and encyclopedias, and in any problem dealing with the storage and retrieval of textual information. These databases manage efficiently the textual information through the use of the Relational Databases technologies in combination with the indexing and searching techniques of the Information Retrieval. However, the integration of the information retrieval techniques in the database systems has not been easy and several approaches have emerged for this problem.
This work presents a survey on text databases and discusses the fundamental theoretical concepts of databases and information retrieval and the main approaches for the integration of databases systems and information retrieval systems.
The ExpSRI system, specifically developed for the study done in this dissertation, allows the manipulation of textual information stored in a text database of large capacity. This database is managed by a DataBase Management System (DBMS) that also integrates information (textual) retrieval techniques. The experimental setup done around the ExpSRI system holds a large test collection (˜473000 documents) and has been used as a test
bench. As the aim of a retrieval information system is to find the textual information required by the system users, it is important to assess the effectiveness of these systems.
The effectiveness evaluation of the ExpSRI has been based on the experimental evaluation of measures such as precision, recall, precision versus recall and R-Precision. The analysis performed to the results obtained during ExpSRI evaluation permitted the identification of the main factors that can influence the effectiveness of information retrieval systems.

Keywords

Text Databases, Information Retrieval, Oracle ConText

Subject

Information Retrieval and Text Databases

MSc Thesis

Bases de Dados de Texto: a Gestão de Documentos Armazenados numa Base de Dados Relacional, November 2002

Cited by

No citations found