Context-Based Retrieval in Software Development
Authors
Abstract
Although software development may include all the activities that result in asoftware product, from its conception to its realization, here we focus on the
process of writing and maintaining the source code. Software development
projects have been increasing in size and complexity, requiring developers to
cope with a large amount of contextual information during their work. With
workspaces frequently comprising hundreds, or even thousands, of artifacts, they
spend a considerable amount of time navigating the source code or searching for
a specific source code artifact they need to work with. With the aim of helping
developers understand the source code structure and find what they need, modern
IDE provide several features for searching and navigating the source code.
But, according to some studies, developers still spend a considerable amount of
time searching and navigating the source code structure.
With regard to search, the most commonly used approach is the matching of
specific patterns in the lines of code that comprise a software system,
requiring a direct correspondence between the pattern and the text in the source
code. The limitations of this approach have been surpassed by the research
carried out in the field of IR, encouraging researchers to use these
techniques to help developers finding relevant source code for their current
task. But, despite the fact that context is argued to improve the effectiveness
of IR systems, as far as we know, none of the previous approaches have used
the contextual information of the developer to improve the retrieval or ranking
of relevant source code in the IDE.
Another interesting form of delivering relevant source code artifacts to a
developer is using a recommender system. These systems have been used in a wide
variety of domains, to help users find relevant information, deal with
information overload and provide personalized recommendations of very different
kinds of items. In software development, researchers studied ways of using
contextual information to recommend source code artifacts that are potentially
relevant for the current task of the developer. But, these approaches usually
use a limited context model, or require contextual information to be explicitly
provided.
The research described in this thesis is focused on the development of a
context-based approach to search and recommendation of source code in the
IDE. The source code structure stored in the workspace of the developer is
represented in a knowledge base. A context model represents the source code
elements that are more relevant for the developer in a specific moment. These
structures are then used to improve the retrieval and ranking of source code
elements, such as classes, interfaces and methods, taking into account their
relevance to the current context of the developer. The relevance of the source
code elements retrieved is computed based on the structural and lexical
relations that exist between these elements and the elements in the context
model.
We have implemented a prototype that implements and integrates our approach in
the Eclipse IDE. This prototype was tested with a group of developers in
order to validate our approach. The statistical information collected shows that
the source code elements manipulated by the developer are highly related.
This supports our claim that the relations that exist between source code
artifacts can be used to measure the proximity between these artifacts, and to
compute their relevance in the current context of the developer. Also, we have
verified that the context components have a clear contribution to improve the
ranking of search results, with the search results selected by the developers
using our approach being better ranked in more than half of the times. With
respect to recommendations, although the results are not so evident, we have
shown that our context model can be used to retrieve relevant source code
elements for the developer, being able to predict the needed source code element
in more than half of the times.