Saturday, May 3, 2008

Applying a Semantic Layer in a Source Code Retrieval Tool

Today, a master thesis of a RiSE member is published in this blog.

Here is the abstract of the work: The challenge for achieving a more efficient software engineering practice comprises a vast number of obstacles related to the intrinsic complexity of software systems and their surrounding contexts. Consequently, software systems tend to fail in meeting the real needs they were developed to address, consuming more resources, thus having a higher cost, and taking longer to complete than anticipated. The software reuse field is often regarded as the most promising discipline for closing these gaps, however, models and tools are still immature to make its adoption on a systematic fashion.

To promote the development of practices, models and tools are welcome activities to boost the reuse activity in most software development organizations. The lack of knowledge about reusable assets and the use of inappropriate tools are example of reasons for the low reuse activity. In this sense, this work presents a semantic layer applied to a source code search tool with the objective of bringing real relevant returns closer to user need, and, consequently to increase the chance of reuse. Two new components are proposed for the execution of the semantic activities and the resulting semantic search engine is evaluated with a realistic environment configuration analogous to projects from software organizations.

Outline: The main contribution of this work refers to the semantic layer applied to a keyword-based search engine in order to increase the precision of search returns. The proposed solution utilizes a domain ontology for enhancing the construction of the query with related terms and a machine learning technique for source code classification. The implemented proposal configures a viable and practical solution for being utilized in an industrial scenario in favor of source code reuse. Through the real experiment, it was evidenced the increase of precision during the code searches and this finding goes against the semantic conceptual gap between user needs and machine understanding.
The full text you can get here.

No comments: