Thursday, November 1, 2007

Enhancing Components Search in a Reuse Environment Using Discovered Knowledge Techniques

In this post I will present the main topics related to my master thesis. This work is part of the B.A.R.T (Basic Asset Retrieval Tool) Project whose main goal is to develop a robust tool to search and retrieval software components. In this project we had experimented new trends related to search and retrieval such as active search, folksonomy and context. These efforts have presented initial findings which stimulate the research in this direction.
The main objective of my work is to optimize the component search and retrieval through its historic use. Often, this use is monitored by a log mechanism such as file or database. Thus, it is possible to use this information to extract knowledge in order to improve the search engines. The optimization is done using recommendations of downloads to the users. These recommendations come from the knowledge that was extracted from the mentioned logs and are represented as rules (A -> B). Thus, this work proposes a way to improve the search engines avoiding unnecessary searches.
The presentation was done for the I.N. - Advanced Seminars in Software Reuse course at Cin/UFPE and several questions were pointed, like:

Why use association rules?
There are several techniques to extract knowledge, but in this work the objective is suggest or preview downloads. For this, is not important the sequence or the results classification, but the main knowledge is the relation mapped by rules (A -> B)

How to conduct the experimentation?
It is the main problem, because the data need to be in a good quantity. For it, the current version of BART already storing this data and it allows the experimentation.

The next step is improve the prototype and begin the experimentation using real data to validate de knowledge extracted.


Ricardo Cavalcanti said...

It is a great work, surely.
IMO, you should hurry with your experimentation, then you could try different algorithms or tune the one you've already chosen. When you start tunning your data miner, you will be giving real contribution to reuse field. Before that, your contributions are still only for search/retrieve and data mining.

Eduardo Almeida said...

Ricardo approached an important issue, the algorithm used in the process. Your analyzes is based on some surveys which shown the results about the "best" algorithm to use. However, I believe that - I am not sure - we have to experiment different algo in the initial solution. In addition, I think that after testing the approach in an experiment, other ones could experiment different algorithms in the tool. What do you think? or for you, is the APRIORI recommended?