Monday, October 29, 2007

BTT - Towards a Bug Triage Tool

Today we had a seminar for I.N.1.0.3.8 - Advanced Seminars in Software Reuse course at Cin/UFPE about current works/researches on mining software repositories, especially bug and source repositories. For one that is not familiar with challenges raised by these kinds of repositories this text can be interesting.

About the presentation, among challenges related to bug and source repositories were presented works and current researches about Impact Analysis, Automated Bug Report Assignment, Automated Bug Report Duplication Detect, Understanding and Predicting Software Evolution, Team Expertise Determination, Bug Report Time to Fix and Effort Estimation, and Social networks.

For each of these challenges were presented how researches has been addressed the problem, e.g. what techniques has been used, and the achieved results for each approach. In addition, issues about techniques and test framework for some works were mentioned. However, despite the number of challenges, the presenter focuses on Duplication Detect problem, which has few works about and is more crucial in bug triage process and tools.

Among the presented works for Duplication Detect problem, that were only 3, we can cite the paper from Runeson which attacked the problem using NLP techniques, the MSc dissertation from Hiew using a cluster based approach with cosine similarity measurement, and the paper from Anvik that presented some resulted using statistical model and text similarity.

Some consideration also were made: among revised works, only one had been tested in an industrial scale (Runeson´s paper), the others are academic prototype only; more real test cases with a greater number of different software projects and bug databases are needed; the major part of the works are project process dependent, which blocks the approach generalization for other software projects; and the efficiency of the approach vary from 20% to 40% only, that is a very low range.

Furthermore, the presenter left some open questions about a tool development in the future: would be better to develop a bug triage tool from scratch, or to develop plug-ins for most used tools or adapt some existing tool to create a new one? What technique to use - NLP, TDT, Text Mining techniques or a mix of them?

Participants of the seminar also discussed about things like how to handle meta-bugs (bugs that describe others bugs) in the approach and what is the reuse motivation for the Duplication Detect problem. For reuse motivation we can argue for quality improvement, cost reduction and time saving, that, in general, is the idea of software reuse. And for meta-bugs, that are very common in open source development, probably we must threat this at pre-processing operations of bug reports.

Download the presentation PDF here

5 comments:

Anonymous said...

I see your proposal as very interesting and useful, given the data shown in today’s presentation about eclipse and firefox duplicates that rounded 20-30% the total amount, not to mention the amount of time and effort spent to discover and abort duplicates.
I'd say adapt the existing tools is the best choice, firstly to maintain the focus on the Duplication Detection problem/solution. Besides that, there are mainly 3 or 4 (bugzilla, trac, jira, mantisBT) well used bug tracking tools. Maybe one approach would be define a common set of requirements, which probably all of them would fulfill, and build a generic plugin, adaptable for any bug tracking tool.

Eduardo Almeida said...

Your problem really is very nice. It combines experimentation, tools, industrial problem and a road to develop useful solutions. Ricardo's comments is interesting and important. I think that this analyzes will be very important. Moreover, It can be available as a service working, for example, with the bart system indexing all the open repositories.

Vinicius Garcia said...

Yeah Yguaratã... you have a very nice problem to solve in your M.Sc. As we talk after your presentation, I think that you can use some of the proposals/solutions discovered, researched, prototyped and developed by the RiSE's "search team".
Some algorithms were studied by Fred in his M.Sc, talk to him about it.
And good luck!
Congratulation for your work.

Yguaratã C. Cavalcanti said...

Yeah. I´m begging to see BART as good option and opportunity too; it could aggregate value for both tools/solutions.
I had a little lunch talk with fred about some issues... we need to discuss more about it.

dimitri Malheiros said...

Besides duplicate detection, the tool also dealt with cases where there is no real bugs (badly understanding of business rule, wrong tests, etc.)?