About the presentation, among challenges related to bug and source repositories were presented works and current researches about Impact Analysis, Automated Bug Report Assignment, Automated Bug Report Duplication Detect, Understanding and Predicting Software Evolution, Team Expertise Determination, Bug Report Time to Fix and Effort Estimation, and Social networks.
For each of these challenges were presented how researches has been addressed the problem, e.g. what techniques has been used, and the achieved results for each approach. In addition, issues about techniques and test framework for some works were mentioned. However, despite the number of challenges, the presenter focuses on Duplication Detect problem, which has few works about and is more crucial in bug triage process and tools.
Among the presented works for Duplication Detect problem, that were only 3, we can cite the paper from Runeson which attacked the problem using NLP techniques, the MSc dissertation from Hiew using a cluster based approach with cosine similarity measurement, and the paper from Anvik that presented some resulted using statistical model and text similarity.
Some consideration also were made: among revised works, only one had been tested in an industrial scale (Runeson´s paper), the others are academic prototype only; more real test cases with a greater number of different software projects and bug databases are needed; the major part of the works are project process dependent, which blocks the approach generalization for other software projects; and the efficiency of the approach vary from 20% to 40% only, that is a very low range.
Furthermore, the presenter left some open questions about a tool development in the future: would be better to develop a bug triage tool from scratch, or to develop plug-ins for most used tools or adapt some existing tool to create a new one? What technique to use - NLP, TDT, Text Mining techniques or a mix of them?
Participants of the seminar also discussed about things like how to handle meta-bugs (bugs that describe others bugs) in the approach and what is the reuse motivation for the Duplication Detect problem. For reuse motivation we can argue for quality improvement, cost reduction and time saving, that, in general, is the idea of software reuse. And for meta-bugs, that are very common in open source development, probably we must threat this at pre-processing operations of bug reports.
Download the presentation PDF here