Sunday, July 27, 2008

Bug Reports Scenarios

A big problem in software development nowadays in the practical and academic sense is the Bug Duplication Problem [see the first discussion about it here]. RiSE Labs is working in this direction too and I was wondering about possible solutions about it. I would like to discuss it with you here. The discussion is based on scenarios and some possibilities.

General Case: Don’t mess up my Repository
In this general case, the idea is that a tool can help you to avoid the mess. Maybe, we [user and tool?] can do it.

1st – Scenario – I cannot avoid it but I can help you with it (No intelligence both side)

The idea here is that the user sends a bug report for the environment without checking previous bug report and a tool cannot do anything to solve it. The tool just confirms it (Bug sent with success!). You can do some searches and see the reports after and remove it [if it is a duplicate].

What can we do now? Maybe, I can hire a person to check the new submissions every day or the tool can check it in order to try to identify similar bugs and send a message for someone (the bug report author and the manager) in a configurable way (here we have some intelligence in the tool side). Yes, we have to define how to identify similar bugs with some confidence (otherwise we can have the false/positive problem) and automate it.

2nd – Scenario – I can try to avoid it and maybe I can help you (Some intelligence in the tool side)

In this case the user sends a bug report – in the same way without checking a previous one – and the tool presents a message for the user that there is a possibility that it is a duplicate bug report. Yes, the tool can do some analyzes, compute some data and say that. The user has the opportunity to see the duplication and confirms in any case the manager receives a notification also about this possibility.

3rd – Scenario – Yes, we can have some intelligence in the process.

In this case the user has some templates and a well defined vocabulary to send the bug report. Maybe with a restricted one we can be more precise and the tool can have more accuracy in its prediction. Thus, the user based on some templates sends the reports and the tool can do some analyzes and predict a better possibility about the duplication and do the notifications.

4th – Scenario – Intelligent tool

Using a template or not, while the user is writing the bug report, the tool can perform some analysis and notify him with some confidence that that one can be a duplication and show why. Yes, the tool can combine semantic and textual information and say it for the user.

What do you think about it? Can you present more scenarios? Another possibility is a scenario as: My repository has duplicate reports [I cannot say that it is a mess or I can have problems with some companies] and what a tool can do in order to help me to be more productive? It is that I am starting a bug report and my tool cannot support the previous scenarios.


Yguaratã C. Cavalcanti said...

Good post Eduardo! I think we can identify another scenario: the user searches for past bug-reports before sending a new one. In this case, the user is aware about the duplication problem and tries to avoid it. The tool can also help him with such searches.

Currently, searching and visualizing bug-reports in tools such as bugzilla, mantis, trac etc, require extra effort (mainly for final users).

Thiago said...

Comments and questions:
1) Yguarata, why do you say searching for bugs in tools like mantis, trac, etc require extraeffort? what extra effort?

2) Eduardo, I just had an idea of another scenario :P (I don´t know if it is too crazy).
Maybe we could have a scenario, where the application being developed is completly instrumented by the BTTool and everytime an exception occur, the bug is automatically raised and its breaking point properly stored. So the next time the application breaks in the same place automatically we would be able to see that this bug was already opened.

Does it already exists? What you guys think?

It is more like a pro-active approach of not letting people enter duplicated bugs, than searching for already existing duplicated bug (of course this search for existing duplicates is necessary and helpful).

Eduardo Almeida said...

Thiago, Can you explain more your suggestion?
I do not know, but for me, the scenarios firstly presented can be summarized in:
- how to identify similar bugs. Perhaps, we should anayze some repositories, look for patterns [if available], identify how to identify them automatically and with which techniques or tools.

Thiago said...

Ok, let me try to detail this scenario better.

1) We are developing an application. Our BTT should be connected with our application server (just an example). Every time an exception is raised in this server, the BTT saves the time, the exception, the source code line and automatically opens a Bug for that. Next time this exception occurs it will detect that a bug is already open for that exception in that file in that specificallz source code line.

So it is like automatic bug reporting without duplication.

2) But what if I try to manuallz open a bug request that is a duplicate? The BTT should be able to identify (now following your scenarios ideas ... based on patterns, dates, files, exceptions, etc...)

I think number 2 goes in the same direction of your thoughts, but number 1 would comes like a previous automatic scenario.

Yguaratã C. Cavalcanti said...

I understand your idea, thiago. It is something like windows does when it works badly: they ask to the user if he/she wants to submit a report with some execution information. We have other tools that do that, but i only remember windows by now. :D

This type of information is more systematic to detect duplicates than reports with natural language descriptions. In my research beginning i identified the work from Podgurski where he addressed such type of reports with execution information. And recently the work from Wang tried to improve detecting duplicate bug-reports in natural language using execution information mixed with natural language processing.

Our focus is concerned with bug-reports written in natural language. Currently we do not look for execution information in bug-reports. We also must consider that not every reports carry execution information, but it is such a information we must use if provided. Firstly we'll need to extract such information automatically from bug-reports, and it involves knowing the type of programming language that the target software is developed.

And about your question: in mantis and bugzilla, for example, to perform specific searches you need to create filters, specifying the required fields. Our tool mix such fields enabling them to be searched without building filters. It is just a single point of improvement. ;) The tool also provide visualization facilities, and extract extra information from bug-reports text, such as bug ids, links, attachments etc.