Showing posts with label bug. Show all posts
Showing posts with label bug. Show all posts

Sunday, July 27, 2008

Bug Reports Scenarios

A big problem in software development nowadays in the practical and academic sense is the Bug Duplication Problem [see the first discussion about it here]. RiSE Labs is working in this direction too and I was wondering about possible solutions about it. I would like to discuss it with you here. The discussion is based on scenarios and some possibilities.

General Case: Don’t mess up my Repository
In this general case, the idea is that a tool can help you to avoid the mess. Maybe, we [user and tool?] can do it.

1st – Scenario – I cannot avoid it but I can help you with it (No intelligence both side)

The idea here is that the user sends a bug report for the environment without checking previous bug report and a tool cannot do anything to solve it. The tool just confirms it (Bug sent with success!). You can do some searches and see the reports after and remove it [if it is a duplicate].


What can we do now? Maybe, I can hire a person to check the new submissions every day or the tool can check it in order to try to identify similar bugs and send a message for someone (the bug report author and the manager) in a configurable way (here we have some intelligence in the tool side). Yes, we have to define how to identify similar bugs with some confidence (otherwise we can have the false/positive problem) and automate it.


2nd – Scenario – I can try to avoid it and maybe I can help you (Some intelligence in the tool side)

In this case the user sends a bug report – in the same way without checking a previous one – and the tool presents a message for the user that there is a possibility that it is a duplicate bug report. Yes, the tool can do some analyzes, compute some data and say that. The user has the opportunity to see the duplication and confirms in any case the manager receives a notification also about this possibility.


3rd – Scenario – Yes, we can have some intelligence in the process.

In this case the user has some templates and a well defined vocabulary to send the bug report. Maybe with a restricted one we can be more precise and the tool can have more accuracy in its prediction. Thus, the user based on some templates sends the reports and the tool can do some analyzes and predict a better possibility about the duplication and do the notifications.


4th – Scenario – Intelligent tool

Using a template or not, while the user is writing the bug report, the tool can perform some analysis and notify him with some confidence that that one can be a duplication and show why. Yes, the tool can combine semantic and textual information and say it for the user.

What do you think about it? Can you present more scenarios? Another possibility is a scenario as: My repository has duplicate reports [I cannot say that it is a mess or I can have problems with some companies] and what a tool can do in order to help me to be more productive? It is that I am starting a bug report and my tool cannot support the previous scenarios.

Sunday, April 6, 2008

Proactive Bug Reports Discover


The developers behind PyGame have introduced a proactive way to capture bugs discussion that have not reported to the PyGame mailing list [yeah, they use email list to report bugs]. The motivation is the fact that users, sometimes, associate the PyGame's problems to the operational system is running, which lead them to report the problem to the OS's bug tracker, or simply discuss about PyGame's problems in personal blogs or whatever.

The method proposed in PyGame's blog is interesting and consists in searching into specific places [sites, mailing lists etc] to find bug reports related to PyGame. However, some questions about viability and scalability rise, such as: what about the cost to maintain crawlers searching for bug discussion in the web? How to decide if a bug discussion is really relevant or if it is really a bug? etc.

For PyGame's developers, this solution is reasonable because they have specifics web sites, mailing lists, and CR tracking systems, where the searches must be performed. But, for widely used softwares, such as Firefox web browser, the technique might be very costly, or even impracticable.

Thursday, February 28, 2008

More on Change Request Duplication Problem

This to CR are iqual?
In a previous post we had a little discussion about Change Requests duplication problem, showing how it can impact on software development and some works that approached it. However, our curiosity led us to a much bigger investigation of the problem to see its real dimension.

Thus, with this objective in mind, was performed a formal characterization study of CR duplication problem to see how it impacts on software development productivity. Furthermore, it was selected several different projects -- which include private and open source projects -- with different characteristics in order to expand our study as much as possible. Among these characteristics, we can cite software domain, team size and experience, software size and life time, CR tracking system used, and so on. In addition, it was performed some interviews with developers and people which deal with CR tracking systems.

The values and answers obtained for the metrics and questions we have defined confirmed the initial expectations, in most cases passing them, showing that the CR duplication problem is very critical to the project productivity, evolution and maintenance. In other words, many hours are lost in the task of identifying duplicate CRs, which could be used with other tasks. In addition, not only hours are lost, but also this problem turns difficult the engagement of new people to deal with CR tasks. In the last case, it happens because people need to have a good knowledge of the past CRs of a repository in order to avoid the problem.

Moreover, we have not only performed this characterization study but, given that the problem has a industrial scale, some studies and works is being conducted to solve the problem. For example, an approach based on keyword-based search engines was applied to the problem. However, although the result were satisfactory, we think that it can be improved much more with the combination of other techniques.

Monday, October 29, 2007

BTT - Towards a Bug Triage Tool

Today we had a seminar for I.N.1.0.3.8 - Advanced Seminars in Software Reuse course at Cin/UFPE about current works/researches on mining software repositories, especially bug and source repositories. For one that is not familiar with challenges raised by these kinds of repositories this text can be interesting.

About the presentation, among challenges related to bug and source repositories were presented works and current researches about Impact Analysis, Automated Bug Report Assignment, Automated Bug Report Duplication Detect, Understanding and Predicting Software Evolution, Team Expertise Determination, Bug Report Time to Fix and Effort Estimation, and Social networks.

For each of these challenges were presented how researches has been addressed the problem, e.g. what techniques has been used, and the achieved results for each approach. In addition, issues about techniques and test framework for some works were mentioned. However, despite the number of challenges, the presenter focuses on Duplication Detect problem, which has few works about and is more crucial in bug triage process and tools.

Among the presented works for Duplication Detect problem, that were only 3, we can cite the paper from Runeson which attacked the problem using NLP techniques, the MSc dissertation from Hiew using a cluster based approach with cosine similarity measurement, and the paper from Anvik that presented some resulted using statistical model and text similarity.

Some consideration also were made: among revised works, only one had been tested in an industrial scale (Runeson´s paper), the others are academic prototype only; more real test cases with a greater number of different software projects and bug databases are needed; the major part of the works are project process dependent, which blocks the approach generalization for other software projects; and the efficiency of the approach vary from 20% to 40% only, that is a very low range.

Furthermore, the presenter left some open questions about a tool development in the future: would be better to develop a bug triage tool from scratch, or to develop plug-ins for most used tools or adapt some existing tool to create a new one? What technique to use - NLP, TDT, Text Mining techniques or a mix of them?

Participants of the seminar also discussed about things like how to handle meta-bugs (bugs that describe others bugs) in the approach and what is the reuse motivation for the Duplication Detect problem. For reuse motivation we can argue for quality improvement, cost reduction and time saving, that, in general, is the idea of software reuse. And for meta-bugs, that are very common in open source development, probably we must threat this at pre-processing operations of bug reports.

Download the presentation PDF here

Tuesday, September 18, 2007

The bad side of Bug Repositories

In the last eight years, approximately, bug repositories, especially in Open Source Software, has gained much more focus by researchers, increasingly considerably the literature about it. These repositories are being analyzed by information retrieval perspective for Software Engineering (see 1 and 2), in an attempt to improve and automate some processes related to them. Bug repositories are systems to collect bugs founded by users and developers during a software usage.

As some people has noticed, the majority of open source software, and proprietary software too, has been organized their development processes around a bug repository system. This means that bugs resolution, new features and even improvements in the process, are being dictated by bug reports. Here, we mean by bug a software defect, change requests, features requests, issues in general.

The task of analyzing reported bugs is called bug tracking or bug triage, where the word "bug" could, reasonably, be replaced by issue, ticket, change request, defect, problem, as many others. But the more interesting is to know that bug tracking tasks are done, in general, by developers and a precious time is taken for this. Beside many others sub-tasks in bug triage, we can cite: analyzing if a bug is valid; trying to reproduce it; dependency checking -- that is, verify if other bugs block this bug and vice-versa; verify if a similar bug has been reported -- duplication detect; assign a reported bug to a developer.

Many other sub-tasks can be identified, however, in attempt to show the problem that bug triage could be the in software final quality, we'll concentrate our efforts on bug duplication detect task, witch actually is manually made, as many others.

In a paper by Gail Murphy, entitled Coping with an open bug repository, we can see that almost 50% of reported bugs during the development and improving phase are invalid. That is, are bugs that could not be reproduced (here we include the well know "works for me" bugs), bugs that wont be resolved, duplicated bugs, bugs with low priority, and so on. And 20% of this invalid bugs are only duplicated bugs, that is, bugs that was early reported.

Putting it in numbers, lets suppose that a project receive about 120 bug reports by day (in some projects this average is much more bigger), and that a developer spent about 5 minutes to analyze one bug. Doing simple arithmetic operations, we see that 10 hours per day, or 10 persons-hour, are wasted only in this task (bug tracking), and about 5 hours are wasted only with bug that does not improve the software quality. And only for duplicated bugs we have 2 wasted hours. Now calculate it for a month, for a year! That is, the automated invalid bugs detection, in special duplicated bug detection, is a field to continue being explored; many techniques has been tested. A good technique can save these wasted hours and put them in a health task.

Another thing which we can mention is that if a software product line approach is used, the problem of duplicated bug reports can increase significantly. Since, products have a common platform, many components are reused. That is, as the same component are used in many products, the probability of reporting the same bug by different people are higher. Moreover, the right component must be correctly identified in attempt to solve the bug, if not the problem still occurring in the product line.

One could not see at a first glance, but the bug repositories analysis, specially the detection of duplicated bugs, has much to see with software reuse. Software reuse try to reduce costs, make software development process faster, increase the software quality and other benefits. Improvements in bug triage processes aims to do exactly this!

Bug repositories came as a new challenge for emergence Data Mining for Software Engineering field. Many techniques from intelligent information retrieval, data mining, machine learn and even data clustering, could be applied to solve these problems. The actually researches results has achieved only 40% (as a maximum) of effectiveness on trying to automate these tasks, witch characterize a semi-automated solution.

Post by Yguaratã C. Cavalcanti, M.Sc. candidate at CIn-UFPE and RiSE member.