Showing posts with label open source software. Show all posts
Showing posts with label open source software. Show all posts

Thursday, July 10, 2008

Do you care about source code as you should?

Diomidis Spinelle has performed an interesting research about communication through source code in Open Source (OS) projects, and wen can summarize it by paraphrasing him: "... it confirms my belief that source code is the most important artifact of the software development process".

In such research, Spinelle computed the amount of source code that are not of compiler's interest from 30 OS projects. The results showed that for most of projects, more than 60% of source code is composed by code used to facilitate the source code understanding. The Figure bellow summarizes the results (click to expand).


I absolutely agree with him when he says that source code is the most important asset. And current, in OS world, Reuse is all about reusing source code. Thus, to understand what some source code does and how it does is very important to reuse it.

Efforts have been concentrated to defined architecture and design definition process (among other software aspects) to facilitate software reuse, however we are still looking at the source code before to reuse it. Thus, such efforts seem to forget that some programming language (actually source code) will be used to make the project runs.

Furthermore, most of projects analyzed by Spinelle is composed by successful open source projects; another reason to believe that source code is the most important artifact of software development ever.

So remember, if your project is going to be evolved on the future, the source code is probably the first place where developers will look at.

Sunday, April 6, 2008

Proactive Bug Reports Discover


The developers behind PyGame have introduced a proactive way to capture bugs discussion that have not reported to the PyGame mailing list [yeah, they use email list to report bugs]. The motivation is the fact that users, sometimes, associate the PyGame's problems to the operational system is running, which lead them to report the problem to the OS's bug tracker, or simply discuss about PyGame's problems in personal blogs or whatever.

The method proposed in PyGame's blog is interesting and consists in searching into specific places [sites, mailing lists etc] to find bug reports related to PyGame. However, some questions about viability and scalability rise, such as: what about the cost to maintain crawlers searching for bug discussion in the web? How to decide if a bug discussion is really relevant or if it is really a bug? etc.

For PyGame's developers, this solution is reasonable because they have specifics web sites, mailing lists, and CR tracking systems, where the searches must be performed. But, for widely used softwares, such as Firefox web browser, the technique might be very costly, or even impracticable.

Tuesday, September 18, 2007

The bad side of Bug Repositories

In the last eight years, approximately, bug repositories, especially in Open Source Software, has gained much more focus by researchers, increasingly considerably the literature about it. These repositories are being analyzed by information retrieval perspective for Software Engineering (see 1 and 2), in an attempt to improve and automate some processes related to them. Bug repositories are systems to collect bugs founded by users and developers during a software usage.

As some people has noticed, the majority of open source software, and proprietary software too, has been organized their development processes around a bug repository system. This means that bugs resolution, new features and even improvements in the process, are being dictated by bug reports. Here, we mean by bug a software defect, change requests, features requests, issues in general.

The task of analyzing reported bugs is called bug tracking or bug triage, where the word "bug" could, reasonably, be replaced by issue, ticket, change request, defect, problem, as many others. But the more interesting is to know that bug tracking tasks are done, in general, by developers and a precious time is taken for this. Beside many others sub-tasks in bug triage, we can cite: analyzing if a bug is valid; trying to reproduce it; dependency checking -- that is, verify if other bugs block this bug and vice-versa; verify if a similar bug has been reported -- duplication detect; assign a reported bug to a developer.

Many other sub-tasks can be identified, however, in attempt to show the problem that bug triage could be the in software final quality, we'll concentrate our efforts on bug duplication detect task, witch actually is manually made, as many others.

In a paper by Gail Murphy, entitled Coping with an open bug repository, we can see that almost 50% of reported bugs during the development and improving phase are invalid. That is, are bugs that could not be reproduced (here we include the well know "works for me" bugs), bugs that wont be resolved, duplicated bugs, bugs with low priority, and so on. And 20% of this invalid bugs are only duplicated bugs, that is, bugs that was early reported.

Putting it in numbers, lets suppose that a project receive about 120 bug reports by day (in some projects this average is much more bigger), and that a developer spent about 5 minutes to analyze one bug. Doing simple arithmetic operations, we see that 10 hours per day, or 10 persons-hour, are wasted only in this task (bug tracking), and about 5 hours are wasted only with bug that does not improve the software quality. And only for duplicated bugs we have 2 wasted hours. Now calculate it for a month, for a year! That is, the automated invalid bugs detection, in special duplicated bug detection, is a field to continue being explored; many techniques has been tested. A good technique can save these wasted hours and put them in a health task.

Another thing which we can mention is that if a software product line approach is used, the problem of duplicated bug reports can increase significantly. Since, products have a common platform, many components are reused. That is, as the same component are used in many products, the probability of reporting the same bug by different people are higher. Moreover, the right component must be correctly identified in attempt to solve the bug, if not the problem still occurring in the product line.

One could not see at a first glance, but the bug repositories analysis, specially the detection of duplicated bugs, has much to see with software reuse. Software reuse try to reduce costs, make software development process faster, increase the software quality and other benefits. Improvements in bug triage processes aims to do exactly this!

Bug repositories came as a new challenge for emergence Data Mining for Software Engineering field. Many techniques from intelligent information retrieval, data mining, machine learn and even data clustering, could be applied to solve these problems. The actually researches results has achieved only 40% (as a maximum) of effectiveness on trying to automate these tasks, witch characterize a semi-automated solution.

Post by Yguaratã C. Cavalcanti, M.Sc. candidate at CIn-UFPE and RiSE member.