Friday, December 21, 2007
RiSE’s Interviews: Episode 1 – Software Reuse with Dr. Ruben Prieto-Diaz
Saturday, December 15, 2007
Architecture vs. Design
This post discusses a philosophic question: Are there differences between architecture and design activities? Before answering this question, I would like to tell some assertions found in literature. Eden and Kazman discuss differences among Architecture, Design and Implementation fields. According to them: "Architecture is concerned with the selection of architectural elements, their interaction, and the constraints on those elements and their interactions... Design is concerned with the modularization and detailed interfaces of the design elements, their algorithms and procedures, and the data types needed to support the architecture and to satisfy the requirements". Following this idea, Bass et al. defines software architecture as "the structure or structures of the system, which comprise software elements, the externally visible properties of those elements, and the relationships among them". The SEI (Software Engineering Institute) considers two distinct activities in the life-cycle development: Architecture design and Detailed design.
We can found several methods, methodologies, approaches, and so on, that can be seen as an architecture or design activity. For example OOAD (Object-Oriented Analysis and Design) methods, CBD (Component-Based Development) methods, SOA (Service-Oriented Architecture) methods. Can these methods be considered design methods? In architecture point of view, we have other methods, such as ADD method by SEI and 4+1 Model View by RUP. These methods present some steps to design a high-level architecture on several views.
In this context, I have some questions:
- Design comprises architecture activities? Or is it the opposite? Do analysis and design discipline in the life-cycle development encompasses an architecture method?
- OOAD is a design method, architecture method or design technique?
- Are UML Components and Catalysis (CBD methods) design methods?
At the end, my final question is: How is named an approach that defines components of the system in a high-level architecture with different views (architecture concept), and defines details of these components such as their operations(design concept)?
Monday, December 10, 2007
More on Software Product Line Design
Software product line engineering is a pro-active reuse approach. It introduces software reuse in large scale, throughout the organization. Its adoption benefits from lowering costs and time-to-market as well as raising product quality.
The five best known architecting methods for Software Product lines, which were surveyed in this presentation were the FAST method, by David Weiss; FORM by Kio Kang, that focuses in the feature analysis and develops its architectural model based on its feature model; COPA, developed inside Philips, which is probably the most extense method, covering also organizational issues; the QADA method, by Matinlassi, focuses on architectural quality attributes and proposes designing and assessing the architecture; and last, but not least, the KobrA method, developd by Atkinson, inside the Fraunhofer Institute as a concrete, component-based, object-oriented instantiation of PuLSE-DSSA.
Besides the well known methods proposing a new way of modeling product line architectures, Hendrickson bases his work on change sets and relationships between those change sets. An interesting alternative for tackling complexity.
Agreeing with Matinlassi's comparison of the well known methods, the KobrA approach is the simplest one. It is very pragmatic and focuses directly on defining a conceptual architecture and realizing it through component-based development. The key activities in the KobrA approach are those related to context realization (Application Context, Framework Context...). Therefore, a worthy contribution shall come from closing this gap between the conceptual approach and the Digital TV application domain.
The slides are available here for download, already with some extensions suggested this morning, at the classroom.
Thursday, December 6, 2007
Software Component Certification: A Component Quality Model
Monday, December 3, 2007
RiSE Summer School (RiSS) - Final Remarks
In the last day, Prieto discussed his ideas about libraries, facets and their evolution into ontologies. After that, Krueger started his presentation. Krueger was incredible. He could show how a CEO can do a nice talk and present his product in a hands on way. Krueger was also very impressive. I think that he could discuss for all night.
After the talks, the awards were announced. The first one was the Reuse Guy. Ricardo Cavalcanti, software engineer at C.E.S.A.R, was incredible. This guy had several questions during all the talks and was awarded by audience and organizers. In the second one, Wayne Lim, was awarded with the best course in the summer school. Wayne was incredible and had incredible acceptance by the attendants.
If you did not have opportunity to be there, in the next week we will publish the videos and interviews during the conference. If you were there, it is time to remember more.
Friday, November 30, 2007
1st Day RiSE Summer School - RISS 2007
All the material will be published on the internet (here), in conjunction with the video of the presentations.
Celebration - Latin American Fellowship Program:: Microsoft Research
His mentor there will be Dr. Ethan Jackson from the Foundations of Software Engineering group. Nowadays, Daniel is doing his Ph.D. sandwich at George Mason University. In this time, the entire RiSE staff in Brazil congrats him for it.
Congratulations and let’s have more champagne.
The First Historical RiSE Day
If your answer was yes, that is it. It was the RiSE Day which happened on November, 29, 2007. It was incredible with different point of views and exciting talks with the B.Sc., M.Sc., and Ph.D. students in RiSE.
Thanks for all the students and the keynotes for valuable feedback and patience. See all the pictures and presentations here.
Wednesday, November 28, 2007
The Historical "Software Reuse Adoption" Meeting
On November, 27th 2007, I had a historical meeting with Ruben Prieto-Díaz from James Madison University and Dirk Muthig from Fraunhofer Institute. We discussed important points about reuse adoption, specially, about maturity models, and what are the main obstacles to be surpassed and what are the best ways towards an effective reuse adoption program. In the following, the meeting summary:
I presented to Dr. Prieto-Díaz the current version of the RiSE Maturity Model, its principles, fundamentals and the scientific background, citing the main important works in the area. Prieto-Díaz worked in the SPC, so he knows the RMM and helped me to see some new obstacles that I had not considered yet.
An interesting issue is related to the people and the personal knowledge. Here, in Brazil, we have the culture to change the staff (software engineers) commonly. In this context, we need to specify a set of policies, procedures or rules (?) to aid in the knowledge transfer and storage, to make it available for the new people that come to the organization in the future. Reuse is important not only based on the software life cycle assets, but the knowledge reuse is also important for the organization. Some methods, techniques and environments/tools must be defined and introduced to support these policies/procedures and to make this content available for the next projects.
Related to my thesis, Prieto-Díaz highlighted the importance of making explicit the reuse concept, in the context of the thesis. As well as the boundaries and the scientific contribution.
After that, I met Dr. Dirk Muthig and he gives me another vision of the work. Muthig works at the Fraunhofer Institute, "similar" to CESAR, and has a vision focused on the practice (industry).
The meeting with Muthig was amazing. He explained me how they design a Ph.D. thesis at
Fraunhofer Institute, and adviced me to follow the same way. I will try to explain, summarized, what we define.
A Ph.D. thesis is composed of four elements: a Problem, an Idea, the Benefits; and, an Action Plan. In my thesis context, they are:
- P: Brazilian organizations see benefits of reuse but don't know how to introduce it. The main problem can be seen by this simple expression:
- reuse -> a lot of investments + risks => don't do it!
- a lot of investments because sometimes you need to reorganize everything, rewrite everything and rebuild and restructure everything!
- I: Provide an incremental path towards reuse adoption/introduction in practice.
- B: The main benefits are:
- lower risks;
- quicker benefits;
- smaller jumps of investments.
- A: The action plan can be something like this:
- Survey of Single Systems practice [related to reuse]
- in this scenario, we will have some companies with no reuse
- Define a first step that applies to all of them [measure reuse, explicit of reusable assets...]
- Develop a plan to aid in this question: how to maximize reuse from here? (it can be a link to Product Lines Approaches).
Prieto and Dirk agree that is fundamental to explicit the reuse concept. The boundaries should also be described. The next steps are: to specify the assessment and the reuse introduction process, based on the RiSE Maturity Model.
Tuesday, November 27, 2007
The Historical "Search And Retrieval" Meeting
On November, 27th 2007 the RiSE members had an historical meeting with Ruben Prieto-Díaz from James Madison University. They discussed important points about search mechanisms and which are the obstacles to be surpassed. In the following, we summarize the meeting:
Fred Durão started the meeting making an introduction of the first actions of the RiSE group in terms of search engine. He introduced the MARACATU, the first breath of search engine development. After that, he presented his MsC proposal showing how the use of ontology can be useful for improving the search precision.
Eduardo Cruz presented the B.A.R.T Search Engine, the commercial version of MARCATU, and also explained his MsC proposal that envisage a proactive search mechanism that exploits user context information to enhance the search.
Alexandre Martins talked about this ongoing work that is based on data mining technique to avoid unnecessary queries. In his work Martins analyzes the logs generated from the searches and extract patterns of relationships between query and retrieval. Thus it is possible the creation of association rules utilized for suggesting related assets in the regular search returns. Martins also empathized that the key point for the creation of relevant rules is to choose a good "window time" for evaluation of the search engine logs.
Yguaratã Cavalcanti also presented his MsC's proposal whose goal is to develop a tool for avoiding duplication of CR (change request). In general, the CRs are generated by bugs in the systems and reported in CR repository. Yguaratã pointed out that sometimes users rarely look for duplicated CR before reporting a new one and tools for avoiding such inconvenience are strongly advisable for software organizations that have hundreds of collaborators.
Cássio Melo showed his object of study which aims to develop a tool for automatic component extraction. Eduardo Cruz complemented the Cassio's speech by presenting the CORE system, a component repository system which motivates the Cassio's research. According to Cassio, software companies which intend to install a component repository have difficulty to store all of its assets in the repository.
Rodrigo Mendes had the most expected meeting with his (virtual) advisor; the hotspot of the discussion concentrated on how automatically produce relevant facets. Ironically, both researchers shared the same doubt and concluded that more research is needed. However they agreed at a point: a semi generation facet tool is the most viable way to apply a facet-based mechanism in a search tool.
Thursday, November 22, 2007
Software Engineering for Automotive Systems - Safety Vehicles
Tuesday, November 20, 2007
Towards a Query Reformulation Approach for Component Retrieval
Software construction is done more quickly when a reuse process is adopted. But this is not enough if there is no market to absorb these software components. The component market still faces a wide range of difficulties such as lack of a efficient search engine.
One of the biggest problems in component retrieval and search is to increase the significance of the result since the user normally doesn´t formulate the query of the appropriate way. The searcher has a vision of the problem that is not necessarily the components repository reality. Several approaches try to solve this problem. The seminar focuses on the query reformulation technique which reduces the conceptual gap between problem and solution through query refinement based on formulated queries stored previously.
The Code Finder was one of the first attempts to implement component search and retrieval by query reformulation. The papper "Interactive Internet search: keyword, directory and query reformulation mechanisms compared" evaluates that query refomulation improves the relevance of documents, but increase search time. The work "Using Ontologies Using Ontologies for Database Query Reformulation" do query reformulation using ontology rules for query optimization and for data integration. Another very interesting study is "Lexical analysis Lexical analysis for modeling web query reformulation" that analyzes lexicaly the searcher behavior through that Query Clarity and Part-of-Speech.
My initial proposal is to develop a query reformulation engine for BART, using techniques that will be evaluated such ontology, keywords order and other. Approaches comparison matrix will be prepared to compare the several existing techniques e helps in the correct choose.
by Dimitri Malheiros
Tuesday, November 13, 2007
RiSE’s Podcasts: Episode 1 – Software Product Lines with Dr. David Weiss
Thursday, November 8, 2007
Celebration - CRUISE book - Part II
Tuesday, November 6, 2007
Celebration - CRUISE book
Tonight, we celebrate: The CRUISE book will be released - printed copy - at Livraria Cultura in Recife, Perbambuco, Brazil, after more than 2000 downloads on the web. Everyone is invited to enjoy the night with the authors who will sign the copies.
Monday, November 5, 2007
Software Product Line Design
Software Product Line Scoping
In the survey, were analyzed 9 approaches. For each approach were identified: activities of scoping, strengths and weaknesses. Then, the more relevant activities found on the survey were grouped on a matrix and the approaches were compared.
The conclusions of the analysis are: the more complete approach is Pulse Scoping Process; the Pulse-ECO is the more referenced approach in the literature identified in this work; few approaches have activities to treat social problem of scoping; one approach has one activity to identify available assets; only one approach defines relation between domains; and only one approach has guidelines to different contexts.
Before this scenario, my initial proposal to software product line scoping is to adapt Pulse Scoping Process to the software reuse tools domain, adding activities to: help the marketing team on product portfolio scoping (e.g. identify customer segments, prioritize each segment; identify essential stakeholders); identify sub-domains and their relations; consider view point of the whole optimal and of the individual optimal; identify available assets.
The presentation (see) of the survey was done for the I.N.1.0.3.8 - Advanced Seminars in Software Reuse course at Cin/UFPE. Participants of the seminar discussed about the approaches comparison criteria and risk in reuse the available assets. The reuse of available assets can reduce efforts, but is necessary evaluate their impact in product line. The approaches comparison matrix can be improved (e.g. define the level of completeness of each activity in the approaches).
Thursday, November 1, 2007
Enhancing Components Search in a Reuse Environment Using Discovered Knowledge Techniques
The main objective of my work is to optimize the component search and retrieval through its historic use. Often, this use is monitored by a log mechanism such as file or database. Thus, it is possible to use this information to extract knowledge in order to improve the search engines. The optimization is done using recommendations of downloads to the users. These recommendations come from the knowledge that was extracted from the mentioned logs and are represented as rules (A -> B). Thus, this work proposes a way to improve the search engines avoiding unnecessary searches.
The presentation was done for the I.N.1.0.3.8 - Advanced Seminars in Software Reuse course at Cin/UFPE and several questions were pointed, like:
Why use association rules?
There are several techniques to extract knowledge, but in this work the objective is suggest or preview downloads. For this, is not important the sequence or the results classification, but the main knowledge is the relation mapped by rules (A -> B)
How to conduct the experimentation?
It is the main problem, because the data need to be in a good quantity. For it, the current version of BART already storing this data and it allows the experimentation.
The next step is improve the prototype and begin the experimentation using real data to validate de knowledge extracted.
Monday, October 29, 2007
BTT - Towards a Bug Triage Tool
About the presentation, among challenges related to bug and source repositories were presented works and current researches about Impact Analysis, Automated Bug Report Assignment, Automated Bug Report Duplication Detect, Understanding and Predicting Software Evolution, Team Expertise Determination, Bug Report Time to Fix and Effort Estimation, and Social networks.
For each of these challenges were presented how researches has been addressed the problem, e.g. what techniques has been used, and the achieved results for each approach. In addition, issues about techniques and test framework for some works were mentioned. However, despite the number of challenges, the presenter focuses on Duplication Detect problem, which has few works about and is more crucial in bug triage process and tools.
Among the presented works for Duplication Detect problem, that were only 3, we can cite the paper from Runeson which attacked the problem using NLP techniques, the MSc dissertation from Hiew using a cluster based approach with cosine similarity measurement, and the paper from Anvik that presented some resulted using statistical model and text similarity.
Some consideration also were made: among revised works, only one had been tested in an industrial scale (Runeson´s paper), the others are academic prototype only; more real test cases with a greater number of different software projects and bug databases are needed; the major part of the works are project process dependent, which blocks the approach generalization for other software projects; and the efficiency of the approach vary from 20% to 40% only, that is a very low range.
Furthermore, the presenter left some open questions about a tool development in the future: would be better to develop a bug triage tool from scratch, or to develop plug-ins for most used tools or adapt some existing tool to create a new one? What technique to use - NLP, TDT, Text Mining techniques or a mix of them?
Participants of the seminar also discussed about things like how to handle meta-bugs (bugs that describe others bugs) in the approach and what is the reuse motivation for the Duplication Detect problem. For reuse motivation we can argue for quality improvement, cost reduction and time saving, that, in general, is the idea of software reuse. And for meta-bugs, that are very common in open source development, probably we must threat this at pre-processing operations of bug reports.
Download the presentation PDF hereSunday, October 28, 2007
Using Requirements Management Tools in Software Product Line Engineering: The State of the Practice
The majority of the authors work in the industry, and because of that they did not define a systematic approach to do the analysis. They said that it was all based on practical experience, but should it be enough? Doing it does not increase the chance of bias in the research?
Besides, the requirements defined for requirements managements tools were too superficial described, some lacked of reasons why to include it, while others did not explained how it could aid the a software product line process.
On the other hand, the paper was derivated from a report, which may explain what was not very clear in the paper.
Tuesday, October 23, 2007
Changing the focus on Search and Retrieval: From Software Assets to Interactive Multimedia Diary for Home
Their motivation is that automated capture and retrieval of experiences tracking place at home is interesting for several reasons. First, the home offers an environment where a variety of memorable events and experiences take place [imagine your first soup, steps, etc]. Thus, the work on multimedia capture and retrieval focuses on the development of algorithms for person tracking, key frame extraction, media handover, lighting change detection and the design of strategies that help to navigate huge amounts of multimedia data. The studies were conducted at the National Institute of Information and Communications Technology’s Ubiquitous Home in Kyoto, Japan, in an environment simulating a two-bed room house equipped with 17 cameras and 25 microphone for continuous video and audio acquisition, in conjunction with pressures-based floor sensors. Some challenges are associated to floor sensor data retrieval, audio retrieval, lighting changes, besides user interaction. In their prototype, the user retrieves video, audio, and key frames through a graphical interface based on some queries. But¸ about the queries you can think about the gap between the user queries and the semantic levels. For example, consider a query as: “retrieve video showing the regions of the house people were at 20:00 p.m” and “What was I doing after dinner?”.
Monday, October 22, 2007
XXI Brazilian Symposium on Software Engineering (SBES)
Among other activities, the SBES had a panel with RiSE group coordinator Silvio Meira (C.E.S.A.R and Federal University of Pernambuco), Don Batory (University of Texas), David Rosenblum (University College London), Claudia Werner (COPPE/ Federal University of Rio de Janeiro) and Itana Gimenes (State University of Maringá) about Academic and Industrial Cooperation in Software Engineering. The panel rose questions about why there is so little cooperation between them, and how it can be improved. In this panel the LIFT tool was cited several times as a successful example of cooperation between Academy (Federal University of Pernambuco) and Industry (C.E.S.A.R and Pitang Software Factory)
Thursday, October 18, 2007
Extracting and Evolving Mobile Games Product Lines
The biggest problem involving SPL and the mobile domain is possibly the chaos involved with the platform. We commonly see different manufacturers implementing the same platform in two different ways. Or there are processing restrictions that make the platform implementation to work differently from the specification. It seems that the mobile application domain is still in chaos because the market share rivalry and the run for lower prices are most important than establishing a standard platform for applications amongst manufacturers.
My opinion is: in a short period of time, handsets computing power will be enough to establish a SPL with the same sophistication of a SPL applied to desktop applications. And then, the major problems we face now with mobile domain will vanish and the problems will be related for any domain chosen.
Monday, October 15, 2007
The Semantic Web and the Knowledge Reuse
As seem the meaningful of the content is the key point of the Semantic Web applications; this happens because Semantic Web Markup Languages such as OWL and RDF associate the content to be exhibited with its source: the domain ontologies. In computer science, ontologies correspond to documents that formally define the relations among entities of real-world. Moreover, ontologies play an important role in knowledge sharing and reuse since new the Semantic Web Markup formats can be easily published on the Internet. Today many different domain ontologies have been populating the web and making the knowledge reusable by anybody else. In a positive perspective, we believe this scenario tends to increase with the advent of Semantic Web applications. Example of this we can be seen at "Ontology library systems: The key to successful ontology reuse"[Y. Ding and D. Fensel., 2001].
Saturday, October 13, 2007
Software Engineering for Automotive Systems - Starting with Car Talk
Friday, October 5, 2007
Quality Certification of Reusable Components
One interesting aspect of this paper is that we, from RiSE, are developing a robust software reuse framework that considering the same ideas of the activities stands out in the paper (and other ones also). Besides, we are applying this framework in real case environments and a set of tools were developed to support this environment. The main goal is increase the software productivity of software companies. Is it interested?? Contact RiSE.
Tuesday, October 2, 2007
Changing the Research Focus: Towards Industrial Projects and New Startups
Saturday, September 29, 2007
Using data mining to improve search engines
I'll present some ideas related with the use of data mining to improve search engines. Thus, the first question is: Where are your data that you will extract the knowledge? The focus in this discussion is use historic data as log files like reference of a real use of search engine. In other hand we need to select the techniques that we use to extract the knowledge hidden of the raw data. In the literature, there are several techniques as classification, clustering, sequence analysis, association rules; among others [see 1].
The direction selected in this discussion is using the association rules [see 2] to analyze the relations between the data stored in the log file. These relations are used to aid the users through of suggestions like queries or options to download.
The paper selected for the RiSE`s Discussion was “Using Association Rules to Discover Search Engines Related Queries” that shows the use of association rules to extract related queries from a log generated by a web site.
The first question was related with the transformation of log files in the term called “user sessions”, why do it? It is important because the algorithms used to extract association rules needs that the records are grouped in a transactions set. However, when log files are used, these groups are not perfectly separated. The classic situation of association rules is the Market Basket Analysis [1] that is associated with the organization’s products in a super market. In this case, the sessions are defined by the consumer ticket. In this ticket, the products bought are perfectly described and separated of other consumers. However, in a log file the records are not sorted and it is necessary to separate these lines in transactions set. Each transaction will contain several log lines that represent the use of the users. In the paper the IP address and a window time was used. My work uses the session number id to identify the users during a time window.
The quality of recommendations was cited too. This quality is measured using metrics like support, confidence. However the parameter used is specific for each situation.
This approach is common used in the web paradigm, but this idea can be used to improve component search engines using recommendations to downloads, queries and any other information that is stored in log files.
I have some critiques about this paper like the details of data mining process; several algorithms can be used, what was used? Other question is related with the experiment, I think that the choice of a specific domain to extract the rules could help the validation of suggestions using a specialist in this domain.
Wednesday, September 26, 2007
Open Call for M.Sc. and Ph.D. students in RiSE
Monday, September 24, 2007
What views are necessary to represent a SOA?
Service-Oriented Architecture (SOA) is a system architecture in which a collection of loosely coupled services communicate with each other using standard interfaces and message-exchanging protocols. As an emerging technology in software development, SOA presents a new paradigm, and some authors affirms that it affects the entire software development cycle including analysis, specification, design, implementation, verification, validation, maintenance and evolution [see 1, 2 and 3].
In this context, we discussed about the paper "SOA Views: A Coherent View Model of the SOA in the Enterprise", published at IEEE International Conference on Services Computing in 2006. The authors, Ibrahim and Misic, proposed a set of nine views to represent an SOA-based architecture software: Business view, Interface view, Discovery view, Transformation view, Invocation view, Component view, Data view, Infrastructure view, and Test view.
In our discussion, the first question was: Do current approaches, such as RUP 4+1 Model View and ADD method by SEI, attend the particularities within context of SOA design?
We agree with some views and we considerate interesting within SOA approach, such as Interface view and Discovery view. The first describes the service contract, and the second provides the information necessary to discover, bind, and invoke the service.
Additionally, I agree with the paper about to have several views for SOA, because they can conduct the architects to construct a solution with the particularities of SOA and to address the quality attributes of this kind of enterprise system.
Finally, I think that misses in this paper the relation among the stakeholders and the quality attributes that each view can be address. Besides, the paper does not show how each view can be represented. For architects, it is important to have models in order to help the architects to design the solution for each view. One example of this, it is using the UML sequence diagram for Discovery view, showing how the consumer can find the services in the service registry.
Wednesday, September 19, 2007
No Evolution on SE?
This conference has a very interesting public from a set of software companies, such as Philips, Nokia, Sony/Ericsson, HP, among others and a set of recognizable institutes like Fraunhofer Institute, Finland Research, C.E.S.A.R., among others. In this way, interesting discussions and partnerships (with the industry and academia) usually takes place.
I have presented two papers there: (1) a paper about software component maturity model, in which I described the component quality model and the evaluation techniques proposed by our group in order to achieve a quality degree in software components; (2) a paper about an experimental study on domain engineering, which was an interesting work accomplished by our group together with the university in order to evaluate a domain engineering process at a post-graduate course. Some researchers that watched those presentations believe the component certification is the future of software components and like the work that we have been developing because this area is vague, sometimes. On the other hand, the researchers liked the experimental study report and commented that this is an interesting area that could be improved in order to increase the number of proved and validated works (in academia or industry) in software engineering area. The experimental engineering area has received a special attention in the last years by the software engineering community due to the lack of works and the difficulty to evaluate the software researches.
Tuesday, September 18, 2007
The bad side of Bug Repositories
As some people has noticed, the majority of open source software, and proprietary software too, has been organized their development processes around a bug repository system. This means that bugs resolution, new features and even improvements in the process, are being dictated by bug reports. Here, we mean by bug a software defect, change requests, features requests, issues in general.
The task of analyzing reported bugs is called bug tracking or bug triage, where the word "bug" could, reasonably, be replaced by issue, ticket, change request, defect, problem, as many others. But the more interesting is to know that bug tracking tasks are done, in general, by developers and a precious time is taken for this. Beside many others sub-tasks in bug triage, we can cite: analyzing if a bug is valid; trying to reproduce it; dependency checking -- that is, verify if other bugs block this bug and vice-versa; verify if a similar bug has been reported -- duplication detect; assign a reported bug to a developer.
Many other sub-tasks can be identified, however, in attempt to show the problem that bug triage could be the in software final quality, we'll concentrate our efforts on bug duplication detect task, witch actually is manually made, as many others.
In a paper by Gail Murphy, entitled Coping with an open bug repository, we can see that almost 50% of reported bugs during the development and improving phase are invalid. That is, are bugs that could not be reproduced (here we include the well know "works for me" bugs), bugs that wont be resolved, duplicated bugs, bugs with low priority, and so on. And 20% of this invalid bugs are only duplicated bugs, that is, bugs that was early reported.
Putting it in numbers, lets suppose that a project receive about 120 bug reports by day (in some projects this average is much more bigger), and that a developer spent about 5 minutes to analyze one bug. Doing simple arithmetic operations, we see that 10 hours per day, or 10 persons-hour, are wasted only in this task (bug tracking), and about 5 hours are wasted only with bug that does not improve the software quality. And only for duplicated bugs we have 2 wasted hours. Now calculate it for a month, for a year! That is, the automated invalid bugs detection, in special duplicated bug detection, is a field to continue being explored; many techniques has been tested. A good technique can save these wasted hours and put them in a health task.
Another thing which we can mention is that if a software product line approach is used, the problem of duplicated bug reports can increase significantly. Since, products have a common platform, many components are reused. That is, as the same component are used in many products, the probability of reporting the same bug by different people are higher. Moreover, the right component must be correctly identified in attempt to solve the bug, if not the problem still occurring in the product line.
One could not see at a first glance, but the bug repositories analysis, specially the detection of duplicated bugs, has much to see with software reuse. Software reuse try to reduce costs, make software development process faster, increase the software quality and other benefits. Improvements in bug triage processes aims to do exactly this!
Bug repositories came as a new challenge for emergence Data Mining for Software Engineering field. Many techniques from intelligent information retrieval, data mining, machine learn and even data clustering, could be applied to solve these problems. The actually researches results has achieved only 40% (as a maximum) of effectiveness on trying to automate these tasks, witch characterize a semi-automated solution.
Monday, September 17, 2007
RiSE members visit Virginia Tech in Falls Church
We also presented RiSE's works, like the Reuse Maturity Model, the Model Driven Reuse approach, component certification and testing and the RiSE tools – B.A.R.T., CORE, ToolDAy and LIFT. They were particularly interested in Lift, which is a tool for retrieving legacy systems information, aiding the system documentation, because of its results in a real project, and also because they are currently working with reengineering themselves.
Frakes was also interested in B.A.R.T.'s query reformulation work. Regarding the ToolDAy, even though the adopted process is different from DARE's, he liked to see that the tool is well developed and assembled, and said that DARE could use some of improvement in this aspect.
Frakes also gave us a more detailed presentation about the DARE environment. He also presented the main concepts and current trends on software reuse, and we were pleased to see that RiSE has relevant works in most of them.
Besides getting to know each other's works, another goal of this meeting was to find options for possible cooperations between RiSE and their research group at Virginia Tech. One of the suggestions is to pursue co-founded projects between us; another option is to send Ph.D. and M.Sc. students to Virginia Tech, to exchange ideas and experience, and vice-versa; we also discussed the possibility of joint development and tool integration. Since one of RiSE's goals is to develop practical tools for reuse, we could benefit from the experience of both groups to deliver good solutions to the industry.
The meeting ended with many possibilities, and the next step is to start defining concrete options and suggestions to make this collaboration happen.