Wednesday, July 30, 2008

Empirical Studies: On software and sharks

Software has become part of our society and it is found in products ranging from microwaves to space shuttles. It implicates that a vast amount of software has been and is being developed. On the other hand, organizations are continuously trying to improve their software process in order to achieve their goals.

Nevertheless, when the improvement proposal has been identified, it is necessary to determine which to introduce, if applicable. Moreover, it is often not possible just to change the existing software process without obtaining more information about the actual effect of the improvement proposal, i.e., it is necessary to evaluate the proposals before making any changes in order to reduce risks. In this context, empirical studies are crucial since the progress in any discipline depends on our ability to understand the basic units necessary to solve a problem. Additionally, experimentation provides a systematic, disciplined, quantifiable, and controlled way to evaluate new theories. It has been used in many fields, e.g., physics, medicine, manufacturing; however, in the software engineering field, this idea started to be explored in the 70s with the work of Victor Basili from University of Maryland. Currently, we can see conferences, books and other efforts in this direction.

At RiSE Labs, we are working hard in this direction. All of the research before being introduced in an industrial scenario should run an empirical study about it showing the benefits and drawbacks. Sometimes, the students and researchers arguing a lot before seeing and understanding the benefits related to it. However, if we look for other examples in the real world, we can see that it is a standard procedure and is being handled seriously.

Last Sunday, Discovery Channel started the Shark Week, a special program with a bunch of information about sharks. I did not have the opportunity to watch the first day, but on Monday I did it. In the Day of the Shark program, researchers from several places around the world were performing important experiments involving sharks attacks. I will explain some of them here and maybe it can be useful:
  • What is the best approach in a shark attack? Stay together in a group (like in cartoons. Everybody praying together) or separate, each one trying to save yourself? The experiments identified that is better to stay together. The researchers used fake humans (toys) to perform the experiments and identified it. As in a software engineering experiment, you have many threats to deal (clothes for the toys, lack of signals that human have and sharks’ sensors can recognize, types of sharks, sharks habits….)
  • Can the sharks attack during the day and night in the same way? Yes, the experiment showed that sharks are not concerned about day or night.
  • Other important experiment was performed in order to design devices to avoid sharks attacks. The researchers analyzed the use of electrical devices and a type of gas and these ones were able to scare the sharks for a while. This way, you have some minutes to find a boat, reef, etc.
  • In another experiment, they designed a material to be resistant for a shark bite. The material was very resistant, however, it was destroyed by the sharks and they are planning a replication with some adjusts. Can you imagine yourself trying this solution before its experimentation?
The program was very impressive and interesting and the researchers presented important findings, for example, comparing the surfers’ dilemma: stay quiet or try to get out with the surfboard. Nevertheless, the experiment which was more impressive to me involved two researchers rounded by sharks [It can be disgusting to watch].

The goal was try to understand the sharks’ behavior. However, one of the researchers had part of his calf bitten by a shark. That was terrible. Although, after the accident, the same researcher replicated the experiment trying to better control the variables (just him in the water, cameras away from him – the guys were in a rock) and obtain more findings about it. In this replication, he did not have problem.

As you can see, empirical studies are critical to improve the science and the way to understand part of the world and its elements. With luck, in some situations in the software engineering area, we do not have to deal with human in critical conditions to run experiments.

In addition, I agree with David Parnas when he said that we cannot do experiments for everything that we are defining (or we just have to do it for the next years and forget our current activities), however, in some situations they are very important to present evidences about something.

Tuesday, July 29, 2008

3rd Workshop for Introducing Reuse in Enterprises (WIRE)

As posted before in this blog, during June 27-28 happened the 3rd Workshop for Introducing Reuse in Enterprises (WIRE) in Recife, Brazil. The workshop was composed of tutorials, panels, keynotes, and a lot of discussions (see the program here). Key researchers working on the topic were there, such as Jan Bosch (VP, Engineering Process Intuit Inc., EUA) and Paulo Merson (Software Engineering Institute, USA).
During these days, the main topic in discussion was Software Product Lines. Jan Bosch (here and here) talked a lot about his experience in Software Product Lines adoption in industry, the main problems, the benefits, the big challenges, his maturity model for SPL. At the end of the first day, professor Paulo Borba (UFPE) presented an industrial report about a big project involving several institutions and a Game Mobile company addressed to implement a product line in the game mobile domain. The results were very interesting and the opportunity to see something closer to us was very important for the attendance. Moreover, in the first day we had other two industrial reports regarding reuse in industry. In the first one, Eduardo Cruz talked about RiSE's solutions to improve the productivity in software development, covering process, methods and tools. In the second one, Eliseu Santos talked about the CPM Braxis experience in adopt reuse in it software development process, the main problems, the business decisions related to reuse practices, and the solutions adopted to face these challenges.

In the second day was held the Industrial Panel, where the attendance asked the guests about everything related to reuse adoption, software product lines challenges, legal aspects related to software reuse, non technical decision, etc. The panel was formed by Eduardo Almeida (RiSE), Eliseu Santos (CPM Braxis), Jan Bosh (Intuit Inc.), Paulo Adeodato (Neurotech) and was moderated by Eduardo Cruz (RiSE). After the painel, Paulo Merson started his talk about Service-Oriented Architecture for Quality-Oriented Architects, a practical information for the creation and evaluation of the architecture of an SOA system was discussed. Next, Paulo Merson discussed important questions to how to document product lines architectures using UML 2.0 and other solutions.
If you did not have the opportunity to be there, the workshop pictures and presentations are available in the WIRE web site. If you were there, it is time to remember more.

Sunday, July 27, 2008

An Enduring Legacy - Randy Pausch

Randy Pausch, the professor at Carnegie Mellon University who inspired countless students in the classroom and others worldwide through his highly acclaimed last lecture, has died of complications from pancreatic cancer. He was 47.

See all the information at Carnegie Mellon University website.

His legacy was incredible. I had the opportunity to learn last year about one of his project: Alice and it was brilliant. Take a time to watch his last lecture talk. It was amazing. From there, he said:
"We cannot change the cards we are dealt, just how we play the hand."--Randy Pausch


Bug Reports Scenarios

A big problem in software development nowadays in the practical and academic sense is the Bug Duplication Problem [see the first discussion about it here]. RiSE Labs is working in this direction too and I was wondering about possible solutions about it. I would like to discuss it with you here. The discussion is based on scenarios and some possibilities.

General Case: Don’t mess up my Repository
In this general case, the idea is that a tool can help you to avoid the mess. Maybe, we [user and tool?] can do it.

1st – Scenario – I cannot avoid it but I can help you with it (No intelligence both side)

The idea here is that the user sends a bug report for the environment without checking previous bug report and a tool cannot do anything to solve it. The tool just confirms it (Bug sent with success!). You can do some searches and see the reports after and remove it [if it is a duplicate].


What can we do now? Maybe, I can hire a person to check the new submissions every day or the tool can check it in order to try to identify similar bugs and send a message for someone (the bug report author and the manager) in a configurable way (here we have some intelligence in the tool side). Yes, we have to define how to identify similar bugs with some confidence (otherwise we can have the false/positive problem) and automate it.


2nd – Scenario – I can try to avoid it and maybe I can help you (Some intelligence in the tool side)

In this case the user sends a bug report – in the same way without checking a previous one – and the tool presents a message for the user that there is a possibility that it is a duplicate bug report. Yes, the tool can do some analyzes, compute some data and say that. The user has the opportunity to see the duplication and confirms in any case the manager receives a notification also about this possibility.


3rd – Scenario – Yes, we can have some intelligence in the process.

In this case the user has some templates and a well defined vocabulary to send the bug report. Maybe with a restricted one we can be more precise and the tool can have more accuracy in its prediction. Thus, the user based on some templates sends the reports and the tool can do some analyzes and predict a better possibility about the duplication and do the notifications.


4th – Scenario – Intelligent tool

Using a template or not, while the user is writing the bug report, the tool can perform some analysis and notify him with some confidence that that one can be a duplication and show why. Yes, the tool can combine semantic and textual information and say it for the user.

What do you think about it? Can you present more scenarios? Another possibility is a scenario as: My repository has duplicate reports [I cannot say that it is a mess or I can have problems with some companies] and what a tool can do in order to help me to be more productive? It is that I am starting a bug report and my tool cannot support the previous scenarios.

Sunday, July 13, 2008

RiSE’s Interviews: Episode 5 – Software Reuse with Dr. Jan Bosch

During the 3rd Workshop to Introduce Reuse in Enterprises (WIRE), I performed an interview with Jan Bosch, an outstanding researcher working in the software architecture and software product lines area. Nowadays, Jan is a VP from Intuit. You can listen it here.

I would like to thank Jan for the interview and patience because of the noise at the hotel during this podcast. Thus, I will publish the questions here in order to facilitate it.

1 - You started you career as a software engineering professor in the Netherlands and after that you moved to Nokia Research Center and nowadays you are working at Intuit in the U.S. I would like to know about these experiences and the work at the university and industry, your challenges as a professor and after that in the industry as a VP and how as the road to be there because it is an incredible career.

2 - You worked a lot with software architecture and software product lines. For you, what is the importance of the industry in the field?

3 - In the software product line area, we can see the community increasing. In the last year, during SPLC, we had more than 200 participants and too many from the industry. In your opinion what are the ingredients for this success in this conference?

4 - For you, is there difference between domain engineering and software product lines?

5 - You had/have many projects with the industry. What are the main problems to introduce software product lines in companies?
What are the risks and how to avoid them? Finally, how to define a road to start it in companies?

6 - Some companies believe that software product lines can be a good approach to obtain benefits related to time-to-market cost reduction, etc. However, we do not have many specific models to show the risks, the benefits, economic models etc. So, how to show for companies that a software product lines approach can be good?

7 - You had many industrial projects in the software product lines area. For you, what were the strong, weak points and main lessons learned?

8 - How to introduce software product lines in a software factory working with different domains?

9 - We can see in the reuse field, ideas like: modules, objects, components, software product lines, and others ones such as models, services, DSLs. For you what can be the next one?

10 - For you what is the state of the practice in the area and the directions for future research?

Thursday, July 10, 2008

Do you care about source code as you should?

Diomidis Spinelle has performed an interesting research about communication through source code in Open Source (OS) projects, and wen can summarize it by paraphrasing him: "... it confirms my belief that source code is the most important artifact of the software development process".

In such research, Spinelle computed the amount of source code that are not of compiler's interest from 30 OS projects. The results showed that for most of projects, more than 60% of source code is composed by code used to facilitate the source code understanding. The Figure bellow summarizes the results (click to expand).


I absolutely agree with him when he says that source code is the most important asset. And current, in OS world, Reuse is all about reusing source code. Thus, to understand what some source code does and how it does is very important to reuse it.

Efforts have been concentrated to defined architecture and design definition process (among other software aspects) to facilitate software reuse, however we are still looking at the source code before to reuse it. Thus, such efforts seem to forget that some programming language (actually source code) will be used to make the project runs.

Furthermore, most of projects analyzed by Spinelle is composed by successful open source projects; another reason to believe that source code is the most important artifact of software development ever.

So remember, if your project is going to be evolved on the future, the source code is probably the first place where developers will look at.

Monday, July 7, 2008

Product Line Stakeholders - Do you know them?


Managers, Architects, Analysists, Testers, Software Engineers, CEOs...

It was a discussion with RiSE members about who are the stakeholders in a product line approach.

The idea is to define who they are, what they do, and in which phase during the life cycle they participate in the process. We would like to know your opnion based on your research, practical experience or just ideas.

20th International Conference on Software Engineering and Knowledge Engineering (SEKE)

Last week, in San Francisco, California, I participated in the 20th International Conference on Software Engineering and Knowledge Engineering (SEKE). The conference this year had roughly 120 attendants from more than 30 countries. During these three days, the conference discussed several aspects related to software engineering and knowledge engineering. In special, important topics were related to neural networks in software development (prediction systems), clone detection, component selection approaches, and product lines.

In this conference, the main drawback was the number of parallel tracks (3-4). On the other hand, the good point was the number of accepted papers from Brazil. There, the RiSE Labs presented a paper entitled: A Systematic Process for Domain Engineering discussing how to create family of reusable assets in a specific domain.