Thursday, June 10, 2010

Towards a Reuse Reference Collection

The idea of software reuse has been around for more than four decades and the technology for searching and retrieving reusable software artefacts has certainly grown out of its infancy (cf. e.g. CodeConjurer). After about 30 years of basic research in which scientists often struggled to get their hands on meaningful numbers of reusable artifacts to evaluate their prototypes, the "open source revolution" has made software reuse a serious practical possibility. Millions of reusable files have become freely available and more sophisticated retrieval tools have emerged providing better ways of searching among them.

However, while the development of such systems has made considerable progress, their evaluation is still largely driven by proprietary approaches which are all too often neither comprehensive nor really comparable to one another. Consequently, it is also hard if not impossible to assess whether existing tools are really beneficial in a practical context.

Driven by these shortcomings, I submitted a paper ("Facilitating the Comparison of Software Retrieval Systems through a Reference Reuse Collection") to the SUITE workshop at ICSE in Cape Town where we discussed this challenge and agreed to start the creation of a reference reuse collection. Meanwhile the Universities of Irvine and Mannheim have started a first initiative and shared reusable material from their Sourcerer and merobase repositories (which comprise far more than 50,000 open source projects) with the scientific community.

Clearly, we appretiate if other researchers would join this initiative and share their data in order to have a broad basis for future comparisons of reuse tools. The next steps required for this undertaking are briefly outlined in the paper mentioned above, but as always: the devil is in the details and hence there are plenty of oportunities to contribute to this project.

1 comment:

Eduardo Almeida said...

Hi Oliver, good to hear from you. Yes, since Doug McIlroy's paper it is the challenger. The search and retrieval one of the main problems has several solutions. An old, but vey nice review, by Ali Mili et al. describe more than 50 solutions for the problem. Nowadays, we have more and more.

I agree with you in this direction. We need to define a test bed for research and development in this area. BUT, I think that as important as it is researchers to make the prototypes available for others one to perform experiments evaluating recall, precision, and so on, but usability issues also tried by software engineers.

In this sense, I think that RiSE can contribute and we will try. In addition, I would like to investigate more the qualitative side of this issue. Why we do not have a good code? what is missing? I have a draft paper about it and maybe we can discuss it here.