World of Reuse: data mining

Wednesday, September 2, 2009

35th Euromicro Conference on Software Engineering and Advanced Applications

On last week, between 27-29 August, it was run the 12th Euromicro Conference on Digital System Design (DSD) and the 35th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2009.

Both conferences took place at the Cultural and Conference Center, in the University of Patras. The event put togther research from various places of the world. All of them interested in discussing new ideas, such work in progress, and concluded work. The RiSE group was represented by Yguaratã Cerqueira Cavalcanti, in the SEAA 2009 sessions, where he presented three works from the group, as follows:

1 - Martins, A. C; Garcia, V. C.; Almeida, E. S.; Meira, S. R. L. Suggesting Software Components for Reuse in Search Engines Using Discovered Knowledge Techniques, 35th IEEE EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), Service and Component Based Software Engineering (SCBSE) Track, Patras, Greece, 2009.

2 - Neiva, D. F. S; Almeida, E. S.; Meira, S. R. L. An Experimental Study on Requirements Engineering for Software Product Lines, 35th IEEE EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), Service and Component Based Software Engineering (SCBSE) Track, Short Paper, Patras, Greece, 2009.

3 - Silva, F. R. C; Almeida, E. S.; Meira, S. R. L. A Component Testing Approach Supported by a CASE Tool, 35th IEEE EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), Service and Component Based Software Engineering (SCBSE) Track, Short Paper, Patras, Greece, 2009.

The paper "A Component Testing Approach Supported by a CASE Tool" was presented in the SCBSE: Component-based Systems Correctness and Test session. In conjunction with this work, several other articles were presented , showing really interesting approaches.

The paper "Suggesting Software Components for Reuse in Search Engines Using Discovered Knowledge Techniques" was presented in the session SCBSE: Experiences and Applications. And th paper "An Experimental Study on Requirements Engineering for Software Product Lines" was showed in the session SPPI: Empirical Approaches.

All the work presented were very interesting. People showed a lot of new ideas to solve the most well know problems regarding SCBSE, and the importance of the empirical approaches session should be emphasized, since there is a lack of well made empirical validation in most of CS work.

Oh, we had also a very amazing gala dinner organized by the Euromicro committee, in front of a very beautiful beach. There we could taste really nice Greek food, and it was also possible to see some Greek dance and to listen Greek music. Really nice!!!

The next Euromicro will take place on Lilly, France. I hope to see you there.

Saturday, September 29, 2007

Using data mining to improve search engines

I'll present some ideas related with the use of data mining to improve search engines. Thus, the first question is: Where are your data that you will extract the knowledge? The focus in this discussion is use historic data as log files like reference of a real use of search engine. In other hand we need to select the techniques that we use to extract the knowledge hidden of the raw data. In the literature, there are several techniques as classification, clustering, sequence analysis, association rules; among others [see 1].
The direction selected in this discussion is using the association rules [see 2] to analyze the relations between the data stored in the log file. These relations are used to aid the users through of suggestions like queries or options to download.
The paper selected for the RiSE`s Discussion was “Using Association Rules to Discover Search Engines Related Queries” that shows the use of association rules to extract related queries from a log generated by a web site.
The first question was related with the transformation of log files in the term called “user sessions”, why do it? It is important because the algorithms used to extract association rules needs that the records are grouped in a transactions set. However, when log files are used, these groups are not perfectly separated. The classic situation of association rules is the Market Basket Analysis [1] that is associated with the organization’s products in a super market. In this case, the sessions are defined by the consumer ticket. In this ticket, the products bought are perfectly described and separated of other consumers. However, in a log file the records are not sorted and it is necessary to separate these lines in transactions set. Each transaction will contain several log lines that represent the use of the users. In the paper the IP address and a window time was used. My work uses the session number id to identify the users during a time window.
The quality of recommendations was cited too. This quality is measured using metrics like support, confidence. However the parameter used is specific for each situation.
This approach is common used in the web paradigm, but this idea can be used to improve component search engines using recommendations to downloads, queries and any other information that is stored in log files.
I have some critiques about this paper like the details of data mining process; several algorithms can be used, what was used? Other question is related with the experiment, I think that the choice of a specific domain to extract the rules could help the validation of suggestions using a specialist in this domain.

Wednesday, September 2, 2009

35th Euromicro Conference on Software Engineering and Advanced Applications

Saturday, September 29, 2007

Using data mining to improve search engines

What language is better for software reuse?

Join The Community

Blog Archive

Contributors

Labels

License

Subscribe in a reader