Software development is still considered a bottleneck for Small and Medium Enterprises (SMEs) in the advance of the Information Society. Usually, SMEs store and collect a large number of software textual documentation; these documents might be profitably used to facilitate them in using (and re-using) Software Engineering methods for systematically designing their applications, thus reducing software development cost. Specific and semantics textual filtering/search mechanisms, supporting the identification of adequate processes and practices for the enterprise needs, are fundamental in this context. To this aim, we present an automatic document retrieval method based on semantic similarity and Word Sense Disambiguation techniques. The proposal leverages on the strengths of both classic information retrieval and knowledge-based techniques, exploiting syntactical and semantic information provided by general and specific domain knowledge sources. For any SME, it is as easily and generally applicable as are the search techniques offered by common enterprise Content Management Systems. Our method was developed within the FACIT-SME European FP-7 project, whose aim is to facilitate the diffusion of Software Engineering methods and best practices among SMEs. As shown by a detailed experimental evaluation, the achieved effectiveness goes well beyond typical retrieval solutions.
Roughly speaking, these techniques look for documents containing the same terms specified by the user query.
IDF is obtained by dividing the total number of documents by the number of documents containing the term and then by computing the logarithm of that ratio.
We recall that \(t_i\) is said to be a hypernym of \(t_j\) if there exists a \(t_i\)’s meaning that includes (i.e., is a hypernym) of a meaning of \(t_j\): for instance, “electronic device” is a hypernym of “computer”.
In the following, we will denote the new sense-discerning techniques as “sense-aware”, while the original ones described in Sect. 3 will be denoted as “all-senses”.
Other approaches to deal with composite terms, as the one described in [41], could be employed.
In this example, we set for \(GSim\) a default threshold of 10 and for \(HSim\) a default threshold of 0.25.
http://www.alfresco.com/. Other commercial tools offer analogous functionalities.
Precision is defined as the fraction of retrieved documents which are known to be relevant, recall is the fraction of known relevant objects which were actually retrieved.
The research leading to these results has received funding from the European Community’s Seventh Framework Programme managed by REA Research Executive Agency (http://ec.europa.eu/research/rea) ([FP7/2007-2013] [FP7/2007–2011]) under Grant agreement n. 243695. Our sincere thanks to Domenico Beneventano (UniMoRe), Gorka Benguria (ESI), Frank-Walter Jaekel (Fraunhofer IPK) and to the other project partners for their support to this research.
Bergamaschi, S., Martoglia, R. & Sorrentino, S. Exploiting semantics for filtering and searching knowledge in a software development context. Knowl Inf Syst 45, 295–318 (2015). https://doi.org/10.1007/s10115-014-0796-1
