Abstract
Software development is still considered a bottleneck for Small and Medium Enterprises (SMEs) in the advance of the Information Society. Usually, SMEs store and collect a large number of software textual documentation; these documents might be profitably used to facilitate them in using (and re-using) Software Engineering methods for systematically designing their applications, thus reducing software development cost. Specific and semantics textual filtering/search mechanisms, supporting the identification of adequate processes and practices for the enterprise needs, are fundamental in this context. To this aim, we present an automatic document retrieval method based on semantic similarity and Word Sense Disambiguation techniques. The proposal leverages on the strengths of both classic information retrieval and knowledge-based techniques, exploiting syntactical and semantic information provided by general and specific domain knowledge sources. For any SME, it is as easily and generally applicable as are the search techniques offered by common enterprise Content Management Systems. Our method was developed within the FACIT-SME European FP-7 project, whose aim is to facilitate the diffusion of Software Engineering methods and best practices among SMEs. As shown by a detailed experimental evaluation, the achieved effectiveness goes well beyond typical retrieval solutions.
Similar content being viewed by others
Notes
Roughly speaking, these techniques look for documents containing the same terms specified by the user query.
IDF is obtained by dividing the total number of documents by the number of documents containing the term and then by computing the logarithm of that ratio.
We recall that \(t_i\) is said to be a hypernym of \(t_j\) if there exists a \(t_i\)’s meaning that includes (i.e., is a hypernym) of a meaning of \(t_j\): for instance, “electronic device” is a hypernym of “computer”.
In the following, we will denote the new sense-discerning techniques as “sense-aware”, while the original ones described in Sect. 3 will be denoted as “all-senses”.
Other approaches to deal with composite terms, as the one described in [41], could be employed.
In this example, we set for \(GSim\) a default threshold of 10 and for \(HSim\) a default threshold of 0.25.
http://www.alfresco.com/. Other commercial tools offer analogous functionalities.
Precision is defined as the fraction of retrieved documents which are known to be relevant, recall is the fraction of known relevant objects which were actually retrieved.
References
Aetic (Spain), Agoria (Belgium), AssInform (Italy) et al (2008) In: Position paper towards a European software strategy presented to commissioner Viviane Reding
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing, Boston
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Gottlob G, Walsh T (eds) IJCAI. Morgan Kaufmann, San Francisco, USA, pp 805–810
Beneventano D, Bergamaschi S, Guerra F, Vincini M (2001) The momis approach to information integration. In: 3rd international conference on enterprise information systems (ICEIS), pp 194–198
Beneventano D, Bergamaschi S, Guerra F, Vincini M (2005) Querying a super-peer in a schema-based super-peer network. In: DBISP2P, vol 4125 of Lecture Notes in Computer Science. Springer, Berlin, pp 13–25
Bergamaschi S, Martoglia R, Sorrentino S (2012) A semantic method for searching knowledge in a software development context. In: SEBD, pp 115–122
Binkley D, Lawrie D (2010) Maintenance and evolution: information retrieval applications. In: Laplante PA (ed) Encyclopedia of software engineering. Taylor & Francis, London, UK, pp 454–463
Carnegie Mellon University Software Engineering Institute (2006) In: CMMI for development, version 1.2 (pdf)
Constantopoulos P, Jarke M, Mylopoulos J, Vassiliou Y (1995) The software information base: a server for reuse. VLDB Journal 4:1–43
DG INFSO Internal Reflection Group on Software Technologies, ITEA (April 2002)
Diaconis P, Graham RL (1977) Spearman’s footrule as a measure of disarray. R Stat Soc Ser B 32(24):262–268
DIS 9001:2000 Quality Management Systems: requirement (pdf) (1999) In: ISO TC176
FACIT-SME Project (2010–2012) Facilitate IT-providing SMEs by operation-related models and methods. http://www.facit-sme.eu/
Gall CS, Lukins SK, Etzkorn LH, Gholston S, Farrington P, Utley DR, Fortune J, Virani S (2008) Semantic software metrics computed from natural language design specifications. IET Software 2(1), 17–26. http://dblp.uni-trier.de/db/journals/iee/iet-s2.html
Garg A, Goyal DP, Lather AS (2010) The influence of the best practices of information system development on software SMEs: a research scope. IJBIS 5(3), 268–290. http://dblp.uni-trier.de/db/journals/ijbis/ijbis5.html
Girardi MR, Ibrahim B (1994) A similarity measure for retrieving software artifacts. In: SEKE. Knowledge Systems Institute, pp 478–485
Grandi F, Mandreoli F, Martoglia R, Ronchetti E, Scalas MR, Tiberio P (2008) Ontology-based personalization of e-government services. In: Mourlas Constantinos, Germanakos Panagiotis (eds) Intelligent user interfaces: adaptation and personalization systems and technologies. IGI Global, Hershey, pp 167–187
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910. doi:10.1109/TSE.2005.112
Happel HJ, Korthaus A, Seedorf S, Tomczyk P (2006) Kontor: an ontology-enabled approach to software reuse. In: Zhang K, Spanoudakis G, Visaggio G (eds) SEKE, pp 349–354
Happel HJ, Seedorf S (2006) Applications of ontologies in software engineering. Engineering, pp 1–14
Happel H, Maalej W, Stojanovic L (2008) Team: towards a software engineering semantic web. In: Proceedings of the 2008 international workshop on cooperative and human aspects of software engineering, CHASE 2008. Leipzig, Germany, Tuesday, May 13, pp 57–60
InnoSME Project (2008) InnoSME Project, a support action of the ICT program. http://cordis.europa.eu/news/rcn/28963_en.html
Kiefer C, Bernstein A, Tappolet J (2007) Mining software repositories with iSPAROL and a software evolution ontology. In: MSR. IEEE Computer Society, p 10
Leacock C, Chodorow M (1998) Combining local context and wordNet similarity for word sense identification, chapter 11. The MIT Press, Cambridge
Lethbridge TC, Singer J, Forward A (2003) How software engineers use documentation: the state of the practice. IEEE Softw 20(6):35–39. doi:10.1109/MS.2003.1241364
Leung H, Liao L, Qu Y (2005) A software process ontology and its application. In: The 4th international semantic web conference
Mandreoli F, Martoglia R (2011) Knowledge-based sense disambiguation (almost) for all structures. Inf Syst 36(2):406–430
Mandreoli F, Martoglia R, Penzo W, Sassatelli S (2009) Data-sharing p2p networks with semantic approximation capabilities. IEEE Internet Comput 13(5):60–70
Mandreoli F, Martoglia R, Ronchetti E (2005) Versatile structural disambiguation for semantic-aware applications. In: Herzog O, Schek HJ, Fuhr N, Chowdhury A, Teiken W (eds) CIKM. ACM, pp 209–216
Martoglia R (2011) Facilitate IT-providing SMEs in software development: a semantic helper for filtering and searching knowledge. In: SEKE. pp 130–136
Miller GA (1994) Wordnet: a lexical database for English. In: HLT. Morgan Kaufmann, Burlington
Mylopoulos J, Borgida A, Jarke M, Koubarakis M (1990) Telos: representing knowledge about information systems. ACM Trans Inf Syst 8(4):325–362. doi:10.1145/102675.102676
Navigli R (2009) Word sense disambiguation: a survey. vol 41. ACM Comput Surv, New York, USA, pp 1–69
ORM Architecture and Engineering Models (2010) In: Jaekel FW (eds) FP7-SME FACIT-SME (FP7-243695), deliverable
OSES Architecture and Component Specification (2010) In: Benguria G (eds) FP7-SME FACIT-SME (FP7-243695), deliverable
Palmer M, Dang HT, Fellbaum C (2007) Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat Lang Eng 13(2):137–163
Po L, Sorrentino S (2011) Automatic generation of probabilistic relationships for improving schema matching. Inf Syst 36(2):192–208
Poshyvanyk D, Marcus A (2006) The conceptual coupling metrics for object-oriented systems. In: Proceedings of the 22nd IEEE international conference on software maintenance, ICSM ’06. IEEE Computer Society, Washington, DC, USA, pp 469–478. doi:10.1109/ICSM.2006.67
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Shepherd D, Fry ZP, Hill E, Pollock LL, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Barry BM, de Moor O (eds) AOSD, vol 208 of ACM international conference proceeding series. ACM, pp 212–224
Sorrentino S, Bergamaschi S, Gawinecki M (2011) NORMS: an automatic tool to perform schema label normalization. In: Abiteboul S, Böhm K, Koch C, Tan K-L (eds) ICDE. IEEE Computer Society, Washington, pp 1344–1347
Soydan K (2006) An owl ontology for representing the cmmi-sw model. In: 2nd international workshop on semantic web enabled software engineering (SWESE 2006) at ISWC 06. http://km.aifb.uni-karlsruhe.de/ws/swese2006/final/soydan-full.pdf
Sridhara G, Hill E, Pollock L, Vijay-Shanker K (2008) Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of the 2008 the 16th IEEE international conference on program comprehension. ICPC ’08, IEEE Computer Society, Washington, DC, USA, pp 123–132. doi:10.1109/ICPC.2008.18
Standish Group (2006) In: Chaos report 2006
Trakarnviroj A, Prompoon N (2012) A storage and retrieval of requirement model and analysis model for software product line. In: Proceedings of the international multi conference of engineers
Udomchaiporn A, Prompoon N, Kanongchaiyos P (2006) Software requirements retrieval using use case terms and structure similarity computation. In: Proceedings of the XIII Asia Pacific software engineering conference, APSEC ’06. IEEE Computer Society, Washington, DC, USA, pp 113–120. doi:10.1109/APSEC.2006.53
Varghese M, Systems C (2012) Content strategy for small and medium enterprises (SMEs). Best Practices 14(5)
Witte R, Zhang Y, Rilling J (2007) Empowering software maintainers with semantic web technologies. In: Proceedings of the 4th European conference on the semantic web: research and applications, ESWC ’07. Springer, Berlin, pp 37–52. doi:10.1007/978-3-540-72667-8_5
Acknowledgments
The research leading to these results has received funding from the European Community’s Seventh Framework Programme managed by REA Research Executive Agency (http://ec.europa.eu/research/rea) ([FP7/2007-2013] [FP7/2007–2011]) under Grant agreement n. 243695. Our sincere thanks to Domenico Beneventano (UniMoRe), Gorka Benguria (ESI), Frank-Walter Jaekel (Fraunhofer IPK) and to the other project partners for their support to this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bergamaschi, S., Martoglia, R. & Sorrentino, S. Exploiting semantics for filtering and searching knowledge in a software development context. Knowl Inf Syst 45, 295–318 (2015). https://doi.org/10.1007/s10115-014-0796-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0796-1