Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Exploiting semantics for filtering and searching knowledge in a software development context

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Software development is still considered a bottleneck for Small and Medium Enterprises (SMEs) in the advance of the Information Society. Usually, SMEs store and collect a large number of software textual documentation; these documents might be profitably used to facilitate them in using (and re-using) Software Engineering methods for systematically designing their applications, thus reducing software development cost. Specific and semantics textual filtering/search mechanisms, supporting the identification of adequate processes and practices for the enterprise needs, are fundamental in this context. To this aim, we present an automatic document retrieval method based on semantic similarity and Word Sense Disambiguation techniques. The proposal leverages on the strengths of both classic information retrieval and knowledge-based techniques, exploiting syntactical and semantic information provided by general and specific domain knowledge sources. For any SME, it is as easily and generally applicable as are the search techniques offered by common enterprise Content Management Systems. Our method was developed within the FACIT-SME European FP-7 project, whose aim is to facilitate the diffusion of Software Engineering methods and best practices among SMEs. As shown by a detailed experimental evaluation, the achieved effectiveness goes well beyond typical retrieval solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Roughly speaking, these techniques look for documents containing the same terms specified by the user query.

  2. http://wordnet.princeton.edu/.

  3. http://www.computer.org/sevocab.

  4. IDF is obtained by dividing the total number of documents by the number of documents containing the term and then by computing the logarithm of that ratio.

  5. Note that, Eq. (1) is not meant to be symmetric, instead it is conceived so to facilitate the ranking of documents \(D^y\) w.r.t. document \(D^x\). If case symmetry is needed, the summation in (1) can be extended to the terms of both documents.

  6. We recall that \(t_i\) is said to be a hypernym of \(t_j\) if there exists a \(t_i\)’s meaning that includes (i.e., is a hypernym) of a meaning of \(t_j\): for instance, “electronic device” is a hypernym of “computer”.

  7. In the following, we will denote the new sense-discerning techniques as “sense-aware”, while the original ones described in Sect. 3 will be denoted as “all-senses”.

  8. Other approaches to deal with composite terms, as the one described in [41], could be employed.

  9. In this example, we set for \(GSim\) a default threshold of 10 and for \(HSim\) a default threshold of 0.25.

  10. http://www.alfresco.com/. Other commercial tools offer analogous functionalities.

  11. Precision is defined as the fraction of retrieved documents which are known to be relevant, recall is the fraction of known relevant objects which were actually retrieved.

References

  1. Aetic (Spain), Agoria (Belgium), AssInform (Italy) et al (2008) In: Position paper towards a European software strategy presented to commissioner Viviane Reding

  2. Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing, Boston

    Google Scholar 

  3. Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Gottlob G, Walsh T (eds) IJCAI. Morgan Kaufmann, San Francisco, USA, pp 805–810

  4. Beneventano D, Bergamaschi S, Guerra F, Vincini M (2001) The momis approach to information integration. In: 3rd international conference on enterprise information systems (ICEIS), pp 194–198

  5. Beneventano D, Bergamaschi S, Guerra F, Vincini M (2005) Querying a super-peer in a schema-based super-peer network. In: DBISP2P, vol 4125 of Lecture Notes in Computer Science. Springer, Berlin, pp 13–25

  6. Bergamaschi S, Martoglia R, Sorrentino S (2012) A semantic method for searching knowledge in a software development context. In: SEBD, pp 115–122

  7. Binkley D, Lawrie D (2010) Maintenance and evolution: information retrieval applications. In: Laplante PA (ed) Encyclopedia of software engineering. Taylor & Francis, London, UK, pp 454–463

  8. Carnegie Mellon University Software Engineering Institute (2006) In: CMMI for development, version 1.2 (pdf)

  9. Constantopoulos P, Jarke M, Mylopoulos J, Vassiliou Y (1995) The software information base: a server for reuse. VLDB Journal 4:1–43

    Article  Google Scholar 

  10. DG INFSO Internal Reflection Group on Software Technologies, ITEA (April 2002)

  11. Diaconis P, Graham RL (1977) Spearman’s footrule as a measure of disarray. R Stat Soc Ser B 32(24):262–268

    MathSciNet  Google Scholar 

  12. DIS 9001:2000 Quality Management Systems: requirement (pdf) (1999) In: ISO TC176

  13. FACIT-SME Project (2010–2012) Facilitate IT-providing SMEs by operation-related models and methods. http://www.facit-sme.eu/

  14. Gall CS, Lukins SK, Etzkorn LH, Gholston S, Farrington P, Utley DR, Fortune J, Virani S (2008) Semantic software metrics computed from natural language design specifications. IET Software 2(1), 17–26. http://dblp.uni-trier.de/db/journals/iee/iet-s2.html

  15. Garg A, Goyal DP, Lather AS (2010) The influence of the best practices of information system development on software SMEs: a research scope. IJBIS 5(3), 268–290. http://dblp.uni-trier.de/db/journals/ijbis/ijbis5.html

  16. Girardi MR, Ibrahim B (1994) A similarity measure for retrieving software artifacts. In: SEKE. Knowledge Systems Institute, pp 478–485

  17. Grandi F, Mandreoli F, Martoglia R, Ronchetti E, Scalas MR, Tiberio P (2008) Ontology-based personalization of e-government services. In: Mourlas Constantinos, Germanakos Panagiotis (eds) Intelligent user interfaces: adaptation and personalization systems and technologies. IGI Global, Hershey, pp 167–187

    Google Scholar 

  18. Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910. doi:10.1109/TSE.2005.112

    Article  Google Scholar 

  19. Happel HJ, Korthaus A, Seedorf S, Tomczyk P (2006) Kontor: an ontology-enabled approach to software reuse. In: Zhang K, Spanoudakis G, Visaggio G (eds) SEKE, pp 349–354

  20. Happel HJ, Seedorf S (2006) Applications of ontologies in software engineering. Engineering, pp 1–14

  21. Happel H, Maalej W, Stojanovic L (2008) Team: towards a software engineering semantic web. In: Proceedings of the 2008 international workshop on cooperative and human aspects of software engineering, CHASE 2008. Leipzig, Germany, Tuesday, May 13, pp 57–60

  22. InnoSME Project (2008) InnoSME Project, a support action of the ICT program. http://cordis.europa.eu/news/rcn/28963_en.html

  23. Kiefer C, Bernstein A, Tappolet J (2007) Mining software repositories with iSPAROL and a software evolution ontology. In: MSR. IEEE Computer Society, p 10

  24. Leacock C, Chodorow M (1998) Combining local context and wordNet similarity for word sense identification, chapter 11. The MIT Press, Cambridge

    Google Scholar 

  25. Lethbridge TC, Singer J, Forward A (2003) How software engineers use documentation: the state of the practice. IEEE Softw 20(6):35–39. doi:10.1109/MS.2003.1241364

    Article  Google Scholar 

  26. Leung H, Liao L, Qu Y (2005) A software process ontology and its application. In: The 4th international semantic web conference

  27. Mandreoli F, Martoglia R (2011) Knowledge-based sense disambiguation (almost) for all structures. Inf Syst 36(2):406–430

    Article  Google Scholar 

  28. Mandreoli F, Martoglia R, Penzo W, Sassatelli S (2009) Data-sharing p2p networks with semantic approximation capabilities. IEEE Internet Comput 13(5):60–70

    Article  Google Scholar 

  29. Mandreoli F, Martoglia R, Ronchetti E (2005) Versatile structural disambiguation for semantic-aware applications. In: Herzog O, Schek HJ, Fuhr N, Chowdhury A, Teiken W (eds) CIKM. ACM, pp 209–216

  30. Martoglia R (2011) Facilitate IT-providing SMEs in software development: a semantic helper for filtering and searching knowledge. In: SEKE. pp 130–136

  31. Miller GA (1994) Wordnet: a lexical database for English. In: HLT. Morgan Kaufmann, Burlington

  32. Mylopoulos J, Borgida A, Jarke M, Koubarakis M (1990) Telos: representing knowledge about information systems. ACM Trans Inf Syst 8(4):325–362. doi:10.1145/102675.102676

    Article  Google Scholar 

  33. Navigli R (2009) Word sense disambiguation: a survey. vol 41. ACM Comput Surv, New York, USA, pp 1–69

  34. ORM Architecture and Engineering Models (2010) In: Jaekel FW (eds) FP7-SME FACIT-SME (FP7-243695), deliverable

  35. OSES Architecture and Component Specification (2010) In: Benguria G (eds) FP7-SME FACIT-SME (FP7-243695), deliverable

  36. Palmer M, Dang HT, Fellbaum C (2007) Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat Lang Eng 13(2):137–163

    Google Scholar 

  37. Po L, Sorrentino S (2011) Automatic generation of probabilistic relationships for improving schema matching. Inf Syst 36(2):192–208

    Article  Google Scholar 

  38. Poshyvanyk D, Marcus A (2006) The conceptual coupling metrics for object-oriented systems. In: Proceedings of the 22nd IEEE international conference on software maintenance, ICSM ’06. IEEE Computer Society, Washington, DC, USA, pp 469–478. doi:10.1109/ICSM.2006.67

  39. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  40. Shepherd D, Fry ZP, Hill E, Pollock LL, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: Barry BM, de Moor O (eds) AOSD, vol 208 of ACM international conference proceeding series. ACM, pp 212–224

  41. Sorrentino S, Bergamaschi S, Gawinecki M (2011) NORMS: an automatic tool to perform schema label normalization. In: Abiteboul S, Böhm K, Koch C, Tan K-L (eds) ICDE. IEEE Computer Society, Washington, pp 1344–1347

    Google Scholar 

  42. Soydan K (2006) An owl ontology for representing the cmmi-sw model. In: 2nd international workshop on semantic web enabled software engineering (SWESE 2006) at ISWC 06. http://km.aifb.uni-karlsruhe.de/ws/swese2006/final/soydan-full.pdf

  43. Sridhara G, Hill E, Pollock L, Vijay-Shanker K (2008) Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of the 2008 the 16th IEEE international conference on program comprehension. ICPC ’08, IEEE Computer Society, Washington, DC, USA, pp 123–132. doi:10.1109/ICPC.2008.18

  44. Standish Group (2006) In: Chaos report 2006

  45. Trakarnviroj A, Prompoon N (2012) A storage and retrieval of requirement model and analysis model for software product line. In: Proceedings of the international multi conference of engineers

  46. Udomchaiporn A, Prompoon N, Kanongchaiyos P (2006) Software requirements retrieval using use case terms and structure similarity computation. In: Proceedings of the XIII Asia Pacific software engineering conference, APSEC ’06. IEEE Computer Society, Washington, DC, USA, pp 113–120. doi:10.1109/APSEC.2006.53

  47. Varghese M, Systems C (2012) Content strategy for small and medium enterprises (SMEs). Best Practices 14(5)

  48. Witte R, Zhang Y, Rilling J (2007) Empowering software maintainers with semantic web technologies. In: Proceedings of the 4th European conference on the semantic web: research and applications, ESWC ’07. Springer, Berlin, pp 37–52. doi:10.1007/978-3-540-72667-8_5

Download references

Acknowledgments

The research leading to these results has received funding from the European Community’s Seventh Framework Programme managed by REA Research Executive Agency (http://ec.europa.eu/research/rea) ([FP7/2007-2013] [FP7/2007–2011]) under Grant agreement n. 243695. Our sincere thanks to Domenico Beneventano (UniMoRe), Gorka Benguria (ESI), Frank-Walter Jaekel (Fraunhofer IPK) and to the other project partners for their support to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Martoglia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bergamaschi, S., Martoglia, R. & Sorrentino, S. Exploiting semantics for filtering and searching knowledge in a software development context. Knowl Inf Syst 45, 295–318 (2015). https://doi.org/10.1007/s10115-014-0796-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0796-1

Keywords

Navigation