Nothing Special   »   [go: up one dir, main page]

skip to main content
article

On the role of semantics in automated requirements tracing

Published: 01 September 2015 Publication History

Abstract

In this paper, we investigate the potential benefits of utilizing natural language semantics in automated traceability link retrieval. In particular, we evaluate the performance of a wide spectrum of semantically enabled information retrieval methods in capturing and presenting requirements traceability links in software systems. Our objectives are to gain more operational insights into these methods and to provide practical guidelines for the design and development of effective requirements tracing and management tools. To achieve our research objectives, we conduct an experimental analysis using three datasets from various application domains. Results show that considering more semantic relations in traceability link retrieval does not necessarily lead to higher quality results. Instead, a more focused semantic support, that targets specific semantic relations, is expected to have a greater impact on the overall performance of tracing tools. In addition, our analysis shows that explicit semantic methods, that exploit local or domain-specific sources of knowledge, often achieve a more satisfactory performance than latent methods, or methods that derive semantics from external or general-purpose knowledge sources.

References

[1]
Abebe S, Tonella P (2010) Natural language parsing of program element names for concept extraction. In: International conference on program comprehension, pp 156---159
[2]
Ahn D, Jijkoun V, Mishne G, Mller K, de Rijke M, Schlobach S (2004) Using Wikipedia at the TREC QA track. In: Interactive poster and demonstration sessions, pp 73---76
[3]
Andrzejewski D, Mulhern A, Ben BL, Zhu X (2007) Statistical debugging using latent topic models. In: European conference on machine learning, pp 6---17
[4]
Anquetil N, Fourrier C, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: Working conference on reverse engineering, pp 235---255
[5]
Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Conference of the centre for advanced studies on collaborative research, pp 4---14
[6]
Antoniol G, Caprile B, Potrich A, Tonella P (2000) Design-code traceability for object-oriented systems. Ann Softw Eng 9(1---4):35---58
[7]
Aslam J, Yilmaz E, Pavlu V (2005) A geometric interpretation of r-precision and its correlation with average precision. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 573---574
[8]
Asuncion H, Asuncion A, Taylor R (2010) Software traceability with topic modeling. In: International conference on software engineering, pp 95---104
[9]
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading, MA
[10]
Ben Charrada E, Koziolek A, Glinz M (2012) Identifying outdated requirements based on source code changes. In: International requirements engineering conference, pp 61 ---70
[11]
Biggers L, Bocovich C, Capshaw R, Eddy B, Etzkorn L, Kraft N (2012) Configuring latent Dirichlet allocation based feature location. Empir Softw Eng 1---36
[12]
Biggerstaff T, Mitbander B, Webster D (1994) Program understanding and the concept assignment problem. Commun ACM 37(5):72---82
[13]
Binkley D, Lawrie D (2010) Information retrieval applications in software development. In: Computer technologies and information sciences, chap 37
[14]
Binkley D, Lawrie D (2011) Maintenance and evolution: information retrieval applications. In: Laplante PA (ed) Encyclopedia of software engineering, pp 454---463
[15]
Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993---1022
[16]
Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13---47
[17]
Capobianco G, De Lucia A, Oliveto R, Panichella A, Panichella S (2013) Improving IR-based traceability recovery via noun-based indexing of software artifacts. J Softw Maint Evolut Res Pract 25(7):743---762
[18]
Chen P, Lin S (2010) Automatic keyword prediction using Google similarity distance. Expert Syst Appl 37(3):1928---1938
[19]
Chowdhury A, McCabe M (1998) Improving information retrieval systems using part of speech tagging. Technical report, ISR, Institute for Systems Research
[20]
Cilibrasi R, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370---383
[21]
Cleland-Huang J, Heimdahl M, Huffman-Hayes J, Lutz R, Mäder P (2012) Trace queries for safety requirements in high assurance systems. In: International conference on requirements engineering: foundation for software quality, pp 179---193
[22]
Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: International conference on requirements engineering, pp 135---144
[23]
Cleland-Huang J, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27---35
[24]
De Lucia A, Fasano F, Oliveto R, Tortora G (2007) Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol 16(4):13---50
[25]
De Lucia A, Oliveto R, Zurolo F., Di Penta M (2006) Improving comprehensibility of source code via traceability information: a controlled experiment. In: International conference on program comprehension, pp 317---326
[26]
Dean A, Voss D (1999) Design and analysis of experiments. Springer, Berlin
[27]
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391---407
[28]
Deißenböck F, Pizka M (2005) Concise and consistent naming. In: International workshop on program comprehension, pp 97---106
[29]
Dekhtyar A, Huffman-Hayes J, Antoniol G (2007) Benchmarks for traceability? In: International symposium on grand challenges in traceability
[30]
Demmel J, Kahan W (1990) Accurate singular values of bidiagonal matrices. J Sci Stat Comput 11(5):873---912
[31]
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1---30
[32]
Dumais S (1993) Lsi meets trec: a status report. In: Harman D (ed) The first text retrieval conference (TREC1), National Institute of Standards and Technology Special Publication, pp 137---152
[33]
Etzkorn L, Davis C (1995) An approach to object-oriented program understanding. In: IEEE workshop on program comprehension, pp 14---15
[34]
Evans D, Zhai C (1996) Noun-phrase analysis in unrestricted text for information retrieval. In: Annual meeting on association for computational linguistics, pp 17---24
[35]
Falleri J, Huchard M, Lafourcade M, Nebut C, Prince V, Dao M (2010) Automatic extraction of a wordnet-like identifier network from software. In: International conference on program comprehension, pp 4---13
[36]
Fang J, Guo L (2011) Calculation of relatedness by using search results. In: International workshop on intelligent systems and applications, pp 1---4
[37]
Feilkas M, Ratiu D, Jurgens E (2009) The loss of architectural knowledge during system evolution: an industrial case study. In: International conference on program comprehension, pp 188---197
[38]
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA
[39]
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116---131
[40]
Furnas G, Deerwester S, Dumais S, Landauer T, Xarshman R, Streeter L, Lochbaum K (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 465---480
[41]
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: International joint conference on artificial intelligence, pp 1606---1611
[42]
Gibiec M, Czauderna A, Cleland-Huang J (2010) Towards mining replacement queries for hard-to-retrieve traces. In: International conference on automated software engineering, pp 245---254
[43]
Gligorov R, Aleksovski Z, Kate W, Harmelen F (2007) Using Google distance to weight approximate ontology matches. In: International conference on world wide web, pp 767---776
[44]
Gotel O, Finkelstein A (1994) An analysis of the requirements traceability problem. In: International conference on requirements engineering, pp 94---101
[45]
Gotel O, Finkelstein A (1995) Contribution structures. In: International symposium on requirements engineering, pp 100---107
[46]
Gotel O, Morris S (2011) Out of the labyrinth: leveraging other disciplines for requirements traceability. In: IEEE international requirements engineering conference, pp 121---130
[47]
Grant S, Cordy J (2010) Estimating the optimal number of latent concepts in source code analysis. In: International working conference on source code analysis and manipulation, pp 65---74
[48]
Griffiths T (2004) Steyvers M finding scientific topics. In: The National Academy of Sciences, pp 5228---5235
[49]
Grzywaczewski A, Iqbal R (2012) Task-specific information retrieval systems for software engineers. J Comput Syst Sci 78(4):1204---1218
[50]
Guo Y, Yang M, Wang J, Yang P, Li F (2009) An ontology-based approach for traceability recovery. In: International symposium on knowledge acquisition and modeling, pp 160---163
[51]
Han E, Karypis G (2000) Centroid-based document classification: Analysis and experimental results. In: European conference on principles of data mining and knowledge discovery, pp 424---431
[52]
Hata M, Homae F, Hagiwara H (2009) Semantic relatedness between words in each individual brain: an event-related potential study. Neurosci Lett 501(2):72---77
[53]
Hazen T (2010) Direct and latent modeling techniques for computing spoken document similarity. In: Spoken language technology workshop, pp 366---371
[54]
Hindle A, Bird C, Zimmermann T, Nagappan N (2012) Relating requirements to implementation via topic analysis: do topics extracted from requirements make sense to managers and developers? In: International conference on software maintenance, pp 243---252
[55]
Huffman-Hayes J, Dekhtyar A, Osborne J (2003) Improving requirements tracing via information retrieval. In: International conference on requirements engineering, pp 138---147
[56]
Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4---19
[57]
Jurafsky D, Martin J (2000) Speech and language processing. Prentice Hall, Englewood Cliffs NJ
[58]
Kit L, Man C, Baniassad E (2006) On finding duplication and near-duplication in large software systems. In: Annual ACM SIGPLAN conference on object-oriented programming systems, languages, and applications, pp 383---396
[59]
von Knethen A (2002) Automatic change support based on a trace model. In: International workshop on traceability in emerging forms of software engineering
[60]
Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: Identifying topics in source code. Inf Softw Technol 49(3):230---243
[61]
Lawrie D, Feild H, Binkley D (2007) Extracting meaning from abbreviated identifiers. In: International working conference on source code analysis and manipulation, pp 213---222
[62]
Lehman M (1984) On understanding laws, evolution, and conservation in the large-program life cycle. J Syst Softw 1(3):213---221
[63]
Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P (2007) Mining concepts from code with probabilistic topic models. In: International conference on automated software engineering, pp 461---464
[64]
Lioma C, Blanco R (2009) Part of speech based term weighting for information retrieval. In: Advances in information retrieval, pp 412---423
[65]
Liu Y, Poshyvanyk D, Ferenc R, Gyimóthy T, Chrisochoides N (2009) Modelling class cohesion as mixtures of latent topics. In: International conference on software maintenance, pp 233---242
[66]
Lormans M (2006) Can lsi help reconstructing requirements traceability in design and test. In: European conference on software maintenance and reengineering, pp 47---56
[67]
Luisa M, Mariangela F, Pierluigi I (2004) Market research for requirements analysis using linguistic tools. Requir Eng 9(1):40---56
[68]
Mahmoud A, Niu N (2010) Using semantics-enabled information retrieval in requirements tracing: An ongoing experimental investigation. In: Annual computer software and applications conference, pp 246---247
[69]
Mahmoud A, Niu N (2011) Source code indexing for automated tracing. In: International workshop on traceability in emerging forms of software engineering, pp 3---9
[70]
Mahmoud A, Niu N, Xu S (2012) A semantic relatedness approach for traceability link recovery. In: International conference on program comprehension, pp 183---192
[71]
Maletic J, Marcus A (2000) Using latent semantic analysis to identify similarities in source code to support program understanding. In: International conference on tools with artificial intelligence, pp 46---53
[72]
Marcus A, Maletic J (2001) Identification of high-level concept clones in source code. In: International conference on automated software engineering, pp 107---114
[73]
Marcus A, Maletic J (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: International conference on software engineering, pp 125---135
[74]
Maskeri G, Sarkar S, Heafield K (2008) Mining business topics in source code using Latent Dirichlet allocation. In: ISEC, pp 113---120
[75]
Meneely A, Smith B, Williams L (2012) iTrust electronic health care system: a case study, chap. software and systems traceability. Springer, Berlin
[76]
Milne D, Witten I (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: AAAI workshop on wikipedia and artificial intelligence, pp 25---30
[77]
Nallapati R, Cohen W, Lafferty J (2007) Parallelized variational EM for Latent Dirichlet allocation: an experimental evaluation of speed and scalability. In: International conference on data mining workshops, pp 349---354
[78]
Niu N, Easterbrook S (2008) Extracting and modeling product line functional requirements. In: International requirements engineering conference, pp 155---164
[79]
Niu N, Mahmoud A (2012) Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: IEEE international requirements engineering conference, pp 81---90
[80]
Nuseibeh B, Easterbrook S (2000) Requirements engineering: a roadmap. In: Conference on the future of software engineering, pp 35---46
[81]
Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: International conference on program comprehension, pp 68---71
[82]
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: International conference on software engineering, pp 522---531
[83]
Patwardhan S, Banerjee S, Pedersen T (2005) Senserelate::targetword: a generalized framework for word sense disambiguation. In: Interactive poster and demonstration sessions, pp 73---76
[84]
Pedersen J, Silverstein C, Vogt C (2000) Verity at trec-6: out-of-the-box and beyond. Inf Process Manage 36(1):187---204
[85]
Porteous I, Newman D, Ihler A, Asuncion A, Smyth P, Welling M (2008) Fast collapsed gibbs sampling for latent Dirichlet allocation. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 569---577
[86]
Porter F (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc., Los Altos, CA, pp 313---316
[87]
Poshyvanyk D, gal Guhneuc Y, Marcus A (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420---432
[88]
Pucher M (2007) Wordnet-based semantic relatedness measures in automatic speech recognition for meetings. In: Annual meeting of the ACL on interactive poster and demonstration sessions, pp 129---132
[89]
Ramesh B, Jarke M (2001) Towards reference models for requirements traceability. IEEE Trans Softw Eng 27(1):58---93
[90]
Rao A, Lu A, Meier E, Ahmed S, Pliske D (2000) Query processing in trec-6. Inf Process Manage 36(1):179---186
[91]
Rosario B (2000) Latent semantic indexing: an overview. INFOSYS 240 Spring Paper, University of California, Berkeley
[92]
Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613---620
[93]
Sarukkai R (2002) Foundations of web technology. The Springer International Series in Engineering and Computer Science, New York, pp 106---108
[94]
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: International conference on new methods in language processing, pp 44---49
[95]
Settimi R, Cleland-Huang J, Ben Khadra O, Mody J, Lukasik W, DePalma C (2004) Supporting software evolution through dynamically retrieving traces to uml artifacts. In: International workshop on the principles of software evolution, pp 49---54
[96]
Shepherd D, Fry Z, Hill E, Pollock L, Vijay-Shanker K (2007) Using natural language program analysis to locate and understand action-oriented concerns. In: International conference on aspect-oriented software development, pp 212---224
[97]
Sorg P, Cimiano P (2012) Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl Eng 74(0):26---45
[98]
Spanoudakis G, Zisman A (2004) Software traceability: a roadmap. Handb Softw Eng Knowl Eng 3:395---428
[99]
Strube M, Ponzetto S (2006) Wikirelate! computing semantic relatedness using wikipedia. In: National conference on artificial intelligence, pp 1419---1424
[100]
Sundaram S, Huffman-Hayes J, Dekhtyar A, Holbrook E (2010) Assessing traceability of software engineering artifacts. Requir Eng J 15(3):313---335
[101]
Teh Y, Jordan M, Beal M, Blei D (2006) Hierarchical Dirichlet processes. J Am Stat Assoc 101(476):1566---1581
[102]
Teufel S (2007) An overview of evaluation methods in trec ad hoc information retrieval and trec question answering. In: Dybkjaer L, Hemsen H, Minker W (eds) Evaluation of text and speech systems, pp 163---186
[103]
Thomas S, Adams B, Hassan A, Blostein D (2010) Validating the use of topic models for software evolution. In: IEEE working conference on source code analysis and manipulation, pp 55---64
[104]
Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. Commun ACM 37(1):1---40
[105]
Wei X, Croft B (2006) LDA-based document models for ad-hoc retrieval. In: ACM SIGIR, pp 178---185
[106]
Wong S, Ziarko W, Raghavan V, Wong P (2012) On modeling of information retrieval concepts in vector spaces. In: ACM transactions database systems, pp 299---321
[107]
Zhai K, Boyd-Graber J, Asadi N, Alkhouja M (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce. In: International conference on world wide web, pp 879---888
[108]
Zhang Y, Witte R, Rilling J, Haarslev V (2006) An ontology-based approach for traceability recovery. In: International workshop on metamodels, schemas, grammars, and ontologies for reverse engineering, pp 36---43

Cited By

View all
  • (2024)Recovering Trace Links Between Software Documentation And CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639130(1-13)Online publication date: 20-May-2024
  • (2023)Applications of natural language processing in software traceabilityJournal of Systems and Software10.1016/j.jss.2023.111616198:COnline publication date: 1-Apr-2023
  • (2022)Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability RecoveryProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556948(1-1)Online publication date: 10-Oct-2022
  • Show More Cited By
  1. On the role of semantics in automated requirements tracing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Requirements Engineering
    Requirements Engineering  Volume 20, Issue 3
    September 2015
    130 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 September 2015

    Author Tags

    1. Information retrieval
    2. Semantics
    3. Traceability

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Recovering Trace Links Between Software Documentation And CodeProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639130(1-13)Online publication date: 20-May-2024
    • (2023)Applications of natural language processing in software traceabilityJournal of Systems and Software10.1016/j.jss.2023.111616198:COnline publication date: 1-Apr-2023
    • (2022)Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability RecoveryProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556948(1-1)Online publication date: 10-Oct-2022
    • (2022)Generating and visualizing trace link explanationsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510129(1033-1044)Online publication date: 21-May-2022
    • (2022)The Effect of Feature Characteristics on the Performance of Feature Location TechniquesIEEE Transactions on Software Engineering10.1109/TSE.2021.304973548:6(2066-2085)Online publication date: 1-Jun-2022
    • (2022)Testing software’s changing features with environment-driven abstraction identificationRequirements Engineering10.1007/s00766-022-00390-827:4(405-427)Online publication date: 1-Dec-2022
    • (2022)Detecting coreferent entities in natural language requirementsRequirements Engineering10.1007/s00766-022-00374-827:3(351-373)Online publication date: 1-Sep-2022
    • (2021)Predicting Vulnerability for Requirements2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI51335.2021.00028(160-167)Online publication date: 10-Aug-2021
    • (2020)On Combining IR Methods to Improve Bug LocalizationProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389280(252-262)Online publication date: 13-Jul-2020
    • (2020)An empirical assessment of baseline feature location techniquesEmpirical Software Engineering10.1007/s10664-019-09734-525:1(266-321)Online publication date: 1-Jan-2020
    • Show More Cited By

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media