research-article

Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention

Authors:

Shikun ZhangAuthors Info & Claims

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Pages 219 - 229

https://doi.org/10.1145/3387904.3389281

Published: 12 September 2020 Publication History

Abstract

Bug localization automatic localize relevant source files given a natural language description of bug within a software project. For a large project containing hundreds and thousands of source files, developers need cost lots of time to understand bug reports generated by quality assurance and localize these buggy source files. Traditional methods are heavily depending on the information retrieval technologies which rank the similarity between source files and bug reports in lexical level. Recently, deep learning based models are used to extract semantic information of code with significant improvements for bug localization. However, programming language is a highly structural and logical language, which contains various relations within and cross source files. Thus, we propose KGBugLocator to utilize knowledge graph embeddings to extract these interrelations of code, and a keywords supervised bi-directional attention mechanism regularize model with interactive information between source files and bug reports. With extensive experiments on four different projects, we prove our model can reach the new the-state-of-art(SOTA) for bug localization.

References

[1]

Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund. 2007. On the Accuracy of Spectrum-Based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION '07). IEEE Computer Society, USA, 89--98.

Digital Library

[2]

Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States. 2787--2795. http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data

Digital Library

[3]

Gregory Gay, Sonia Haiduc, Andrian Marcus, and Tim Menzies. 2009. On the use of relevance feedback in IR-based concept location. In 25th IEEE International Conference on Software Maintenance (ICSM 2009), September 20-26, 2009, Edmonton, Alberta, Canada. 351--360. https://doi.org/10.1109/ICSM.2009.5306315

[4]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (1997), 1735--1780. https://doi.org/10.1162/neco.1997.9.8.1735

Digital Library

[5]

Xuan Huo and Ming Li. 2017. Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017. 1909--1915. https://doi.org/10.24963/ijcai.2017/265

[6]

Xuan Huo, Ming Li, and Zhi-Hua Zhou. 2016. Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 1606--1612. http://www.ijcai.org/Abstract/16/230

[7]

Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers. 687--696. https://www.aclweb.org/anthology/P15-1067/

[8]

Rie Johnson and Tong Zhang. 2015. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. In NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31 - June 5, 2015. 103--112. https://doi.org/10.3115/v1/n15-1011

[9]

James A. Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In 20th IEEE/ACM International Conference on Automated Software Engineering (ASE 2005), November 7-11, 2005, Long Beach, CA, USA. 273--282. https://doi.org/10.1145/1101908.1101949

[10]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2017. Bug localization with combination of deep learning and information retrieval. In Proceedings of the 25th International Conference on Program Comprehension, ICPC 2017, Buenos Aires, Argentina, May 22-23, 2017. 218--229. https://doi.org/10.1109/ICPC.2017.24

[11]

Hongliang Liang, Lu Sun, Meilin Wang, and Yuxing Yang. 2019. Deep Learning With Customized Abstract Syntax Tree for Bug Localization. IEEE Access 7 (2019), 116309--116320. https://doi.org/10.1109/ACCESS.2019.2936948

[12]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA. 2181--2187. http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9571

Digital Library

[13]

Zeqi Lin, Bing Xie, Yanzhen Zou, Junfeng Zhao, Xuan-Dong Li, Jun Wei, Hailong Sun, and Gang Yin. 2017. Intelligent Development Environment and Software Knowledge Graph. J. Comput. Sci. Technol. 32, 2 (2017), 242--249. https://doi.org/10.1007/s11390-017-1718-y

[14]

Guangliang Liu, Yang Lu, Ke Shi, Jingfei Chang, and Xing Wei. 2019. Convolutional Neural Networks-Based Locating Relevant Buggy Code Files for Bug Reports Affected by Data Imbalance. IEEE Access 7 (2019), 131304--131316. https://doi.org/10.1109/ACCESS.2019.2940557

[15]

Stacy K. Lukins, Nicholas A. Kraft, and Letha H. Etzkorn. 2010. Bug localization using latent Dirichlet allocation. Information & Software Technology 52, 9 (2010), 972--990. https://doi.org/10.1016/j.infsof.2010.04.002

Digital Library

[16]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press. https://doi.org/10.1017/CBO9780511809071

[17]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States. 3111--3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality

Digital Library

[18]

Laura Moreno, John Joseph Treadway, Andrian Marcus, and Wuwei Shen. 2014. On the Use of Stack Traces to Improve Text Retrieval-Based Bug Localization. In 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29-October 3, 2014. 151--160. https://doi.org/10.1109/ICSME.2014.37

Digital Library

[19]

Lili Mou, Ge Li, Zhi Jin, Lu Zhang, and Tao Wang. 2014. TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718 (2014). arXiv:1409.5718 http://arxiv.org/abs/1409.5718

[20]

Syed Shariyar Murtaza, Abdelwahab Hamou-Lhadj, Nazim H. Madhavji, and Mechelle Gittens. 2014. An empirical study on the use of mutant traces for diagnosis of faults in deployed systems. Journal of Systems and Software 90 (2014), 29--44. https://doi.org/10.1016/j.jss.2013.11.1094

Digital Library

[21]

Sravya Polisetty, Andriy V. Miranskyy, and Ayse Basar. 2019. On Usefulness of the Deep-Learning-Based Bug Localization Models to Practitioners. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering, PROMISE 2019, Recife, Brazil, September 18, 2019. 16--25. https://doi.org/10.1145/3345629.3345632

Digital Library

[22]

Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar T. Devanbu. 2016. On the "naturalness" of buggy code. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 428--439. https://doi.org/10.1145/2884781.2884848

[23]

Henrique Lemos Ribeiro, Roberto Paulo Andrioli de Araujo, Marcos Lordello Chaim, Higor Amario de Souza, and Fabio Kon. 2019. Evaluating data-flow coverage in spectrum-based fault localization. In ESEM. IEEE, 1--11.

[24]

Xiaobing Sun, Wei Zhou, Bin Li, Zhen Ni, and Jinting Lu. 2019. Bug Localization for Version Issues With Defect Patterns. IEEE Access 7 (2019), 18811--18820. https://doi.org/10.1109/ACCESS.2019.2894976

[25]

Shaowei Wang and David Lo. 2016. AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization. Journal of Software: Evolution and Process 28, 10 (2016), 921--942. https://doi.org/10.1002/smr.1801

Digital Library

[26]

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27-31, 2014, Québec City, Québec, Canada. 1112--1119. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8531

Digital Library

[27]

Yan Xiao and Jacky Keung. 2018. Improving Bug Localization with Character-Level Convolutional Neural Network and Recurrent Neural Network. In 25th Asia-Pacific Software Engineering Conference, APSEC 2018, Nara, Japan, December 4-7, 2018. 703--704. https://doi.org/10.1109/APSEC.2018.00097

[28]

Yan Xiao, Jacky Keung, Kwabena Ebo Bennin, and Qing Mi. 2019. Improving bug localization with word embedding and enhanced convolutional neural networks. Information & Software Technology 105 (2019), 17--29. https://doi.org/10.1016/j.infsof.2018.08.002

[29]

Yan Xiao, Jacky Keung, Qing Mi, and Kwabena Ebo Bennin. 2017. Improving Bug Localization with an Enhanced Convolutional Neural Network. In 24th Asia-Pacific Software Engineering Conference, APSEC 2017, Nanjing, China, December 4-8, 2017. 338--347. https://doi.org/10.1109/APSEC.2017.40

[30]

Rui Xie, Long Chen, Wei Ye, Zhiyu Li, Tianxiang Hu, Dongdong Du, and Shikun Zhang. 2019. DeepLink: A Code Knowledge Graph Based Deep Learning Approach for Issue-Commit Link Recovery. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019. 434--444. https://doi.org/10.1109/SANER.2019.8667969

[31]

Xin Ye, Razvan C. Bunescu, and Chang Liu. 2014. Learning to rank relevant files for bug reports using domain knowledge. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16-22, 2014. 689--699. https://doi.org/10.1145/2635868.2635874

Digital Library

[32]

Klaus Changsun Youm, June Ahn, Jeongho Kim, and Eunseok Lee. 2015. Bug Localization Based on Code Change Histories and Bug Reports. In 2015 Asia-Pacific Software Engineering Conference, APSEC 2015, New Delhi, India, December 1-4, 2015. 190--197. https://doi.org/10.1109/APSEC.2015.23

[33]

Klaus Changsun Youm, June Ahn, and Eunseok Lee. 2017. Improved bug localization based on code change histories and bug reports. Information & Software Technology 82 (2017), 177--192. https://doi.org/10.1016/j.infsof.2016.11.002

[34]

Mengshi Zhang, Xia Li, Lingming Zhang, and Sarfraz Khurshid. 2017. Boosting spectrum-based fault localization using PageRank. In ISSTA. ACM, 261--272.

[35]

Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. 14--24. https://doi.org/10.1109/ICSE.2012.6227210

Cited By

Yang BLi S(2024)UIGuider: Detecting Implicit Design Guidelines Using a Domain Knowledge Graph ApproachElectronics10.3390/electronics1307121013:7(1210)Online publication date: 26-Mar-2024
https://doi.org/10.3390/electronics13071210
Chakraborty PAlfadel MNagappan M(2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3452595
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Show More Cited By

Index Terms

Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention

Recommendations

Bug localization via searching crowd-contributed code
Internetware '14: Proceedings of the 6th Asia-Pacific Symposium on Internetware

Bug localization, i.e., locating bugs in code snippets, is a frequent task in software development. Although static bug-finding tools are available to reduce manual effort in bug localization, these tools typically detect bugs with known project-...
Bug localization with combination of deep learning and information retrieval
ICPC '17: Proceedings of the 25th International Conference on Program Comprehension

The automated task of locating the potential buggy files in a software project given a bug report is called bug localization. Bug localization helps developers focus on crucial files. However, the existing automated bug localization approaches face a ...
A preliminary study on using code smells to improve bug localization
ICPC '18: Proceedings of the 26th Conference on Program Comprehension

Bug localization is a technique that has been proposed to support the process of identifying the locations of bugs specified in a bug report. A traditional approach such as information retrieval (IR)-based bug localization calculates the similarity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

July 2020

481 pages

ISBN:9781450379588

DOI:10.1145/3387904

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPC '20

Sponsor:

SIGSOFT

ICPC '20: 28th International Conference on Program Comprehension

July 13 - 15, 2020

Seoul, Republic of Korea

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
799
Total Downloads

Downloads (Last 12 months)125
Downloads (Last 6 weeks)24

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang BLi S(2024)UIGuider: Detecting Implicit Design Guidelines Using a Domain Knowledge Graph ApproachElectronics10.3390/electronics1307121013:7(1210)Online publication date: 26-Mar-2024
https://doi.org/10.3390/electronics13071210
Chakraborty PAlfadel MNagappan M(2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3452595
Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934
Wang DGalster MMorales-Trujillo M(2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1016/j.infsof.2023.107338
Han JHuang CLiu J(2024)bjEnet: a fast and accurate software bug localization method in natural language semantic spaceSoftware Quality Journal10.1007/s11219-024-09693-132:4(1515-1538)Online publication date: 22-Jul-2024
https://doi.org/10.1007/s11219-024-09693-1
Ma YDu YLi MElkind E(2023)Capturing the long-distance dependency in the control flow graph via structural-guided attention for bug localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/249(2242-2250)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/249
Du YYu ZChandra SBlincoe KTonella P(2023)Pre-training Code Representation with Semantic Flow Graph for Effective Bug LocalizationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616338(579-591)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616338
Casamayor RCetina CPastor ÓPérez F(2023)Studying the Influence and Distribution of the Human Effort in a Hybrid Fitness Function for Search-Based Model-Driven EngineeringIEEE Transactions on Software Engineering10.1109/TSE.2023.332973049:12(5189-5202)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3329730
Zhu ZTong HWang YLi Y(2023)BL-GAN: Semi-Supervised Bug Localization via Generative Adversarial NetworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322532935:11(11112-11125)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3225329
Wei HSu XZheng WTao W(2023)Documentation-Guided API Sequence Search without Worrying about the Text-API Semantic Gap2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00040(343-354)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00040
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents