Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training

Published: 09 April 2022 Publication History

Abstract

Source code representation learning is the basis of applying artificial intelligence to many software engineering tasks such as code clone detection, algorithm classification, and code summarization. Recently, many works have tried to improve the performance of source code representation from various perspectives, e.g., introducing the structural information of programs into latent representation. However, when dealing with rapidly expanded unlabeled cross-language source code datasets from the Internet, there are still two issues. Firstly, deep learning models for many code-specific tasks still suffer from the lack of high-quality labels. Secondly, the structural differences among programming languages make it more difficult to process multiple languages in a single neural architecture.
To address these issues, in this article, we propose a novel Cross-language Code representation with a large-scale pre-training (XCode) method. Concretely, we propose to use several abstract syntax trees and ELMo-enhanced variational autoencoders to obtain multiple pre-trained source code language models trained on about 1.5 million code snippets. To fully utilize the knowledge across programming languages, we further propose a Shared Encoder-Decoder (SED) architecture which uses the multi-teacher single-student method to transfer knowledge from the aforementioned pre-trained models to the distilled SED. The pre-trained models and SED will cooperate to better represent the source code. For evaluation, we examine our approach on three typical downstream cross-language tasks, i.e., source code translation, code clone detection, and code-to-code search, on a real-world dataset composed of programming exercises with multiple solutions. Experimental results demonstrate the effectiveness of our proposed approach on cross-language code representations. Meanwhile, our approach performs significantly better than several code representation baselines on different downstream tasks in terms of multiple automatic evaluation metrics.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, November 2-4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.), USENIX Association, 265–283. Retrieved from https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.
[2]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 81.
[3]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=H1gKYo09tX.
[4]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53, 4 (2018), 404–419.
[5]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.
[6]
Thazin Win Win Aung, Yao Wan, Huan Huo, and Yulei Sui. 2022. Multi-triage: A multi-task learning framework for bug triage. Journal of Systems and Software 184 (2022), 111133.
[7]
Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, and Pascal Poupart. 2018. Variational attention for sequence-to-sequence models. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, 1672–1682. Retrieved from https://aclanthology.org/C18-1142.
[8]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Ann Arbor, Michigan, June 29, 2005, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare R. Voss (Eds.), Association for Computational Linguistics, 65–72. https://www.aclweb.org/anthology/W05-0909/.
[9]
Antoine Barbez, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2019. Deep learning anti-patterns from code metrics history. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, September 29 - October 4, 2019. IEEE, 114–124.
[10]
Avishkar Bhoopchand, Tim Rocktäschel, Earl T. Barr, and Sebastian Riedel. 2016. Learning python code suggestion with a sparse pointer network. arXiv:1611.08307. Retrieved from http://arxiv.org/abs/1611.08307.
[11]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.) Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
[12]
Nghi D. Q. Bui, Yijun Yu, and Lingxiao Jiang. 2021. InferCode: Self-supervised learning of code representations by predicting subtrees. In 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21), Madrid, Spain, 22–30 May 2021. IEEE, 1186–1197. DOI:
[13]
José Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.), ACM, 964–974.
[14]
Francesco Paolo Casale, Adrian V. Dalca, Luca Saglietti, Jennifer Listgarten, and Nicoló Fusi. 2018. Gaussian process prior variational autoencoders. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 10390–10401. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/1c336b8080f82bcc2cd2499b4c57261d-Abstract.html.
[15]
Junjie Chen, Zhuo Wu, Zan Wang, Hanmo You, Lingming Zhang, and Ming Yan. 2020. Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 30:1–30:35.
[16]
Mingda Chen, Qingming Tang, Sam Wiseman, and Kevin Gimpel. 2019. Controllable paraphrase generation with a syntactic exemplar. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.), Association for Computational Linguistics, 5972–5984.
[17]
Mingda Chen, Qingming Tang, Sam Wiseman, and Kevin Gimpel. 2019. A multi-task approach for disentangling syntax and semantics in sentence representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), Association for Computational Linguistics, 2453–2464.
[18]
Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.), ACM, 826–831.
[19]
Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.), 2552–2562. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/d759175de8ea5b1d9a2660e45554894f-Abstract.html.
[20]
Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, and Heyan Huang. 2020. Cross-lingual natural language generation via pre-training. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, The 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 7570–7577. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6256.
[21]
Yunjey Choi, Min-Je Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18), Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, 8789–8797. DOI:
[22]
Michael L. Collard, Michael John Decker, and Jonathan I. Maletic. 2013. srcML: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In IEEE International Conference on Software Maintenance, Eindhoven, The Netherlands, September 22–28, 2013. IEEE Computer Society, 516–519. DOI:
[23]
Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.), 7057–7067. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/c04c19c2c2474dbf5f7ac4372c5b9af1-Abstract.html.
[24]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018, Frank Tip and Eric Bodden (Eds.), ACM, 95–105.
[25]
Milan Cvitkovic, Badal Singh, and Animashree Anandkumar. 2019. Open vocabulary learning on source code with a graph-structured cache. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 1475–1485. Retrieved from http://proceedings.mlr.press/v97/cvitkovic19b.html.
[26]
Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 225:1–225:28.
[27]
AL DAVIS. 1982. Data flow program graphs. Computer 15, 2 (1982), 26–41. DOI:
[28]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), Association for Computational Linguistics, 4171–4186.
[29]
Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Kumar Goyal, Peter Ku, and Dilek Hakkani-Tür. 2020. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.), European Language Resources Association, 422–428.
[30]
Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. 2018. Learning to teach. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Retrieved from https://openreview.net/forum?id=HJewuJWCZ.
[31]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. DOI:
[32]
Orhan Firat, Kyunghyun Cho, and Yoshua Bengio. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, June 12-17, 2016, Kevin Knight, Ani Nenkova, and Owen Rambow (Eds.), The Association for Computational Linguistics, 866–875.
[33]
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor S. Lempitsky. 2017. Domain-adversarial training of neural networks. In Proceedings of the Domain Adaptation in Computer Vision Applications, Gabriela Csurka (Ed.), Springer, 189–209.
[34]
Yi Gao, Zan Wang, Shuang Liu, Lin Yang, Wei Sang, and Yuanfang Cai. 2019. TECCD: A tree embedding approach for code clone detection. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, September 29 - October 4, 2019. IEEE, 145–156.
[35]
Zhipeng Gao, Xin Xia, John Grundy, David Lo, and Yuan-Fang Li. 2020. Generating question titles for stack overflow from mined code snippets. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 26:1–26:37.
[36]
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&Fuzz: Machine learning for input fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, October 30 - November 03, 2017, Grigore Rosu, Massimiliano Di Penta, and Tien N. Nguyen (Eds.), IEEE Computer Society, 50–59.
[37]
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.), ACM, 933–944.
[38]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin B. Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training code representations with data flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net.
[39]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. DOI:
[40]
Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering, Martin Glinz, Gail C. Murphy, and Mauro Pezzè (Eds.). IEEE, 837–847. DOI:
[41]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. arxiv:1503.02531. Retrieved from http://arxiv.org/abs/1503.02531.
[42]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
[43]
Haiwen Hong, Jingfeng Zhang, Yin Zhang, Yao Wan, and Yulei Sui. 2021. Fix-filter-fix: Intuitively connect any models for effective bug fixing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.), Association for Computational Linguistics, 3495–3504. Retrieved from https://aclanthology.org/2021.emnlp-main.282.
[44]
Wei-Ning Hsu, Yu Zhang, and James R. Glass. 2017. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017. IEEE, 16–23.
[45]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018, Foutse Khomh, Chanchal K. Roy, and Janet Siegmund (Eds.), ACM, 200–210.
[46]
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P. Xing. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.), PMLR, 1587–1596. Retrieved from http://proceedings.mlr.press/v70/hu17e.html.
[47]
Huaibo Huang, Zhihang Li, Ran He, Zhenan Sun, and Tieniu Tan. 2018. IntroVAE: Introspective variational autoencoders for photographic image synthesis. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.), 52–63. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html.
[48]
Xuan Huo, Ferdian Thung, Ming Li, David Lo, and Shu-Ting Shi. 2021. Deep transfer bug localization. IEEE Transactions on Software Engineering 47, 7 (2021), 1368–1380.
[49]
Lucy Jiang, Robert Rewcastle, Paul Denny, and Ewan D. Tempero. 2020. CompareCFG: Providing visual feedback on code quality using control flow graphs. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, ITiCSE 2020, Trondheim, Norway, June 15-19, 2020, Michail N. Giannakos, Guttorm Sindre, Andrew Luxton-Reilly, and Monica Divitini (Eds.), ACM, 493–499.
[50]
Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. ACM, 1–12.
[51]
Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: A code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.), ACM, 946–957.
[52]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.), ACL, 1746–1751.
[53]
Yoon Kim and Alexander M. Rush. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, November 1-4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.), The Association for Computational Linguistics, 1317–1327. Retrieved from http://aclweb.org/anthology/D/D16/D16-1139.pdf.
[54]
Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
[55]
Yann LeCun and Yoshua Bengio. 1998. Convolutional Networks for Images, Speech, and Time Series. MIT Press, Cambridge, MA, 255–258.
[56]
Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365–378. Retrieved from https://transacl.org/ojs/index.php/tacl/article/view/1051.
[57]
Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.
[58]
Zehao Lin, Xinjing Huang, Feng Ji, Haiqing Chen, and Yin Zhang. 2019. Task-oriented conversation generation using heterogeneous memory networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.), Association for Computational Linguistics, 4557–4566.
[59]
Fang Liu, Lu Zhang, and Zhi Jin. 2020. Modeling programs hierarchically with stack-augmented LSTM. Journal of Systems and Software 164 (2020), 110547. DOI:
[60]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, 9-15 July 2016, Subbarao Kambhampati (Ed.), IJCAI/AAAI Press, 2873–2879.
[61]
Shangqing Liu, Cuiyun Gao, Sen Chen, Lun Yiu Nie, and Yang Liu. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering 1 (2020), 1–1.
[62]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692.
[63]
Zihan Liu, Genta Indra Winata, Peng Xu, Zhaojiang Lin, and Pascale Fung. 2020. Cross-lingual spoken language understanding with regularized representation alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.), Association for Computational Linguistics, 7241–7251.
[64]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
[65]
Tambet Matiisen, A. Oliver, T. Cohen, and John Schulman. 2020. Teacher-student curriculum learning. IEEE Transactions on Neural Networks and Learning Systems 31, 9 (2020), 3732–3740. DOI:
[66]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.), http://arxiv.org/abs/1301.3781.
[67]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the 13th AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, Dale Schuurmans and Michael P. Wellman (Eds.), AAAI Press, 1287–1293. Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11775.
[68]
Kawser Wazed Nafi, Tonny Shekha Kar, Banani Roy, Chanchal K. Roy, and Kevin A. Schneider. 2019. CLCDSA: Cross language code clone detection using syntactical features and API documentation. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, November 11-15, 2019. IEEE, 1026–1037.
[69]
Iulian Neamtiu, Jeffrey S. Foster, and Michael Hicks. 2005. Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes 30, 4 (2005), 1–5.
[70]
Bui D. Q. Nghi, Yijun Yu, and Lingxiao Jiang. 2019. Bilateral dependency neural networks for cross-language algorithm classification. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, Xinyu Wang, David Lo, and Emad Shihab (Eds.). IEEE, 422–433.
[71]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA311–318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf.
[72]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.), 8024–8035. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
[73]
Shuke Peng, Feng Ji, Zehao Lin, Shaobo Cui, Haiqing Chen, and Yin Zhang. 2020. MTSS: Learn from multiple domain teachers and become a multi-domain dialogue expert. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 8608–8615. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6384.
[74]
Passakorn Phannachitta. 2020. On an optimal analogy-based software effort estimation. Information and Software Technology 125 (2020), 106330. DOI:
[75]
Md Rahman, Yutaka Watanobe, and Keita Nakamura. 2020. Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Applied Sciences 10, 8 (2020), 2973.
[76]
Baptiste Rozière, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html.
[77]
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536. Retrieved from http://www.nature.com/articles/323533a0.
[78]
Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.
[79]
Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic source code summarization with extended tree-LSTM. In Proceedings of the International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019. IEEE, 1–8.
[80]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, Satinder P. Singh and Shaul Markovitch (Eds.), AAAI Press, 4278–4284. Retrieved from http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806.
[81]
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 6105–6114. Retrieved from http://proceedings.mlr.press/v97/tan19a.html.
[82]
Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019. Multilingual neural machine translation with knowledge distillation. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, May 6-9, 2019. OpenReview.net. Retrieved from https://openreview.net/forum?id=S1gUsoR9YX.
[83]
Gang Tian, Qibo Wang, Yi Zhao, Lantian Guo, Zhonglin Sun, and Liangyu Lv. 2020. Smart contract classification with a Bi-LSTM based approach. IEEE Access 8 (2020), 43806–43816.
[84]
Jonas Tölke. 2010. Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nVIDIA. Computing and Visualization in Science 13, 1 (2010), 29.
[85]
Haonan Tong, Bin Liu, and Shihai Wang. 2018. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Information and Software Technology 96 (2018), 94–111.
[86]
Chau Tran, Yuqing Tang, Xian Li, and Jiatao Gu. 2020. Cross-lingual retrieval for iterative self-supervised training. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1763ea5a7e72dd7ee64073c2dda7a7a8-Abstract.html.
[87]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 19:1–19:29. DOI:
[88]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.
[89]
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. CIDEr: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, June 7-12, 2015. IEEE Computer Society, 4566–4575.
[90]
S. VenkataKeerthy, R. Aggarwal, S. Jain, Maunendra Sankar Desarkar, Ramakrishna Upadrasta, and Y. N. Srikant. 2019. IR2Vec: A flow analysis based scalable infrastructure for program encodings. CoRR abs/1909.06228.
[91]
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip S. Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, November 11-15, 2019. IEEE, 13–25.
[92]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S Yu. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 397–407.
[93]
Senzhang Wang, Jiannong Cao, and Philip Yu. 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Transactions on Knowledge and Data Engineering (2020). https://ieeexplore.ieee.org/document/9204396/citations#citations.
[94]
Wenhan Wang, Ge Li, Sijie Shen, Xin Xia, and Zhi Jin. 2020. Modular tree network for source code representation learning. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 1–23.
[95]
Wenhan Wang, Kechi Zhang, Ge Li, and Zhi Jin. 2020. Learning to represent programs with heterogeneous graphs. CoRR.
[96]
Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, and Lei Zhang. 2019. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6629–6638.
[97]
Xin Wang, Yasheng Wang, Fei Mi, Pingyi Zhou, Yao Wan, Xiao Liu, Li Li, Hao Wu, Jin Liu, and Xin Jiang. 2021. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv:2108.04556. Retrieved from https://arxiv.org/abs/2108.04556.
[98]
Martin White, Christopher Vendome, Mario Linares-Vásquez, and Denys Poshyvanyk. 2015. Toward deep learning software repositories. In Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 334–345.
[99]
Yan Xiao, Jacky Keung, Kwabena E Bennin, and Qing Mi. 2019. Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology 105 (2019), 17–29.
[100]
A. Xu, T. Dai, Huajun Chen, Zhe Ming, and W. Li. 2018. Vulnerability detection for source code using contextual LSTM. In Proceedings of the 2018 5th International Conference on Systems and Informatics. 1225–1230.
[101]
Bowen Xu, Deheng Ye, Zhenchang Xing, Xin Xia, Guibin Chen, and Shanping Li. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016, David Lo, Sven Apel, and Sarfraz Khurshid (Eds.), ACM, 51–62.
[102]
Jiacheng Xu and Greg Durrett. 2018. Spherical latent spaces for stable variational autoencoders. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4503–4513.
[103]
Yanming Yang, Xin Xia, David Lo, and John C. Grundy. 2021. A survey on deep learning for software engineering. ACM Comput. Surv. (2021).
[104]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems. 5753–5763.
[105]
Pengcheng Yin, Chunting Zhou, Junxian He, and Graham Neubig. 2018. StructVAE: Tree-structured latent variable models for semi-supervised semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.), Association for Computational Linguistics, 754–765. Retrieved from https://www.aclweb.org/anthology/P18-1070/.
[106]
Cangzhou Yuan, Shenhong Wei, Yutong Wang, Yue You, and ShangGuan ZiLiang. 2016. Android applications categorization using bayesian classification. In Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2016, Chengdu, China, October 13-15, 2016, Bin Xie and Xiaolong Xu (Eds.), IEEE, 173–176.
[107]
Jingfeng Zhang, Haiwen Hong, Yin Zhang, Yao Wan, Ye Liu, and Yulei Sui. 2021. Disentangled code representation learning for multiple programming languages. In Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.), Association for Computational Linguistics, 4454–4466.
[108]
Dehai Zhao, Zhenchang Xing, Chunyang Chen, Xin Xia, and Guoqiang Li. 2019. ActionNet: Vision-based workflow action recognition from programming screencasts. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.), IEEE / ACM, 350–361.
[109]
Gang Zhao and Jeff Huang. 2018. DeepSim: Deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL) . Association for Computing Machinery, New York, NY, 141–151.
[110]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 10197–10207.

Cited By

View all
  • (2024)A Two-stage Coarsening Method for a Streaming Graph with Preserving Key FeaturesProceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security10.1145/3665348.3665392(253-260)Online publication date: 3-Jul-2024
  • (2024)PTM-APIRec: Leveraging Pre-trained Models of Source Code in API RecommendationACM Transactions on Software Engineering and Methodology10.1145/363274533:3(1-30)Online publication date: 15-Mar-2024
  • (2024)A study of common bug fix patterns in RustEmpirical Software Engineering10.1007/s10664-023-10437-129:2Online publication date: 12-Feb-2024
  • Show More Cited By

Index Terms

  1. XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 31, Issue 3
      July 2022
      912 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3514181
      • Editor:
      • Mauro Pezzè
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 April 2022
      Accepted: 01 December 2021
      Revised: 01 November 2021
      Received: 01 December 2020
      Published in TOSEM Volume 31, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Deep learning
      2. neural networks
      3. code representation
      4. cross-language
      5. pre-training

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Natural Science Foundation of China
      • MoE Engineering Research Center of Digital Library
      • Chinese Knowledge Center for Engineering Sciences and Technology
      • National Engineering Research Center for Big Data Technology and System
      • Fundamental Research Funds for the Central Universities
      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)442
      • Downloads (Last 6 weeks)41
      Reflects downloads up to 23 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Two-stage Coarsening Method for a Streaming Graph with Preserving Key FeaturesProceedings of the 2024 International Conference on Generative Artificial Intelligence and Information Security10.1145/3665348.3665392(253-260)Online publication date: 3-Jul-2024
      • (2024)PTM-APIRec: Leveraging Pre-trained Models of Source Code in API RecommendationACM Transactions on Software Engineering and Methodology10.1145/363274533:3(1-30)Online publication date: 15-Mar-2024
      • (2024)A study of common bug fix patterns in RustEmpirical Software Engineering10.1007/s10664-023-10437-129:2Online publication date: 12-Feb-2024
      • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
      • (2023)Rise of the Planet of Serverless Computing: A Systematic ReviewACM Transactions on Software Engineering and Methodology10.1145/357964332:5(1-61)Online publication date: 21-Jul-2023
      • (2023)Seeing the Whole Elephant: Systematically Understanding and Uncovering Evaluation Biases in Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/356138232:3(1-37)Online publication date: 27-Apr-2023
      • (2023)Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural NetworksIEEE Transactions on Software Engineering10.1109/TSE.2023.331179649:11(4846-4868)Online publication date: 6-Sep-2023
      • (2023)Multi-Granularity Detector for Vulnerability FixesIEEE Transactions on Software Engineering10.1109/TSE.2023.328127549:8(4035-4057)Online publication date: 1-Aug-2023
      • (2023)Invalidator: Automated Patch Correctness Assessment Via Semantic and Syntactic ReasoningIEEE Transactions on Software Engineering10.1109/TSE.2023.325517749:6(3411-3429)Online publication date: 1-Jun-2023
      • (2023)Effective Isolation of Fault-Correlated Variables via Statistical and Mutation AnalysisIEEE Transactions on Software Engineering10.1109/TSE.2022.320959049:4(2053-2068)Online publication date: 1-Apr-2023
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media