LongCoder: a long-range pre-trained language model for code completion
Article No.: 486, Pages 12098 - 12107
Abstract
In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. Long-Coder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens --bridge tokens and memory tokens -- to improve performance and efficiency. Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction, while memory tokens are included to highlight important statements that may be invoked later and need to be memorized, such as package imports and definitions of classes, functions, or structures. We conduct experiments on a newly constructed dataset that contains longer code context and the publicly available CodeXGLUE benchmark. Experimental results demonstrate that LongCoder achieves superior performance on code completion tasks compared to previous models while maintaining comparable efficiency in terms of computational resources during inference.
References
[1]
Allamanis, M. The adverse effects of code duplication in machine learning models of code. In Onward!, pp. 143- 153. ACM, 2019.
[2]
Allamanis, M. and Sutton, C. Mining source code repositories at massive scale using language modeling. In 2013 10th Working Conference on Mining Software Repositories (MSR), pp. 207-216. IEEE, 2013.
[3]
Allamanis, M. and Sutton, C. Mining idioms from source code. In SIGSOFT FSE, pp. 472-483. ACM, 2014.
[4]
Beltagy, I., Peters, M. E., and Cohan, A. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
[5]
Bielik, P., Raychev, V., and Vechev, M. T. PHOG: probabilistic model for code. In ICML, volume 48 of JMLR Workshop and Conference Proceedings, pp. 2933-2942. JMLR.org, 2016.
[6]
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. Language models are few-shot learners. In NeurIPS, 2020.
[7]
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. d. O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
[8]
Child, R., Gray, S., Radford, A., and Sutskever, I. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
[9]
Choromanski, K. M., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlós, T., Hawkins, P., Davis, J. Q., Mohiuddin, A., Kaiser, L., Belanger, D. B., Colwell, L. J., and Weller, A. Rethinking attention with performers. In ICLR. OpenReview.net, 2021.
[10]
Clement, C. B., Lu, S., Liu, X., Tufano, M., Drain, D., Duan, N., Sundaresan, N., and Svyatkovskiy, A. Longrange modeling of source code files with ewash: Extended window access by syntax hierarchy. In EMNLP, pp. 4713- 4722. Association for Computational Linguistics, 2021.
[11]
Devlin, J., Chang, M., Lee, K., and Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pp. 4171-4186. Association for Computational Linguistics, 2019.
[12]
Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J., Zhou, M., and Hon, H. Unified language model pre-training for natural language understanding and generation. In NeurIPS, pp. 13042-13054, 2019.
[13]
Fried, D., Aghajanyan, A., Lin, J., Wang, S., Wallace, E., Shi, F., Zhong, R., Yih, W.-t., Zettlemoyer, L., and Lewis, M. Incoder: A generative model for code infilling and synthesis. arXiv preprint arXiv:2204.05999, 2022.
[14]
Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., and Yin, J. Unixcoder: Unified cross-modal pre-training for code representation. In ACL, pp. 7212-7225. Association for Computational Linguistics, 2022.
[15]
Hellendoorn, V. J. and Devanbu, P. T. Are deep neural networks the best choice for modeling source code? In ESEC/SIGSOFT FSE, pp. 763-773. ACM, 2017.
[16]
Hindle, A., Barr, E. T., Gabel, M., Su, Z., and Devanbu, P. T. On the naturalness of software. Commun. ACM, 59(5): 122-131, 2016.
[17]
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., and Brockschmidt, M. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019.
[18]
Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML, volume 119 of Proceedings of Machine Learning Research, pp. 5156-5165. PMLR, 2020.
[19]
Kitaev, N., Kaiser, L., and Levskaya, A. Reformer: The efficient transformer. In ICLR. OpenReview.net, 2020.
[20]
Li, Y., Choi, D., Chung, J., Kushman, N., Schrittwieser, J., Leblond, R., Eccles, T., Keeling, J., Gimeno, F., Dal Lago, A., et al. Competition-level code generation with alphacode. Science, 378(6624):1092-1097, 2022.
[21]
Liu, F., Li, G., Zhao, Y., and Jin, Z. Multi-task learning based pre-trained language model for code completion. In ASE, pp. 473-485. IEEE, 2020.
[22]
Liu, T., Xu, C., and McAuley, J. Repobench: Benchmarking repository-level code auto-completion systems. arXiv preprint, 2023.
[23]
Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C. B., Drain, D., Jiang, D., Tang, D., Li, G., Zhou, L., Shou, L., Zhou, L., Tufano, M., Gong, M., Zhou, M., Duan, N., Sundaresan, N., Deng, S. K., Fu, S., and Liu, S. Codexglue: A machine learning benchmark dataset for code understanding and generation. In NeurIPS Datasets and Benchmarks, 2021.
[24]
Nijkamp, E., Pang, B., Hayashi, H., Tu, L., Wang, H., Zhou, Y., Savarese, S., and Xiong, C. A conversational paradigm for program synthesis. arXiv preprint arXiv:2203.13474, 2022.
[25]
Qin, Z., Sun, W., Deng, H., Li, D., Wei, Y., Lv, B., Yan, J., Kong, L., and Zhong, Y. cosformer: Rethinking softmax in attention. In ICLR. OpenReview.net, 2022.
[26]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[27]
Raychev, V., Bielik, P., and Vechev, M. T. Probabilistic model for code with decision trees. In OOPSLA, pp. 731-747. ACM, 2016.
[28]
Svyatkovskiy, A., Deng, S. K., Fu, S., and Sundaresan, N. Intellicode compose: code generation using transformer. In ESEC/SIGSOFT FSE, pp. 1433-1443. ACM, 2020.
[29]
Tay, Y., Dehghani, M., Bahri, D., and Metzler, D. Efficient transformers: A survey. ACM Computing Surveys, 55(6): 1-28, 2022.
[30]
Tu, Z., Su, Z., and Devanbu, P. T. On the localness of software. In SIGSOFT FSE, pp. 269-280. ACM, 2014.
[31]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In NIPS, pp. 5998-6008, 2017.
[32]
Wang, S., Li, B. Z., Khabsa, M., Fang, H., and Ma, H. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768, 2020.
[33]
Xu, F. F., Alon, U., Neubig, G., and Hellendoorn, V. J. A systematic evaluation of large language models of code. In MAPS@PLDI, pp. 1-10. ACM, 2022.
[34]
Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontañón, S., Pham, P., Ravula, A., Wang, Q., Yang, L., and Ahmed, A. Big bird: Transformers for longer sequences. In NeurIPS, 2020.
Recommendations
A Code Assignment Algorithm for Nonblocking OVSF Codes in WCDMA
OVSF codes are used as channelization codes in WCDMA. Due to code blocking property of OVSF codes, the bandwidth available in the system is severely limited. Code reassignments mitigate the impact of the blocking property at the expense of causing ...
Effective Bug Triage Based on Historical Bug-Fix Information
ISSRE '14: Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability EngineeringFor complex and popular software, project teams could receive a large number of bug reports. It is often tedious and costly to manually assign these bug reports to developers who have the expertise to fix the bugs. Many bug triage techniques have been ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
July 2023
43479 pages
Copyright © 2023.
Publisher
JMLR.org
Publication History
Published: 23 July 2023
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024