Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

BERT- and TF-IDF-based feature extraction for long-lived bug prediction in FLOSS: : A comparative study

Published: 01 August 2023 Publication History

Abstract

Context:

The correct prediction of long-lived bugs could help maintenance teams to build their plan and to fix more bugs that often adversely affect software quality and disturb the user experience across versions in Free/Libre Open-Source Software (FLOSS). Machine Learning and Text Mining methods have been applied to solve many real-world prediction problems, including bug report handling.

Objective:

Our research aims to compare the accuracy of ML classifiers on long-lived bug prediction in FLOSS using Bidirectional Encoder Representations from Transformers (BERT)- and Term Frequency - Inverse Document Frequency (TF-IDF)-based feature extraction. Besides that, we aim to investigate BERT variants on the same task.

Method:

We collected bug reports from six popular FLOSS and used the Machine Learning classifiers to predict long-lived bugs. Furthermore, we compare different feature extractors, based on BERT and TF-IDF methods, in long-lived bug prediction.

Results:

We found that long-lived bug prediction using BERT-based feature extraction systematically outperformed the TF-IDF. The SVM and Random Forest outperformed other classifiers in almost all datasets using BERT. Furthermore, smaller BERT architectures show themselves as competitive.

Conclusion:

Our results demonstrated a promising avenue to predict long-lived bugs based on BERT contextual embedding features and fine-tuning procedures.

References

[1]
A. Lamkanfi, S. Demeyer, E. Giger, B. Goethals, Predicting the severity of a reported bug, in: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), (ISSN: 2160-1852) 2010, pp. 1–10.
[2]
A. Lamkanfi, S. Demeyer, Q.D. Soetens, T. Verdonck, Comparing Mining Algorithms for Predicting the Severity of a Reported Bug, in: 2011 15th European Conference on Software Maintenance and Reengineering, (ISSN: 1534-5351) 2011, pp. 249–258.
[3]
Yang G., Baek S., Lee J.-W., Lee B., Analyzing emotion words to predict severity of software bugs: A case study of open source projects, in: Proceedings of the Symposium on Applied Computing, SAC ’17, ACM, New York, NY, USA, 2017, pp. 1280–1287.
[4]
H. Zhang, L. Gong, S. Versteeg, Predicting bug-fixing time: An empirical study of commercial software projects, in: 2013 35th International Conference on Software Engineering (ICSE), (ISSN: 0270-5257) 2013, pp. 1042–1051.
[5]
W. Abdelmoez, M. Kholief, F.M. Elsalmy, Bug fix-time prediction model using naïve Bayes classifier, in: 2012 22nd International Conference on Computer Theory and Applications (ICCTA), 2012, pp. 167–172.
[6]
Al-Zubaidi W.H.A., Dam H.K., Ghose A., Li X., Multi-objective search-based approach to estimate issue resolution time, in: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering, in: PROMISE, Association for Computing Machinery, New York, NY, USA, 2017, pp. 53–62. Available from: https://doi.org/10.1145/3127005.3127011.
[7]
P. Ardimento, M. Bilancia, S. Monopoli, Predicting Bug-Fix Time: Using Standard Versus Topic-Based Text Categorization Techniques, 2016, pp. 167–182.
[8]
Ardimento P., Dinapoli A., Knowledge extraction from on-line open source bug tracking systems to predict bug-fixing time, in: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, WIMS ’17, Association for Computing Machinery, New York, NY, USA, 2017, Available from: https://doi.org/10.1145/3102254.3102275.
[9]
Sepahvand R., Akbari R., Hashemi S., Predicting the bug fixing time using word embedding and deep long short term memories, IET Softw. 14 (3) (2020) 203–212.
[10]
C. Liu, J. Yang, L. Tan, M. Hafiz, R2Fix: Automatically Generating Bug Fixes from Bug Reports, in: 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, 2013, pp. 282–291.
[11]
P. Francis, L. Williams, Determining ”Grim Reaper” Policies to Prevent Languishing Bugs, in: 2013 IEEE International Conference on Software Maintenance, 2013, pp. 436–439.
[12]
Akbarinasaji S., Caglayan B., Bener A., Predicting bug-fixing time: A replication study using an open source software project, J. Syst. Softw. 136 (2018) 173–186.
[13]
B.S. Rawal, A.K. Tsetse, Analysis of bugs in Google security research project database, in: 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 2015, pp. 116–121.
[14]
Saha R.K., Khurshid S., Perry D.E., Understanding the triaging and fixing processes of long lived bugs, Inf. Softw. Technol. 65 (2015) 114–128.
[15]
Mezouar M.E., Zhang F., Zou Y., Are tweets useful in the bug fixing process? An empirical study on firefox and chrome, Empir. Softw. Eng. 23 (3) (2018) 1704–1742. Available from: https://doi.org/10.1007/s10664-017-9559-4.
[16]
R.K. Saha, S. Khurshid, D.E. Perry, An empirical study of long lived bugs, in: 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), 2014, pp. 144–153.
[17]
R.K. Saha, J. Lawall, S. Khurshid, D.E. Perry, Are These Bugs Really “Normal”?, in: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, (ISSN: 2160-1852) 2015, pp. 258–268.
[18]
Gomes L.A.F., da Silva Torres R., Côrtes M.L., On the prediction of long-lived bugs: An analysis and comparative study using FLOSS projects, Inf. Softw. Technol. 132 (2021) Available from: https://www.sciencedirect.com/science/article/pii/S0950584920302482.
[19]
G. Canfora, M. Ceccarelli, L. Cerulo, M. Di Penta, How Long Does a Bug Survive? An Empirical Study, in: 2011 18th Working Conference on Reverse Engineering, 2011, pp. 191–200.
[20]
Marks L., Zou Y., Hassan A.E., Studying the fix-time for bugs in large open source projects, in: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, Promise ’11, Association for Computing Machinery, New York, NY, USA, 2011, Available from: https://doi.org/10.1145/2020390.2020401.
[21]
Giger E., Pinzger M., Gall H., Predicting the fix time of bugs, in: Proceedings of the 2Nd International Workshop on Recommendation Systems for Software Engineering, RSSE ’10, ACM, New York, NY, USA, 2010, pp. 52–56.
[22]
Singh V.B., Misra S., Sharma M., Bug severity assessment in cross project context and identifying training candidates, J. Inf. Knowl. Manage. 16 (01) (2017).
[23]
N.K.S. Roy, B. Rossi, Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results, in: 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2017, pp. 426–429.
[24]
Gomes L.A.F., da Silva Torres R., Côrtes M.L., Bug report severity level prediction in open source software: A survey and research opportunities, Inf. Softw. Technol. 115 (2019) 58–78.
[25]
H. Rocha, G. de Oliveira, M.T. Valente, H. Marques-Neto, Characterizing Bug Workflows in Mozilla Firefox, in: Proceedings of the 30th Brazilian Symposium on Software Engineering, SBES 2016, Maringá, Brazil, September 19 - 23, 2016, 2016, pp. 43–52.
[26]
Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. Available from: https://www.aclweb.org/anthology/N19-1423.
[27]
González-Carvajal S., Garrido-Merchán E.C., Comparing BERT against traditional machine learning text classification, 2020, arXiv preprint arXiv:2005.13012.
[28]
Sun C., Qiu X., Xu Y., Huang X., How to fine-tune BERT for text classification?, 2020.
[29]
Peters M.E., Ruder S., Smith N.A., To tune or not to tune? Adapting pretrained representations to diverse tasks, in: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), Association for Computational Linguistics, Florence, Italy, 2019, pp. 7–14. Available from: https://www.aclweb.org/anthology/W19-4302.
[30]
Csuvik V., Horváth D., Horváth F., Vidács L., Utilizing source code embeddings to identify correct patches, in: 2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF), IEEE, 2020, pp. 18–25.
[31]
Feng Z., Guo D., Tang D., Duan N., Feng X., Gong M., Shou L., Qin B., Liu T., Jiang D., Zhou M., CodeBERT: A pre-trained model for programming and natural languages, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 1536–1547. Available from: https://aclanthology.org/2020.findings-emnlp.139.
[32]
Guo D., Ren S., Lu S., Feng Z., Tang D., Liu S., Zhou L., Duan N., Svyatkovskiy A., Fu S., et al., Graphcodebert: Pre-training code representations with data flow, 2020, arXiv preprint arXiv:2009.08366.
[33]
Kanade A., Maniatis P., Balakrishnan G., Shi K., Learning and evaluating contextual embedding of source code, in: Proceedings of the 37th International Conference on Machine Learning, ICML ’20, JMLR.org, 2020.
[34]
Wang R., Zhang H., Lu G., Lyu L., Lyu C., Fret: Functional reinforced transformer with BERT for code summarization, IEEE Access 8 (2020) 135591–135604.
[35]
Akimova E.N., Bersenev A.Y., Deikov A.A., Kobylkin K.S., Konygin A.V., Mezentsev I.P., Misilov V.E., A survey on software defect prediction using deep learning, Mathematics 9 (11) (2021) 1180.
[36]
Allamanis M., Jackson-Flux H., Brockschmidt M., Self-supervised bug detection and repair, Adv. Neural Inf. Process. Syst. 34 (2021).
[37]
de Araújo A.F., Marcacini R.M., RE-BERT: Automatic extraction of software requirements from app reviews using BERT language model, in: Proceedings of the 36th Annual ACM Symposium on Applied Computing, SAC ’21, Association for Computing Machinery, New York, NY, USA, 2021, pp. 1321–1327. Available from: https://doi.org/10.1145/3412841.3442006.
[38]
J. Lin, Y. Liu, Q. Zeng, M. Jiang, J. Cleland-Huang, Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models, in: Proceedings of the 43rd International Conference on Sofware Engineering, Vol. 43, Available from:.
[39]
Wang X., Wang Y., Mi F., Zhou P., Wan Y., Liu X., Li L., Wu H., Liu J., Jiang X., SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation, 2021,. arXiv. Available from: https://arxiv.org/abs/2108.04556.
[40]
Zou W., Li E., Fang C., BLESER: Bug localization based on enhanced semantic retrieval, 2021, arXiv preprint arXiv:2109.03555.
[41]
P. Ardimento, C. Mele, Using BERT to Predict Bug-Fixing Time, in: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), 2020, pp. 1–7.
[42]
Zhang T., Yang G., Lee B., Chan A.T.S., Predicting severity of bug report by mining bug repository with concept profile, in: Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC ’15, ACM, New York, NY, USA, 2015, pp. 1553–1558.
[43]
Géron A., Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques To Build Intelligent Systems, O’Reilly Media, 2019.
[44]
Flach P., Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, New York, NY, USA, 2012.
[45]
Marsland S., Machine Learning: An Algorithmic Perspective, Second Edition, second ed., Chapman & Hall/CRC, 2014.
[46]
Haykin S., Neural Networks: A Comprehensive Foundation, second ed., Prentice Hall PTR, Upper Saddle River, NJ, USA, 1998.
[47]
Zhou J., Zhang H., Lo D., Where should the bugs be fixed? - More accurate information retrieval-based bug localization based on bug reports, in: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, IEEE Press, Piscataway, NJ, USA, 2012, pp. 14–24.
[48]
Breiman L., Random Forests, Mach. Learn. 45 (1) (2001) 5–32.
[49]
Tian Y., Ali N., Lo D., Hassan A.E., On the unreliability of bug severity data, Empir. Softw. Engg. 21 (6) (2016) 2298–2323.
[50]
Zhao Y., Cen Y., Data Mining Applications with R, first ed., Academic Press, 2013.
[51]
Kuhn M., Johnson K., Applied Predictive Modeling, in: SpringerLink : Bücher, Springer New York, 2013.
[52]
Luo G., A review of automatic selection methods for machine learning algorithms and hyperparameter values, Netw. Model. Anal. Health Inform. Bioinform. 5 (1) (2016) 18.
[53]
Probst P., Bischl B., Boulesteix A.-L., Tunability: Importance of Hyperparameters of Machine Learning Algorithms, 2018, arXiv e-prints, arXiv:1802.09596. arXiv:1802.09596 [stat.ML].
[54]
Feldman R., Sanger J., Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, New York, NY, USA, 2006.
[55]
Williams G., Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Springer, 2011, p. 374.
[56]
Srivastava A., Sahami M., Text Mining: Classification, Clustering, and Applications, first ed., Chapman and Hall/CRC, 2009.
[57]
Torfi A., Shirvani R.A., Keneshloo Y., Tavaf N., Fox E.A., Natural language processing advancements by deep learning: A survey, 2021.
[58]
Landolt S., Wambsganss T., Söllner M., A Taxonomy for Deep Learning in Natural Language Processing, Hawaii International Conference on System Sciences, 2021.
[59]
Ravichandiran S., Getting Started with Google BERT: Build and Train State-of-the-Art Natural Language Processing Models using BERT, Packt Publishing, 2021.
[60]
Lan Z., Chen M., Goodman S., Gimpel K., Sharma P., Soricut R., ALBERT: A lite BERT for self-supervised learning of language representations, 2019,. Available from: https://arxiv.org/abs/1909.11942.
[61]
Turc I., Chang M.-W., Lee K., Toutanova K., Well-read students learn better: On the importance of pre-training compact models, 2019, arXiv Preprint arXiv:1908.08962v2.
[62]
Sanh V., Debut L., Chaumond J., Wolf T., DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019,. arXiv. Available from: https://arxiv.org/abs/1910.01108.
[63]
Clark K., Luong M.-T., Le Q.V., Manning C.D., ELECTRA: Pre-training text encoders as discriminators rather than generators, in: International Conference on Learning Representations, 2020, Available from: https://openreview.net/forum?id=r1xMH1BtvB.
[64]
A. Lamkanfi, S. Demeyer, Filtering Bug Reports for Fix-Time Analysis, in: 2012 16th European Conference on Software Maintenance and Reengineering, 2012, pp. 379–384.
[65]
Habayeb M., Murtaza S.S., Miranskyy A., Bener A.B., On the use of hidden Markov model to predict the time to fix bugs, in: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, ACM, New York, NY, USA, 2018, p. 700.
[66]
Y. Tian, D. Lo, C. Sun, Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction, in: 2012 19th Working Conference on Reverse Engineering, 2012, pp. 215–224.
[67]
Valdivia Garcia H., Shihab E., Characterizing and predicting blocking bugs in open source projects, in: Proceedings of the 11th Working Conference on Mining Software Repositories, in: MSR 2014, ACM, New York, NY, USA, 2014, pp. 72–81.
[68]
de Jonge E., van der Loo M., An introduction to data cleaning with R, Statist. Netherl. (2013) 53.
[69]
Japkowicz N., Shah M., Evaluating Learning Algorithms: A Classification Perspective, Cambridge University Press, New York, NY, USA, 2011.
[70]
Wilcoxon F., Individual Comparisons by Ranking Methods, Springer New York, New York, NY, 1992, pp. 196–202.
[71]
Kipf T.N., Welling M., Semi-supervised classification with graph convolutional networks, 2016, arXiv preprint arXiv:1609.02907.
[72]
Zhou J., Cui G., Zhang Z., Yang C., Liu Z., Sun M., Graph neural networks: A review of methods and applications, 2018, CoRR, abs/1812.08434. Available from: http://arxiv.org/abs/1812.08434.
[73]
Wu Z., Pan S., Chen F., Long G., Zhang C., Yu P.S., A comprehensive survey on graph neural networks, 2019, CoRR, abs/1901.00596. Available from: http://arxiv.org/abs/1901.00596.

Cited By

View all
  • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/3695988Online publication date: 20-Sep-2024
  • (2024)Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?ACM Transactions on Software Engineering and Methodology10.1145/365444333:6(1-41)Online publication date: 27-Jun-2024
  • (2024)How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction?Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643113(346-347)Online publication date: 14-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 160, Issue C
Aug 2023
251 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 August 2023

Author Tags

  1. Software maintenance
  2. Bug Tracking System
  3. Long-lived bugs
  4. Machine learning
  5. Text mining
  6. Natural Language Processing
  7. BERT

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/3695988Online publication date: 20-Sep-2024
  • (2024)Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?ACM Transactions on Software Engineering and Methodology10.1145/365444333:6(1-41)Online publication date: 27-Jun-2024
  • (2024)How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction?Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3643113(346-347)Online publication date: 14-Apr-2024
  • (2024)Content-based quality evaluation of scientific papers using coarse feature and knowledge entity networkJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10211936:6Online publication date: 1-Jul-2024
  • (2024)Modelling customer requirement for mobile games based on online reviews using BW-CNN and S-Kano modelsExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125142258:COnline publication date: 15-Dec-2024
  • (2024)Enhancing Accessibility in Online Shopping: A Dataset and Summarization Method for Visually Impaired IndividualsSN Computer Science10.1007/s42979-024-03351-w5:8Online publication date: 2-Nov-2024
  • (2024)A three-stage quality evaluation method for experience products: taking animation as an exampleMultimedia Systems10.1007/s00530-024-01401-030:4Online publication date: 8-Jul-2024
  • (2024)User Story Classification with Machine Learning and LLMsKnowledge Science, Engineering and Management10.1007/978-981-97-5492-2_13(161-175)Online publication date: 16-Aug-2024
  • (2023)Effective Recommendation of Cross-Project Correlated Issues based on Issue MetricsProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609462(1-1)Online publication date: 4-Aug-2023

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media