Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1287624.1287683acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

Training on errors experiment to detect fault-prone software modules by spam filter

Published: 07 September 2007 Publication History

Abstract

The fault-prone module detection in source code is of importance for assurance of software quality. Most of previous fault-prone detection approaches are based on software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics. In order to mitigate such difficulties, we propose a novel approach for detecting fault-prone modules using a spam filtering technique, named Fault-Prone Filtering. Because of the increase of needs for spam e-mail detection, the spam filtering technique has been progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in a way that the source code modules are considered as text files and are applied to the spam filter directly. This paper describes the training on errors procedure to apply fault-prone filtering in practice. Since no pre-training is required, this procedure can be applied to actual development field immediately. In order to show the usefulness of our approach, we conducted an experiment using a large source code repository of Java based open source project. The result of experiment shows that our approach can classify about 85% of software modules correctly. The result also indicates that fault-prone modules can be detected relatively low cost at an early stage.

References

[1]
P. Bellini, I. Bruno, P. Nesi, and D. Rogai. Comparing fault-proneness estimation models. In Proc. of 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05), pages 205--214, 2005.
[2]
bogofilter. http://bogofilter.sourceforge.net/.
[3]
L. C. Briand, W. L. Melo, and J. Wust. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering, 28(7):706--720, 2002.
[4]
S. Chhabra, W. S. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In Proc. of Fourth IEEE International Conference on Data Mining (ICDM 2004), pages 347--350, 2004.
[5]
CRM114 -- the Controllable Regex Mutilator. http://crm114.sourceforge.net/.
[6]
G. Denaro and M. Pezze. An empirical evaluation of fault-proneness models. In Proc. of 24th International Conference on Software Engineering (ICSE'02), pages 241--251, 2002.
[7]
Eclipse Project. http://www.eclipse.org/.
[8]
P. Graham. Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pages 121--129. O'Reilly Media, 2004.
[9]
L. Guo, B. Cukic, and H. Singh. Predicting fault prone modules by the dempster--shafer belief networks. In Proc. of 18th IEEE International Conference on Automated Software Engineering (ASE'03), pages 249--252, 2003.
[10]
T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Trans. on Software Engineering, 28(7):654--670, 2002.
[11]
T. M. Khoshgoftaar and E. B. Allen. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineeering, 6(4):303--317, 1999.
[12]
T. M. Khoshgoftaar and E. B. Allen. Controlling overfitting in classification tree models of software quality. Empirical Software Engineering, 6(1):59--79, 2001.
[13]
T. M. Khoshgoftaar, E. B. Allen, and J. Deng. Using regressin trees to classify fault-prone software modules. IEEE Transactions on Reliability, 51(4):455--462, 2002.
[14]
T. M. Khoshgoftaar and N. Seliya. Software quality classification modeling using SPRINT decision tree algorithm. In Proc. of 14th International Conference on Tools with Artificial Intelligence, pages 365--374, 2002.
[15]
T. M. Khoshgoftaar and N. Seliya. Comparative assessment of software quality classification techniques: An empirical study. Empirical Software Engineering, 9:229--257, 2004.
[16]
T. M. Khoshgoftaar, R. Shan, and E. B. Allen. Using product, process, and execution metrics to predict fault-prone software modules with classification trees. In Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE'00), pages 301--310, 2000.
[17]
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 33(1):2--13, January 2007.
[18]
O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In Proc. of 2007 International Workshop on Mining Software Repositories (MSR2007), page 4, 2007.
[19]
NASA's Metrics Data Program. http://mdp.ivv.nasa.gov/.
[20]
POPFile. http://popfile.sourceforge.net/.
[21]
Postini Inc. Postini Announces Top Five 2007 Messaging Security Predictions As Email Spam Becomes Front Burner Issue Again In The New Year. http://www.postini.com/news events/pr/pr120606.php.
[22]
N. Seliya, T. M. Khoshgoftaar, and S. Zhong. Analyzing software quality with limited fault-proneness defect data. In Proc. of Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05), pages 89--98, 2005.
[23]
C. Siefkes, F. Assis, S. Chhabra, and W. S. Yerazunis. Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In Proc. of Conference on Machine Learning (ECML) / European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004.
[24]
J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? (on fridays.). In Proc. of Mining Software Repository 2005, pages 24--28, 2005.
[25]
SpamAssassin. http://spamassassin.apache.org/index.html.
[26]
C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. Experimentation in software engineering: An introduction. Kluwer Academic Publishers, 2000.

Cited By

View all
  • (2024)TraceJIT: Evaluating the Impact of Behavioral Code Change on Just-In-Time Defect Prediction2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00065(580-591)Online publication date: 12-Mar-2024
  • (2023)Security‐based code smell definition, detection, and impact quantification in AndroidSoftware: Practice and Experience10.1002/spe.325753:11(2296-2321)Online publication date: 9-Sep-2023
  • (2020)Effort-aware just-in-time defect identification in practice: a case study at AlibabaProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3417048(1308-1319)Online publication date: 8-Nov-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
September 2007
638 pages
ISBN:9781595938114
DOI:10.1145/1287624
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fault-prone modules
  2. spam filter
  3. text mining

Qualifiers

  • Article

Conference

ESEC/FSE07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TraceJIT: Evaluating the Impact of Behavioral Code Change on Just-In-Time Defect Prediction2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00065(580-591)Online publication date: 12-Mar-2024
  • (2023)Security‐based code smell definition, detection, and impact quantification in AndroidSoftware: Practice and Experience10.1002/spe.325753:11(2296-2321)Online publication date: 9-Sep-2023
  • (2020)Effort-aware just-in-time defect identification in practice: a case study at AlibabaProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3417048(1308-1319)Online publication date: 8-Nov-2020
  • (2019)The impact of context metrics on just-in-time defect predictionEmpirical Software Engineering10.1007/s10664-019-09736-3Online publication date: 8-Aug-2019
  • (2018)Software defect predictionSoftware Quality Journal10.1007/s11219-016-9353-326:2(525-552)Online publication date: 1-Jun-2018
  • (2017)Analyzing and predicting concurrency bugs in open source systems2017 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2017.7965923(721-728)Online publication date: May-2017
  • (2017)Differential analysis of token metric and object oriented metrics for fault predictionInternational Journal of Information Technology10.1007/s41870-017-0004-09:1(93-100)Online publication date: 23-Feb-2017
  • (2016)An Empirical Study on Fault Prediction using Token-Based ApproachProceedings of the International Conference on Advances in Information Communication Technology & Computing10.1145/2979779.2979811(1-7)Online publication date: 12-Aug-2016
  • (2016)Analyzing the Decision Criteria of Software Developers Based on Prospect Theory2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)10.1109/SANER.2016.115(644-648)Online publication date: Mar-2016
  • (2016)Token based approach for cross project prediction of fault prone modules2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT)10.1109/ICCTICT.2016.7514581(215-221)Online publication date: Mar-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media