Article

Training on errors experiment to detect fault-prone software modules by spam filter

Authors:

Tohru KikunoAuthors Info & Claims

ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

Pages 405 - 414

https://doi.org/10.1145/1287624.1287683

Published: 07 September 2007 Publication History

Abstract

The fault-prone module detection in source code is of importance for assurance of software quality. Most of previous fault-prone detection approaches are based on software metrics. Such approaches, however, have difficulties in collecting the metrics and constructing mathematical models based on the metrics. In order to mitigate such difficulties, we propose a novel approach for detecting fault-prone modules using a spam filtering technique, named Fault-Prone Filtering. Because of the increase of needs for spam e-mail detection, the spam filtering technique has been progressed as a convenient and effective technique for text mining. In our approach, fault-prone modules are detected in a way that the source code modules are considered as text files and are applied to the spam filter directly. This paper describes the training on errors procedure to apply fault-prone filtering in practice. Since no pre-training is required, this procedure can be applied to actual development field immediately. In order to show the usefulness of our approach, we conducted an experiment using a large source code repository of Java based open source project. The result of experiment shows that our approach can classify about 85% of software modules correctly. The result also indicates that fault-prone modules can be detected relatively low cost at an early stage.

References

[1]

P. Bellini, I. Bruno, P. Nesi, and D. Rogai. Comparing fault-proneness estimation models. In Proc. of 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05), pages 205--214, 2005.

Digital Library

[2]

bogofilter. http://bogofilter.sourceforge.net/.

[3]

L. C. Briand, W. L. Melo, and J. Wust. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering, 28(7):706--720, 2002.

Digital Library

[4]

S. Chhabra, W. S. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In Proc. of Fourth IEEE International Conference on Data Mining (ICDM 2004), pages 347--350, 2004.

Digital Library

[5]

CRM114 -- the Controllable Regex Mutilator. http://crm114.sourceforge.net/.

[6]

G. Denaro and M. Pezze. An empirical evaluation of fault-proneness models. In Proc. of 24th International Conference on Software Engineering (ICSE'02), pages 241--251, 2002.

Digital Library

[7]

Eclipse Project. http://www.eclipse.org/.

[8]

P. Graham. Hackers and Painters: Big Ideas from the Computer Age, chapter 8, pages 121--129. O'Reilly Media, 2004.

[9]

L. Guo, B. Cukic, and H. Singh. Predicting fault prone modules by the dempster--shafer belief networks. In Proc. of 18th IEEE International Conference on Automated Software Engineering (ASE'03), pages 249--252, 2003.

Digital Library

[10]

T. Kamiya, S. Kusumoto, and K. Inoue. Ccfinder: A multi-linguistic token-based code clone detection system for large scale source code. IEEE Trans. on Software Engineering, 28(7):654--670, 2002.

Digital Library

[11]

T. M. Khoshgoftaar and E. B. Allen. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineeering, 6(4):303--317, 1999.

[12]

T. M. Khoshgoftaar and E. B. Allen. Controlling overfitting in classification tree models of software quality. Empirical Software Engineering, 6(1):59--79, 2001.

Digital Library

[13]

T. M. Khoshgoftaar, E. B. Allen, and J. Deng. Using regressin trees to classify fault-prone software modules. IEEE Transactions on Reliability, 51(4):455--462, 2002.

[14]

T. M. Khoshgoftaar and N. Seliya. Software quality classification modeling using SPRINT decision tree algorithm. In Proc. of 14th International Conference on Tools with Artificial Intelligence, pages 365--374, 2002.

Digital Library

[15]

T. M. Khoshgoftaar and N. Seliya. Comparative assessment of software quality classification techniques: An empirical study. Empirical Software Engineering, 9:229--257, 2004.

Digital Library

[16]

T. M. Khoshgoftaar, R. Shan, and E. B. Allen. Using product, process, and execution metrics to predict fault-prone software modules with classification trees. In Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE'00), pages 301--310, 2000.

Digital Library

[17]

T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 33(1):2--13, January 2007.

Digital Library

[18]

O. Mizuno, S. Ikami, S. Nakaichi, and T. Kikuno. Spam filter based approach for finding fault-prone software modules. In Proc. of 2007 International Workshop on Mining Software Repositories (MSR2007), page 4, 2007.

Digital Library

[19]

NASA's Metrics Data Program. http://mdp.ivv.nasa.gov/.

[20]

POPFile. http://popfile.sourceforge.net/.

[21]

Postini Inc. Postini Announces Top Five 2007 Messaging Security Predictions As Email Spam Becomes Front Burner Issue Again In The New Year. http://www.postini.com/news events/pr/pr120606.php.

[22]

N. Seliya, T. M. Khoshgoftaar, and S. Zhong. Analyzing software quality with limited fault-proneness defect data. In Proc. of Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05), pages 89--98, 2005.

Digital Library

[23]

C. Siefkes, F. Assis, S. Chhabra, and W. S. Yerazunis. Combining winnow and orthogonal sparse bigrams for incremental spam filtering. In Proc. of Conference on Machine Learning (ECML) / European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 2004.

Digital Library

[24]

J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? (on fridays.). In Proc. of Mining Software Repository 2005, pages 24--28, 2005.

Digital Library

[25]

SpamAssassin. http://spamassassin.apache.org/index.html.

[26]

C. Wohlin, P. Runeson, M. Host, M. C. Ohlsson, B. Regnell, and A. Wesslen. Experimentation in software engineering: An introduction. Kluwer Academic Publishers, 2000.

Digital Library

Cited By

Morita IKashiwa YKondo MSohn JMcIntosh SKamei YUbayashi N(2024)TraceJIT: Evaluating the Impact of Behavioral Code Change on Just-In-Time Defect Prediction2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00065(580-591)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00065
Zhong YShi MHe JFang CChen Z(2023)Security‐based code smell definition, detection, and impact quantification in AndroidSoftware: Practice and Experience10.1002/spe.325753:11(2296-2321)Online publication date: 9-Sep-2023
https://doi.org/10.1002/spe.3257
Yan MXia XFan YLo DHassan AZhang XDevanbu PCohen MZimmermann T(2020)Effort-aware just-in-time defect identification in practice: a case study at AlibabaProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3417048(1308-1319)Online publication date: 8-Nov-2020
https://dl.acm.org/doi/10.1145/3368089.3417048
Show More Cited By

Index Terms

Training on errors experiment to detect fault-prone software modules by spam filter
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability
        Software product lines
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

An extension of fault-prone filtering using precise training and a dynamic threshold
MSR '08: Proceedings of the 2008 international working conference on Mining software repositories

Fault-prone module detection in source code is important for assurance of software quality. Most previous fault-prone detection approaches have been based on software metrics. Such approaches, however, have difficulties in collecting the metrics and in ...
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator

This paper describes a novel approach for detecting fault-prone modules using a spam filtering technique. Fault-prone module detection in source code is important for the assurance of software quality. Most previous fault-prone detection approaches have ...
Predicting Fault-Prone Software Modules in Telephone Switches

An empirical study was carried out at Ericsson Telecom AB to investigate the relationship between several design metrics and the number of function test failure reports associated with software modules. A tool, ERIMET, was developed to analyze the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

September 2007

638 pages

ISBN:9781595938114

DOI:10.1145/1287624

General Chair:
Ivica Crnkovic
Mälardalen University, Sweden
,
Program Chair:
Antonia Bertolino
ISTI-CNR, Italy

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ESEC/FSE07

Sponsor:

ESEC/FSE07: Joint 11th European Software Engineering Conference 2007

September 3 - 7, 2007

Dubrovnik, Croatia

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
533
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Morita IKashiwa YKondo MSohn JMcIntosh SKamei YUbayashi N(2024)TraceJIT: Evaluating the Impact of Behavioral Code Change on Just-In-Time Defect Prediction2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00065(580-591)Online publication date: 12-Mar-2024
https://doi.org/10.1109/SANER60148.2024.00065
Zhong YShi MHe JFang CChen Z(2023)Security‐based code smell definition, detection, and impact quantification in AndroidSoftware: Practice and Experience10.1002/spe.325753:11(2296-2321)Online publication date: 9-Sep-2023
https://doi.org/10.1002/spe.3257
Yan MXia XFan YLo DHassan AZhang XDevanbu PCohen MZimmermann T(2020)Effort-aware just-in-time defect identification in practice: a case study at AlibabaProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3417048(1308-1319)Online publication date: 8-Nov-2020
https://dl.acm.org/doi/10.1145/3368089.3417048
Kondo MGerman DMizuno OChoi E(2019)The impact of context metrics on just-in-time defect predictionEmpirical Software Engineering10.1007/s10664-019-09736-3Online publication date: 8-Aug-2019
https://doi.org/10.1007/s10664-019-09736-3
Bowes DHall TPetrić J(2018)Software defect predictionSoftware Quality Journal10.1007/s11219-016-9353-326:2(525-552)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11219-016-9353-3
Ciancarini PPoggi FRossi DSillitti A(2017)Analyzing and predicting concurrency bugs in open source systems2017 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2017.7965923(721-728)Online publication date: May-2017
https://doi.org/10.1109/IJCNN.2017.7965923
Kaur INarula GJain V(2017)Differential analysis of token metric and object oriented metrics for fault predictionInternational Journal of Information Technology10.1007/s41870-017-0004-09:1(93-100)Online publication date: 23-Feb-2017
https://doi.org/10.1007/s41870-017-0004-0
Kaur IBajpai N(2016)An Empirical Study on Fault Prediction using Token-Based ApproachProceedings of the International Conference on Advances in Information Communication Technology & Computing10.1145/2979779.2979811(1-7)Online publication date: 12-Aug-2016
https://dl.acm.org/doi/10.1145/2979779.2979811
Kina KTsunoda MHata HTamada HIgaki H(2016)Analyzing the Decision Criteria of Software Developers Based on Prospect Theory2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER)10.1109/SANER.2016.115(644-648)Online publication date: Mar-2016
https://doi.org/10.1109/SANER.2016.115
Kaur IKapoor N(2016)Token based approach for cross project prediction of fault prone modules2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT)10.1109/ICCTICT.2016.7514581(215-221)Online publication date: Mar-2016
https://doi.org/10.1109/ICCTICT.2016.7514581
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents