research-article

Using Machine Learning Technique for Effort Estimation in Software Development

Authors:

Weldson Amaral,

Geraldo Braz Junior,

Davi VianaAuthors Info & Claims

SBQS '19: Proceedings of the XVIII Brazilian Symposium on Software Quality

Pages 240 - 245

https://doi.org/10.1145/3364641.3364670

Published: 28 October 2019 Publication History

Abstract

Estimates in software projects aim to help practitioners predict more realistic values on software development, impacting the quality of software process activities regarding planning and execution. However, software companies have difficulties when carrying out estimations that represent adequately the real effort needed to execute the software project activities. Although, the literature presents techniques to estimate effort, this activity remains complex. Recently, Machine Learning (ML) techniques are been applied to solve this problem. Through ML techniques it is possible to use databases of finished projects (datasets) to help get more precisely estimations. This research aims to propose a methodology to estimate effort using a ML technique based on decision trees: XGBoost. To evaluate our methodology, we conducted tests with four datasets using two metrics: Mean Magnitude Relative Error and Prediction(25). The preliminary results show consistent results for this methodology for software effort estimation based on the employed metrics, which indicates that our methodology is promising. As further work, new datasets must be analyzed using our methodology, and also an approach using synthetic data to improve the ML training.

References

[1]

Tamer Mohamed Abdellatif. 2018. A Comparison Study Between Soft Computing and Statistical Regression Techniques for Software Effort Estimation. In 2018 IEEE Canadian Conference on Electrical & Computer Engineering (CCECE). IEEE, 1--5.

[2]

Márcio de Oliveira Barros Arthur Lopes. 2019. Programação Genética Aplicada a Estimativas de Projeto de Software. In X Workshop de Engenharia de Software Baseada em Buscas. WESB, 1--2.

[3]

Mohammad Azzeh and Ali Bou Nassif. 2013. Fuzzy model tree for early effort estimation. In 2013 12th International Conference on Machine Learning and Applications, Vol. 2. IEEE, 117--121.

Digital Library

[4]

Mohammad Azzeh, Daniel Neagu, and Peter Cowling. 2009. Software effort estimation based on weighted fuzzy grey relational analysis. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering. ACM, 8.

Digital Library

[5]

Kelly Bettio, Andreia Malucelli, Giovanna Tiboni, and Renato Ferraz Machado. 2012. Análise da Precisão de Estimativas de Projetos de Software Utilizando Redes Bayesianas. In XI Simpósio Brasileiro De Qualidade De Software. SBQS.

[6]

B. Boehm. 1981. Software Engineering Economics. Prentice Hall.

Digital Library

[7]

Robert N Charette. 2005. Why software fails [software failure]. IEEE spectrum 42, 9 (2005), 42--49.

Digital Library

[8]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794.

Digital Library

[9]

Adriano LI de Oliveira, Sergio Soares, et al. 2010. Hybrid intelligent design of morphological-rank-linear perceptrons for software development cost estimation. In 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, Vol. 1. IEEE, 160--167.

[10]

JM Desharnais. 1989. Analyse statistique de la productivitie des projects informatique a partie de la technique des point des function. Masters Thesis University of Montreal (1989).

[11]

Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics 28, 2 (2000), 337--407.

[12]

Ali Idri, Mohamed Hosni, and Alain Abran. 2016. Systematic literature review of ensemble effort estimation. Journal of Systems and Software 118 (2016), 151--175.

Digital Library

[13]

Magne Jorgensen and Martin Shepperd. 2006. A systematic review of software development cost estimation studies. IEEE Transactions on software engineering 33, 1 (2006), 33--53.

Digital Library

[14]

Chris F Kemerer. 1987. An empirical validation of software cost estimation models. Commun. ACM 30, 5 (1987), 416--429.

Digital Library

[15]

Tuan Khanh Le-Do, Kyung-A Yoon, Yeong-Seok Seo, and Doo-Hwan Bae. 2010. Filtering of inconsistent software project data for analogy-based effort estimation. In 2010 IEEE 34th Annual Computer Software and Applications Conference. IEEE, 503--508.

Digital Library

[16]

Jingzhou Li and Guenther Ruhe. 2008. Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+. Empirical Software Engineering 13, 1 (2008), 63--96.

Digital Library

[17]

Yan-Fu Li, Min Xie, and TN Goh. 2009. A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Systems with Applications 36, 3 (2009), 5921--5931.

Digital Library

[18]

Yan-Fu Li, Min Xie, and Thong-Ngee Goh. 2010. Adaptive ridge regression system for software cost estimating on multi-collinear datasets. Journal of Systems and Software 83, 11 (2010), 2332 -- 2343. https://doi.org/10.1016/j.jss.2010.07.032 Interplay between Usability Evaluation and Software Development.

Digital Library

[19]

Werney Ayala Luz Lira, Francisco Vanderson de Moura Alves, Pedro de Alcântara dos Santos Neto, Ricardo de Andrade Lira Rabêlo, and Ricardo de Sousa Britto. 2015. Estimativa de Esforço em Projetos Ágeis de Software Utilizando Mapas de Kohonen. In XIV Simpósio Brasileiro De Qualidade De Software. SBQS.

[20]

Katrina Maxwell and K Maxwell. 2002. Applied statistics for software managers. Prentice Hall PTR Englewood Cliffs.

[21]

Emilia Mendes, Nile Mosley, and Ian Watson. 2002. A comparison of case-based reasoning approaches. In Proceedings of the 11th international conference on World Wide Web. ACM, 272--280.

Digital Library

[22]

Tim Menzies, Jeremy Greenwald, and Art Frank. 2006. Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering 33, 1 (2006), 2--13.

[23]

Leandro L Minku and Xin Yao. 2017. Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models. Automated Software Engineering 24, 3 (2017), 499--542.

Digital Library

[24]

Geeta Nagpal, Moin Uddin, and Arvinder Kaur. 2012. A Comparative Study of Estimation by Analogy using Data Mining Techniques. JIPS 8, 4 (2012), 621--652.

[25]

Adriano LI Oliveira, Petronio L Braga, Ricardo MF Lima, and Márcio L Cornélio. 2010. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. information and Software Technology 52, 11 (2010), 1155--1166.

[26]

Harris Papadopoulos, Efi Papatheocharous, and Andreas S Andreou. 2009. Reliable Confidence Intervals for Software Effort Estimation. In AIAI Workshops. 211--220.

[27]

Efi Papatheocharous and Andreas S Andreou. 2012. A hybrid software cost estimation approach utilizing decision trees and fuzzy logic. International Journal of Software Engineering and Knowledge Engineering 22, 03 (2012), 435--465.

[28]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[29]

Przemyslaw Pospieszny, Beata Czarnacka-Chrobot, and Andrzej Kobylinski. 2018. An effective approach for software project effort and duration estimation with machine learning algorithms. Journal of Systems and Software 137 (2018), 184--196.

[30]

Shashank Mouli Satapathy, Barada Prasanna Acharya, and Santanu Kumar Rath. 2016. Early stage software effort estimation using random forest technique based on use case points. IET Software 10, 1 (2016), 10--17.

[31]

J. Sayyad Shirabad and T.J. Menzies. 2005. The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository

[32]

Liyan Song, Leandro L Minku, and Xin Yao. 2018. A novel automated approach for software effort estimation based on data augmentation. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 468--479.

Digital Library

[33]

Ivan AP Tierno and Daltro J Nunes. 2013. An Extended Assessment of Data-Driven Bayesian Networks in Software Effort Prediction. In 2013 27th Brazilian Symposium on Software Engineering. IEEE, 157--166.

[34]

Jianfeng Wen, Shixian Li, Zhiyong Lin, Yong Hu, and Changqin Huang. 2012. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology 54, 1 (2012), 41--59.

Digital Library

Cited By

Terlapu PRaju KKiran Kumar GJagadeeswara Rao GKavitha KSamreen S(2024)Improved Software Effort Estimation Through Machine Learning: Challenges, Applications, and Feature Importance AnalysisIEEE Access10.1109/ACCESS.2024.345777112(138663-138701)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3457771
Srivastava DSharma AChoudhary D(2021)Software Development Effort Estimation Using Machine Learning Techniques: Multi-linear Regression versus Random Forest2021 International Conference on Computing, Communication and Green Engineering (CCGE)10.1109/CCGE50943.2021.9776394(1-5)Online publication date: 23-Sep-2021
https://doi.org/10.1109/CCGE50943.2021.9776394

Index Terms

Using Machine Learning Technique for Effort Estimation in Software Development
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods
        Boosting
2. General and reference
  1. Cross-computing tools and techniques
    1. Estimation

Recommendations

Insightful analogy-based software development effort estimation through selective classification and localization

Accurate development effort estimation is a challenging issue in the management of software projects because it can considerably affect the planning and scheduling of a software project. Over the past few years, many algorithmic and non-algorithmic ...
Implementing Software Effort Estimation in a Medium-sized Company
SEW '11: Proceedings of the 2011 IEEE 34th Software Engineering Workshop

Effort estimation in software development projects is far from being an easy task. In fact, despite the several effort estimation techniques available in the literature and the need for companies to perform such task in a daily basis, most small and ...
Improved measurement of software development effort estimation bias
Highlights
- An effort estimate is the estimate of a point in an effort probability distribution.
Abstract Context
While prior software development effort estimation research has examined the properties of estimation error measures, there has not been much research on the properties of measures of estimation bias.
...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBQS '19: Proceedings of the XVIII Brazilian Symposium on Software Quality

October 2019

330 pages

ISBN:9781450372824

DOI:10.1145/3364641

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SBC: Brazilian Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conference

SBQS'19

SBQS'19: XVIII Brazilian Symposium on Software Quality

October 28 - November 1, 2019

Fortaleza, Brazil

Acceptance Rates

SBQS '19 Paper Acceptance Rate 35 of 99 submissions, 35%;

Overall Acceptance Rate 35 of 99 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
263
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Terlapu PRaju KKiran Kumar GJagadeeswara Rao GKavitha KSamreen S(2024)Improved Software Effort Estimation Through Machine Learning: Challenges, Applications, and Feature Importance AnalysisIEEE Access10.1109/ACCESS.2024.345777112(138663-138701)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3457771
Srivastava DSharma AChoudhary D(2021)Software Development Effort Estimation Using Machine Learning Techniques: Multi-linear Regression versus Random Forest2021 International Conference on Computing, Communication and Green Engineering (CCGE)10.1109/CCGE50943.2021.9776394(1-5)Online publication date: 23-Sep-2021
https://doi.org/10.1109/CCGE50943.2021.9776394

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents