Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3364641.3364670acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbqsConference Proceedingsconference-collections
research-article

Using Machine Learning Technique for Effort Estimation in Software Development

Published: 28 October 2019 Publication History

Abstract

Estimates in software projects aim to help practitioners predict more realistic values on software development, impacting the quality of software process activities regarding planning and execution. However, software companies have difficulties when carrying out estimations that represent adequately the real effort needed to execute the software project activities. Although, the literature presents techniques to estimate effort, this activity remains complex. Recently, Machine Learning (ML) techniques are been applied to solve this problem. Through ML techniques it is possible to use databases of finished projects (datasets) to help get more precisely estimations. This research aims to propose a methodology to estimate effort using a ML technique based on decision trees: XGBoost. To evaluate our methodology, we conducted tests with four datasets using two metrics: Mean Magnitude Relative Error and Prediction(25). The preliminary results show consistent results for this methodology for software effort estimation based on the employed metrics, which indicates that our methodology is promising. As further work, new datasets must be analyzed using our methodology, and also an approach using synthetic data to improve the ML training.

References

[1]
Tamer Mohamed Abdellatif. 2018. A Comparison Study Between Soft Computing and Statistical Regression Techniques for Software Effort Estimation. In 2018 IEEE Canadian Conference on Electrical & Computer Engineering (CCECE). IEEE, 1--5.
[2]
Márcio de Oliveira Barros Arthur Lopes. 2019. Programação Genética Aplicada a Estimativas de Projeto de Software. In X Workshop de Engenharia de Software Baseada em Buscas. WESB, 1--2.
[3]
Mohammad Azzeh and Ali Bou Nassif. 2013. Fuzzy model tree for early effort estimation. In 2013 12th International Conference on Machine Learning and Applications, Vol. 2. IEEE, 117--121.
[4]
Mohammad Azzeh, Daniel Neagu, and Peter Cowling. 2009. Software effort estimation based on weighted fuzzy grey relational analysis. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering. ACM, 8.
[5]
Kelly Bettio, Andreia Malucelli, Giovanna Tiboni, and Renato Ferraz Machado. 2012. Análise da Precisão de Estimativas de Projetos de Software Utilizando Redes Bayesianas. In XI Simpósio Brasileiro De Qualidade De Software. SBQS.
[6]
B. Boehm. 1981. Software Engineering Economics. Prentice Hall.
[7]
Robert N Charette. 2005. Why software fails [software failure]. IEEE spectrum 42, 9 (2005), 42--49.
[8]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794.
[9]
Adriano LI de Oliveira, Sergio Soares, et al. 2010. Hybrid intelligent design of morphological-rank-linear perceptrons for software development cost estimation. In 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, Vol. 1. IEEE, 160--167.
[10]
JM Desharnais. 1989. Analyse statistique de la productivitie des projects informatique a partie de la technique des point des function. Masters Thesis University of Montreal (1989).
[11]
Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics 28, 2 (2000), 337--407.
[12]
Ali Idri, Mohamed Hosni, and Alain Abran. 2016. Systematic literature review of ensemble effort estimation. Journal of Systems and Software 118 (2016), 151--175.
[13]
Magne Jorgensen and Martin Shepperd. 2006. A systematic review of software development cost estimation studies. IEEE Transactions on software engineering 33, 1 (2006), 33--53.
[14]
Chris F Kemerer. 1987. An empirical validation of software cost estimation models. Commun. ACM 30, 5 (1987), 416--429.
[15]
Tuan Khanh Le-Do, Kyung-A Yoon, Yeong-Seok Seo, and Doo-Hwan Bae. 2010. Filtering of inconsistent software project data for analogy-based effort estimation. In 2010 IEEE 34th Annual Computer Software and Applications Conference. IEEE, 503--508.
[16]
Jingzhou Li and Guenther Ruhe. 2008. Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+. Empirical Software Engineering 13, 1 (2008), 63--96.
[17]
Yan-Fu Li, Min Xie, and TN Goh. 2009. A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Systems with Applications 36, 3 (2009), 5921--5931.
[18]
Yan-Fu Li, Min Xie, and Thong-Ngee Goh. 2010. Adaptive ridge regression system for software cost estimating on multi-collinear datasets. Journal of Systems and Software 83, 11 (2010), 2332 -- 2343. https://doi.org/10.1016/j.jss.2010.07.032 Interplay between Usability Evaluation and Software Development.
[19]
Werney Ayala Luz Lira, Francisco Vanderson de Moura Alves, Pedro de Alcântara dos Santos Neto, Ricardo de Andrade Lira Rabêlo, and Ricardo de Sousa Britto. 2015. Estimativa de Esforço em Projetos Ágeis de Software Utilizando Mapas de Kohonen. In XIV Simpósio Brasileiro De Qualidade De Software. SBQS.
[20]
Katrina Maxwell and K Maxwell. 2002. Applied statistics for software managers. Prentice Hall PTR Englewood Cliffs.
[21]
Emilia Mendes, Nile Mosley, and Ian Watson. 2002. A comparison of case-based reasoning approaches. In Proceedings of the 11th international conference on World Wide Web. ACM, 272--280.
[22]
Tim Menzies, Jeremy Greenwald, and Art Frank. 2006. Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering 33, 1 (2006), 2--13.
[23]
Leandro L Minku and Xin Yao. 2017. Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models. Automated Software Engineering 24, 3 (2017), 499--542.
[24]
Geeta Nagpal, Moin Uddin, and Arvinder Kaur. 2012. A Comparative Study of Estimation by Analogy using Data Mining Techniques. JIPS 8, 4 (2012), 621--652.
[25]
Adriano LI Oliveira, Petronio L Braga, Ricardo MF Lima, and Márcio L Cornélio. 2010. GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. information and Software Technology 52, 11 (2010), 1155--1166.
[26]
Harris Papadopoulos, Efi Papatheocharous, and Andreas S Andreou. 2009. Reliable Confidence Intervals for Software Effort Estimation. In AIAI Workshops. 211--220.
[27]
Efi Papatheocharous and Andreas S Andreou. 2012. A hybrid software cost estimation approach utilizing decision trees and fuzzy logic. International Journal of Software Engineering and Knowledge Engineering 22, 03 (2012), 435--465.
[28]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[29]
Przemyslaw Pospieszny, Beata Czarnacka-Chrobot, and Andrzej Kobylinski. 2018. An effective approach for software project effort and duration estimation with machine learning algorithms. Journal of Systems and Software 137 (2018), 184--196.
[30]
Shashank Mouli Satapathy, Barada Prasanna Acharya, and Santanu Kumar Rath. 2016. Early stage software effort estimation using random forest technique based on use case points. IET Software 10, 1 (2016), 10--17.
[31]
J. Sayyad Shirabad and T.J. Menzies. 2005. The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository
[32]
Liyan Song, Leandro L Minku, and Xin Yao. 2018. A novel automated approach for software effort estimation based on data augmentation. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 468--479.
[33]
Ivan AP Tierno and Daltro J Nunes. 2013. An Extended Assessment of Data-Driven Bayesian Networks in Software Effort Prediction. In 2013 27th Brazilian Symposium on Software Engineering. IEEE, 157--166.
[34]
Jianfeng Wen, Shixian Li, Zhiyong Lin, Yong Hu, and Changqin Huang. 2012. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology 54, 1 (2012), 41--59.

Cited By

View all
  • (2024)Improved Software Effort Estimation Through Machine Learning: Challenges, Applications, and Feature Importance AnalysisIEEE Access10.1109/ACCESS.2024.345777112(138663-138701)Online publication date: 2024
  • (2021)Software Development Effort Estimation Using Machine Learning Techniques: Multi-linear Regression versus Random Forest2021 International Conference on Computing, Communication and Green Engineering (CCGE)10.1109/CCGE50943.2021.9776394(1-5)Online publication date: 23-Sep-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SBQS '19: Proceedings of the XVIII Brazilian Symposium on Software Quality
October 2019
330 pages
ISBN:9781450372824
DOI:10.1145/3364641
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • SBC: Brazilian Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Boosting
  2. Effort estimation
  3. Machine Learning
  4. Software Projects

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conference

SBQS'19
SBQS'19: XVIII Brazilian Symposium on Software Quality
October 28 - November 1, 2019
Fortaleza, Brazil

Acceptance Rates

SBQS '19 Paper Acceptance Rate 35 of 99 submissions, 35%;
Overall Acceptance Rate 35 of 99 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improved Software Effort Estimation Through Machine Learning: Challenges, Applications, and Feature Importance AnalysisIEEE Access10.1109/ACCESS.2024.345777112(138663-138701)Online publication date: 2024
  • (2021)Software Development Effort Estimation Using Machine Learning Techniques: Multi-linear Regression versus Random Forest2021 International Conference on Computing, Communication and Green Engineering (CCGE)10.1109/CCGE50943.2021.9776394(1-5)Online publication date: 23-Sep-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media