Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Development effort estimation in free/open source software from activity in version control systems

Published: 01 November 2022 Publication History

Abstract

Effort estimation models are a fundamental tool in software management, and used as a forecast for resources, constraints and costs associated to software development. For Free/Open Source Software (FOSS) projects, effort estimation is especially complex: professional developers work alongside occasional, volunteer developers, so the overall effort (in person-months) becomes non-trivial to determine. The objective of this work it to develop a simple effort estimation model for FOSS projects, based on the historic data of developers’ effort. The model is fed with direct developer feedback to ensure its accuracy. After extracting the personal development profiles of several thousands of developers from 6 large FOSS projects, we asked them to fill in a questionnaire to determine if they should be considered as full-time developers in the project that they work in. Their feedback was used to fine-tune the value of an effort threshold, above which developers can be considered as full-time. With the help of the over 1,000 questionnaires received, we were able to determine, for every project in our sample, the threshold of commits that separates full-time from non-full-time developers. We finally offer guidelines and a tool to apply our model to FOSS projects that use a version control system.

References

[1]
Abdelmoez W, Kholief M, Elsalmy F M (2012) Bug fix-time prediction model using naïve bayes classifier. In: 2012 22nd International conference on computer theory and applications (ICCTA). IEEE, pp 167–172
[2]
Abran A, Desharnais J -M, Aziz F (2016) 3.5 measurement convertibility—from function points to cosmic ffp. Cosmic Function Points: Theory and Advanced Practices 214
[3]
Agrawal A, Rahman A, Krishna R, Sobran A, Menzies T (2018) We don’t need another hero? The impact of “heroes” on software development. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, pp 245–253
[4]
Ahsan S N, Ferzund J, Wotawa F (2009) Program file bug fix effort estimation using machine learning methods for oss. In: SEKE, pp 129–134
[5]
Alomari H A slicing-based effort estimation approach for open-source software projects Int J Adv Comput Eng Netw (IJACEN) 2015 3 8 1-7
[6]
Amor J J, Robles G, Gonzalez-Barahona J M (2006) Effort estimation by characterizing developer activity. In: Proceedings of the 2006 international workshop on economics driven software engineering research. ACM, pp 3–6
[7]
Anbalagan P, Vouk M (2009) On predicting the time taken to correct bug reports in Open Source projects. In: IEEE international conference on software maintenance. ICSM 2009. IEEE, pp 523–526
[8]
Asundi J The need for effort estimation models for open source software projects ACM SIGSOFT Softw Eng Notes 2005 30 4 1-3
[9]
Boehm B (1981) Software engineering economics
[10]
Boehm B W, Madachy R, Steece B, et al. (2000) Software cost estimation with COCOMO II with CDROM. Prentice Hall PTR
[11]
Capiluppi A and Izquierdo-Cortázar D Effort estimation of FLOSS projects: a study of the Linux kernel Empir Softw Eng 2013 18 1 60-88
[12]
Capiluppi A, Michlmayr M (2007) From the cathedral to the bazaar: an empirical study of the lifecycle of volunteer community projects. In: IFIP International conference on open source systems. Springer, pp 31–44
[13]
Capra E, Francalanci C, Merlo F (2007) The economics of open source software: an empirical analysis of maintenance costs. In: IEEE international conference on software maintenance. ICSM 2007. IEEE, pp 395–404
[14]
Capra E, Francalanci C, and Merlo F An empirical study on the relationship between software design quality, development effort and governance in Open Source Projects IEEE Trans Softw Eng 2008 34 6 765-782
[15]
Capra E, Francalanci C, Merlo F (2010) The economics of community open source software projects: an empirical analysis of maintenance effort. Advances in Software Engineering
[16]
Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2)
[17]
Dueñas S, Cosentino V, Robles G, Gonzalez-Barahona J M (2018) Perceval: software project data at your will. In: Proceedings of the 40th international conference on software engineering: companion proceedings, pp 1–4
[18]
Dumke R, Abran A (2016) COSMIC function points: theory and advanced practices. Auerbach Publications
[19]
Fernandez-Ramil J, Izquierdo-Cortazar D, Mens T (2009) What does it take to develop a million lines of Open Source code?. In: Open source ecosystems: diverse communities interacting. Springer, pp 170–184
[20]
Fitzgerald B (2006) The transformation of open source software. Mis Quarterly 587–598
[21]
González-Barahona JM and Robles G On the reproducibility of empirical software engineering studies based on data retrieved from development repositories Empir Softw Eng 2012 17 1–2 75-89
[22]
Hönel S, Ericsson M, Löwe W, Wingkvist A (2018) A changeset-based approach to assess source code density and developer efficacy. In: Proceedings of the 40th international conference on software engineering: companion proceedings, pp 220–221
[23]
Hou Q, Ma Y, Chen J, Xu Y (2014) An empirical study on inter-commit times in svn. In: SEKE, pp 132–137
[24]
Jorgensen M and Shepperd M A systematic review of software development cost estimation studies IEEE Trans Softw Eng 2007 33 1 33-53
[25]
Kalliamvakou E, Gousios G, Spinellis D, Pouloudi N (2009) Measuring developer contribution from software repository data. MCIS 2009:4th
[26]
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101
[27]
Koch S Profiling an open source project ecology and its programmers Electron Mark 2004 14 2 77-88
[28]
Koch S Effort modeling and programmer participation in open source software projects Inf Econ Policy 2008 20 4 345-355
[29]
Koch S and Schneider G Effort, co-operation and co-ordination in an open source software project: GNOME Inf Syst J 2002 12 1 27-42
[30]
Kolassa C, Riehle D, Salim M A (2013a) The empirical commit frequency distribution of open source projects. In: Proceedings of the 9th international symposium on open collaboration, pp 1–8
[31]
Kolassa C, Riehle D, Salim M A (2013b) A model of the commit size distribution of open source. In: International conference on current trends in theory and practice of computer science. Springer, pp 52–66
[32]
Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, pp 124–133
[33]
Kouters E, Vasilescu B, Serebrenik A, van den Brand M G (2012) Who’s who in GNOME: using LSA to merge software repository identities. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 592–595
[34]
Lerner J and Tirole J Some simple economics of open source J Ind Econ 2002 50 2 197-234
[35]
Ma Y, Wu Y, Xu Y (2014) Dynamics of open-source software developer’s commit behavior: an empirical investigation of subversion. In: Proceedings of the 29th annual ACM symposium on applied computing, pp 1171–1173
[36]
Malhotra R, Lata K (2020) Using ensembles for class-imbalance problem to predict maintainability of open source software. Int J Reliab Qual Safety Eng 2040011
[37]
Mi Q, Keung J (2016) An empirical analysis of reopened bugs based on open source projects. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, pp 1–10
[38]
Michlmayr M, Fitzgerald B, and Stol K-J Why and how should open source projects adopt time-based releases? IEEE Softw 2015 32 2 55-63
[39]
Mockus A, Votta L G (2000) Identifying reasons for software changes using historic databases. In: International conference on software maintenance. Proceedings. IEEE, pp 120–130
[40]
Mockus A, Fielding RT, and Herbsleb JD Two case studies of open source software development: Apache and mozilla ACM Transa Softw Eng Methodol (TOSEM) 2002 11 3 309-346
[41]
Moulla D and Kolyang COCOMO model for software based on open source: application to the adaptation of triade to the university system Int J Comput Sci Eng (IJCSE) 2013 5 6 522-527
[42]
Moulla D K, Damakoa I, Kolyang D T (2014) Application of function points to software based on open source: a case study. In: 2014 Joint conference of the international workshop on software measurement and the international conference on software process and product measurement. IEEE, pp 191–195
[43]
Porru S, Murgia A, Demeyer S, Marchesi M, Tonelli R (2016) Estimating story points from issue reports. In: Proceedings of the the 12th international conference on predictive models and data analytics in software engineering, pp 1–10
[44]
Riehle D, Riemer P, Kolassa C, Schmidt M (2014) Paid vs. volunteer work in open source. In: 2014 47th Hawaii international conference on system sciences. IEEE, pp 3286–3295
[45]
Robles G and Gonzalez-Barahona JM Developer identification methods for integrated data from various sources ACM SIGSOFT Softw Eng Notes 2005 30 4 1-5
[46]
Robles G, Koch S, González-Barahona J M, Carlos J (2004) Remote analysis and measurement of libre software systems by means of the cvsanaly tool. In: Proceedings of the 2nd ICSE workshop on remote analysis and measurement of software systems (RAMSS). IET, pp 51–56
[47]
Robles G, González-Barahona J M, Cervigón C, Capiluppi A, Izquierdo-Cortázar D (2014) Estimating development effort in free/open source software projects by mining software repositories: a case study of openstack. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 222–231
[48]
Shah SK Motivation, governance, and the viability of hybrid forms in open source software development Manag Sci 2006 52 7 1000-1014
[49]
Sowe SK, Stamelos I, and Angelis L Understanding knowledge sharing activities in free/open source software projects: an empirical study J Syst Softw 2008 81 3 431-446
[50]
Steinmacher I, Conte T, Gerosa M A, Redmiles D (2015) Social barriers faced by newcomers placing their first contribution in open source software projects. In: Proceedings of the 18th ACM conference on computer supported cooperative work & social computing, pp 1379–1392
[51]
Thung F (2016) Automatic prediction of bug fixing effort measured by code churn size. In: Proceedings of the 5th international workshop on software mining, pp 18–23
[52]
Von Krogh G, Spaeth S, and Lakhani KR Community, joining, and specialization in open source software innovation: a case study Res Policy 2003 32 7 1217-1241
[53]
Wiese I S, da Silva J T, Steinmacher I, Treude C, Gerosa M A (2016) Who is who in the mailing list? comparing six disambiguation heuristics to identify multiple addresses of a participant. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 345–355
[54]
Wu H, Shi L, Chen C, Wang Q, Boehm B (2016) Maintenance effort estimation for open source software: a systematic literature review. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 32–43
[55]
Yang Y, Harman M, Krinke J, Islam S, Binkley D, Zhou Y, Xu B (2016) An empirical study on dependence clusters for effort-aware fault-proneness prediction. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 296–307
[56]
Yu L Indirectly predicting the maintenance effort of open-source software J Softw Maint Evol: Res Pract 2006 18 5 311-332
[57]
Zhao Y, Zhang F, Shihab E, Zou Y, Hassan A E (2016) How are discussions associated with bug reworking? an empirical study on open source projects. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10

Cited By

View all
  • (2024)How Are Paid and Volunteer Open Source Developers Different? A Study of the Rust ProjectProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639197(1-13)Online publication date: 20-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 27, Issue 6
Nov 2022
1651 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 November 2022
Accepted: 18 March 2022

Author Tags

  1. Effort estimation
  2. Open source
  3. Free software
  4. Mining software repositories
  5. Versioning system
  6. Commits

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)How Are Paid and Volunteer Open Source Developers Different? A Study of the Rust ProjectProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639197(1-13)Online publication date: 20-May-2024

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media