Nothing Special   »   [go: up one dir, main page]

skip to main content
article

On the variation and specialisation of workload--A case study of the Gnome ecosystem community

Published: 01 August 2014 Publication History

Abstract

Most empirical studies of open source software repositories focus on the analysis of isolated projects, or restrict themselves to the study of the relationships between technical artifacts. In contrast, we have carried out a case study that focuses on the actual contributors to software ecosystems, being collections of software projects that are maintained by the same community. To this aim, we defined a new series of workload and involvement metrics, as well as a novel approach-- $\widetilde{\mathbf{T}}$ -graphs--for reporting the results of comparing multiple distributions. We used these techniques to statistically study how workload and involvement of ecosystem contributors varies across projects and across activity types, and we explored to which extent projects and contributors specialise in particular activity types. Using Gnome as a case study we observed that, next to coding, the activities of localization, development documentation and building are prevalent throughout the ecosystem. We also observed notable differences between frequent and occasional contributors in terms of the activity types they are involved in and the number of projects they contribute to. Occasional contributors and contributors that are involved in many different projects tend to be more involved in the localization activity, while frequent contributors tend to be more involved in the coding activity in a limited number of projects.

References

[1]
Aho AV, Garey MR, Ullman JD (1972) The transitive reduction of a directed graph. SIAM J Comput 1(2):131-137.
[2]
Akritas M, Arnold S, Brunner E (1997) Nonparametric hypotheses and rank statistics for unbalanced factorial designs. J Am Stat Assoc 92:258-265.
[3]
Allison PD (1978) Measures of inequality. Am Sociol Rev 43(6):865-880.
[4]
Antoniol G, Di Penta M, Harman M (2005) Search-based techniques applied to optimization of project planning for a massivemaintenance project. In: Int conf softw maint. Inst Electr Electron Eng, pp 240-249.
[5]
Baxter G, Frean M, Noble J, Rickerby M, Smith H, Visser M, Melton H, Tempero E (2006) Understanding the shape of Java software. SIGPLAN Not 41(10):397-412.
[6]
Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: Int conf program comprehension. Inst Electr Electron Eng, pp 124-133.
[7]
Bird C, Gourley A, Devanbu PT, Gertz M, Swaminathan A (2006) Mining email social networks. In: Min softw repos. Assoc comput mach, pp 137-143.
[8]
Bonaccorsi A, Giannangeli S, Rossi C (2006) Entry strategies under competing standards: hybrid business models in the open source software industry. Manag Sci 52(7):1085-1098.
[9]
Brown BM, Hettmansperger TP (2002) Kruskal-Wallis, multiple comparisons and Efron dice. Aust N Z J Stat 44(4):427-438.
[10]
Brunner E, Munzel U (2000) The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom J 42(1):17-25.
[11]
Brunner E, Munzel U (2002) Nichtparametrische Datenanalysen: Unverbundene Stichproben. Statistik und ihre Anwendungen, Springer.
[12]
Capiluppi A, Lago P, Morisio M (2003) Characteristics of open source projects. In: Conf softw maint reengineering. Inst Electr Electron Eng, pp 317-327.
[13]
Capiluppi A, Serebrenik A, Singer L (2012a) Assessing technical candidates on the social web. IEEE Software 30(1):45-51.
[14]
Capiluppi A, Serebrenik A, Youssef A (2012b) Developing an h-index for OSS developers. In: Min softw repos. Inst Electr Electron Eng, pp 251-254.
[15]
Casebolt JR, Krein JL, MacLean AC, Knutson CD, Delorey DP (2009) Author entropy vs. file size in the GNOME suite of applications. In: Min softw repos. Inst Electr Electron Eng, pp 91-94.
[16]
Christen P (2006) A comparison of personal name matching: Techniques and practical issues. In: Int conf data min. Inst Electr Electron Eng, pp 290-294.
[17]
Christen P, Churches T, Hegland M (2004) Febrl--a parallel open source data linkage system. In: Adv knowl discov data min. Lect Not Comput Sci, vol 3056. Springer, pp 638-647.
[18]
Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51:661-703.
[19]
Cowell FA (2000) Measurement of inequality. In: Handbook of income distribution. Handbooks in economics, vol 1. Elsevier, pp 87-166.
[20]
Cowell FA, Jenkins SP (1995) How much inequality can we explain? A methodology and an application to the United States. Econ J 105(429):421-430.
[21]
D'Ambros M, Lanza M (2009) Visual software evolution reconstruction. J Softw Maint Evol 21:217-232.
[22]
Davies J, German D, Godfrey M, Hindle A (2011) Software bertillonage: finding the provenance of an entity. In: Min softw repos. Assoc comput mach, pp 183-192.
[23]
Dinh-Trong T, Bieman J (2005) The FreeBSD project: a replication case study of open source development. Trans Softw Eng, Inst Electr Electron Eng 31(6):481-494.
[24]
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52-64.
[25]
Dunnett CW (1955) A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 50(272):1096-1121.
[26]
Ernst N, Mylopoulos J (2010) On the perception of software quality requirements during the project lifecycle. In: Requirements engineering: foundation for software quality. Lect Not Comput Sci, vol 6182. Springer, pp 143-157.
[27]
Gabriel KR (1969) Simultaneous test procedures--some theory of multiple comparisons. Ann Math Stat 40(1):224-250.
[28]
German DM (2003) The GNOME project: a case study of open source, global software development. Softw Process Improv Pract 8(4):201-215.
[29]
German DM (2004) Using software trails to reconstruct the evolution of software. J Softw Maint Evol 16(6):367-384.
[30]
Gini C (1921) Measurement of inequality of incomes. Econ J 31:124-126.
[31]
Goeminne M, Mens T (2011a) A comparison of identity merge algorithms for software repositories. Sci Comput Program. Available online 1 Dec 2011, ISSN 0167-6423.
[32]
Goeminne M, Mens T (2011b) Evidence for the Pareto principle in open source software activity. In: Int workshop softw qual maintainab.
[33]
Goeminne M, Mens T (2013) Analysing ecosystems for open source software developer communities. In: Software ecosystems: analyzing and managing business networks in the software industry. Palgrave-MacMillan.
[34]
Gousios G, Kalliamvakou E, Spinellis D (2008) Measuring developer contribution from software repository data. In: Min softw repos. Assoc comput mach, pp 129-132.
[35]
Hindle A, Godfrey MW, Holt RC (2007) Release pattern discovery: A case study of database systems. In: Int conf softw maint. Inst Electr Electron Eng, pp 285-294.
[36]
Hindle A, Herraiz I, Shihab E, Jiang ZM (2010) Mining challenge 2010: FreeBSD, GNOME desktop and Debian/Ubuntu. In: Min softw repos. Inst Electr Electron Eng, pp 82-85.
[37]
Holander M, Wolfe DA (1973) Nonparametric statistical methods. Wiley.
[38]
Iqbal A, Hausenblas M (2012) Integrating developer-related information across open source repositories. In: 13th Int Conf Information reuse and integration (IRI), 2012 Inst Electr Electron Eng, pp 69-76.
[39]
ISO/IEC/IEEE (2009) Standard 9945:2009 information technology--portable operating system interface (posix) base specifications. Issue 7.
[40]
Jergensen C, Sarma A, Wagstrom P (2011) The onion patch: migration in open source ecosystems. In: Gyimóthy T, Zeller A (eds) SIGSOFT found softw eng. Assoc comput mach, pp 70-80.
[41]
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1-2):81-93.
[42]
Khomh F, Di Penta M, Guéhéneuc YG (2009) An exploratory study of the impact of code smells on software change-proneness. In: Work conf reverse eng. Inst Electr Electron Eng, pp 75-84.
[43]
Knuth D (1973) The art of computer programming, vol 3. Sorting and searching. Addison Wesley.
[44]
Koch S, Schneider G (2002) Effort, co-operation and co-ordination in an open source software project: GNOME. Inf Syst J 12(1):27-42.
[45]
Konietschke F (2012) nparcomp. Reference manual.
[46]
Konietschke F, Hothorn LA, Brunner E (2012) Rank-based multiple test procedures and simultaneous confidence intervals. Electron J Stat 6:738-759.
[47]
Kouters E, Vasilescu B, Serebrenik A, van den Brand MGJ (2012) Who's who in Gnome: using LSA to merge software repository identities. In: Int conf softw maint. Inst Electr Electron Eng, pp 592-595.
[48]
Krinke J, Gold N, Jia Y, Binkley D (2010) Cloning and copying between GNOME projects. In: Min softw repos. Inst Electr Electron Eng, pp 98-101.
[49]
Kurtz TE, Link RF, Tukey JW, Wallace DL (1965) Short-cut multiple comparisons for balanced single and double classifications: part 2. Derivations and approximations. Biometrika 52(3-4):485-498.
[50]
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10(8):707-710.
[51]
Linstead E, Baldi P (2009) Mining the coherence of GNOME bug reports with statistical topic models. In: Min softw repos. Inst Electr Electron Eng, pp 99-102.
[52]
Little T (2006) Schedule estimation and uncertainty surrounding the cone of uncertainty. IEEE Software 23(3):48-54.
[53]
Lopez-Fernandez L, Robles G, Gonzalez-Barahona J, Herraiz I (2006) Applying social network analysis techniques to community-driven libre software projects. Int J Inf Technol Web Eng 1(3):27-48.
[54]
Lorenz MO (1905) Methods of measuring the concentration of wealth. J Am Stat Assoc 9(70): 209-219.
[55]
Louridas P, Spinellis D, Vlachos V (2008) Power laws in software. Assoc ComputMach: Trans Softw Eng Meth 18:2:1-2:26;
[56]
Luijten B, Visser J, Zaidman A (2010) Assessment of issue handling efficiency. In: Min softw repos. Inst Electr Electron Eng, pp 94-97.
[57]
Lungu M, Malnati J, Lanza M (2009) Visualizing GNOME with the small project observatory. In: Min softw repos. Inst Electr Electron Eng, pp 103-106.
[58]
Lungu M, Lanza M, Gîrba T, Robbes R (2010) The small project observatory: visualizing software ecosystems. Sci Comput Program 75:264-275.
[59]
de Mendiburu F (2010) Agricolae. Practical manual. Faculty of Economics and Planning, La Molina National Agrarian University, La Molina, Lima, Peru.
[60]
Mens T, Goeminne M (2011) Analysing the evolution of social aspects of open source software ecosystems. In: Int workshop softw ecosystems, CEUR-WS, pp 1-14.
[61]
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. Assoc Comput Mach: Trans Softw Eng Meth 11(3):309-346.
[62]
Moon JY, Sproull L (2000) Essence of distributed work: The case of Linux kernel. First Monday 5(11). http://firstmonday.org/issues/issue5_11/moon/index.html. Accessed December 2011.
[63]
Mordal K, Anquetil N, Laval J, Serebrenik A, Vasilescu B, Ducasse S (2012) Software quality metrics aggregation in industry. J Softw Evol Proc.
[64]
Nakakoji K, Yamamoto Y, Nishinaka Y, Kishida K, Ye Y (2002) Evolution patterns of open-source software systems and communities. In: Int workshop princ softw evol. Assoc comput mach, pp 76-85.
[65]
Neary D, David V (2010) The GNOME census: who writes GNOME? In: GNOME users and developers European conference.
[66]
Neu S, Lanza M, Hattori L, D'Ambros M (2011) Telling stories about GNOME with complicity. In: Intl workshop vis softw underst anal. Inst Electr Electron Eng, pp 1-8.
[67]
Noether GE (1981) Why Kendall tau? Teach Stat 3(2):41-43.
[68]
Pearson K (1895) Note on regression and inheritance in the case of two parents. Royal Soc Proc 58:240-242.
[69]
Poncin W, Serebrenik A, van den Brand MGJ (2011) Process mining software repositories. In: Conf softw maint reengineering. Inst Electr Electron Eng, pp 5-14.
[70]
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C/C++: the art of scientific computing code. Cambridge University Press.
[71]
R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
[72]
Robles G, González-Barahona JM (2005) Developer identification methods for integrated data from various sources. In: Min softw repos. Assoc comput mach, pp 106-110.
[73]
Robles G, Gonzalez-Barahona JM, Merelo JJ (2006) Beyond source code: the importance of other artifacts in software development (a case study). J Syst Softw 79(9):1233-1248.
[74]
Robles G, González-Barahona JM, Izquierdo-Cortazar D, Herraiz I (2009) Tools for the study of the usual data sources found in libre software projects. Int J Open Source Softw Process 1(1):24-45.
[75]
Rose C (2001) Re: Handling Translations. https://mail.gnome.org/archives/gnome-web-list/2001- August/msg00073.html. Accessed December 2011.
[76]
Rose C (2007) Re: Git vs SVN (was: can we improve things?). https://mail.gnome.org/archives/foundation-list/2007-September/msg00050.html. Accessed December 2011.
[77]
Schackmann H, Lichter H (2009) Evaluating process quality in GNOME based on change request data. In: Min softw repos. Inst Electr Electron Eng, pp 95-98.
[78]
Sekhon JS (2011) Multivariate and propensity score matching software with automated balance optimization: the matching package for R. J Stat Softw 42(7):1-52.
[79]
Serebrenik A, van den Brand MGJ (2010) Theil index for aggregation of software metrics values. In: Int conf softw maint. Inst Electr Electron Eng, pp 1-9.
[80]
Serebrenik A, Vasilescu B, van den Brand MGJ (2011) Similar tasks, different effort: Why the same amount of functionality requires different development effort? In: 10th Belg-Neth softw evol semin, pp 4-5.
[81]
Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall.
[82]
Shibuya B, Tamai T (2009) Understanding the process of participating in open source communities. In: Emerg trends in free/libre/open-source softw. Inst Electr Electron Eng, pp 1-6.
[83]
Shihab E, Jiang ZM, Hassan A (2009) On the use of internet relay chat (IRC) meetings by developers of the GNOME GTK+ project. In: Min softw repos. Inst Electr Electron Eng, pp 107-110.
[84]
Souphavanh A, Karoonboonyanan T (2005) Free/open source software: localization. United Nations Asia Pacific Development Information Programme.
[85]
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72-101.
[86]
Stone D (2004) Re: [fdo] Re: on translation regressions due to freedesktop.org dependencies. https://mail.gnome.org/archives/gnome-i18n/2004-July/msg00146.html. Accessed December 2011.
[87]
Taube-Schock C, Walker RJ, Witten IH (2011) Can we avoid high coupling? In: Eur conf object-oriented program. Lect not comp sci, vol 6813. Springer, pp 204-228.
[88]
Terceiro A, Rios LR, Chavez C (2010) An empirical study on the structural complexity introduced by core and peripheral developers in free software projects. In: Braz symp softw eng. Inst Electr Electron Eng, pp 21-29.
[89]
Theil H (1967) Economics and information theory. North-Holland.
[90]
Theil H (1971) Principles of econometrics. John Wiley.
[91]
Tsay JT, Dabbish L, Herbsleb J (2012) Social media and success in open source projects. In: Comp support coop work companion. Assoc comput Mach. New York, NY, USA, pp 223-226.
[92]
Tukey JW (1951) Quick and dirty methods in statistics, part II. Simple analysis for standard designs. In: Am soc qual control, pp 189-197.
[93]
Valverde S (2007) Crossover from endogenous to exogenous activity in open-source software development. Europhys Lett 77(2):20,002.
[94]
Vasa R, Lumpe M, Branch P, Nierstrasz OM (2009) Comparative analysis of evolving software systems using the Gini coefficient. In: Int conf softw maint. Inst Electr Electron Eng, pp 179- 188.
[95]
Vasilescu B, Serebrenik A, van den Brand MGJ (2011a) By no means a study on aggregating software metrics. In: Workshop emerg trends softw metr. Assoc comput Mach, pp 23-26.
[96]
Vasilescu B, Serebrenik A, van den Brand MGJ (2011b) You can't control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In: Int conf softw maint. Inst Electr Electron Eng, pp 313-322.
[97]
Villa L (2007) Re: GNOME Project Organogram. https://mail.gnome.org/archives/marketing-list/2007-February/msg00027.html. Accessed December 2011.
[98]
Vuong QH (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 57(2):307-333.
[99]
Waugh J (2007) GNOME community celebrates 10 years of software freedom, innovation and industry adoption. https://mail.gnome.org/archives/gnome-announce-list/2007-August/msg00048.html. Accessed December 2011.
[100]
Weber S (2004) The success of open source. Harvard University Press.
[101]
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80-83.
[102]
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer.
[103]
Yu L, Ramaswamy S (2007) Mining CVS repositories to understand open-source project developer roles. In: Min softw repos. Inst Electr Electron Eng, p 8.
[104]
Zaidman A, Rompaey BV, van Deursen A, Demeyer S (2011) Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empir Softw Eng 16(3):325-364.
[105]
Zeileis A (2009) ineq:Measuring Inequality, concentration, and poverty. R Foundation for Statistical Computing.
[106]
Zimmerman DW, Zumbo BD (1992) Parametric alternatives to the Student t test under violation of normality and homogeneity of variance. Percept Mot Skills 74(3(1)):835-844.
[107]
Zobel J, Dart P (1996) Phonetic string matching: lessons from information retrieval. In: Int conf res and dev inf retr. Assoc comput mach, pp 166-172.

Cited By

View all
  • (2023)Matching Skills, Past Collaboration, and Limited Competition: Modeling When Open-Source Projects Attract ContributorsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616282(42-54)Online publication date: 30-Nov-2023
  • (2023)Gender Representation Among Contributors to Open-Source Infrastructure: An Analysis of 20 Package Manager EcosystemsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Society10.1109/ICSE-SEIS58686.2023.00025(180-187)Online publication date: 17-May-2023
  • (2023)The Distribution and Disengagement of Women Contributors in Open-Source: 2008--2021Proceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00082(305-307)Online publication date: 14-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 19, Issue 4
August 2014
420 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2014

Author Tags

  1. Case study
  2. Developer community
  3. Metrics
  4. Open source
  5. Software ecosystem

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Matching Skills, Past Collaboration, and Limited Competition: Modeling When Open-Source Projects Attract ContributorsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616282(42-54)Online publication date: 30-Nov-2023
  • (2023)Gender Representation Among Contributors to Open-Source Infrastructure: An Analysis of 20 Package Manager EcosystemsProceedings of the 45th International Conference on Software Engineering: Software Engineering in Society10.1109/ICSE-SEIS58686.2023.00025(180-187)Online publication date: 17-May-2023
  • (2023)The Distribution and Disengagement of Women Contributors in Open-Source: 2008--2021Proceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00082(305-307)Online publication date: 14-May-2023
  • (2022)Managing Episodic Volunteers in Free/Libre/Open Source Software CommunitiesIEEE Transactions on Software Engineering10.1109/TSE.2020.298509348:1(260-277)Online publication date: 1-Jan-2022
  • (2020)Need for TweetProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387466(322-326)Online publication date: 29-Jun-2020
  • (2020)Interface protocol inference to aid understanding legacy software componentsSoftware and Systems Modeling (SoSyM)10.1007/s10270-020-00809-219:6(1519-1540)Online publication date: 1-Nov-2020
  • (2019)A Report on the Teaching of Software Ecosystems in Software Engineering DisciplineProceedings of the XXXIII Brazilian Symposium on Software Engineering10.1145/3350768.3351302(130-139)Online publication date: 23-Sep-2019
  • (2019)The relevance of application domains in empirical findingsProceedings of the 2nd International Workshop on Software Health10.1109/SoHeal.2019.00010(17-24)Online publication date: 28-May-2019
  • (2019)Characterizing the roles of contributors in open-source scientific software projectsProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00069(421-432)Online publication date: 26-May-2019
  • (2019)Going farther togetherProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00078(688-699)Online publication date: 25-May-2019
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media