Abstract
Social coding enables collaborative software development in virtual and distributed communities. Social coding platforms (e.g., GitHub) provide the pull request feature that allows developers to clone a project, make code changes, and request the project owners to review and integrate the code changes to the main stream of a project. The pull request feature has been widely adopted by a large number of GitHub projects, as it minimizes the risk of exposing the projects to the open communities. The efficiency of the pull requests review process depends both on technical (e.g., the code quality) and social (e.g., the connection of a contributor to the project maintainer) factors. However, it is still unclear which social factors have the most impact on the efficiency of the review process. To identify the social factors, we study the team structures formed by the developers within the projects that adopt the pull-based development model. We build the pull-based networks, where two developers are linked if one has integrated a pull request submitted by the other. We investigate the 7,850 most popular projects on GitHub that are developed in ten programming languages. We identify the network metrics that have a significant association with the speed of processing the pull requests. Specifically, maintaining a strong core of contributors and denser interactions among the developers is associated with faster response and processing of the pull requests. We further find that more than 90% of the studied projects follow 8 dominant team structures out of 18 possible team structures. In the larger projects, only a set of developers is granted review and integration privileges of the pull requests, reflecting a strict decision making process. The small to medium projects are characterized by a small number of core contributors who maintain repeated interactions, and are able to process the incoming pull requests more efficiently. The evolution of the team structures of projects over time reveals that only a low percentage of the projects witnesses a change towards team structures associated to faster pull requests processing (e.g., stronger centralization).
Similar content being viewed by others
References
Anderson BS, Butts C, Carley K (1999) The interaction of size and density with graph-level indices. Soc Networks 21(3):239–267
Barr ET, Bird C, Rigby PC, Hindle A, German DM, Devanbu P (2012) Cohesive and isolated development with branches. In: Fundamental Approaches to Software Engineering, Springer, pp 316–331
Bersani FS, Lindqvist D, Mellon SH, Epel ES, Yehuda R, Flory J, Henn-Hasse C, Bierer LM, Makotkine I, Abu-Amara D, Coy M, Reus VI, Lin J, Blackburn EH, Marmar C, Wolkowitz OM (2016) Association of dimensional psychological health measures with telomere length in male war veterans. J Affect Disord 190:537–542
Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: 2010 IEEE 18th International Conference on Program Comprehension (ICPC), pp 124–133
Bird C, Gourley A, Devanbu P, Swaminathan A, Hsu G (2007) Open borders? Immigration in open source projects. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, IEEE Computer Society, MSR ’07, pp 6
Bird C, Pattison D, D’Souza R, Filkov V, Devanbu P (2008) Latent social structure in open source projects. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, ACM, pp 24–35
Butts CT, et al. (2008) Social network analysis with sna. J Stat Softw 24 (6):1–51
Capra E, Francalanci C, Merlo F (2008) An empirical study on the relationship between software design quality, development effort and governance in open source projects. IEEE Trans Softw Eng 34(6):765–782
Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88:2–9
Crowston K, Howison J (2005) The social structure of free and open source software development. First Monday 10(2). https://doi.org/10.5210/fm.v0i0.1478
Crowston K, Howison J (2006) Hierarchy and centralization in free and open source software team communications. Knowl Technol Policy 18(4):65–85
Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in github: Transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’12, pp 1277–1286
de Reus MA, van den Heuvel MP (2013) The parcellation-based connectome: Limitations and extensions. NeuroImage 80:397–404. mapping the Connectome
Dinh-Trong TT, Bieman JM (2005) The freebsd project: a replication case study of open source development. IEEE Trans Softw Eng 31(6):481–494
Ducheneaut N (2005) Socialization in an open source software community: a socio-technical analysis. Comput Supported Coop Work (CSCW) 14(4):323–368
Ehrlich K, Cataldo M (2012) All-for-one and one-for-all?: A multi-level analysis of communication patterns and individual performance in geographically distributed software development. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’12, pp 945–954
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry pp 35–41
Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Networks 1(3):215–239
Gacek C, Arief B (2004) The many meanings of open source. IEEE Softw 21 (1):34–40
Garlaschelli D, Loffredo MI (2004) Patterns of link reciprocity in directed networks. Phys Rev Lett 93(26):268,701
Gharehyazie M, Posnett D, Vasilescu B, Filkov V (2015) Developer initiation and social interactions in oss: a case study of the apache software foundation. Empir Softw Eng 20(5):1318–1353
Gousios G (2013) The ghtorent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, pp 233–236
Gousios G, Pinzger M, Av Deursen (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, ICSE 2014, pp 345–355
Gousios G, Zaidman A, Storey MA, van Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering, vol 1. IEEE Press, Piscataway, ICSE ’15, pp 358–368
Gousios G, Storey MA, Bacchelli A (2016) Work practices and challenges in pull-based development: The contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering, ACM, New York, ICSE ’16, pp 285–296
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M (2008) Statnet: Software tools for the representation, visualization, analysis and simulation of network data. J Stat Softw 24(1):1548
Howison J, Inoue K, Crowston K (2006) Social dynamics of free and open source team communications. In: IFIP International Conference on Open Source Systems, Springer, pp 319–330
Jiang Y, Adams B, German DM (2013) Will my patch make it? and how fast?: Case study on the linux kernel. In: Proceedings of the 10th Working Conference on Mining Software Repositories, IEEE Press, Piscataway, MSR ’13, pp 101–110
Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: An empirical study on count and network metrics. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), IEEE, pp 164–174
Krackhardt D (1994) Graph theoretical dimensions of informal organizations. Computational Organization Theory 89(112):123–140
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics pp 50–60
Marlow J, Dabbish L, Herbsleb J (2013) Impression formation in online peer production: Activity traces and personal profiles in github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, ACM, New York, CSCW ’13, pp 117–128
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11 (3):309–346
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, ACM, New York, ICSE ’08, pp 181–190
Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: An empirical case study. In: Proceedings of the First International Symposium on Empirical Software Engineering and Measurement, IEEE Computer Society, Washington, ESEM ’07, pp 364–373
O’Mahony S, Ferraro F (2007) The emergence of governance in an open source community. Acad Manag J 50(5):1079–1106
Rick (2013) View long-running pull requests
Rigby PC, Storey MA (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering, ACM, New York, ICSE ’11, pp 541–550
Rigby PC, Barr ET, Bird C, Devanbu P, German DM (2013) What effect does distributed version control have on oss project organization? In: 2013 1st International Workshop on Release Engineering (RELENG), IEEE, pp 29–32
Robertsa J, Hann IH, Slaughter S (2006) Communication networks in an open source software project. In: IFIP International Conference on Open Source Systems, Springer, pp 297–306
Schall D (2014) Who to follow recommendation in large-scale online development communities. Inf Softw Technol 56(12):1543–1555. special issue: Human Factors in Software Development
Sheskin DJ (2007) Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC, Boca Raton
Siegel S (1956) Nonparametric Statistics for the Behavioral Sciences. McGraw-hill, New York
Steel RGD, Torrie JH (1960) Principles and Procedures of Statistics: with Special Reference to the Biological Sciences. McGraw-Hill, New York
Tsay J, Dabbish L, Herbsleb J (2014a) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, ACM, New York, ICSE 2014, pp 356–366
Tsay J, Dabbish L, Herbsleb J (2014b) Let’s talk about it: Evaluating contributions through discussion in github. In: Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, New York, FSE 2014, pp 144–154
Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ACM, pp 805–816
von Krogh G, Spaeth S, Lakhani KR (2003) Community, joining, and specialization in open source software innovation: a case study. Res Policy 32 (7):1217–1241. open Source Software Development
Wasserman S, Faust K (1994) Social Network Analysis: Methods and Applications, vol 8. Cambridge University Press, Cambridge
Wolf T, Schroter A, Damian D, Nguyen T (2009) Predicting build failures using social network analysis on developer communication. In: Proceedings of the 31st International Conference on Software Engineering, IEEE Computer Society, Washington, ICSE ’09, pp 1–11
Yu Y, Wang H, Yin G, Ling CX (2014a) Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 21St Asia-pacific Software Engineering Conference, vol 1. pp 335–342
Yu Y, Yin G, Wang H, Wang T (2014b) Exploring the patterns of social behavior in github. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, ACM, New York, CrowdSoft 2014, pp 31–36
Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp 367–371
Zanetti MS, Scholtes I, Tessone CJ, Schweitzer F (2013) Categorizing bugs with social networks: A case study on four open source software communities. In: Proceedings of the 2013 International Conference on Software Engineering, IEEE Press, Piscataway, ICSE ’13, pp 1032–1041
Zar JH (2005) Spearman Rank Correlation. John Wiley & Sons, Ltd
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Jeffrey C. Carver
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
El Mezouar, M., Zhang, F. & Zou, Y. An empirical study on the teams structures in social coding using GitHub projects. Empir Software Eng 24, 3790–3823 (2019). https://doi.org/10.1007/s10664-019-09700-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-019-09700-1