In this section, we summarize and discuss the most important results and observations of our study, and we derive a number of hypotheses to guide further work. As we generated a large amount of data, we present only selected cases and representative figures here (e.g., we use only results from the 6-month time ranges since they are most illustrative). The remaining data are available on the article’s supplementary website (SW). In what follows, we include links to the plots on the supplementary website in braces after the project’s name. For example, (GiF ) links to the plot of the programmer activity’s analysis for project git of the form [abbreviation for project, (number of developer), abbreviation for analysis], directly leading to the specific plots.
Table
1 shows the maximum and a minimum number of developers, as well as the number of developers at a project’s start and end times. In all projects, the number of developers increases, reaches a peak, and then decreases. An exception is project
Wine (
WiA ). There, we observe only a decrease. The typical behavior happens in four patterns: (1) a slow and steady increase is followed by a short decrease (
Electron (
EA ),
LLVM (
LA ),
QEMU (
QeA ), and
React (
RA )); (2) a steeper increase is followed by a slower and smooth decrease (with possible bumps), for
Angular (
AnA ),
Atom (
AtA ),
Django (
DA ),
GCC (
GcA ),
Moby (
MA ),
Node.js (
NA ),
ownCloud (
OA ),
Qt (
QtA ), and
webpack (
WeA ); (3) increase, decrease, increase:
Bootstrap (
BA ),
FFmpeg (
FfA ), and
git (
GiA ); (4) only increase:
Flutter (
FlA ),
TypeScript (
TA ), and
U-Boot (
UA ). An example for the most frequent pattern (2) is
GCC (Figure
8 (left)).
5.2.1 Typical Structure and Evolution (RQ1).
Results. With an increasing number of developers, the proportion of developers in the hierarchical part decreases in most projects; and, with a decreasing number of developers, the proportion of developers in the hierarchical part increases (
Angular (
AnA ),
Atom (
AtA ),
Bootstrap (
BA ),
Electron (
EA ),
Flutter (
FlA ),
GCC (
GcA ),
git (
GiA ),
LLVM (
LA ),
Moby (
MA ),
Node.js (
NA ),
ownCloud (
OA ),
Qt (
QtA ),
QEMU (
QeA ),
React (
RA ),
TypeScript (
TA ),
U-Boot (
UA ), and
webpack (
WeA )). We illustrate this in Figure
8 (right) for
GCC: The percentage of developers in the hierarchical part grows from 7% to 25%, while the developers’ number falls from around 1,300 developers to around 260 developers. For the projects,
FFmpeg (
FfA ) and
Django (
DA ), the portion of the hierarchy and the number of developers are independent of each other. The portion of hierarchy of
Wine (
WiA ) is mainly stable, but in the end, we find an increase of the portion.
Only in a low number of the analyzed time ranges, we do not find a hierarchical structure: Angular (AnH ) (range 1), Atom (AtH ) (range 1), Flutter (FlH ) (ranges 16 and 17), GCC (GcH ) (ranges 1, 2, and 3), git (GiH ) (ranges 1 and 2), ownCloud (OH ) (range 31), Qt (QtH ) (ranges 1–6), U-Boot (UH ) (range 1), webpack (WeH ) (range 1), and Wine (WiH ) (range 61). This mostly happens in the very first time ranges of a project, where only very few developers are communicating yet, ending up in a loosely connected network where most developers have less than three connections to others.
Nonetheless, for almost all ranges of the above stated projects and all ranges for the projects
Bootstrap (
BH ),
Django (
DH ),
Electron (
EH ),
FFmpeg (
FfH ),
LLVM (
LH ),
Moby (
MH ),
Node.js (
NH ),
Qemu (
QeH ),
React (
RH ), and
TypeScript (
TH ), we are able to identify a hierarchical structure for all analyzed time ranges. To provide an example, we show the hierarchical part of project
Angular in Figure
9 (to the right of the dashed line). Over time, the residual variance of the regression fit and the angle between regression line and
\(x\)-axis (see Section
4.2.2) stay relatively stable for each project.
Discussion. As we were able to identify a hierarchical structure in almost all time ranges for all subject projects, independent of the number of developers and independent of the communication channel (issue tracker or mailing list), our method on decomposing developer networks into a hierarchical and a non-hierarchical part is generalizable to projects of different sizes, different ages, different domains, and different communication channels. Our findings support previous indications that successful OSS projects develop a hybrid organizational structure composed of a hierarchical and a non-hierarchical part, with most of the developers being part of the non-hierarchical part.
The presence of a hierarchical part tends to be unaffected by variations in the numbers of developers of a project; variation tends to be limited to fluctuations in its relative size as the project matures. Developers who enter or leave the hierarchical part change its composition. Despite this developer turnover, the slow change in residual variance and in the angle between the regression line (red solid line in Figure
9) and the
\(x\)-axis over time, as indicators for stable hierarchical structure, suggests that the subject projects have a stable organizational structure. This finding is in line with the hypothesis that the hierarchical part is principally responsible for coordination supporting information exchange [
25]. In this case, one would indeed expect that successful projects achieve a stable hierarchy, since large structural shifts disrupt coordination.
5.2.2 Change of Position in Hierarchy (RQ2).
Results. In Figure
10, we show how the position of an exemplary developer changes during the evolution of project
LLVM. The developer starts in the upper left of the plot, that is, at the bottom of the hierarchy or already in the non-hierarchical part. Then, the clustering coefficient decreases as the developer moves down in the plot towards the hierarchical part, where the developer stays for around 12 time ranges. Then, the developer moves back to the non-hierarchical part and potentially leaves the project (if this is not yet the end of our analyzed time period).
We find notable patterns of positional change, which we summarize in Table
2 for mail networks and Table
3 for issue networks. In total, we analyze 200 developers per data source (i.e., 200 developers for issue networks and 200 developers for mail networks)—the 10 developers with maximum node degree and 10 random developers per project as described in Section
4.3. We provide descriptive statistics for both, the most active and the randomly selected developers, in Tables
A.1 and
A.2 in the appendix. In general, the majority of the randomly selected developers contributed no commit to the project and only a few e-mails or issue comments, whereas the most active developers were not only highly active in communicating, but were also highly active code contributors. (Project
GCC is an exception, where even none of the most active communicators contributed any commit to the source code; we discuss this phenomenon further below in Section
5.2.4.) As expected, the most active developers appear to be active in more time ranges than the randomly selected developers do. In line with that, the most active developers are mostly part of the hierarchical part, whereas the randomly selected developers are only rarely part of the hierarchical part.
The mail and issue networks exhibit largely similar movement patterns for the selected developers, so we focus on the details of mail networks to summarize. The movement patterns describe different starting points and directions of position changes in the organizational structure. The most frequent pattern occurs in all projects: 40 out of 100 most active developers start at the non-hierarchical part’s upper left region in the hierarchy plot, then move down to the upper levels of the hierarchical part (lower right), to finally return to the non-hierarchical part again (pattern “down
\(\rightarrow\) up”). An example of this pattern is developer 1,610 of the project
LLVM (Figure
10). The two seconds most frequent patterns (18 out of 100) describe developers who move from the non-hierarchical part to the upper levels of the hierarchical part (that is, down to the right in the hierarchy plot, the pattern “down”) and developers moving in the opposite direction (18 out of 100), that is, they start in the hierarchical part and then move to the non-hierarchical part (pattern “up”). The five remaining patterns play only a secondary role and do not occur often. In the end, we find 11 developers over all projects who have other (individual) movement patterns.
For the randomly selected developers, we find the same patterns, but with different frequencies. Often, developers remain relatively constant in one area in the hierarchy (35 out of 99, pattern “constant”) and they move only slightly. We find also that developers move horizontally (i.e., they have more neighbors, but the connectivity between the neighbors stays constant, 18 out of 99, “horizontal”) and remain active only for a few time ranges.
For the issue networks, we end up in largely similar pattern occurrences, but we get much more occurrences of pattern “constant” for the randomly selected developers (68 out of 100) than we do for the randomly selected developers from the mail networks (35 out of 99).
Discussion. The high number of occurrences for pattern “constant” for the randomly selected developers in the issue networks might be caused by the much higher number of project “users” participating in issue discussions only for a short time period, compared to the mail networks (see Table
1). When neglecting the pattern “constant” for the randomly chosen developers in issue networks, the most frequent pattern among all developers (most active and randomly chosen developers) is that developers enter the project with only a few contacts (pattern “down
\(\rightarrow\) up”). Over time, the number of interaction partners rises and the developer climbs the project’s hierarchy. This may be indicative of their role changing and gaining coordination responsibilities. Then, the number of interactions decreases again and the developer returns to a small number of contacts. The second most frequent pattern is similar to the first in that non-hierarchical developers move to the hierarchical structure’s upper regions—however, either they stay or we run out of data before we see them leaving (pattern “down”). This might be caused by the much higher number of project “users” participating in issue discussions only for a short time period (see Table
1).
Developers starting in the hierarchical part and moving to the non-hierarchical part are often founders or leaders of the project, who then stopped contributing. We confirmed this for each project that had mailing lists by consulting its website (e.g., for git, Linus Torvalds is listed as founder; over time, he moved to the non-hierarchical part).
5.2.3 Developers’ Neighborhood (RQ3).
Results. We illustrate an example in Figure
11 and summarize the results for all projects in Table
4 for mail networks and Table
5 for issue networks. First, we explore the most active developers in a project who have a static neighborhood. These developers interact during their entire life cycle with developers of the hierarchical part (2 out of 100 developers in mail networks, pattern “Hierarchical part”) or both (61 out of 100 in mail networks, pattern “Both”). Second, the most active developers’ neighborhood may change, too, which happens in two ways: Either a developer starts their career with contacts mainly from the hierarchical part, and then they interact with developers from both parts (17 out of 100 in mail networks, pattern “Hierarchical part
\(\rightarrow\) both”), or they start with contacts from both parts and then restrict their interaction to developers of the hierarchical part (19 out of 100 in mail networks, pattern “Both
\(\rightarrow\) hierarchical part”). Issue networks exhibit similar patterns as described for mail networks.
We also evaluate the neighborhood of randomly selected developers. Their neighborhoods are more stable. Those developers interact during their entire life cycle with developers of the hierarchical part (47 out of 99 in mail networks, pattern “Hierarchical part”) or both parts (34 out of 99 in mail networks, pattern “Both”). Only 18 out of 99 randomly selected developers in mail networks have a dynamic neighborhood. For issue networks, similarly, only 10 out of 100 randomly selected developers to have a dynamic neighborhood.
Discussion. If a project is assumed to have proficient leadership (which cannot be guaranteed for every project), then it is not unexpected that randomly selected developers of the non-hierarchical part mostly interact with developers of the hierarchical part or both groups, but not solely with developers of the non-hierarchical part: In any discussion, a developer of the hierarchical part can join to add clarifications or to make a decision, which is not unlikely given the role of the developers of the hierarchical part. Consequently, developers of the non-hierarchical part are expected to interact with developers of the hierarchical part in projects that have a well-functioning leadership. Also, it is worth mentioning that the developers of the non-hierarchical part do not deliberately choose their interaction partners, as they cannot influence who is replying to their messages. Developers from the hierarchical part, however, take the role of maintainers and, most likely, decide which discussions they reply to.
Especially the dynamic patterns of the most active developers’ neighborhoods are interesting, as these shed light on on- and off-boarding processes. During on-boarding, developers start with interactions from the hierarchical part, and later extend their interaction to developers of both parts. This dynamic might suggest that, when developers enter a project, they start accumulating knowledge from developers of the hierarchical part and only later transfer knowledge to the non-hierarchical part. During off-boarding, we observe that developers focus their interaction to developers of the hierarchical part. Interaction with the non-hierarchical part or, more specifically, newcomers seems to be present, though. This finding suggests that, when central developers leave, they focus on bringing their ongoing topics to an end, but avoid opening new ones.
5.2.4 Tenure and Programming Activity (RQ4).
Results. In Figure
12, we show the developers’ tenure and their position in the hierarchy for
LLVM: The left plot encodes tenure in terms of size and color (larger and lighter dots denote shorter tenure values); the middle plot compares the distributions of tenure values of developers in the hierarchy with developers outside the hierarchy; the right plot shows the progression of average tenure values over time. Overall, the developers in
LLVM’s hierarchy have, on average, higher tenure values than the developers outside the hierarchy (
\(p \ll 0.001\);
\(d = 0.39\)). This difference in tenure between developers inside and outside the hierarchy is consistent across all projects that use a mailing list (
\(p \ll 0.001\);
\(0.25 \le d \le 0.52\)). Interestingly, the difference between tenure values of hierarchical and non-hierarchical developers increases over time. Remarkably, also this is consistent across all projects that use a mailing list, except for
qt, where the difference stays constant over time.
For issue-based projects, we get slightly different results: Developers in the hierarchy have, on average, higher tenure values than the developers outside the hierarchy. This holds for all projects. However, only for projects Flutter, Node.js, and TypeScript, this difference (\(p \ll 0.001\); \(0.23 \le d \le 0.42\)) has a similar effect size than in the projects that use mailing lists. For these three projects, also the difference between the tenure values increases over time, as we already have identified for the projects that use mailing lists. For the remaining seven issue-based projects, the difference in tenure between developers inside and outside the hierarchy still is significant, but with a smaller effect size (\(p \ll 0.001\); \(0.10\le d \le 0.18\)) and without notable patterns over time.
Much like for tenure, we show the results for programming activity for developers of
LLVM in Figure
13. Developers in the hierarchical part edit most files (
\(p \ll 0.001\);
\(d = 0.40\)). This difference in programming activity remains existent over time but is fluctuating with regard to its extent. For most projects, we find that, overall, the number of edited files of the non-hierarchical developers is significantly lower than the number of edited files of the hierarchical part. Only for
GCC we cannot find any significant difference between developers inside and outside the hierarchy. As already seen for tenure, the difference between the number of edited files of developers inside and outside the hierarchy has a stronger effect on projects that use mailing lists (and project
Node.js) (
\(p \ll 0.001\);
\(0.12\le d \le 0.47\)) than on projects that use the GitHub issue tracker (
\(p \ll 0.001\);
\(0.03\le d \le 0.10\)).
The dynamics of the individual projects show different patterns, though, which we group into four categories: In the first category, the upper hierarchical part contains both developers who edit many files and developers who edit no files or only a few files (Angular (AnF ), Atom (AtF ), Django (DF ), FFmpeg (FfF ), Flutter (FlF ), git (GiF ), LLVM (LF ), ownCloud (OF ), React (RF ), TypeScript (TF ), U-Boot (UF ), and Wine (WiF )). In the second category, the distribution of the number of files edited is split between the hierarchical and the non-hierarchical part: the hierarchical part contains the developers who edit many files, whereas the non-hierarchical parts contains the developers who edit only few files (Bootstrap (BF ), Electron (EF ), Moby (MF ), Node.js (NF ), and webpack (WeF )). In the third category, the pattern is dynamic (QEMU (QeF ) and Qt (QtF )). For example, in early phases of QEMU (QeF ), mainly developers of the non-hierarchical part edited files. In later phases, most files were edited by developers of the hierarchical part. In the fourth category is only GCC (GcF ), which has mainly editing developers in the non-hierarchical part of the network.
Interestingly, for several projects (e.g., GCC (GcF ), git (GiF ), ownCloud (OF ), U-Boot (UF ), and Wine (WiF )), the number of developers who program a lot is very low: A maximum number of five developers are responsible for most of the changes. These developers often have a relatively low node degree. Furthermore, for some projects and time ranges, we found that developers of the hierarchical part have very few edited files and mainly communicate (GCC, git, and Qt).
Discussion. Our data suggest that developers in the hierarchy stay longer in the project. The patterns are consistent with a system where gaining experience through consistent involvement is important for the advancement of responsibilities and influence. This finding is interesting in the light of the conjecture that hierarchy reflects role stratification, since developers with the behavior of a core developer consistently appear within the hierarchical part and not in the non-hierarchical part.
The number of edited files seems to affect the position in the hierarchy more than the developer’s tenure: the more files a developer edits, the more embedded they appear to be in the hierarchy, probably because a higher number of edited files increases the probability that their activity affects many other developers. The interesting cases are when the number of edited files and the position in the hierarchy are unrelated. This could be an indicator for a modular project structure, in which developers of the non-hierarchical part edit files of a certain part of the project, whereas the files that developers of the hierarchical part edit are scattered across many parts of the project. In project
GCC, which is an outlier w.r.t. to the programming activity, there could also be an additional explanation for mainly having editing developers of the non-hierarchical part: As
GCC is a rather low-level, technical project, which is dependent on technical features that rely on a certain hardware support [
22], developers from different hardware manufacturers may add their specific hardware support to the code base, being in the non-hierarchical part of the project communication although accounting for many file edits. Table
A.1 in the appendix shows that none of the 10 most active developers on GCC’s mailing list, who often are also in the hierarchical part, contributed any commit to the source code. This indicates that these most active developers on the mailing list take rather organizational coordination tasks than programming tasks in
GCC, which is why this project has mainly editing developers in the non-hierarchical part of the mail network.
The relationship between developers’ positions in the hierarchy and programming activity or tenure occurs to be less pronounced in issue-based projects than in projects that use mailing lists. One possible reason for that could be that there are lots of “users” of issue-based projects who participate in discussions for a large amount of time (e.g., reporting bugs), which also is represented in the sheer number of participants in the discussions (see Table
1). By contrast, there are many developers in the hierarchical part who perform merely project maintenance and pull-request reviews and therefore edit only a smaller number of files. Nevertheless, even if the effect is lower for issue-based projects, both kinds of projects have in common that the number of edited files and tenure are higher for developers inside the hierarchical part than outside the hierarchical part.
The fact that the relationship of the number of edited files with the hierarchical part is subject to change speaks in favor of a strong relationship between temporal focus and social contacts. At times when active developers are in the non-hierarchical part, a rather discussion-based group structure seems to establish. At times when the most active developers are at the top of the project’s organizational structure, operational activity seems to be the main focus.