Nothing Special   »   [go: up one dir, main page]

Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

An overview of Boost participation in Google Summer of Code™ 2006

For the second consecutive year, Google has conducted its Summer of Code™ initiative, a program by which student developers are sponsored for their contributions within open source organizations willing to mentor the participants. The 2006 campaign has run between April and September, with active development work taking place between May 23 and August 21.

Around mid April, when the program had just started, some Boost members began considering the possibility to enter Summer of Code as a mentoring organization. Despite the lack of time and the fact that most of us were completely new to this initiative, Boost managed to successfully apply for the program. As a result ten projects were selected and mentored, most of which are expected to become full contributions to Boost shortly.

We give here a summary report of this experience, along with a short analysis of the main problems we found, so that we can work at solving them and do better next year.

Contents

How the program works

There are three types of participants in Google Summer of Code:

  • Google itself acts as the funding partner and conducts the overall program.
  • The open-source organizations accepted into the program must designate people inside the organization who will act as project mentors.
  • Students submit their project ideas and, if selected, work in collaboration with one of the mentoring organizations; upon successful completion of the project, students receive the full stipend for the program.

The program goes through the following stages:

  • Organization selection: those open source organizations willing to enter Summer of Code submit an expression of interest to Google, along with information Google uses for qualifying purposes. Selected organizations are publicly announced and each organization is expected to provide a pool of project ideas.
  • Student selection: students willing to participate submit one or more project proposals, typically expanding on some of the ideas previously provided by the mentoring organizations. A student can apply several times and for different organizations, but ultimately can only be chosen for just one project. These proposals are routed by Google to the appropriate organizations, which must analyze them, rank them, and assign mentors to the most promising applications. Based on the information provided by mentoring organizations, Google issues the final list of accepted projects.
  • Development: Students, guided by their assigned mentors, are expected to complete the projects during a period of three months. Google asks mentors for a mid-program review upon which continuation of the project depends.
  • Final review: Once the development period is over, mentors are requested to inform Google of the results of the project, and determine whether students qualify to receive the full stipend.

2006 figures

The 2006 campaign of Google Summer of Code took place between April 14 and September 25. A total of 102 mentoring organizations participated. Of the 6,338 applications submitted by 3,044 students around the globe, 630 were finally selected and funded. Google has spent more than US$3 million in student stipends and compensations to the mentoring organizations.

Boost participation

Application and process selection

On April 14, the same day Google Summer of Code started, Julio M. Merino Vidal (later to become one of the selected students) sent a message encouraging Boost members to participate in this program as a mentoring organization. This call sparked the interest of the community; although time was already short for doing all the preparation labors, Boost moderators put rapidly themselves to work and conducted the preliminary registration steps. In the meantime, a Wiki page was grown with project ideas provided by Boost members, totaling more than twenty proposals.

By the beginning of May Boost was officially accepted into the program and Boost moderators set out to form a group of mentors, selected on an invitation basis. As student selection is a delicate process, involving the assessment of individuals on their technical skills, all subsequent discussions were conducted by the selected mentors on a private mail list established for their collaboration.

We were not prepared for the avalanche of student applications that followed. On day two after the application period was open, we had received three proposals; the next day it was 14, and within a week the count exceeded 50. By the end of the application period the total number of proposals received was 174, which forced us to go through a very intensive ranking process and recruit additional mentors. Two rules were followed so as rationalize the process of selection among dozens of different proposals:

  • Where there were competing applications for the same project idea, only one were to be ultimately selected; so, no two projects with the same or very similar goals were accepted.
  • Some of the applications built on a given Boost library (for instance, the Boost Graph Library is a frequent target for the addition of algorithms.) We limited the applications to a maximum of two per Boost library.

These rules have the combined effect of greatly reducing the number of eligible applications while at the same time distributing the accepted projects evenly across the space of ideas. Moreover, students with unique proposals, i.e. project ideas not coming from the pool originally presented by Boost, are at a competitive advantage.

The different proposals were classified according to their related technological area so that each cluster could be handled by an appointed mentor with the required expertise on the subject. Mentors submitted then "focus reports" summarizing the applications under their responsibility; these reports served as a first filter to help reduce the number of final applications to be evaluated jointly. Along the process, students with the most promising proposals were asked to refine their ideas and provide further information.

Although not enforced by the official rules, we agreed upon a one-to-one ratio of mentors to students, which ultimately marked a hard limit on the maximum number of eligible projects.

Accepted projects

Google accepted and funded the ten top-ranked projects endorsed by Boost. Of these, eight projects are libraries or library components targeted for future inclusion into Boost, while the remaining two consist of utility programs heavily relying on Boost.

C++ Coroutine Library
Giovanni Piero Deretta, mentored by Eric Niebler.
Library for the management through a modern C++ interface of OS-provided coroutine facilities.

Concurrency Library
Matthew Calabrese, mentored by David Abrahams.
STL-inspired generic framework for high-level specification and execution of parallelizable algorithms.

TR1 Math Special Functions
Xiaogang Zhang, mentored by John Maddock.
Implementation of the 23 special mathematical functions specified in C++ standard library extension proposal TR1.

The Boost.Process library
Julio M. Merino Vidal, mentored by Jeff Garland.
Portable library for process launching and basic management.

Out-of-Core Graphs and Graph Algorithms
Stéphane Zampelli, mentored by Jeremy Siek.
Extension of the Boost Graph Library to deal with out-of-core structures, i.e. data sets too large to be kept in main memory at once.

MISC (M)ulti (I)ndex (S)pecialized (C)ontainers
Matías Capeletto, mentored by Joaquín M López Muñoz.
Families of specialized containers internally based on Boost.MultiIndex.

Generic Tree Container
Bernhard Reiter, mentored by René Rivera.
Design and implementation of a family of STL-compatible tree containers.

Viewer utility for FSMs
Ioana Tibuleac, mentored by Andreas Huber Dönni.
Utility program for the visualization of finite state machines (FSMs) specified with Boost.Statechart.

Modular C++ preprocessor, using Boost.Spirit
Hermanpreet 'Lally' Singh, mentored by Joel de Guzman.
Implementation with Boost.Spirit and Boost.Wave of a front-end translator from Modular C++ (as specified in a proposal to add modules to C++ by Daveed Vandevoorde) to standard C++.

Implementing a state of the art Mincut/Maxflow algorithm.
Stephan Diederich, mentored by Douglas Gregor.
Implementation of a fast mincut/maxflow routine for the Boost Graph Library based on a new algorithm devised by Vladimir Kolmogorov.

Development

Two main facilities were set up to assist students and mentors during the development phase: a mailing list and a Trac/SVN project management system with separate directories for each project. One of the students, Matí as Capeletto, out of personal initiative registered a Google Group aimed at giving students with Boost a place for informal interaction and discussion of common problems.

After the initial warm-up period, each student-mentor pair performed development work mostly privately. The usage of the Boost mailing lists was scarce, and only by the end of the program did some students publicly announced their results.

Results

By the date the development period was officially closed, the status of the different projects was as follows:

  • Seven projects were completed or nearly completed and the students are expected to ask for a formal review within 2006 or early 2007. Four of these projects necessitated a goal reorientation during development, basically because the original plan was too ambitious for three months. Most of the projects are still in active development during the months following the Summer of Code program.
  • Two projects did not reach the planned goals, but nevertheless produced useful material that could be expanded outside of the Summer of Code program.
  • One project was abandoned shortly after the midterm review. The reasons for the abandonment are unknown.

The results of all the projects can be consulted online at the dedicated Trac site.

Analysis

We examine the various stages of Boost participation in Summer of Code, with an emphasis on discovering opportunities for improvement.

Boost appeal

In a mid project presentation at OSCON 2006, Chris DiBona from Google provided some data about the organizations which received the most applications:

Organization No of applications
KDE 244
Ubuntu & Bazaar 236
Python Software Foundation 212
GNOME 199
Apache Software Foundation 190
Boost 174
Gaim 152
The GNU Project 148
Drupal 146

The numbers shown here have been estimated from a chart included in the presentation slides. This chart contains an additional column labeled "Google" which actually accounts for the applications dismissed because of their low quality.

The fact that Boost is ranked the sixth most attractive organization out of a total of 102 was entirely unexpected, especially considering the wide popularity of the rest of top-rated organizations. There is a more or less implicit consensus among Boost members that ours is a relatively niche project, known for its quality standards by seasoned C++ practitioners, but with a limited penetration among entry level programmers: maybe the figures above should make us reconsider this assumption. A cursory examination of the applications submitted to Boost reveals that most applicants were regular users of Boost: many cite the Boost status among the C++ community as an appealing factor to apply.

Opportunities lost?

If we look at the number of funded projects concerning the applications received, figures are not so favorable to Boost.

Organization No of projects Project/app ratio
KDE 24 9.8 %
Ubuntu & Bazaar 22 9.3 %
Python Software Foundation 23 10.8 %
GNOME 19 9.5 %
Apache Software Foundation 27 14.2 %
Boost 10 5.7 %
Gaim 8 5.3 %
The GNU Project 10 6.8 %
Drupal 14 9.6 %

It turns out that the project/application ratio for almost any other organization among the top nine is considerably higher than that of Boost. As it happens, Google initially requested that organizations submitted the maximum number of projects they felt they could cope with, and we got funding for exactly what we aimed for, so the limiting factor lies entirely on Boost's side.

Projects startup

Contributing to Boost relies on a fair number of guidelines and protocols for coding, documentation, testing and maintenance. Many of the required tools are exclusively used within Boost, and some of them are not trivial, like for instance Boost.Build. Although the Boost web site contains information about all these tools and procedures, this intelligence is scattered through unrelated pages and sometimes is very hard to come by.

So, there is a good deal of expertise required to begin working at Boost. Some students have reported on startup difficulties getting to know these details and familiarizing themselves with the tools, most notably bjam and Quickbook. Each student overcome the startup difficulties on their own or resorting to their mentors (see the section on public communication issues).

Ongoing development

Once students got past the startup stage, most projects advanced without serious complications. In the majority of cases, it was realized at some point during the development that there was no time to complete it. Some participants had to redefine the goals to keep the project within schedule, while others simply decided that they would continue working after the official deadline of Summer of Code.

The information flow between each student and their mentor was usually reported by both parties to be satisfactory. The projects suffering from lack of communication have been precisely those yielding the poorest results. In general, mentors have not felt overwhelmed by requests from their students, and even in a couple of cases, the projects were run practically unattended. This fact is witness to the high competence of the students recruited into the program.

The degree of usage of the Trac/SVN system has varied. Some students did frequent updates, while others have just used the repository to dump the final results for the official submission to Google.

Public communication issues

Students and mentors had at their disposal three different forums for the public interchange of information and support:

  • Boost public lists, especially the developers and users lists.
  • A dedicated mailing list reaching all students and mentors working at Summer of Code in Boost.
  • A more casual Google Group, set up by one of the students, aimed at providing the participants with a place for socializing and resolution of common problems.

Despite this abundance of resources, there was an almost complete lack of group communication among all the parties involved and between these and the larger Boost community. Seemingly, students were satisfied to pursue their activities by relying on support from their mentors alone. This circumstance has prevented Boost members from enriching the initiative by offering their experience and insight, and has possibly led students to the false impression that contributing to Boost proceeds in a predictable linear path from requisites to completion of the work. When asked about their not engaging in public communication, the students gave vague justifications that can be classified into the following:

  • Doubts were deemed too technical or specific to be worth raising in public.
  • A craving for perfectionism detracted students from asking or submitting work in progress until they felt their material looked good enough.
  • Shyness: some students probably lacked previous experience communicating in public, and most are not English native speakers, which could also be a limiting factor.

Although students did not identify the following as a reason not to go public, likely, many of them did not feel the need given the ready access to their mentors they enjoyed. It is easy to grow used to such a dedicated source of support and neglect to resort to other resources. Mentors should have encouraged their students to pursue the public discussion of projects, which constitutes one of the pillars of Boost renowned quality.

Scope of projects

In hindsight, it has become apparent that most projects were too ambitious to be completed within the three months of duration of the program, and even those that were considered a success will need weeks or months of polishing up before the material is ready for a formal review. In contrast with other organizations participating in the Summer of Code program, Boost has as of this writing included no results into its codebase. No formal review for any project has been requested yet, either.

These scope issues are very dependent on the particular type of project. We can classify the Boost projects for Summer of Code as follows:

  • Full-fledged libraries,
  • additions to existing Boost libraries,
  • utilities and tool projects using Boost.

Of these, additions (like for instance the max-flow min-cut algorithm for BGL by Stephan Diederich) are the most suitable for completion in a short period of time: most of the preparation work is already done, and the student has clear guides as to what coding and documentation standards to follow. Also, these projects need not undergo a formal review, since it is the responsibility of the hosting library author to review the code and include it within her discretion. Utility projects seem also suitable for small timeframes, though most project proposals and requests are naturally oriented to contributions of actual code to the Boost project.

As for those projects involving the design and realization of full-fledged libraries, there is little hope that the goals and scope can be kept modest enough for a three-month schedule. Boost candidate libraries developed by professional authors usually take much longer than three months to be accepted; some libraries have been evolving through several years before being included in Boost. So, the best we can hope for if we are to support the realization of library projects for Boost inside Summer of Code is that the results by the end of the program can be evaluated to constitute a viable potential contribution to Boost. When this is the case, it is crucial that the student commits to further working on the project up to completion and formal review. Perhaps more important than getting libraries coded is to engage new authors in a long-term relationship with the Boost project.

Suggestions for improvement

The following proposals aim to alleviate some of the problems we have identified during the development of Summer of Code within Boost. These action points are related only to the issues found in connection with Boost: we are not addressing other areas of improvement associated with the Summer of Code program itself.

Preparation

Much work can be done before the actual program begins. The following preparation activities can already be launched:

Create a pool of ideas for projects. This action will provide valuable extra time for evaluation and refining of ideas before the Summer of Code begins. The experience has shown that those projects with more preparation work, especially in the area of design, were ultimately more successful. The pool can also be used to retain interesting ideas that arise on the mailing lists and very often are not given proper attention and become abandoned.

Create a student pool. Prior involvement with Boost is an advantage both in the selection phase and later during project development. Those students with a serious interest in participating in Summer of Code with Boost can enter the pool and begin exploring ideas and interacting with the community well in advance of the summer, to put themselves in a favorable position for the selection. Advertisement for the student pool can be initiated in the beginning of 2007 through the usual channels (web site and mailing lists): additionally, Boost members involved with the University can spread this information locally and help raise the interest of students in their environment.

Create a mentor pool. Given the rush with which Boost entered the 2006 Summer of Code campaign, the invitation of mentors has to be done on an on-demand basis as it became all too evident that the task was growing bigger and bigger. The organization must be better prepared next year so that several people with the ability and will to participate as Boost mentors are identified in advance.

Prepare a startup package. In order to facilitate the initial period of getting familiarized with the various Boost guidelines, protocols and tools, it would be extremely useful to prepare a compilation of startup material for students. This package can consist of a single document gathering the currently dispersed information, or go beyond this and provide some bundle of documentation and pre-built tools, an approach that one of the students is currently working on.

Public communication

students must get involved with the community as soon as possible and grow to appreciate the advantages of public development concerning solitary coding.

Mandate (bi)weekly reports. These reports should be directed to the public mailing lists to allow all Boost members to follow the work in progress and contribute. Reporting has the extra benefit for students of forcing them to reflect on their work periodically and struggle with the often difficult task of presenting their ideas to others.

Conduct student-mentor exclusively through public channels. This might be too drastic a policy, as some matters need privacy and depending on the amount of information exchanged flooding problems may arise. Less severe variations involve allowing for some private interchange at the mentors' discretion and moving this kind of communication to a dedicated public mailing list different from the general ones.

Project management

The two most important issues to improve upon with respect to the management are:

  • Project scope must be kept under control,
  • The progress has to be publicly visible, so that problems of scope, design and/or schedule can be more easily detected.

Some of the proposals in this section are not to be regarded as strict rules, but rather as general guidelines to be kept in mind by students and encouraged by mentors.

Create a best practices document. This document can serve as a guideline for project management, an area in which Boost traditionally imposes no requirements. Students might lack the expertise in this area that is usually taken for granted in the traditional model where contributions to Boost are made by professional programmers.

Mandate a design phase. Having a concrete design set up and clearly described early in the project will help estimate the necessary effort for completion of the work. This is also an opportunity for public discussion.

Maintain code, docs and tests in parallel. All too often, novice programmers do the coding in one fell swoop and only then move to testing and documenting their work. This is unacceptable by all current methodology standards, and can result in serious underestimations of the time to completion.

Encourage the KISS principle. It is much better to finish a simpler library and then iteratively evolve it, once it has been exposed to public scrutiny and usage.

More Trac updates. The repository should be viewed as an everyday work tool, not only as of the place into which to dump the final results. Updating often leads to more visibility of the work by the mentor and the public in general.

Informal reviews. The typical Summer of Code Boost project will not be completed by the official deadline, as have been discussed earlier. To somehow officialize the work done within the Summer of Code proper, and also to allow the students to reach some sort of psychological milestone, informal reviews can be instituted where Boost members evaluate the work done at the end of Summer of Code.

Engage students. This experience has shown that it is possible to guide willing and bright students to the competence levels required for contributing to Boost. The best possible outcome of Summer of Code campaigns are the incorporation of new people into the circle of Boost active contributors. Strive to make the students commit to Boost.

Conclusions

Despite the lack of previous experience in Boost, our participation in Google Summer of Code has been extremely fruitful: much useful material has been produced, and, perhaps more importantly, some of the students are likely to commit on a long-term basis and grow to be regular Boost contributors. Traditionally, becoming a productive Boost author has a very high entry barrier due to the extreme quality standards, lack of public support and the very specific culture of the project. The appeal of Summer of Code itself and the possibility of being gently mentored into the world of Boost has most likely been key factors in lowering this entry barrier.

The process has not been without some difficulties, either, as it was expected of a newcomer organization as Boost. We have tried to identify in this paper the areas of improvement and suggest specific actions so that the upcoming Google Summer of Code 2007 can be an even more rewarding experience.

Acknowledgements

This paper couldn't have been written without the numerous reports and contributions kindly provided by Boost students and mentors: Many thanks to all the participants for sharing their experiences with me. Thank you also to the people at Google who have promoted and conducted the Summer of Code initiative.