In the classic 1986 essay, No Silver Bullet, Fred Brooks argued that there is, in some sense, not that much that can be done to improve programmer productivity. His line of reasoning is that programming tasks contain a core of essential/conceptual complexity that's fundamentally not amenable to attack by any potential advances in technology (such as languages or tooling). He then uses an Ahmdahl's law argument, saying that because 1/X of complexity is essential, it's impossible to ever get more than a factor of X improvement via technological improvements.
Towards the end of the essay, Brooks claims that at least 1/2 (most) of complexity in programming is essential, bounding the potential improvement remaining for all technological programming innovations combined to, at most, a factor of 2:
All of the technological attacks on the accidents of the software process are fundamentally limited by the productivity equation:
Time of task = Sum over i { Frequency_i Time_i }
If, as I believe, the conceptual components of the task are now taking most of the time, then no amount of activity on the task components that are merely the expression of the concepts can give large productivity gains.
Brooks states a bound on how much programmer productivity can improve. But, in practice, to state this bound correctly, one would have to be able to conceive of problems that no one would reasonably attempt to solve due to the amount of friction involved in solving the problem with current technologies.
Without being able to predict the future, this is impossible to estimate. If we knew the future, it might turn out that there's some practical limit on how much computational power or storage programmers can productively use, bounding the resources available to a programmer, but getting a bound on the amount of accidental complexity would still require one to correctly reason about how programmers are going to be able to use zillions times more resources than are available today, which is so difficult we might as well call it impossible.
Moreover, for each class of tool that could exist, one would have to effectively anticipate all possible innovations. Brooks' strategy for this was to look at existing categories of tools and state, for each, that they would be ineffective or that they were effective but played out. This was wrong not only because it underestimated gains from classes of tools that didn't exist yet, weren't yet effective, or he wasn't familiar with (e.g., he writes off formal methods, but it doesn't even occur to him to mention fuzzers, static analysis tools that don't fully formally verify code, tools like valgrind, etc.) but also because Brooks thought that every class of tool where there was major improvement was played out and it turns out that none of them were. For example, Brooks wrote off programming languages as basically done, just before the rise of "scripting languages" as well as just before GC languages took over the vast majority of programming. Although you will occasionally hear statements like this, not many people will volunteer to write a webapp in C because gains from modern languages can't be more than 2x over using a modern language.
Another one Brooks writes off is AI, saying "The techniques used for speech recognition seem to have little in common with those used for image recognition, and both are different from those used in expert systems". But, of course this is no longer true now — neural nets are highly effective for both image recognition and speech recognition. Whether or not they'll be highly effective as a programming tool is to be determined, but a lynchpin of Brooks's argument against AI has been invalidated and it's not a stretch to think that a greatly improved GPT-2 could give significant productivity gains to programmers. Of course, it's not reasonable to expect that Brooks could've foreseen neural nets becoming effective for both speech and image recognition, but that's exactly what makes it unreasonable for Brooks to write off all future advance in AI as well as every other field of computer science.
Brooks also underestimates gains from practices and tooling that enables practices. Just for example, looking at what old school programming gurus advocated, we have Ken Thompson arguing that language safety is useless and that bugs happen because people write fragile code, which they should not do if they don't want to have bugs and Jamie Zawinski arguing that, when on a tight deadline, automated testing is a waste of time and "there’s a lot to be said for just getting it right the first time" without testing. Brooks acknowledges the importance of testing, but the only possible improvement to testing that he mentions are expert systems that could make testing easier for beginners. If you look at the complexity of moderately large scale modern software projects, they're well beyond any software project that had been seen in the 80s. If you really think about what it would mean to approach these projects using old school correctness practices, I think the speedup from those sorts of practices to modern practices is infinite for a typical team since most teams using those practices would fail to produce a working product at all if presented with a problem that many big companies have independently solved, e.g., produce a distributed database with some stated SLO. Someone could dispute the infinite speedup claim, but anyone who's worked on a complex project that's serious about correctness will have used tools and techniques that result in massive development speedups, easily more than 2x compared to 80s practices, a possibility that didn't seem to occur to Brooks as it appears that Brooks thought that serious testing improvements were not possible due to the essential complexity involved in testing.
Another basic tooling/practice example would be version control. A version control system that multi-file commits, branches, automatic merging that generally works as long as devs don't touch the same lines, etc., is a fairly modern invention. During the 90s, Microsoft was at the cutting edge of software development and they didn't manage to get a version control system that supported the repo size they needed (30M LOC for Win2k development) and supported branches until after Win2k. Branches were simulated by simply copying the entire source tree and then manually attempting to merge copies of the source tree. Special approval was required to change the source tree and, due to the pain of manual merging, the entire Win2k team (5000 people, including 1400 devs and 1700 testers) could only merge 100 changes per day on a good day (0 on a bad day when the build team got stalled due to time spent fixing build breaks). This was a decade after Brooks was writing and there was still easily an order of magnitude speedup available from better version control tooling, test tooling and practices, machine speedups allowing faster testing, etc. Note that, in addition to not realizing that version control and test tooling would later result in massive productivity gains, Brooks claimed that hardware speedups wouldn't make developers significantly more productive even though hardware speed was noted to be a major limiting factor in Win2k development velocity. Brooks couldn't conceive of anyone building a project as complex as Win2k, which could really utilize faster hardware. Of course, using the tools and practices of Brooks's time, it was practically impossible to build as project as complex as Win2k, but tools and practices advanced so quickly that it was possible only a decade later even if development velocity moved in slow motion compared to what we're used to today due to "stone age" tools and practices.
To pick another sub-part of the above, Brooks didn't list CI/CD as a potential productivity improvement because Brooks couldn't even imagine ever having tools that could possibly enable modern build practices. Writing in 1995, Brooks mentions that someone from Microsoft told him that they build nightly. To that, Brooks says that it may be too much work to enable building (at least) once a day, noting that Bell Northern Research, quite reasonably, builds weekly. Shortly after Brooks wrote that, Google was founded and engineers at Google couldn't even imagine settling for a setup like Microsoft had, let alone building once a week. They had to build a lot of custom software to get a monorepo of Google's scale on to what would be considered modern practices today, but they were able to do it. A startup that I worked for that was founded in 1995 also built out its own CI infra that allowed for constant merging and building from HEAD because that's what anyone who was looking at what could be done instead of thinking that everything that could be done has been done would do. For large projects, just having CI/CD alone and maintaining a clean build over building weekly should easily be a 2x productivity improvement, large than would be possible if Brooks's claim that half of complexity was essential would allow for. It's good that engineers at Google, the startup I worked for, as well as many other places didn't believe that it wasn't possible to get a 2x improvement and actually built tools that enabled massive productivity improvements.
In some sense, looking at No Silver Bullet is quite similar to when we looked at Unix and found the Unix mavens saying that we should write software like they did in the 70s and that the languages they invented are as safe as any language can be. Long before computers were invented, elders have been telling the next generation that they've done everything that there is to be done and that the next generation won't be able to achieve more. In the computer age, we've seen countless similar predictions outside of programming as well, such as Cliff Stoll's now-infamous prediction that the internet wouldn't chagne anything:
Visionaries see a future of telecommuting workers, interactive libraries and multimedia classrooms. They speak of electronic town meetings and virtual communities. Commerce and business will shift from offices and malls to networks and modems. And the freedom of digital networks will make government more democratic.
Baloney. Do our computer pundits lack all common sense? The truth is no online database will replace your daily newspaper ... How about electronic publishing? Try reading a book on disc. At best, it's an unpleasant chore: the myopic glow of a clunky computer replaces the friendly pages of a book. And you can't tote that laptop to the beach. Yet Nicholas Negroponte, director of the MIT Media Lab, predicts that we'll soon buy books and newspapers straight over the Intenet. Uh, sure. ... Then there's cyberbusiness. We're promised instant catalog shopping—just point and click for great deals. We'll order airline tickets over the network, make restaurant reservations and negotiate sales contracts. Stores will become obselete. So how come my local mall does more business in an afternoon than the entire Internet handles in a month?
If you do a little search and replace, Stoll is saying the same thing Brooks did. Sure, technologies changed things in the past, but I can't imagine how new technologies would change things, so they simply won't.
Even without knowing any specifics about programming, we would be able to see that these kinds of arguments have not historically help up and have decent confidence that the elders are not, in fact, correct this time.
Brooks kept writing about software for quite a while after he was a practitioner, but didn't bother to keep up with what was happening in industry after moving into Academia in 1964, which is already obvious from the 1986 essay we looked at, but even more obvious if you look at his 2010 book, Design of Design, where he relies on the same examples he relied on in earlier essays and books, where the bulk of his new material comes from a house that he built. We've seen that programmers who try to generalize their knowledge to civil engineering generally make silly statements that any 2nd year civil engineering student can observe are false, and it turns out that trying to glean deep insights about software engineering design techniques from house building techniques doesn't work any better, but since Brooks didn't keep up with the industry, that's what he had to offer. While there are timeless insights that transcend era and industry, Brooks has very specific suggestions, e.g., running software teams like cocktail party surgical teams, which come from thinking about how one could improve on the development practices Brooks saw at IBM in the 50s. But it turns out the industry has moved well beyond IBM's 1950s software practices and ideas that are improvements over what IBM did in the 1950s aren't particularly useful 70 years later.
Going back to the main topic of this post and looking at the specifics of what he talks about with respect to accidental complexity with the benefit of hindsight, we can see that Brooks' 1986 claim that we've basically captured all the productivity gains high-level languages can provide isn't too different from an assembly language programmer saying the same thing in 1955, thinking that assembly is as good as any language can be and that his claims about other categories are similar. The main thing these claims demonstrate are a lack of imagination. When Brooks referred to conceptual complexity, he was referring to complexity of using the conceptual building blocks that Brooks was familiar with in 1986 (on problems that Brooks would've thought of as programming problems). There's no reason anyone should think that Brooks' 1986 conception of programming is fundamental any more than they should think that how an assembly programmer from 1955 thought was fundamental. People often make fun of the apocryphal "640k should be enough for anybody" quote, but Brooks saying that, across all categories of potential productivity improvement, we've done most of what's possible to do, is analogous and not apocryphal!
If we look at the future, the fraction of complexity that might be accidental is effectively unbounded. One might argue that, if we look at the present, these terms wouldn't be meaningless. But, while this will vary by domain, I've personally never worked on a non-trivial problem that isn't completely dominated by accidental complexity, making the concept of essential complexity meaningless on any problem I've worked on that's worth discussing.
Appendix: concrete problems
Let's see how this essential complexity claim holds for a couple of things I did recently at work:
- scp from a bunch of hosts to read and download logs, and then parse the logs to understand the scope of a problem
- Query two years of metrics data from every instance of every piece of software my employer has, for some classes of software and then generate a variety of plots that let me understand some questions I have about what our software is doing and how it's using computer resources
Logs
If we break this task down, we have
- scp logs from a few hundred thousand machines to a local box
- used a Python script for this to get parallelism with more robust error handling than you'd get out of pssh/parallel-scp
- ~1 minute to write the script
- do other work while logs download
- parse downloaded logs (a few TB)
- used a Rust script for this, a few minutes to write (used Rust instead of Python for performance reasons here — just opening the logs and scanning each line with idiomatic Python was already slower than I'd want if I didn't want to farm the task out to multiple machines)
In 1986, perhaps I would have used telnet or ftp instead of scp. Modern scripting languages didn't exist yet (perl was created in 1987 and perl5, the first version that some argue is modern, was released in 1994), so writing code that would do this with parallelism and "good enough" error handling would have taken more than an order of magnitude more time than it takes today. In fact, I think just getting semi-decent error handling while managing a connection pool could have easily taken an order of magnitude longer than this entire task took me (not including time spent downloading logs in the background).
Next up would be parsing the logs. It's not fair to compare an absolute number like "1 TB", so let's just call this "enough that we care about performance" (we'll talk about scale in more detail in the metrics example). Today, we have our choice of high-performance languages where it's easy to write, fast, safe code and harness the power of libraries (e.g., a regexp library) that make it easy to write a quick and dirty script to parse and classify logs, farming out the work to all of the cores on my computer (I think Zig would've also made this easy, but I used Rust because my team has a critical mass of Rust programmers).
In 1986, there would have been no comparable language, but more importantly, I wouldn't have been able to trivially find, download, and compile the appropriate libraries and would've had to write all of the parsing code by hand, turning a task that took a few minutes into a task that I'd be lucky to get done in an hour. Also, if I didn't know how to use the library or that I could use a library, I could easily find out how I should solve the problem on StackOverflow, which would massively reduce accidental complexity. Needless to say, there was no real equivalent to Googling for StackOverflow solutions in 1986.
Moreover, even today, this task, a pretty standard programmer devops/SRE task, after at least an order of magnitude speedup over the analogous task in 1986, is still nearly entirely accidental complexity.
If the data were exported into our metrics stack or if our centralized logging worked a bit differently, the entire task would be trivial. And if neither of those were true, but the log format were more uniform, I wouldn't have had to write any code after getting the logs; rg or ag would have been sufficient. If I look for how much time I spent on the essential conceptual core of the task, it's so small that it's hard to estimate.
Query metrics
We really only need one counter-example, but I think it's illustrative to look at a more complex task to see how Brooks' argument scales for a more involved task. If you'd like to skip this lengthy example, click here to skip to the next section.
We can view my metrics querying task as being made up of the following sub-tasks:
- Write a set of Presto SQL queries that effectively scan on the order of 100 TB of data each, from a data set that would be on the order of 100 PB of data if I didn't maintain tables that only contain a subset of data that's relevant
- Maybe 30 seconds to write the first query and a few minutes for queries to finish, using on the order of 1 CPU-year of CPU time
- Write some ggplot code to plot the various properties that I'm curious about
- Not sure how long this took; less time than the queries took to complete, so this didn't add to the total time of this task
The first of these tasks is so many orders of magnitude quicker to accomplish today that I'm not even able to hazard a guess to as to how much quicker it is today within one or two orders of magnitude, but let's break down the first task into component parts to get some idea about the ways in which the task has gotten easier.
It's not fair to port absolute numbers like 100 PB into 1986, but just the idea of having a pipeline that collects and persists comprehensive data analogous to the data I was looking at for a consumer software company (various data on the resource usage and efficiency of our software) would have been considered absurd in 1986. Here we see one fatal flaw in the concept of accidental essential complexity providing an upper bound on productivity improvements: tasks with too much accidental complexity wouldn't have even been considered possible. The limit on how much accidental complexity Brooks sees is really a limit of his imagination, not something fundamental.
Brooks explicitly dismisses increased computational power as something that will not improve productivity ("Well, how many MIPS can one use fruitfully?", more on this later), but both storage and CPU power (not to mention network speed and RAM) were sources of accidental complexity so large that they bounded the space of problems Brooks was able to conceive of.
In this example, let's say that we somehow had enough storage to keep the data we want to query in 1986. The next part would be to marshall on the order of 1 CPU-year worth of resources and have the query complete in minutes. As with the storage problem, this would have also been absurd in 1986, so we've run into a second piece of non-essential complexity so large that it would stop a person from 1986 from thinking of this problem at all.
Next up would be writing the query. If I were writing for the Cray-2 and wanted to be productive, I probably would have written the queries in Cray's dialect of Fortran 77. Could I do that in less than 300 seconds per query? Not a chance; I couldn't even come close with Scala/Scalding and I think it would be a near thing even with Python/PySpark. This is the aspect where I think we see the smallest gain and we're still well above one order of magnitude here.
After we have the data processed, we have to generate the plots. Even with today's technology, I think not using ggplot would cost me at least 2x in terms of productivity. I've tried every major plotting library that's supposedly equivalent (in any language) and every library I've tried either has multiple show-stopping bugs rendering plots that I consider to be basic in ggplot or is so low-level that I lose more than 2x productivity by being forced to do stuff manually that would be trivial in ggplot. In 2020, the existence of a single library already saves me 2x on this one step. If we go back to 1986, before the concept of the grammar of graphics and any reasonable implementation, there's no way that I wouldn't lose at least two orders of magnitude of time on plotting even assuming some magical workstation hardware that was capable of doing the plotting operations I do in a reasonable amount of time (my machine is painfully slow at rendering the plots; a Cray-2 would not be able to do the rendering in anything resembling a reasonable timeframe).
The number of orders of magnitude of accidental complexity reduction for this problem from 1986 to today is so large I can't even estimate it and yet this problem still contains such a large fraction of accidental complexity that it's once again difficult to even guess at what fraction of complexity is essential. To write it all down all of the accidental complexity I can think of would require at least 20k words, but just to provide a bit of the flavor of the complexity, let me write down a few things.
- SQL; this is one of those things that's superficially simple but actually extremely complex
- Arbitrary Presto limits, some of which are from Presto and some of which are from the specific ways we operate Presto and the version we're using
- There's an internal Presto data structure assert fail that gets triggered when I use both
numeric_histogram
and cross join unnest
in a particular way. Because it's a waste of time to write the bug-exposing query, wait for it to fail, and then re-write it, I have a mental heuristic I use to guess, for any query that uses both constructs, whether or not I'll hit the bug and I apply it to avoid having to write two queries. If the heuristic applies, I'll instead write a more verbose query that's slower to execute instead of the more straightforward query - We partition data by date, but Presto throws this away when I join tables, resulting in very large and therefore expensive joins when I join data across a long period of time even though, in principle, this could be a series of cheap joins; if the join is large enough to cause my query to blow up, I'll write what's essentially a little query compiler to execute day-by-day queries and then post-process the data as necessary instead of writing the naive query
- There are a bunch of cases where some kind of optimization in the query will make the query feasible without having to break the query across days (e.g., if I want to join host-level metrics data with the table that contains what cluster a host is in, that's a very slow join across years of data, but I also know what kinds of hosts are in which clusters, which, in some cases, lets me filter hosts out of the host-level metrics data that's in there, like core count and total memory, which can make the larger input to this join small enough that the query can succeed without manually partitioning the query)
- We have a Presto cluster that's "fast" but has "low" memory limits a cluster that's "slow" but has "high" memory limits, so I mentally estimate how much per-node memory a query will need so that I can schedule it to the right cluster
- etc.
- When, for performance reasons, I should compute the CDF or histogram in Presto vs. leaving it to the end for ggplot to compute
- How much I need to downsample the data, if at all, for ggplot to be able to handle it, and how that may impact analyses
- Arbitrary ggplot stuff
- roughly how many points I need to put in a scatterplot before I should stop using
size = [number]
and should switch to single-pixel plotting because plotting points as circles is too slow - what the minimum allowable opacity for points is
- If I exceed the maximum density where you can see a gradient in a scatterplot due to this limit, how large I need to make the image to reduce the density appropriately (when I would do this instead of using a heatmap deserves its own post)
- etc.
- All of the above is about tools that I use to write and examine queries, but there's also the mental model of all of the data issues that must be taken into account when writing the query in order to generate a valid result, which includes things like clock skew, Linux accounting bugs, issues with our metrics pipeline, issues with data due to problems in the underlying data sources, etc.
- etc.
For each of Presto and ggplot I implicitly hold over a hundred things in my head to be able to get my queries and plots to work and I choose to use these because these are the lowest overhead tools that I know of that are available to me. If someone asked me to name the percentage of complexity I had to deal with that was essential, I'd say that it was so low that there's no way to even estimate it. For some queries, it's arguably zero — my work was necessary only because of some arbitrary quirk and there would be no work to do without the quirk. But even in cases where some kind of query seems necessary, I think it's unbelievable that essential complexity could have been more than 1% of the complexity I had to deal with.
Revisiting Brooks on computer performance, even though I deal with complexity due to the limitations of hardware performance in 2020 and would love to have faster computers today, Brooks wrote off faster hardware as pretty much not improving developer productivity in 1986:
What gains are to be expected for the software art from the certain and rapid increase in the power and memory capacity of the individual workstation? Well, how many MIPS can one use fruitfully? The composition and editing of programs and documents is fully supported by today’s speeds. Compiling could stand a boost, but a factor of 10 in machine speed would surely . . .
But this is wrong on at least two levels. First, if I had access to faster computers, a huge amount of my accidental complexity would go away (if computers were powerful enough, I wouldn't need complex tools like Presto; I could just run a query on my local computer). We have much faster computers now, but it's still true that having faster computers would make many involved engineering tasks trivial. As James Hague notes, in the mid-80s, writing a spellchecker was a serious engineering problem due to performance constraints.
Second, (just for example) ggplot only exists because computers are so fast. A common complaint from people who work on performance is that tool X has somewhere between two and ten orders of magnitude of inefficiency when you look at the fundamental operations it does vs. the speed of hardware today. But what fraction of programmers can realize even one half of the potential performance of a modern multi-socket machine? I would guess fewer than one in a thousand and I would say certainly fewer than one in a hundred. And performance knowledge isn't independent of other knowledge — controlling for age and experience, it's negatively correlated with knowledge of non-"systems" domains since time spent learning about the esoteric accidental complexity necessary to realize half of the potential of a computer is time spent not learning about "directly" applicable domain knowledge. When we look software that requires a significant amount of domain knowledge (e.g., ggplot) or that's large enough that it requires a large team to implement (e.g., IntelliJ), the vast majority of it wouldn't exist if machines were orders of magnitude slower and writing usable software required wringing most of the performance out of the machine. Luckily for us, hardware has gotten much faster, allowing the vast majority of developers to ignore performance-related accidental complexity and instead focus on all of the other accidental complexity necessary to be productive today.
Faster computers both reduce the amount of accidental complexity tool users run into as well as the amount of accidental complexity that tool creators need to deal with, allowing more productive tools to come into existence.
2022 Update
A lot of people have said that this post is wrong because Brooks was obviously saying X and Brooks did not mean the things I quoted in this post. But people state all sorts of different Xs for what Brooks really meant so, in aggregate, these counterarguments are self-refuting because they think that Brooks "obviously" meant one specific thing but, if it were so obvious, people wouldn't have so many different ideas of what Brooks meant.
This is, of course, inevitable when it comes to a Rorschach test essay like Brooks's essay, which states a wide variety of different and contradictory things.
Thanks to Peter Bhat Harkins, Ben Kuhn, Yuri Vishnevsky, Chris Granger, Wesley Aptekar-Cassels, Sophia Wisdom, Lifan Zeng, Scott Wolchok, Martin Horenovsky, @realcmb, Kevin Burke, Aaron Brown, @up_lurk, and Saul Pwanson for comments/corrections/discussion.