There is arguably more talk today than ever about diversity, equity, and inclusion within the open source world. But is that talk translating to meaningful change? Are underrepresented groups actually participating in open source software development to a greater extent than they did in the past?
The answer seems to be a tentative "yes" — although it's hard to say for sure, given the limited scope and inconsistent nature of the data available about diversity, equity, and inclusion in open source as of 2022.
Here's a look at what we know at present about the participation of underrepresented groups in open source projects, how it compares with historical trends, and which forces are likely behind the apparent trend toward greater diversity within the open source world.
Diversity in Open Source Communities: A Brief History
One of the puzzling aspects of open source software communities is that, although the gender, racial, and other identities of open source contributors are typically not obvious to their collaborators, members of underrepresented groups have traditionally been even more underrepresented in open source than they have in the tech world as a whole.
For example, GitHub found in a 2017 survey that a mere 3% of contributors to open source projects were women, and 1% identified as non-binary. Compare that with data showing that about 25% of all software developers are women, and the extent to which women are underrepresented in open source becomes striking.
What's perhaps even more interesting is that the trend seemed to have gotten worse during most of the past decade. Seven years ago, Breanden Beneschott, now co-founder and CEO at Mechanism Ventures, found that 5.4% of GitHub contributors were women — a considerably higher figure than GitHub's 2017 finding. Daniel Peacock, in a separate analysis, also found that the presence of women in the Debian project (a major Linux distribution) declined over the period 2004-2020.
All of these studies used different methodologies and different sample populations for quantifying the number of women contributors to open source. It's impossible, based on them, to draw apples-to-apples, year-by-year comparisons about gender diversity trends within open source. Still, in the absence of more comprehensive and consistent data, it seems reasonable to conclude that open source communities very likely grew less diverse from a gender perspective over the course of the 2010s.
It's likely that other forms of diversity — such as participation by racial minority groups — followed a similar pattern, although there is less data available for those categories of participants. GitHub's 2017 survey — which found that 16% of open source programmers identified with minority racial groups, as compared with about 34% of U.S. programmers overall — is the only data source I'm aware of that examines racial diversity in open source.
And again, this is all surprising because you'd think that if underrepresented groups were to achieve more inclusion in a certain corner of the tech industry, the open source ecosystem would be that corner. Unlike proprietary software developers, who have to go through hiring processes that are possibly laden with bias and then work in environments where they may face all manner of identity-based discrimination, open source contributors have the luxury of working in relative anonymity. Their real-life identities are obscured by GitHub usernames. Theoretically, the only thing that matters is the quality of their code.
And yet, that has not seemed to be the reality, at least for much of the past decade.
Open Source Demographics Today
That trend, however, may be changing, albeit at a slow pace.
The most recent detailed data on diversity in open source comes from a December 2021 report by the Linux Foundation, which found (among many other data points) that:
- 14% of open source contributors are women, a marked increase from the single-digit levels of representation reported in studies from the 2010s.
- 74% of open source contributors identified as heterosexual. The rest identified as lesbian, gay, bisexual, pansexual, asexual or queer, or chose not to answer. That's compared with about 5.6% of U.S. adults who identify as LGBT, according to Gallup.
- 17% of open source contributors say they have a long-term physical, mental, intellectual, or sensory impairment, as compared with 12% across the U.S. workforce in general.
(As for racial diversity in open source, the Linux Foundation report offered surprisingly little hard data. It noted that "Latinx, Black, and Indigenous groups are less likely to agree that people from different backgrounds have equal opportunities to participate and make decisions in open source," but it didn't provide specific numbers to contextualize this finding.)
It's difficult to make too much of the Linux Foundation's 2021 findings. Again, you can't draw apples-to-apples comparisons against previous reports because the Linux Foundation's methodology and sample population for its 2021 report were hardly identical to the approach used by GitHub in 2017. Still, the most recent data from the Linux Foundation suggests that, on the whole, open source is becoming a more diverse space — even with representation of women.
Why Open Source May Becoming More Diverse and Inclusive
Beyond the limited data about open source diversity today, there's also reason to believe that cultural changes within the open source world, and new initiatives designed to encourage greater diversity, are contributing to more inclusive open source communities.
In general, there has been more discussion within open source communities of the diversity issue over the past several years. And while it may be tempting to chalk that up to the broader social justice conversation that has taken place around the world over the past couple of years, the focus on diversity and inclusion in open source seems to predate the latter shift (although it may have been amplified by it).
For example, Linux creator Linus Torvalds famously took time off in 2018 to educate himself about empathy — a decision perhaps motivated by a sense within open source by that point in time that harsh and non-inclusive behavior was becoming increasingly intolerable.
Consider, too, efforts in 2017 to push Kubernetes to abandon its "master" terminology — a change that hasn't been fully implemented but that nonetheless points to efforts within the open source world starting around 2017 to take diversity and inclusion more seriously, even from a symbolic perspective.
And then there is the simple fact that the Linux Foundation in 2021 funded a detailed report to survey the state of diversity in open source. That's not something that the Linux Foundation, one of the most influential institutions within the open source world, had thought to do previously, despite regularly releasing studies focused on other aspects of the open source ecosystem.
In short, there are positive, if limited, signs that the open source world has actively sought to become more inclusive in recent years. Those efforts seem to be borne out by data that suggests that open source communities have, in fact, grown at least somewhat more diverse than they were until about five years ago.
The insights we have about the state of diversity, equity, and inclusion in open source as of 2022 are anecdotal and incomplete. But overall, the evidence seems to suggest that open source is becoming at least a bit more diverse, and not by accident. The changes are the result of concerted, albeit limited, efforts to make open source as diverse in practice as it should be in theory, given that the only thing that should really matter about open source contributors is the quality of their code.
About the authorChristopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, "For Fun and Profit: A History of the Free and Open Source Software Revolution," was published by MIT Press.