You’ve heard of big data, but what about big code? Big code is among the new code challenges playing out across the IT industry as the sheer volume of source code that organizations have to manage--as well as the complexity of that code--constantly increases.
Let’s take a look at what big code means, the code challenges it creates and how developers and IT teams can respond.
What Is Big Code?
The term “big code” has been tossed around in a few different contexts over the past decade. Some folks have used it to refer to software associated with big data projects. For others, it involves taking a big data analytics approach to studying source code.
In this article on code challenges, however, I want to discuss another definition of big code that has begun to gain traction lately: the ever-increasing size and complexity of the codebases that the typical organization has to maintain.
From this perspective, we are living in the era of big code in the sense that everyone now needs to manage massive volumes of code. Even small companies with basic software needs may have to maintain tens of thousands of lines of code that is written in multiple languages, uses widely varying data structures, incorporates dependencies from dozens of upstream projects, and so on.
The Growth of Code--and Code Challenges
To illustrate what big code looks like and how the complexities of code management today are different from those of the past, consider the following data points:
- As of 2020, the Linux kernel contains about 28 million lines of code. That’s about 158 times more code than the 176,250 lines that Linux 1.0 included when it debuted in 1994.
- Windows 95 had about 10 million lines of code, Windows XP had 40 million, and Windows 10 reportedly contains about 50 million.
- Kubernetes, despite being only five years old and targeting a relatively limited use case, already involves about 2 million lines of code.
- As of 2012, a typical car reportedly relied on around 100 million lines of code. And that was just an ordinary, run-of-the-mill car, not a self-driving wonder.
- Depending on how you count, there are anywhere between 250 and 25,000 different programming languages in existence today. What’s more, there are multiple frameworks or programming tools associated with each of these languages.
These are only scattered data points, and they don’t prove that every organization everywhere is now dealing with bigger code challenges than it did in the past. It’s also true, of course, that lines of code is a crude and imperfect metric for assessing code complexity.
Still, in the absence of more comprehensive information, data like this offers a sense of how, by and large, the sheer size of codebases--and code challenges--has grown over time. It’s a safe bet that this is true not just for major, enterprise-grade platforms like Linux and Windows, but also for internal line-of-business apps that power the everyday operations of most companies.
Big Code Challenges, Big Code Solutions
The reason this matters from the perspective of developers and IT engineers is that the effort required to maintain, deploy and update applications tends to increase along with the size and complexity of a codebase.
The more lines of code you have in an application, the harder it is to pinpoint the source of a bug or ensure that a change in one part of your codebase doesn’t create a problem somewhere else. Likewise, the more modules your application includes, the harder it is to build and deploy a new release. And the more languages, types of databases, APIs and so on that you have to contend with within a single codebase, the more challenging it is to keep everything in sync, mitigate potential security issues and keep track of how it all fits together.
Making matters more complicated is the fact that, despite the multi-factor increase in the size and complexity of the typical codebase over the past decade or two, the tools available to manage these challenges haven’t changed much in the same period. IDEs haven’t evolved fundamentally. Neither have build or deployment tools, or even project management software.
There have been some incremental improvements in tooling. It might be easier to automate some aspects of software delivery today than it was a decade ago, thanks to integrated CI/CD platforms. Tools like Git (and GitHub and GitLab) have also made it easier to manage codebases in a collaborative, centralized way. Yet, by and large, the effectiveness of management tools hasn’t grown by magnitudes in the same way that the complexity of source code has.
Dealing with Big Code Challenges
If you’re a developer or IT engineer, then, big code means that your job is just harder than it used to be. Unfortunately, there’s no simple solution for this challenge.
There are, however, some small steps that teams can take to help ensure that they don’t drown in big code:
- Eliminate unused code, meaning code that lurks within your codebase, and is included in builds, but does not actually power any features of your application that are in active use.
- Embrace microservices, which help tame the complexity of sprawling codebases. Although microservices and big code are rarely discussed in tandem, I have a suspicion that part of the reason microservices have become so popular over the past decade is that they help make big code more manageable.
- Establish policies within your team regarding which development frameworks, tools and other resources you will use. To mitigate complexity, avoid situations where every developer does something different based on personal preferences or which programming framework happens to be trendy at a given moment.
- Make effective use of comments in code. Comments won’t tame all of the complexity on their own, but they’ll make it easier to troubleshoot issues and understand how one part of your codebase interacts with other parts.
The bottom line: Code is constantly increasing in size and complexity, and that probably won’t change anytime soon. The best that developers and IT teams can do is adopt strategies to help mitigate the challenges posed by big code, rather than trying to eliminate them entirely.