Technical debt arises when IT or development teams do not improve inefficient, outdated processes. Often, technical debt accumulates when teams consciously make a decision to choose a “quick fix” to a problem as opposed to a comprehensive long-term solution. Technical debt can apply to outdated equipment, hardware, or software.
A Technical Definition of Tech Debt
Technical debt accrues when data centers rely on outdated technology. Much like monetary debt, technical debt must be “repaid” some way or another down the line. Like monetary debt, technical debt can accrue “interest” if systematic problems compound overtime. When an organization neglects to modernize a system, which is a potentially costly endeavor, it instead applies fixes to the legacy system in place. These quick fixes turn into bugs, which requires more fixes and results in high maintenance costs. In fact, a recent study found that a typical company wastes 23% to 42% of development time on technical debt.
A Simple Definition
Technical debt is the cost of maintaining outdated systems as a cost-effective measure as opposed to investing in better solutions. When it comes to data centers, technical debt refers to the usage of outdated infrastructure and hardware systems that reduce data center efficiency. This type of debt accumulates when data centers keep legacy systems in place as opposed to investing in new and improved technologies. Data centers with high technical debt typically have higher energy costs that harm sustainability initiatives.
Origins of Technical Debt
First coined in the 1990s by Ward Cunningham in the Agile Manifesto, technical debt originally referred to software coding. “Shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with refactoring,” Cunningham said in 1992. “The danger occurs when the debt is not repaid. Every minute spent on code that is not quite right for the programming task of the moment counts as interest on that debt.”
Since then, the concept of technical debt has been applied to a variety of operations, including data centers. Every organization has some degree of technical debt, but it’s best to keep debt low so personnel can focus on new initiatives as opposed to applying endless maintenance to outdated systems.
Why Should You Care About Tech Debt?
While it saves money in the short term, technical debt costs more in the long-term. High technical debt can lead to malfunction and outages. A survey indicated that 45% of data center managers said that their most recent outage cost between $100,000 and $1 million in direct and indirect costs. The same study indicated that most outages are preventable with better management and configuration.
How Do You Identify Tech Debt?
It can be challenging to identify technical debt because the causes are myriad. In order to ascertain what type of technical debt there is and how to fix it, you should:
- Talk to on-campus workers about possible inefficiencies. Individuals involved in the day-to-day operations will be aware of issues and can advise on how to fix them.
- Take an inventory of equipment, assess the lifespan, and determine if it is operating efficiently.
- Analyze data to identify potential power waste and opportunities to automize inefficient manual processes.
Types of Technical Debt
Many factors can increase the amount of technical debt an organization faces in its day-to-day operations. Some technical debt is deliberate, such as cutting corners to make a deadline with the idea that the problem will be fixed later on, while other technical debt is accidental, such as systems aging and failing over time. Successful data centers refurbish old equipment regularly and invest in new technology to ensure sustainability and efficiency.
Server racks, power systems, cooling equipment, and other physical infrastructure become outdated and inefficient, requiring a higher cost of power and maintenance. Outdated equipment fails more often and requires repairs, increasing downtime. Systems that require complicated configurations, which are often applied as a quick fix as opposed to a system-wise overhaul, are more prone to manual error.
Older software and applications often have security vulnerabilities that leave data centers exposed to breaches and hacks. Logging is typically inadequate in legacy systems so attackers can take advantage of flaws. Many times these systems are kept in place because operators do not want to touch older (but workable) code, fearing they will break it.
Legacy software may not run with newer equipment. Software built in 2008 may either not run with newer infrastructure or the operating system may sunset support for the program all together. Legacy software also may not automate some functions requiring a greater deal of manual labor, which can result in a higher rate of human error.
Documentation on systems may be old, outdated, or poorly communicated to workers. If workers do not know how systems work or how errors were solved in the past, they will have trouble fixing future problems.
Real World Technical Debt Examples
Seventy-eight percent of data center managers believe downtime is preventable with better management, processes, or configuration. In 2019, Wells Fargo famously experienced an outage that left customers unable to access their bank accounts due to a preventable fire in a data center. Instances of data center outages have gone down in recent years, but the cost of failure went up.
Uptime Institute recently studied data center outages and discovered that many failures are due to operators cutting short-term costs, which made systems more susceptible to catastrophic failure. For example, ATS components were replaced with customized switchgear. These custom parts have a greater number of controls, and maintenance technicians must ensure everything is synchronized. This high-level of customization opens up systems to failure, documentation technical debt, and software technical debt.
Read more about technical debt in these Data Center Knowledge articles: