Technical Debt and Modular Software Architecture

Some technologists argue that it should be possible to accommodate change by “innovation surfing,” by which they mean something like anticipating or coopting the disruption wrought by innovation.[i]

The Achilles heel of this idea is the presumption that an organization’s existing technology investments -- its technical debt -- do not (in specific cases or in the aggregate) comprise impediments or bulwarks to its freedom of movement. Think about it: A surfer minutely adjusts her movements as she intuits and responds to dynamically changing conditions (e.g., a wave’s height, peel angle and breaking intensity). An organization that is constrained by the inertia of its accrued technical debt does not have the ability to respond as deftly. Like a surfer with plantar fasciitis, the organization’s freedom of movement is restricted: It cannot make the necessary adjustments to negotiate the collapse of the fast-breaking wave.

Putting aside the ideal of “surfing the wave” of innovation, technical debt also limits an organization’s ability to navigate the effects of innovation -- or change of any kind, for that matter. For example, an organization may be unable to take full advantage of the transformative effects of a revolutionary technology paradigm -- say, observable software architecture -- because it is constrained by its investments in legacy software architectures that (by definition) are not observable. In this respect, then, the accrued cost of its technical debt determines what is practicable for the organization.

Technical Debt Explained

The acquisition of technical debt is inextricably bound with the use of technology. So, for example, an organization incurs technical debt whenever it invests in a technology. It increases its technical debt load whenever it customizes this technology to address a specific use case, solve a certain problem, or extend features and functions, as when it backports functions from a new version of a popular open-source package to an older version of that software. Or when it prioritizes certain tactical advantages (such as convenience) over strategic benefits (such as potential portability between and among cloud platforms). The cost of technical debt is not always a simple (or obvious) calculus, however.

This is no less true of the transition from on-premises to cloud infrastructure.

On the one hand, an organization incurs technical debt whenever it opts to host its own data in its own IT infrastructure. The cost of this technical debt is nominally higher than that of keeping -- or putting -- this data in(to) a cloud context. Servers, storage, networking kit, and, of course, software cost money to purchase, license, lease and/or maintain. The organization must build, rent or lease data center facilities to house, power and cool its IT infrastructure. Not least, it must employ (hire, retain, and, if necessary, recruit) skilled IT technicians to support, maintain and enhance this infrastructure. Most organizations are very well versed in the vagaries of this cost model.

On the other hand, an organization incurs technical debt of a different kind whenever it cedes effective ownership of its data -- i.e., its own intellectual property -- to a cloud services provider. Over time, software and data architecture (along with the software engineering and data management practices tasked with supporting these architectures) tends to evolve to conform to this paradigm. In the same way, the organization’s value-creating practices -- e.g., software engineering that focuses on developing new products and services; business analysis, data science, machine learning (ML) and artificial intelligence (AI) engineering; etc. -- will evolve to conform to this paradigm, too.

What is the cost of designing (or evolving) an architecture to depend on a provider-centric data access model? Of building specialized practices that are (to some extent) focused on or clustered around this model? Of accommodating the constraints and vicissitudes associated with this model, such as the access restrictions (or costs) imposed by API rate limits, per-API charges and/or outbound data-transfer charges? Especially when, in most cases, organizations must also maintain “legacy” on-premises infrastructure and assets because some workloads cannot be moved to the cloud?

How Technical Debt Accrues: A SaaS-specific Case Study

An organization incurs technical debt whenever it cedes its rights and perquisites as a customer to a cloud service provider. To get a feel for how this works in practice, consider the case of a hypothetical SaaS cloud subscriber. The subscriber incurs technical debt when it customizes the software or redesigns its core IT and business processes to take advantage of features or functions that are specific to the cloud provider’s platform (for example, Salesforce’s, Marketo’s, Oracle’s, etc.).

This is fine for people who work in sales and marketing, as well as for analysts who focus on sales and marketing. But what about everybody else? Can the organization make its SaaS data available to high-level decision-makers and to the other interested consumers dispersed across its core business function areas? Can it contextualize this SaaS data with data generated across its core function areas? Is the organization taking steps to preserve historical SaaS data (e.g., capturing online transaction processing data before it gets overwritten by new data)?

In short: What is the opportunity cost of the SaaS model and its convenience? What steps must the organization take to offset this opportunity cost? What new technical debt must it take on?

This is a surprisingly complex calculus. In such cases, SaaS subscribers tend to incur a kind of second-order technical debt. It is not unprecedented, after all, for an interested consumer to go to heroic measures to get the data she (thinks she) needs. Some get data directly from SaaS apps -- e.g., by using self-service tools to extract data via SaaS API endpoints and then sharing it with other interested consumers in different formats, such as .CSV extracts or Tableau workbooks. Exceptionally enterprising users might even create on- or off-premises data marts for SaaS data. Like the barnacles that attach themselves to the hull of a newly coppered ship, different kinds of technical debt -- in this case, tools and processes that must be supported -- attach themselves to an organization over time.

What can an organization do to mitigate the cost of this second-order technical debt?

The SaaS use case I described above illustrates that an organization’s overriding priority is to own and control its cloud data. To own and control one’s cloud data is to own and control one’s destiny.

This priority extends to any proprietary SaaS and PaaS cloud offering. At a high level, then, an organization must develop a plan to own and control its cloud data independent of the services from which this data originates. For some organizations, this might entail replicating and synchronizing data from the SaaS and PaaS cloud to the on-premises environment. For many organizations, it will likely involve moving data from the SaaS and PaaS clouds to a cloud object storage service, such as AWS S3.

It is important to stress, first, that any such plan must itself produce new technical debt, and, second, that the use of cloud technologies entails not-so-obvious cost, complexity, and/or data access issues.

Getty Images

The Challenge of Data Access in the Cloud

The problem with the cloud is that a phalanx of technological, practical and economic barriers functions to discourage customers from directly accessing data via SaaS and PaaS apps. The primary reason for this has to do with logistics: Cloud providers must support hundreds, thousands, even tens of thousands of subscribers, which (in practice) means they must support hundreds of millions or even billions of API call requests per second. To ensure the availability and reliability of their cloud services, then, providers limit the rate at which subscribers can send requests to their API endpoints.

These API rate limits comprise technical and practical barriers to data access and movement. For one thing, providers usually limit the number of API requests a subscribing organization (i.e., individual users, individual IP addresses, individual projects, etc.) can make during a fixed period. Some providers permit a fixed number of API calls (say, 1,200) during a finite duration (say, 120 seconds). If or when a subscriber exceeds this limit, the provider stops servicing requests. Alternatively, the provider may continue to accept API calls but will charge the subscriber a fixed amount for overages.

To cite another example, some providers support a maximum number of API requests each month -- for example, no more than 25 million. One problem is that API rate-limit policies vary from cloud provider to cloud provider. As a result, organizations cannot build software -- or define and enforce policies -- in accordance with least-common-denominator criteria. Since APIs are the sole means of accessing cloud services -- let alone requesting data from cloud services -- it is critical that organizations understand the behavior of their human- and machine-controlled apps and services, as well as the requirements of the business processes, use cases or workloads that this software supports.

Understanding is just the beginning, however. More basically, organizations must develop a sustainable model for preserving and managing historical cloud data. Like it or not, this entails rethinking and, in effect, reimagining, data access at an architectural level: The organization must design a sustainable architecture -- that is, a manageable, governable and, above all, scalable architecture – for accessing data in the cloud.

The Diffusion of Novel Technologies Is Usually an Insurgent Phenomenon

Another problem is that the pattern of new technology uptake and adoption is neither rational nor sustainable. An organization’s uptake of disruptive technology can usually be traced as a movement from the particular to the universal. In the beginning, the technology serves one or more specific purposes and addresses the specific needs of one or more internal constituencies. Over time, the technology not only gets taken up by new constituencies inside the organization, but it gets used for different (sometimes novel) purposes. As it diffuses up, down and across an organization, the technology becomes critical to the maintenance and growth of the organization’s business operations.

Broadly speaking, then, technology uptake entails a movement from the tactical to the strategic: As a technology becomes integral to the organization and its operations, it eo ipso becomes strategic. As a result, the organization starts to think about how the technology fits, gels and can be reconciled with its existing IT systems, applications, services and processes. It starts looking at how the technology gets used in specific IT and business processes. In the case of radically disruptive technologies, the organization eventually looks at how best to adapt itself -- i.e., its IT assets and IT/business processes -- to maximize the benefits of the technology. Lastly, to ensure the continuity of its business operations, the organization starts thinking about how to manage and govern its use of the technology.

The result is that the once-disruptive technology gets assimilated into, and rationalized as part of, a larger whole. Every once-disruptive technology model -- be it open-source software, mobile computing, hardware/software virtualization, or cloud itself -- has conformed to this process of assimilation.[ii]

Getty Images

Assimilating and Making Sense of Novel Technologies

Believe it or not, software architecture can help simplify this process of assimilation and rationalization.

For example, recent trends in software architecture emphasize the design of loosely coupled software programs that perform a definite set of functions (e.g., SQL query functions, data profiling functions, data validation functions, functions to convert a data payload encoded in one character set [ASCII] to another [Unicode]) and which are instantiated as services. These services expose different types of endpoint interfaces that can be invoked by human and machine consumers alike.

Architects design workflows and dataflows that knit together and orchestrate operations between distributed services, constructing larger composite applications. Taken to its extreme, this is the essential logic of microservices architecture. However, the same logic underpins the use case- or function-specific SaaS products that Amazon (AWS), Microsoft (Azure), and Google (Google Cloud Platform) expose via their cloud stacks. If these services are generic enough -- that is to say, abstract enough -- an architect could replace an AWS service with an equivalent service in Google Cloud Platform.

The services that comprise these stacks are designed as modular components. In other words, each service addresses a specific use case or provides a set of specific -- and, moreover, related -- functions.

Cloud Services, Abstraction and Modularity

To cite a concrete example, a SaaS extract, transform, and load (ETL) service such as AWS Glue provides (a more or less) comprehensive set of the functions associated with accessing, engineering and moving data. These functions include data profiling, data validation and data transformation capabilities. In the same way, a custom data quality service built on top of Deequ could provide data consistency, matching, deduplication and verification functions. More important, both services could be used and, optionally, orchestrated in tandem with one another. Better still, use of the one service would in a way “lead to” the other. The services complement and, in an essential sense, belong to one another: As an organization (or, rather, its expert users and consumers) matures in its use of one service, it identifies a need for one or more other complementary services. Lacking a SaaS data quality service in AWS, Azure or Google Cloud Platform, it uses an open-source technology (Deequ) to design its own. Think of this as a natural dependency, an essential complementarity, rather than a forced or manipulated dependency.

In theory, modular abstraction of this type could also make it possible for subscribers to switch out one SaaS provider for another with minimal disruption to their IT and business processes. This logic has affinities with Clayton Christensen’s modularity theory, albeit stripped down to its core idea -- namely, that modularity is a special case of abstraction. In the first place, modular design is a means of abstracting a larger design by decomposing it into complementary (i.e., modular) units. So, for example, a conventional car is a modular design in that it consists of a load-bearing structure (a chassis); a machine which converts combustion into energy (an engine); a machine that keeps the engine running in its most efficient RPM range (a transmission); a mechanism that transfers the controlled power output of the transmission to three or more wheels (a differential or transaxle); and so on.

In the second place, replaceability is a special property of modularity. To change analogies, a web server is a modular abstraction in that it performs numerous generalized tasks and provides a common set of features and functions. It makes use of modular design in that most of its features and functions are implemented via discrete modules. So, for example, the Apache web server contains core modules for security (mod_security), redirection (mod_rewrite), compression (mod_deflate) and caching (mod_cache), among others. Modular design makes it easier for Apache’s developers to maintain and enhance its sizable web server codebase. The modularity of the web server itself -- the fact that it provides a standard set of features and functions and is designed to support certain standard use cases -- makes it possible for an organization to replace one web server (say, Apache) with another web server (Nginx), assuming other requisite requirements are met.

Modularity Reconsidered

The useful idea is that both kinds of modularity are beneficial -- in this case, to cloud providers and cloud subscribers alike.

From the cloud provider’s perspective, modular SaaS services are easier to design, scale and maintain. A SaaS stack that provides discrete data profiling, data cleansing, data quality and data transformation capabilities is ipso facto easier to scale and maintain than a monolithic stack that incorporates data profiling, data cleansing, data quality and data transformation capabilities into a single SaaS application. The empirical tendency is for monolithic applications to accrue extra features and functions, some of which are essential but many of which are not. Like free-floating barnacles that attach themselves to the hull of a ship, these incidental features and functions degrade the application’s performance, compromise its reliability and complicate its maintenance.

From a subscriber’s perspective, modular SaaS products have at least two benefits. First, they permit a degree of abstraction, such that a subscriber’s IT infrastructure (not only its IT systems and processes but also its data and software architectures) is not tightly coupled to a specific SaaS provider’s stack. Put differently, it is easier -- or more practicable -- to replace modular SaaS services.

Second, modularity enforces a kind of design discipline with respect to an organization’s own software development efforts. Architects and developers focus on designing, maintaining and enhancing value-added bits -- i.e., use cases and sets of functions that are not easily generalizable or which are specific to the organization itself, its business processes and workflows, or the verticals in which it competes. If a provider changes the terms of its services or if something about the market changes (e.g., a competitive SaaS provider offers services that better suit the organizations’ needs), architects and developers can refactor apps/services to use equivalent services from other providers if necessary.

Getty Images

Conclusion: The Benefits of Modular Software Architecture

The upshot is that the organization that architects its software stack to accord with this special sense of modularity (i.e., abstraction at the level of generalized use-cases or sets of functions) enjoys at least a degree of insulation against the effects of change. If AWS alters the licensing terms for one or more of its AWS services, this organization could refactor its application workflows, data engineering pipelines and other software assets to exploit equivalent services from one or more third-party cloud providers. Alternatively, the organization could opt to build, deploy and maintain its own services. This would not necessarily be easy. It would, however, be manageable (i.e., doable, cost-effective). Today, the reverse of this do-ability is all too often the rule, rather than the exception.

To sum up, deploying available SaaS and PaaS products to support generic use cases or to provide a useful set of generalizable functions provides a hedge against not just avoidable technical debt but dependency on a specific cloud services provider. (Dependency is also, in its way, a kind of technical debt.)

It likewise permits an organization to focus on value-creating (i.e., unique, use case-specific) software engineering. Outside of internet companies, few organizations can afford to service a large amount of technical debt; rather, they try to take on as little technical debt as possible. In this respect, a modular application and data architecture comprises a sustainable foundation for designing, maintaining and scaling essential services; introducing new ones; exploiting the availability of new generic SaaS/PaaS services; and otherwise accommodating different kinds of economic, technological, etc., change.

A modular software architecture that makes use of generic SaaS/PaaS cloud products when and where available does not necessarily sacrifice performance or functionality. Instead, it affords an organization the freedom to focus on designing, maintaining and enhancing value-added software that improves its day-to-day business operations, or which is integral to the development and delivery of new products and services. At the risk of belaboring the point, this architecture should also prove to be more adaptive than one that is tightly coupled to a specific cloud stack or software implementation.

[i] The exemplar discussion of innovation surfing is Joseph Bower and Clayton Christensen’s seminal 1996 article in the Harvard Business Review, “Disruptive Technologies: Catching the Wave.”

[ii] This should not be a controversial observation. In the first place, it is consonant with Simon Wardley’s well-known Pioneers, Settlers and Town Planners model. In the second place, it is likewise consonant with a model that economist Eric von Hippel has explored in his own work. As von Hippel argued in The Sources of Innovation (MIT Press, 1988), human analysis tends to invert causality in assessing the forces that spur and drive innovation. The upshot, von Hippel argued, is that expert users, not producers, are the ur-engines of innovation (p. 13). He cited the case of U.S. semiconductor manufacturing, noting that “U.S. [semiconductor manufacturing] equipment builders are falling behind because the U.S. user community they deal with is falling behind. If this is so, the policy prescription should change: Perhaps U.S. equipment builders can best be helped by helping U.S. equipment users to innovate at the leading edge once more” (p. 10).

The main thrust of Von Hippel’s work is orthogonal to this discussion, but his insight that innovation is an insurgent phenomenon -- that is, something that arises out of use -- has special salience. It is not just that consumers drive innovation; it is, rather, that the people at the forefront of use are, in a sense, cartographers of innovation. They’re the expert users, close to the problem domain, who explore and map needs, gaps, impasses, pain points, etc. In this respect, they apply extant technologies in quite novel ways, acting as trailblazers for producers, who take their discoveries and generalize them, make them manufacturable, etc., engineering new products and services that address emergent needs. (A textbook case of this is the early data visualization tools market, which -- in the space of about five years -- morphed into a full-fledged market for self-service business intelligence discovery products.)

It sometimes happens -- as with technology start-ups -- that an expert user and producer are one and the same. It also happens -- as with the open-source model -- that an expert user becomes producer, or at least has a role in production. But the salient point is that rather than being dictated by top-down policy prescriptions, “strategic appraisals,” etc., the growth and uptake of disruptive technology is usually driven by the needs of expert users who are close to a problem domain. That is, uptake occurs in spite of (and sometimes in contravention to) top-down policies.

Comments

Plain text