Skip navigation
programming code Alamy

It's Not Just Infrastructure; It's Data Infrastructure

Choosing the right data infrastructure can be a daunting task, so more organizations are turning to cloud partners.

At some point, we started referring to data in the broader context of "big data." But now we just call it "data" again. We're generating data at an explosive rate. In fact, some predict that the 44 zettabytes (yes, zetta) that we had worldwide in 2020 will grow to 175 zettabytes by 2025. For context, that's 175 trillion gigabytes. That's pretty close to infinity to the infinity power.

This is a clear indicator that processing, capturing, and storing data is more critical than ever before. As businesses pivot towards being data-driven and as we increase the presence of BI, AI, and ML in the world, that criticality is only amplified. When you consider the plethora of data storage options today, there's really no reason to stay off the bandwagon. Hop on!

But with a dramatically increased need for data coupled with that plethora of options, how do you decide the best combination to store your data? There are three superpowers that form like Voltron (OK, three-fifths of Voltron) to define your overall data landscape:

  • Data infrastructure defines the plumbing for how it will all work
  • Data pipelines are used to move data around (e.g., ingestion, ETL, sharing)
  • And data management governs how you create, store, secure, and access data

What Is Data Infrastructure?

Your standard IT infrastructure consists of things such as computers, networks, and attached devices. These can be physical, virtual, or both. If data is cumbersome to access or consume, use will surely wane. If data is expensive to store or retrieve without providing equal or greater value, it's no longer economical. These are the problems that a modern data infrastructure addresses. Storing #allthedata is one thing, but storing it optimally is where the real cheese is.

You can quickly become overwhelmed with the options available. MySQL, PostgreSQL, CouchDB, MariaDB, CockroachDB, are you serious? YES! Fortunately for everyone (yes, I'm looking at you, DBA team), there's a healthy mix of hosting options to support this cornucopia of options.

You can manage your own database such as installing MS-SQL on virtual instances. You can use a managed database where the cloud provider does all the maintenance of your database, while you provision and use it. And the cherry on the cake here is serverless databases. In these cases, you are only concerned with putting data in and getting data out. You don't have to right-size instances or worry about how/when to scale. The system manages that for you seamlessly in the background.

On a final infra-related note, as we see more event-driven applications and service-oriented architectures, we should celebrate how free we are to use one to many databases to support our applications. We now have the luxury of combining the power of relational databases with NoSQL databases right alongside object data stores. Combined together, these purpose-built database solutions can create a scalable, resilient, and economic solution to enable our applications for success.

What Are Data Pipelines?

Data pipelines are defined workflows that help massage and ship data around. Pipelines can be batch-driven, micro-batch-driven, or streaming. Those pipelines can also help transform your data and add value to it as it flows along the overall process. Finally, data pipelines can enable you to share your data with downstream consumers at various stages in the overall data lifecycle.

For example, real-time use cases can pull data from a pipeline all the way upstream, but at the expense of working with raw or uncurated data. Likewise, batch use cases can grab data further through the pipeline where it may have more sharp edges filed off but is also more latent.

What Is Data Management?

Data management is a nebulous topic in and of itself, so we'll settle for a simple overview for the purposes of this blog. Data management will help you ask and answer several key questions.

  • Where do you need your data? On-premises? Private cloud? Public cloud? A mixture?
  • Inside of these locations, which database technology(ies) will you use? Relational? NoSQL? In-Memory?
  • How will you govern standard CRUD operations to the data as it becomes fragmented across locations and databases?
  • What is the cost of your data being down, and does your HA/DR plan show the right investment to mitigate this risk?
  • How will you secure and appropriately audit the access of your sensitive data?
  • What are the right retention policies to ensure that you are keeping the right amount of data in the right spot?

The DIY or Have-It-Built-for-You Dilemma

With all the prior content considered, choosing the right data infrastructure can be a daunting task. While absolutely worth the investment, defining this landscape and/or deploying the options can be cumbersome for companies to take on by themselves.

There has been a major shift in the IT industry over the past few years where more and more companies are pivoting their focus to their value proposition and shying away from running massive IT departments. This is what I like to describe as "focusing on IP rather than IT." All the cool companies are doing it. If you can have your IT team focus on creating value, and have a cloud partner take over the mundane tasks such as OS updates, database patching, etc., why wouldn't you? When you also consider how tight the IT labor market has become in 2022, it just makes so much sense.

This isn't black and white, however. A good cloud services partner will be able to supplement your IT team where and how you need it. Some companies need to share their problem statement and have a partner run with the solution space. Some companies want to be more actively partnered and merge teams to co-develop. And some companies need a partner to educate them, set the foundation, and help them along their way.

In any of these scenarios, a worthy partner will bring abundant experience to help you solve problems faster and avoid common pitfalls. The most valuable advice I've ever received in technology is usually along the lines of "Hey, I tried that before and it was painful! Here's a slightly different approach that worked well for me. Let me help you."

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.