It’s nearly impossible to read an article about application development these days without running into the concept of containers – portable, standalone, software-based packages that contain everything an application requires.
That’s especially true of cloud-native environments, which are fast becoming the primary source of application development. According to Gartner, more than 95% of new digital workloads will be deployed on cloud-native platforms, up from 30% in 2021.
The growth of cloud-native application development works best with a different type of storage: storage developed specifically for containers.
What Is Container-based Storage?
Container-based storage caters to distributed databases and applications. It includes the pieces a storage environment needs inside a container, without dependencies. That means the storage is infrastructure agnostic and can be moved between environments easily.
Most importantly, container-based storage is dynamically provisioned, application aware, and agile. As such, it very useful for data centers and edge infrastructures, which require flexible workloads.
“Traditionally, storage volumes must be created ahead of time by a storage administrator, but in a cloud-native environment, it’s all about self-service on demand,” explained Goutham Rao, chief technologist for Portworx by Pure Storage.
In addition, traditional storage technologies can’t scale to the level required for container-based workloads. The container method, however, can have several hundred containers running on a single server. In a typical environment, you can have tens of thousands of containers and volumes – much more than most environments’ back-end storage can handle.
That’s why so many container-forward organizations are incorporating storage that is managed directly by containers – in most case, Kubernetes. To make it work, the industry developed the container storage interface (CSI), which manages storage allocation and provisioning. CSI essentially creates a communication path between the container and the storage subsystem of the cloud-native storage layer. CSI also specifies the storage class, which carries all the required properties, such as shared volumes, object storage, or NVMe. This creates a persistent volume, which ties specific storage volumes to specific containers, no matter where the container runs in a cluster.
With the ability to wrap storage classes, create persistent volumes, and include policy around everything, it’s also much easier to optimize storage for the load in ephemeral environments.
“Kubernetes automatically creates persistent volume claims based on your desired profile, policy, or desired stat,” said Michael Cade, a senior global technologist at Veeam. “It can then use that methodology to choose the best storage for your application or data.”
Take the example of developers working with Cassandra, a popular database used in cloud-native computing. A simple deployment of Cassandra can easily involve six or more different Cassandra instances, each running on a different machine. By putting all the data associated with the six different containers on the same disk, you would create an undesirable application hotspot. This is an ideal use case for container-based storage.
Tips for Choosing Container-based Storage
When considering container-based storage, the first decision is whether to opt for an open-source or commercial container storage offering. And just like any other type of software, it’s important to think about the trade-offs. In general, those trade-offs are between cost, supportability, and vendor lock-in. But either way, make sure to spend enough time on the initial architecture.
Without a full understanding of storage, for example, developers are likely to simply create new Kubernetes clusters, which can create cost and shadow IT problems. It’s critical that, from an operations point of view, you take the time to understand the capabilities you are going to sign up for before spinning up new storage instances.
It's also important to take a “security first” mindset. While containers may have security strengths, they are not inherently secure.
“With Kubernetes-managed storage, there are often persistent volumes that can be associated [with] or attached to a pod – a compute resource or unit of Kubernetes,” explained Gary Ogasawara, CTO of storage vendor Cloudian. “Even if you delete the pod, the persistent volume underneath isn’t deleted, so while a user might think their underlying storage is deleted, it isn’t. That’s an obvious security problem that almost all Kubernetes users run into at one time or another.”
Security issues often occur because people are in a hurry to get things up and running, which can lead to misconfigurations in container storage. Misconfigurations could expose the container’s system credentials to a potential attack. It could also expose important personal information that might be stored in the container. To mitigate that issue, it’s important to insist on procedures that require deletion of the persistent volumes as well as the pods.
Container-based storage has many benefits, but it can be complicated, at least at first. That’s why it can make sense to take advantage of managed Kubernetes as a service. Offered by major cloud providers and some specialized vendors, the idea is to provide easy-to-use resources to help organizations create, update, debug, resize, and better use container clusters. The Google Kubernetes Engine, for example, enables users to run as many pods as necessary at a time, along with the ability to attach multiple nodes to a cluster, isolate containers in sandbox environments, control configurations, and build applications with attached persistent storage.
But what about all the existing storage technology you already have? Is it worthless once you move more fully into the container world?
Not at all, Rao said. In fact, it’s usually possible to integrate existing software by using a software-defined storage overlay. The overlay can consume the existing storage technologies and present them in a cloud-native way. Essentially, it’s a virtualization layer that adds intelligence and glue to Kubernetes and presents virtual volumes in such a way that it has all the desired properties: dynamic provisioning, programmatic assignment of volumes, density, and availability.
Full Steam Ahead
Even organizations behind the curve on IT modernization and digitization will probably get on the container bandwagon at some point, and container-based storage will probably come along for the ride. Ogasawara said he has seen it over and over again in recent years: corporate-wide initiatives from the CIO’s office requiring that all new environments must be containerized, even environments at the edge. When that happens, the first question is how to best manage the storage layer.
All of these factors make one thing clear: If your organization is moving heavily into containers, it’s probably time to consider container-based storage. It’s an effective way to increase the agility of the infrastructure overall.
“Storage is frequently the lowest common denominator in a data center, and you don’t want to bring Kubernetes down to the lowest common denominator,” Rao said. “If you’re still manually provisioning storage, for example, everything else is hampered by that slow-moving part.”
While moving to container-based storage a practical change, it also means changing your mindset.
“You have to understand that the industry is moving toward containers to give developers more agility, and that it means you’re turning over the keys to the developers,” he added. “You can’t be stuck in the mindset that everything needs to go through a ticketing system and has to be provisioned by a data center administrator.”
About the authorKaren D. Schwartz is a technology and business writer with more than 20 years of experience. She has written on a broad range of technology topics for publications including CIO, InformationWeek, GCN, FCW, FedTech, BizTech, eWeek and Government Executive.