The concepts behind Docker didn’t evolve in a day; they were based on ideas that had been percolating for a long time (in computer years). Indeed, Docker's container platform--and Docker container security specifically--owes a nod to several different schools of thought about how to group processes together and make them do work, without easily getting out of control.
There are long-held constructs in computing that divide an operating system into privileged and unprivileged sections. In Linux, derived from Unix, this translates into a root user and a lesser privileged user. Sometimes the processes used by a non-root user need to call or interact with processes that have root privileges. Gaining unauthorized control of root privileges has often meant goading apps with root privileges to do something they shouldn’t for the lesser privileged.
The need to find a way to build walls around processes--or put them in metaphorical sandboxes where they can play--is an idea a lot older than containers. In the differing branches of Unix, the first walls were made by isolating an unprivileged process into an isolated process.
This early isolation, invoked by the chroot command in BSD (Berkeley Software Distribution, a branch of AT&T's Unix), allowed an administrator to isolate where any invocation of a command could operate. An unprivileged process could call a root process, but where that unprivileged process could operate was isolated. It was simple and somewhat effective. The designer of FreeBSD, a branch of BSD, then introduced the concept of jails. Jails were early containers, fulfilling a concept of having code, data and a wall between all that and root-privileged processes.
Sun Microsystems' (now Oracle’s) SunOS and Solaris further developed jails into zones. In zones, Solaris made the code, its data and storage into more of an atomic element--that is, an object. A zone could be moved, given group access and treated as a collection of objects, perhaps glued together by other code. Zones were portable, but only if you used Solaris across your server or system turf.
Virtuozzo moved the concept back to more generic (than Solaris/SunOS, that is) platforms, where it became more of a resource virtualizer--along the lines of the first virtual machines. Virtuozzo had the elements of a hypervisor, permitting re-instantiating (re-representing) broad amounts of operating system and even hardware resources. Around this same time, Xen came into being, and there was a branching of efforts among hypervisors that could support many operating system functions with walls supported by CPU memory management features.
The hypervisor branch spawned SuSE (now SUSE) Xen, Citrix Xen and VMware. These systems were designed to carry a significant load of processes, including entire operating systems--all aided by the huge memory model of 64-bit CPUs from Intel and AMD that allowed servers to give plentiful memory space to operating systems living concurrently inside the same server.
Hypervisors branched to offer hardware services to operating systems, while the characteristics of those hardware services were controlled by administrative selection.
The container concept, however, was more minimalistic. Instead of supporting several operating systems, where each believed it owned the entire machine through hypervisor trickery, containers contained just enough code to perform a few functions, using the apps and resources inside a single host operating system instance. Hypervisor-based instances are full operating systems--Windows, Linux, BSD and even, in some cases, MacOS--living inside a single machine. The instances are unaware of other OS instances, believing they own their own server hardware, having had that hardware allocated by hypervisor administrative allocation and control. By contrast, container constructs live inside a reduced privilege model controlled by chroot and something new, cgroups.
Process containers evolved in Linux, and Linux became the experimenter’s platform for both container/minimalist and hypervisor/full-OS camps. Google evolved process containers that were formed into a Linux concept called (the aforementioned) cgroups (control groups), which provision many of the characteristics of how code might execute inside of a machine’s resources sharing the same operating system kernel.
In addition, cgroups enable characteristics--such as the size of memory that can be used, the amount of CPU time allowed, file system resources, disk/storage resources and network characteristics to be controlled--in a single command. Cgroups put barriers around a desired application's resource usages, while providing a method to monitor how much of those resources are being used. The same concept was applied to hypervisors, so that one virtual machine wouldn’t dominate the resources of its host to the detriment of other processes. The concept of cgroups provided boundaries and was forged into the Linux kernel.
Namespaces further grouped objects as an entity, keeping each object demoted so that it couldn’t touch other running processes yet could still do work and be identified as a unique entity, still sharing the same Linux or BSD kernel. Namespaces can have child namespaces, providing a way to isolate characteristics of the object, child object and grandchild objects from each other. By isolating the processes, work can be performed by child processes without interfering with parent or other child processes. Networking, inter-process communications and, importantly, processes (assigned IDs by Linux) can be promoted and demoted. In addition, their accessibility can be controlled by how the namespace is defined and deployed.
LXC (LinuX Containers) furthered the concept of multiple concurrent operating system use by conceptualizing a demoted user/group privilege concept using chroot, controlled by the concepts of cgroup resource limitation. LXC was less like a hypervisor and was focused squarely on Linux. There were a few attempts to host Windows, but hypervisors provided kernels better suited to the resource needs of the Windows OS.
LXC could create entire instances of shared kernel operating systems environments for software workloads, done in a single commandline string; in turn, those instances could do work without knowledge of, and little chance of disturbing, other concurrent instances of operating system resource requests and their associated workloads. The operating systems instances could be "minimalized," in terms of the software installed, to suit the needs of applications.
With LXC, many workloads could be stuffed into the same server box--running in an autonomous yet resource-confined way, and subject to the administrative control of desired settings.
Sewing these concepts together became the order of the day, and several control packages emerged to do the heavy lifting of managing workload cgroups and namespaces, communicating among workloads within these hierarchies of objects, and cloning these container elements. Think of these frameworks--along with their use of CPU, memory, storage, networking and linkage--as stacks.
Several container stacks emerged at this point, circa 2012 to 2013, including Docker.
Docker put all of the settings and control planes into a portable image format that has consistency, and specified basic image interactivity, networking, storage and relationships. Docker provides a common set of interfaces and controls to load containers and make them communicate with each other (perhaps in a hierarchy). Docker does this while keeping workloads in isolation from each other to the extent desired and mediating how containers use resources (CPU, storage, networking). The system also manages elements of security and configuration, providing a portable file format to package all of the components together into a single Docker image.
Docker is extraordinarily easy to use: After you download the Docker system for a given operating system host, a pull request (download) is made from a repository of Docker images. There are hundreds of thousands of Docker images available, providing the components to perform everything from database work to running a Wordpress website to running an instance of an operating system. Docker images come in hundreds of spoken languages, customizations and variants. Generally, Docker images must be free and are almost always open source. The images can be easily customized for reuse.
Images are typically small, as Docker provides the gateway to host resources; limitations on resources for any container or group of containers can be administratively controlled and monitored.
The Docker Container Security Problem
However, with all of this variety and customizability there is also an issue: The provenance of Docker images was often unknown in the early days, unless each element of a container was assayed. This is pretty much impossible for the average human, so the trustworthiness of images was an enormous concern. Part of the trust is that the patch/fix/update of images can vary widely, causing great concern for potential image element vulnerability. These vulnerabilities can cause containers to quit, exfiltrate data (where compromised), or cause erratic or infectious behaviors.
With the rise in popularity of container development, more and more organizations started producing “official” images of Docker-format base instances. Canonical, for example, provided a number of base images that were a known quantity, in terms of component provenance. Alterations of these images could be made for basic use-case profiles, such as images for database, web and LAMP stack.
The source vendor and developer-sanctioned images have fueled Docker popularity. For example, images created for different spoken language sources became further variants of official repositories of images, and many of these variants found their way into the “official” Docker repository.
Still, all was not well, as there was no toolbox mechanism to cite an image’s composite integrity. Rather, this had to be done outside the auspices of Docker’s runtime control. What this meant in real terms is that, although an image might be dubbed “official,” it needed a signature to verify that the contents hadn’t been altered. Alteration could indicate configuration changes, different software versions than expected, or even the insertion of malware or onerous configuration file changes.
Using even "official" images might have been a recipe for potential disaster back then, as the chain of authorities for the integrity of images wasn’t easy to determine and had to be administrated by the user.
Meanwhile, Docker itself ran as a root process inside of its host. Protected in several ways, the Docker engine used the scheduler of its host to instantiate, run, modify and kill images--the entire lifecycle of a Docker deployment. Could Docker be controlled by the images it spawned? A different school of thought emerged.
Stay tuned for Part II: Docker competition improves the core of Docker, but also makes the Docker organization evolve security, too.