In many companies, moving to the cloud means shifting existing VM or server workloads from on-premise data centers to AWS, Azure, GCP, or other cloud vendors. The implications for cloud security architecture and architects: protect VM workloads adequately! Containerized and serverless applications might be the future, but VMs are today’s reality – though with one tricky twist. VMs are soon the “new mainframes”: business-critical and out of fashion.
Architects better design and implement holistic VM protection in the current phase. Soon, funding might dry up quickly and shift to containers and serverless. Thus, the following article provides an overview of the various facets of securing VMs in the cloud, after putting VM workload security in the broader context of cloud security. The discussion is cloud-vendor agnostic. Concrete examples and technologies come from the world of Microsoft’s Azure and the Google Cloud Platform (GCP).
VM Workload Protection: Context and Big Picture
VM (workload) security looks at how to protect the VMs on which the application workloads run. Protecting VM workloads is one aspect of cloud security but not an isolated, self-sufficient task. IT organizations also need network security (including firewall topics and denial of service protection), identity and access management, backups, and business continuity management (Figure 1). Network security prevents attackers from reaching the actual VM; identity and access management are about authenticating and authorizing human and technical users accessing VMs and applications running on them. Backups and business continuity are hot topics to be ready for operational mistakes or successful hacker attacks.
Figure 1: Securing VM Workloads in the Public Cloud - The Big Picture
When looking specifically at VM security, measures fall into two categories:
- Creation-time hardening actions, i.e., tasks when setting up a VM
- Ongoing activities when VMs are up and execute application workloads.
Figure 2 provides an example of a VM lifecycle from a security perspective. A VM is created with a secure initial setup (green). Later, a new vulnerability emerges on the OS level. The risk for successful attacks increases (yellow). A hotfix allows the cloud operations team to fix the vulnerability (green again). Then, a successful attack: a virus compromises the VM (red). IT security detects the infiltration and removes the malware (green again). A later configuration change by engineers turns out to be a security risk (yellow). It is a typical example of a configuration drift that must be noticed and fixed.
Figure 2: Security-related Events during a VM’s Lifecycle
Preventive Measures at Creation Time
Doing things right from the beginning is a pearl of wisdom also applicable when securing VMs. The initial configuration of the VMs at creation time matters – and the term “hardening” subsumes security-related actions and activities to reduce the attack surface. Choosing an up-to-date operating system version should always be the first priority. The Azure portal asks engineers to select an operating system image when creating a VM (Figure 3, A). It was never easier to deploy the latest available build, for which it is safe to assume that it incorporates fixes for (most) known vulnerabilities. The Azure portal is, however, primarily for experimentation. Application teams should always set up their VMs using templates – a topic we discuss later – through which engineers can specify to use always the most recent version available at deployment time. In other words: no one has to check again ever whether newer versions become available. Plus, there is no need to modify and update templates.
Figure 3: Creating a Virtual Machine in Azure - Choosing an Operating System Image (A, B, C) and Security Type options
Besides the Microsoft-provided images, cloud engineers can opt for one available via the Azure Marketplace, e.g., ones implementing the CIS security benchmark (Figure 3, B). Another option is using organization-internal images. They are vital for IT organizations with sophisticated server engineering. Larger organizations tend to create their own images incorporating 3rd-party agents, tools, and solutions. They want to optimize and harmonize, e.g., vulnerability and patch management, anti-malware solutions, or endpoint data loss prevention – a hot topic in multi-cloud and hybrid cloud-on-prem application landscapes. These company-specific images help their engineers set up VMs based on company standards and integrate easily into the organization’s security and application landscape.
Similar to Azure, GCP allows cloud engineers to create basic VMs with GCP-provided images, VMs based on templates and images, and from the GCP marketplace (Figure 4).
Figure 4: Creating plain vanilla or image-based VMs in GCP (left, middle) respectively shielded VM settings (right).
On top, GCP offers the possibility to choose a Shielded VM option, a simple one-click configuration. Shielded VM is a feature protecting against malware and rootkits that run before or in parallel to a Linux or Windows operating system. Rootkits hide from classic malware solutions such as anti-virus software. Anti-malware software runs within the operating system. Thus, the anti-malware software struggles to identify malware running in parallel but outside the scope of the operating system or that ran before the operating system started.
On a technical level, a Shielded VM enforces that only software signed with a Google Certification Authority certificate runs when the operating system starts the various components and drivers. Plus, the feature makes a fingerprint (aka hash) of the components after their startup to identify variations compared to a previous baseline (note: when the startup routines change, this requires updating the baseline). Azure provides a similar concept named Trusted Launch Virtual Machine (Figure 3, C). However, its usage has implications for license requirements and costs. Both, Trusted Launch and Shielded VMs are options to be selected at creation time, and both improve the VM security during the whole VM lifecycle.
Enforcing The Creation of Secure VMs
Would you prefer to be a circus director making ten dolphins jump through a burning ring – or a CISO that has to ensure that three cloud engineers set up all VMs correctly during a whole year? I would immediately choose the latter option if – and only if – the organization has two concepts in place: templates and policies.
While GUI or console deployments are possible, most organizations prefer “infrastructure as code,” typically combined with a declarative specification of resources. Templates (or VM images) are blueprints for setting up a VM correctly – including security-related configurations and agents for monitoring the VM. Engineers write configuration files describing all the VMs, network components, databases, or other needed resources. Then, the cloud creates the resources as specified. The cloud vendors provide cloud-native tools for this purpose, such as the GCP Deployment Manager and Azure Resource Manager (ARM). However, many IT organizations opt for third-party solutions such as Terraform. They want to reduce their cloud-vendor lock-in.
Templates ease deploying standardized resources but do not enforce their use. Engineers can still set up highly insecure VMs. So, how can a cloud platform team ensure that (nearly) 100% of all VMs have the correct configuration? The solution: policies. Policies compare a state defined in a template or the state overserved in reality against a desired state. When Azure Policies identify a non-compliance at the creation time of a resource, a policy can report the non-compliance (“audit”). Alternatively, the policy can try to fix the creation command (“append” and “modify”) or let the creation of the resource fail (“deny”). These are the main effects policies can have. Figure 5 illustrates the concept of policies and highlights the lines related to (potential) policy effects.
Figure 5: Excerpt from a predefined Azure policy with general metadata (A) and a list of effects allowed to be defined (B) when assigning the policy. Additional details that follow are the exact rules of the policy.
The main situations when Azure evaluates policies are, first, at the creation or during an update of a resource, second, during periodic evaluations every 24 hours, and, third, when initiated on-demand. In the context of periodic reevaluations, policies have an “audit” effect, even if defined as “deny” or “modify.” They do not change, stop, or shut down an existing, already created resource. They just report non-compliance. Still, that is very helpful, e.g., if hardening requirements change. At least the cloud platform management and IT security know about the resources not fulfilling the new requirements – and can address them in a dedicated process or project.
While the exact details differ, the corresponding concept to an Azure Policy in the Google world is a GCP Organization Policy.
OS Patch Management in the Cloud
A securely configured and started VM is just the beginning. Keeping the VM and the application workload on the VM secure is an ongoing challenge and not a new one due to the cloud. Securing the workloads requires, as in the past, securing the operating system and the applications. Typical tasks are patch management, vulnerability scanning, and malware protection. Patch management helps close vulnerabilities caused by software bugs, but not by configuration mistakes. Malware protection means finding viruses (or other malware) running on the VM. Vulnerability scanning identifies insecure applications or application configurations that allow attackers to get into the systems.
All security tools from the pre-cloud environment (should) work as well in the cloud for VM workloads. Thus, the architectural question is: Should you continue using (just) your old tools? Can you improve your efficiency by replacing old tools with cloud-native technologies? Or should you move your old tools to the cloud and complement them with cloud-native options?
The GCP VM Manager helps with patch management when using GCP-provided Windows and Linux images. It identifies VMs that need to patch the OS and ensures the actual patching. Azure Update Management is the Microsoft equivalent, helping to identify out-of-date OS versions and deploying patches at scale. However, a new option beyond the traditional patch and update management in the cloud context is “killing” VMs with outdated operating system versions and redeploying a new VM with the most recent image and patch level. Also, the cloud solves one big patch management challenge once and forever: building up an accurate and complete inventory. Resource inventories and network topology in the cloud are just a click away. Cloud vendors know all resources they run (basically: they make their money by charging for them), not only the ones in a more or less accurate and up-to-date configuration management database. They typically provide reports via GUIs and APIs.
Just to clarify the focus of patch management in contrast to our next topic, vulnerability management. Patch management is about precision and discipline. Bookkeepers and nitpickers love such work. A new patch comes out? Make sure engineers deploy it to each and every system, better yesterday than in a month. In contrast, vulnerability management is more of a detective game, and mitigation is more company-specific.
Vulnerability Management in the Cloud
Vulnerability management covers identifying vulnerabilities and misconfigurations independently of the vendor, including looking at company-specific code and integrations. Scanning source code and images is a popular option, pen-testing another one. Both measures are known from the pre-cloud areas. Thus, companies can move their tooling from their pre-cloud on-premise world to cloud VMs and have everything solved.
Reuse of existing solutions and licenses, however, might not be in the mid-term business interest of cloud vendors. They aim for a bigger piece of everyone’s IT (security) budget and lure their customers to cloud-native solutions. Once upon a time, Microsoft coupled their Internet Explorer with Windows, pushing competitors out of the market. Today, cloud providers do not give security features for free. Most are pretty costly. However, they are intriguing for IT organizations because of their tight integration into the cloud ecosystem and the absence of upfront investments. They follow a pay-as-you-go model. It is perfect for most but large enterprises, especially for IT organizations not having a solution in place yet.
Two solutions, one in Azure and one in GCP, are concrete manifestations of the trend of cloud providers trying to become a one-stop shop for IT departments in the area of application security. Google offers the Web Security Scanner, an automated pen-testing solution for applications running on VMs, serverless, or as containerized applications. It actively navigates through web pages to find vulnerabilities and potential security issues. Azure has the Microsoft Defender product suite, which they market as their cloud posture management and workload threat protection solutions. Customers can switch on the Microsoft Defender for various resources, including their VMs, and for which Microsoft integrated one of the leading vulnerability detection solutions on the market, Qualys.
Recommendations and Alarming
Once applications are live on the cloud, security dashboards of the cloud providers are essential such as the GCP Security Center or Azure Defender (Figure 6). They collect risky configuration options and alarms, allow benchmarking against industry standards, and make engineers aware of anomalies and potential attacks. Policies – either from the cloud vendor or developed by the customer – form the conceptual and technical foundation.
Figure 6: Microsoft Defender for the Cloud. Menu for selecting, e.g., recommendations to improve the security or dedicated alarms (A) and pane with scan results and recommendations (B).
Indeed, there is no need to use these tools. Everybody can continue using and extending their current security dashboards or building new dashboards with consolidated feeds from various clouds. However, the tools mentioned are highly efficient, especially in the early phase of building a new cloud environment. They are there from second zero. Larger organizations might want to understand the costs in detail, especially if they have more extensive sets of VMs. However, many companies benefit from de-facto outsourcing specific tooling to cloud vendors. So, setting up VM security is not just some engineering task but also relates to IT, cloud, vendor management, and security strategy!