Guide to Kubernetes Autoscaling

Autoscaling enables you to rightsize your Kubernetes resource allocations to strike the right balance between performance and cost. Here's a look at the three types of Kubernetes autoscaling.

Christopher Tozzi, Technology analyst

October 14, 2022

6 Min Read
Kubernetes wheel amid code

Kubernetes autoscaling is a great way to ensure that your clusters always have sufficient resources to host your workloads, even if you're not managing resource allocations manually.

However, configuring autoscaling in Kubernetes can be challenging because there are multiple types of autoscaling features — and depending on how you deploy your clusters, they may or may not all be available to you.

So, let's take a look at which types of autoscaling are available in Kubernetes, the differences between Kubernetes autoscaling, and how to use each type of autoscaler.

What Is Kubernetes Autoscaling?

Kubernetes autoscaling is a set of features designed to ensure that workloads in a Kubernetes cluster always have sufficient resources to run properly. At the same time, autoscaling aims to avoid overprovisioning workloads in ways that could waste resources by assigning them to workloads that don't actually need them.

By taking advantage of autoscaling, then, you can automatically rightsize your Kubernetes resource allocations to strike the right balance between performance and cost.

You always have the choice of managing Kubernetes resource allocations manually, of course. But by setting up autoscaling, you allow tools to make adjustments automatically, in real time. That saves you a lot of effort, while also helping to ensure that your resource allocations are continuously and immediately adjusted based on shifting requirements.

Related:5 Reasons Why Kubernetes Is So Challenging

Types of Kubernetes Autoscaling

There are three types of autoscaling available in Kubernetes:

  • horizontal pod autoscaling

  • vertical pod autoscaling

  • cluster autoscaling

1. Horizontal Pod Autoscaling

The horizontal Pod autoscaler creates additional pods within a cluster to support increased demand for those pods. When you add more pod instances, your workload can handle more requests because there are more pods available to process them.

How to set up horizontal pod autoscaling

Horizontal pod autoscaling is a native Kubernetes feature, so you can use it in any cluster.

That said, horizontal autoscaling is not enabled by default. You must explicitly configure it by first creating a Kubernetes service, then using kubectl to configure autoscaling for that service.

For example, here's a command (borrowed from the Kubernetes documentation) that configures horizontal autoscaling for the php-apache service:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

For the horizontal Pod autoscaler to work, you need enough nodes in your cluster to host additional pods. If you don't have any free capacity on your nodes, there won't be anywhere for the autoscaler to deploy additional pods. Cluster autoscaling (which we discuss below) can help ensure that there are always enough nodes available, but it's a separate feature from horizontal autoscaling.

Related:The Late, Great K8s? 3 Reasons Why Kubernetes' Future May Be Bleak

2. Vertical Pod Autoscaling

When you autoscale pods vertically, you automatically modify the CPU and memory resources allocated to an existing pod without increasing the number of pods available. Vertical pod autoscaling is another way to accommodate changes in demand for a workload, but without changing the total number of pods deployed.

Vertical pod autoscaling works by automatically changing the limit and request settings for pods. Limits and requests govern how much CPU and memory pods can consume. By automatically estimating how many resources a pod actually needs, then changing its limits and requests accordingly, the vertical pod autoscaler tries to ensure that pods have optimal resource availability.

Of course, like horizontal autoscaling, vertical pod autoscaling only works if the necessary resources are actually available in your cluster. If you don't have enough nodes, or your nodes lack enough available memory or CPU, there won't be resources available to autoscale your pods.

Setting up the vertical pod autoscaler

Vertical pod autoscaling isn't available by default in most Kubernetes distributions, but you can install it from GitHub. Some managed Kubernetes services also include built-in vertical autoscaling features, so check your provider's documentation.

Once the vertical pod autoscaler is installed, you configure and apply a Deployment to enable it. For example:

apiVersion: VerticalPodAutoscalermetadata:  name: my-app-vpaspec:  targetRef:            apiVersion: "apps/v1"            kind:   Deployment            name:              my-app  updatePolicy:            updateMode: "Auto"

3. Cluster Autoscaling

Cluster autoscaling changes the number of nodes within a cluster. Increasing the total node count can also help respond to changes in workload demand, although it works in an indirect way because you would also need to deploy additional pods to take advantage of the nodes you've added. On their own, more nodes don't automatically increase the capacity of your workloads.

Cluster autoscaling can also help reduce total node count when demand for your workloads decreases, which may in turn save you money (since the cost of Kubernetes hosting typically reflects the number of nodes you run).

Cluster autoscaling is not built into Kubernetes by default, but an open source cluster autoscaler is available on GitHub. Note, however, that the cluster autoscaler only works with certain public clouds, such as Amazon Web Services (AWS) and Google Cloud. That means that you can't use the cluster autoscaler to change node count for a Kubernetes cluster that you host in your own data center. It typically works only with managed Kubernetes services, like Amazon EKS and Google GKE.

How to set up cluster autoscaling

The steps for setting up cluster autoscaling vary depending on where you are running Kubernetes. But in general, you must first install the cluster autoscaler from GitHub with a command like the following:

curl -o cluster-autoscaler-autodiscover.yaml

You may then need to perform some configuration steps related to your Kubernetes service. Check your provider's documentation for details.

Finally, you can configure the autoscaler itself by opening the config file with:

kubectl -n kube-system edit deployment.apps/cluster-autoscaler

Here, you can modify the values that define how the cluster autoscaler works.

Which Type of Kubernetes Autoscaler Do You Need?

In Kubernetes, the horizontal pod autoscaler is the easiest to configure and use, and it will suffice in most cases for ensuring that your workloads can keep up with changes in demand. However, vertical pod autoscaling may serve as a more efficient alternative in some cases because it doesn't require you to add pods to (and therefore increase the complexity of) your cluster.

As for cluster autoscaling, it's a powerful feature for rightsizing the total nodes available within your cluster. But in general, cluster autoscaling only works with managed Kubernetes services hosted in public clouds, so it may not be an option for you.

About the Author(s)

Christopher Tozzi

Technology analyst, Fixate.IO

Christopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like