Adding Guardrails to your Kubernetes Stack

Resource limits and requests

Published in

Dev Genius

5 min readJul 22, 2020

Deployment Resources

This series will walk you through how to improve your stack’s overall resilience. In this section, we will do this by making Kubernetes (k8s) react to resource consumption by providing resource requirements.

Introduction

I have often noticed teams run into stability problems because they don’t know how Kubernetes (k8s) performs resource management. Since a lot of k8s’ resource management features won’t kick it when you don’t provide it with information, k8s end up flying blind.

There are two main ways k8s manage resources. First is resource allocation, this is your classic bin-packing problem. How do I put as many items of volume x,y,…..n into a bin with volume z. The second is resource utilization. This is how k8s adjust the number of nodes to meet the current resource utilization. Scaling up if the cluster is lacking and down if there is a surplus. Both these features require the resource definition in the container configuration to work.

Kubernetes is not a magical system. It doesn’t know how much resource your application needs if you don’t tell it. By specifying resource requests and limits on your deployment/ pod, it is helping your cluster can properly bin-pack your app onto nodes. Official documents

Resource Requests:

Setting a resource request tells k8s what baseline resource your app needs to run. This informs k8s how to pack your app within the cluster based on what resource is available (Available resource on node ≥ your app’s requested resource).

Example (line 21–24):

1 vCPU/ Core is equal to 1000m (m stands for millicpu). CPU is defined by how much CPU time you get. By default, it is based on periods of 100ms (milliseconds). So 0.5 vCPU equals 50ms of CPU time every 100 ms.

Memory is defined in bytes and denoted by suffixes: E, P, T, G, M, K or the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki

So in the example application, the app needs 0.25 of a vCPU and 500 megabytes of memory. K8s will look at the nodes in your cluster and find a place where it can fit this app into and schedule it there.

Once you specify resource on your containers, four very critical features are enabled.

1. K8s will now schedule your apps appropriately into nodes based on resource availability. Each node has a finite allocable CPU and memory, with each container those resources are consumed. Without resource request or limit, k8s has no idea what resource each app needs, and k8s may pack too many apps onto a node causing nodes to get overwhelmed.

2. Enables the ability to use Horizontal Pod Autoscaler (HPA). Horizontal pod autoscaler is an application provided by the k8s community to scale pod based on the target utilization of specific resources. By default, CPU and memory are supported but can be extended to custom metrics. HPA will also need a metrics server to work. This helps your applications scale up and down so that each pod won’t over-utilize resources.

3. Enables the ability to use Cluster Autoscaler. Cluster autoscaler is also an application provided by the k8s community. When the resource requested in the cluster is greater than the available resources of your cluster, the cluster autoscaler kicks in and adds nodes to your cluster. This usually happens when a pod is stuck in a “Pending” state due to insufficient resources.

4. Worker node reserves are respected. Worker node reserves act like deployment request but for worker node resources like kubelet and the OS. K8s will allocate pods onto a node until there is no allocable resource left. If your pods are not using requests, K8s has no idea if it is over-allocating which can lead to node instability since worker node resources may be consumed by your applications.

Limits

Limits bound how much resources your application can consume. Memory limits behave similarly to have only x amount of memory on your computer. From the app’s perspective, the amount of usable memory will be the limit you set on your container definition. In addition to this, if your container decides to consume greater than the limit set in the deployment, k8s will make your pod a candidate for out of memory killing (OOM killed). As explained earlier, CPU resource is based on time slices> So when your app needs more CPU than the CPU limit during a single period, it would get throttled till the next period.

Be extremely cautious with CPU limits. CPU limits can affect your application if set inappropriately. From my experience, most metering tools are not precise enough to accurately track CPU consumption. So it is possible that your app is getting throttled due to your limit but it doesn’t reflect in your metrics. Setting a CPU limit is also especially hard on applications that spawn many threads, like most HTTP services that use blocking IO thread per incoming request. For example, you set your limit to 0.5 vCPU on a node that has 10 cores and your application runs 10 threads. This means that if 10 threads are spawned, each thread will only get 0.5/10 vCPU of time (50ms), assuming all 10 threads run at the same time. This translates to a total processing time of 50ms before you run out of CPU for your application.

CPU limit is hard and is worth a deep dive. I highly recommend this article which explains it in much more depth: https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b

Example (Line21–24):

Summary

Setting both Requests and Limits on your deployment will provide k8s with basic information to appropriately place your apps in the cluster to maximize efficiency and prevent your apps from overwhelming the nodes. This will greatly increase your cluster’s resilience against erratic load and buggy apps that over consume resources. It will also force you to better understand your app’s resource needs.

Part 2: Pod Quality of Service and Priority
Part 3: RollingUpdate and Pod Disruption Budgets
Part 4: Horizontal Pod Autoscaler and Cluster Autoscaler