Load testing Microservices with K6

Published in

Dev Genius

5 min readFeb 17, 2024

There are different strategies to test the load that will be put on a service. The two most common are load testing and stress testing. The former determines how your service will behave during normal and peak load conditions, the latter determines how your service will behave beyond normal or peak conditions.

Load testing is necessary for services that need to go into production and handle a lot of users—gaining confidence in your changes by testing in a live-like environment (usually staging).

Prerequisites

To properly execute load testing you’ll need to understand your system and its dependencies. In our case, we have an external dependency that our service relies on.

Since we want to test the Data Parsing Service in isolation, we don’t “care” about the reliability of the third-party platform and we can assume it can handle the load. Before our tests, we created a test harness in place of the third-party platform so we could stub the response and focus on testing the Data Parsing Service.

In place of the App, we would be running a K6 script that simulates X virtual users of the service.

In addition, we need to understand the limitations of the system itself. Our service is a docker container that runs on AWS Elastic Kubernetes Service (EKS). The Kubernetes cluster will handle the load balancing for us but is rate limited to around 1–2k requests per second to protect us from things like DDoS attacks. You can work around these limits for testing purposes, but we decided not to since 1–2k requests per second will suffice for our service.

Writing the K6 script

We decided to use K6 with Grafana cloud so we could visualise the metrics from running our load tests. The script is usually run locally, then pushing the metrics to Grafana cloud so we can analyse the results.

Grafana Cloud has plenty of integrations with third party software, for instance; Prometheus, InfluxDB and OpenTelemetry. You can push your metrics to any of these and visualise them in Grafana Cloud.

import http from 'k6/http';
import { check } from 'k6';
import { Trend } from 'k6/metrics';

// Trend to track failure rate
const failureRate = new Trend('failure_rate');

export const options = {
  // A number specifying the number of VUs to run concurrently.
  vus: 50,
  // A string specifying the total duration of the test run.
  duration: '5m',
  ext: {
    loadimpact: {
      // Project: Default project
      projectID: 12345,
      // Test runs with the same name groups test runs together.
      name: 'Test (13/02/2024-08:50:45)'
    }
  }
};
export default function() {
  const response = http.batch([
    ['GET', 'https://data-parsing-service.aws.com/game/config1', null, {  }],
    ['GET', 'https://data-parsing-service.aws.com/game/config2', null, {  }],
    ['GET', 'https://data-parsing-service.aws.com/game/config3', null, {  }],
]);
  const checkRes = check(response[0], {
    'status is 200': (r) => r.status === 200,
  });
  failureRate.add(!checkRes);
}

The K6 script above has the following things:

vus — number of virtual users to simulate set to 50
duration — the time the tests should run for
projectID — this should match with your projectID in Grafana Cloud
failureRate — to check the HTTP response status of the requests

To run the script you can use this shell command

k6 run --out=cloud script.js

I have added links to the documentation at the end, detailing how to setup your K6 script and hook it up to Grafana Cloud.

Determining if your testing was successful

If everything was hooked up correctly, your metrics should now exist in Grafana Cloud, ready for analysis!

You can see from the Load test results, that in the 5 minutes the script was running, we made 391k requests, peaking 2.2k req/s, with a latency of 189ms. Although the latency is high, the average latency was 104ms which is more acceptable. Since we went over the aforementioned rate limit, I believe the rate limit came into play, therefore slowing down the response time.

Although not shown in the image, the minimum latency was 24ms which is a good response time. As you can see by the red Failure Rate line on the graph, there were no unsuccessful requests — denoted by 200 Vs non 200 response codes.

The service itself is running in K8s and we already have a dashboard in Grafana that gives metrics around how much compute our pod is using. We can cross reference this with our tests to get a better picture of how our service performed.

CPU Usage Vs Requested

The CPU requested was consistently below half the CPU allocated to the pod. This is a good sign and leaves redundant CPU and can likely handle more requests than we simulated in our testing.

RAM Usage Vs Requested

The total RAM allocated to the pod is 500 MiB but at peak only 50 MiB was used, again, leaving most of the RAM redundant.

CPU Throttle

The CPU throttle was high to start then flattened out. In K8s each pod is given a budget and within that, each request is given a time slice of the budget. When the quota is hit, the pod stops scheduling until there is a new quota.

Overall we are happy with the results and believe that our service can handle the expected load.

The future

As you have seen, this can be quite a manual process and is left to the developer to execute these tests. Similarly to unit and integration tests, we would like to run the load tests on every PR so we can have more confidence when making changes to our microservices. We can then assert a max / average latency for the tests, failing the PR checks if they’re outside of this threshold.

Links