Introduction to Kestra, a Declarative Orchestration Alternative to Apache Airflow

Event-driven open-source data orchestrator with a built-in code editor and a rich plugin ecosystem

Anna Geller
Dev Genius

--

This post shares the story behind Kestra and explains how you can use it to orchestrate data processing workflows in a declarative way.

What is Kestra

Kestra is an open-source, event-driven data orchestrator that strives to make data workflows accessible to a broader audience. The product offers a declarative YAML interface for workflow definition in order to allow everyone in an organization who benefits from analytics to participate in the data pipeline creation process.

User Experience

Kestra UI automatically adjusts the YAML definition any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is defined declaratively in code, even if some workflow components are modified in other ways.

Adding new tasks and modifying task properties from Kestra UI automatically adjusts the YAML definition.

The topology view (aka DAG view) is live-updating as you type. The documentation with examples and explanations is embedded next to the code editor, helping onboard new users as the team grows. The only requirement to participate in workflow development is a basic familiarity with Kestra’s key components, explained in this video published on Kestra’s YouTube channel.

The story behind Kestra

Kestra was created due to the challenges that Leroy Merlin faced when adopting Apache Airflow:

  • The platform didn’t scale from a developer productivity standpoint. What data engineers used to accomplish within two minutes took them 20 minutes when using Airflow.
  • Airflow didn’t support API- and event-driven workflows.
  • The tasks were failing not because of business logic but due to the orchestrator failing to handle the workload.
  • Managing Python environments and dependencies introduced a barrier to entry for BI engineers, mostly proficient in SQL.
  • The platform wasn’t the right fit for the needs of their data organization, which was looking for more structure and team-level isolation to manage sensitive data.

As a result, Ludovic Dehon has started building Kestra with the goal of providing a more scalable, performant, and easier-to-use orchestration solution. Several early adopters have given overwhelmingly positive feedback when using it in production deployments, and the project was open-sourced in February 2022.

How does Kestra compare to other orchestration solutions

Here is how Kestra’s creators described the product’s differentiation as compared to existing alternatives:

“From our experience, the orchestration logic of data pipelines can be configured via a simple YAML interface as long as it’s built on top of API-first design with a reliable and scalable architecture, a well-supported ecosystem of integrations (plugins), and UX with autocompletion, code validation, and templating features. Many design patterns popularized by Airflow and used by its successors might be difficult to maintain over the long run due to performance at scale and Python dependency management requiring users to rebuild container images for every change.”

While Kestra’s orchestration logic is defined in a language-agnostic YAML syntax, your business logic can be written in SQL, Python, R, Rust, Bash, Node.js scripts, or any other language you’d like to use. Such custom scripts are isolated on a task level using process or docker runners. Apart from those options, there is a wide range of plugins that provide additional integrations.

The common question: why YAML rather than Python?

Python is an incredibly versatile language, but it comes with tradeoffs you need to consider based on your requirements. Supporting the definition of an orchestration logic only in a given programming language introduces a barrier to entry for engineers working on a different tech stack (e.g., C# developers, front-end engineers) and less-technical users.

At the moment, Kestra supports Python-based workflows either via Python, Bash, or Git plugins (running in isolated processes or containers per task), but the platform itself is built in Java and JavaScript for better performance. Kestra can orchestrate Python scripts or applications, not Python functions. If you need fine-grained visibility into what’s happening within your Python script, you can send that information as metadata using metrics and outputs, helping you decouple your orchestration logic from business logic.

Treating the YAML configuration as an important component rather than an afterthought

Engineers who treat Apache Airflow deployments seriously often build a config system on top of it, e.g., in the form of templated YAML files. This way, the end users don’t need to write a boilerplate configuration in Python DAGs.

With that approach, YAML configuration is the interface exposed to end users. If you approach this problem from first principles, this YAML definition shouldn’t be an afterthought but an important part of the product. Kestra treats it as such and provides a great user experience to write and modify the workflow definition by offering auto-completion, syntax validation, versioning, embedded documentation, and a live-updating DAG view.

Core concepts in Kestra

Flow

The main component of every Kestra workflow is a flow that can be:

Interface to create or import a flow from Kestra UI

Hello world flow

Here is an example flow logging a “Hello world!” message to the console:

Hello world flow in Kestra

This example shows three required attributes of every Kestra flow: id, namespace, and tasks.

  1. The id represents the name of the flow.
  2. The namespace is used to provide logical separation, e.g., to separate development and production environments. Namespaces are like folders on your file system — they organize flows into logical categories and can be nested to provide a hierarchical structure.
  3. Tasks are atomic actions in a workflow. By default, all tasks in the list will be executed sequentially, with additional customization options, a.o. to run tasks in parallel or allow a failure of specific tasks when needed.

Triggers and inputs

Apart from these three required attributes, there are several optional components, including triggers and inputs.

  • Triggers are used to define when flow should run. Kestra is event-driven, and flows are triggered based on events. Examples of such events include a regular time-based schedule, an API call (webhook trigger), ad-hoc execution from the UI, and custom events, including a new file arrival, a new message in a message bus, query or flow completion, and more. Flows can also be triggered from other flows (subflows), allowing a highly modular workflow development.
  • You can pass runtime-specific variables to a flow using inputs. Inputs are strongly typed and can be either required or optional.

For a video explanation of key concepts and a hands-on demo, check the video linked below.

Give it a try

The best way to understand a product is by trying it out. You can download Kestra’s docker-compose file using the following command:

curl -o docker-compose.yml https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml

Then, run docker compose up -d and navigate to the UI using http://localhost:8080/. From here, you can start building your first flows.

Install and start Kestra locally

Next steps

Kestra strives to simplify data workflows, open them to a broader audience, and save development time. If you have questions, Kestra has a small but active Slack community. And if you want to support this open-source project, give Kestra a GitHub star.

--

--