Deploying Airflow on GKE or EKS

Airflow is a popular open-source tool for writing, scheduling, and monitoring workflows, particular complex pipelines for moving data into warehouses.

There are three main options for deploying using Airflow in production. The tradeoffs of the first two are well-summarized in this Hacker News comment:

Managed Airflow Scheduler on AWS with "large" size costs $0.99/hour, or $8,672/year per instance. That's ~ $17,500 considering Airflow for at least non-prod and prod instances.
Building it on your own on same size EC2 instance would cost $3,363/year for the EC2. Times two for two environments, let's say $6,700. $4,000 if you prepay the instance.
That looks way cheaper, but then you have to do the engineering and the operational support yourself.

The third option, deploying your own Airflow instance on Kubernetes, has achieved some consensus as the right way to go. The problem is that Airflow is a fairly complicated stateful application, with a SQL database and a Redis cache, which makes for a tricky setup.

In this post, I'm going to show you how the process can be simplified using Plural. Plural is a tool that makes it easy to deploy open-source projects applications on managed Kubernetes.

Plural configures Airflow properly configured on top of GKE or EKS, sets it up with an appropriate postgres instance (using an already integrated postgres operator), and ensures that it's plugged in to our observability/support/upgrading/dns systems.

The upshot is that you get the ease and experience of a managed service, with the nice price point on doing-it-yourself.

Airflow Plural Installation

  1. Sign up at app.plural.sh and do some setup

a) Create an account at app.plural.sh

2. Install the Plural cli and some dependencies

brew install pluralsh/plural/plural

You'll also want to make sure that you have chosen and enabled a cloud provider (gcp or aws) and installed its cli.

3. Create a new Git repo to store your Plural installation in and initialize the repo

a) Create a new Github repo

b) Clone the repo on your desktop

git clone <ssh-url-of-new-github-repo>

c)  Initialize the repo for Plural

# navigate to my-plural-demo-repo
cd my-plural-demo-repo

# initialize the repo for Plural
plural init

This will ask you to select your cloud provider and some cloud provider configurations. It will record that information in a workspace.yaml file.

4. Install the airflow plural bundle for your cloud provider of choice, so either

plural bundle install airflow gcp-airflow

or

plural bundle install airflow aws-airflow

Plural cli will ask you a few questions to configure Airflow and its dependencies.

    configuration:
    - name: vpc_name
    type: STRING
    documentation: Arbitary name for the virtual private cloud to place your cluster in, eg "plural"
    - type: HELM
    name: bootstrap
    configuration:
    - name: pluralDns
    documentation: use plural as your dns provider
    type: BOOL
    - name: txt_owner
    documentation: Arbitary name for externaldns to use to track ownership of dns records, eg "plural"
    type: STRING
    default: plural
    condition:
    operation: NOT
    field: pluralDns
    - name: dns_domain
    documentation: Top level domain to deploy Plural applications to, eg topleveldomain.com. This might also look something like plural.existingcompanydomain.com if you have an existing company domain and you want to aggregate all the plural resources under a single subdomain.
    type: STRING
    - name: ownerEmail
    type: STRING
    documentation: Email to be used for certificate renewal notifications
    • vpc_name (use arbitrary name, eg plural)
    • pluralDns (true)
    • txt_owner (use arbitrary name, eg plural)
    • ownerEmail (use your email, eg yiren@plural.sh)
    • airflowBucket (use arbitrary name, eg plural-airflow-logs)
    • hostname (use Fully Qualified Domain Name of the form airflow.<subdomain>, where subdomain is the subdomain you created in step 1, eg airflow.tryunitofwork.onplural.sh)
    • dagRepo (use arbitrary name, eg plural-airflow-dags)
    • branchName (use master)
    • adminUsername (choose username, eg yirenlu)
    • adminFirst (your first name, eg Yiren)
    • adminLast (your last name, eg Lu)
    • adminEmail (your email, eg yiren@plural.sh)
    • Do you want to enable plural OIDC? (yN) (y)

    All these values you input will be unspooled into a context.yaml file at the root of your repo. The file will look something like this:

    apiVersion: plural.sh/v1alpha1
    kind: Context
    spec:
      bundles:
      - repository: airflow
        name: gcp-airflow
      - repository: console
        name: console-gcp
      configuration:
        airflow:
          adminEmail: yiren@plural.sh
          adminFirst: Yiren
          adminLast: Lu
          adminUsername: yirenlu
          airflowBucket: ren-plural-2-airflow-bucket
          branchName: master
          dagRepo: ren-plural-dag-repo
          hostname: airflow.tryunitofwork.onplural.sh
        bootstrap:
          dns_domain: tryunitofwork.onplural.sh
          ownerEmail: yiren@plural.sh
          pluralDns: true
          txt_owner: ren-plural-3
          vpc_name: ren-plural-cloud-3
        monitoring: {}
        postgres:
          wal_bucket: ren-plural-2-wal-archives
    

    At this point, your directory should look something like this:

    yirenlu@Yirens-Air-2 my-plural-airflow-repo % ls
    README.md	context.yaml	workspace.yaml

    5. Build

    plural build

    At this point, your directory should look something like this:

    yirenlu@Yirens-Air-2 my-plural-airflow-repo % ls
    README.md	bootstrap	context.yaml	postgres
    airflow		monitoring	workspace.yaml

    6. Deploy

    plural deploy
    

    7. Commit and push your changes

    git add . && git commit -m "Initial plural setup"
    git push
    

    8. Profit!

    You should now be able to navigate to the Fully Qualified Subdomain that you input earlier (in our case airflow.tryunitofwork.onplural.sh). Because you've already chosen OIDC (single sign-on) above, you should be able to

    Congratulations, you've successfully deployed Airflow on Kubernetes!

    Next Steps With Plural

    That was a quick overview of the simplest way you can use Plural to deploy Airflow on Kubernetes. Plural offers a number of other goodies, in particular the Admin Console which serves as a central control panel for all your Plural-deployed applications.

    For more about Plural, our full docs are at docs.plural.sh.

    If you run into any problems or have suggestions for what else you’d like to use Plural for, please let us know in our Discord.

    Yiren Lu

    Yiren Lu