January 7, 2021

AWS Announces Managed Workflows for Apache Airflow

By Dalton Conley

phData has been working with Amazon Managed Workflows for Apache Airflow (MWAA) pre-release and, now that it’s generally available, we wanted to share our initial thoughts on how it compares to our data workflow strategy for Apache Airflow.

Amazon MWAA is a fully managed deployment for Apache Airflow that provides easy management of the Airflow configuration and integrations with other AWS services. Amazon MWAA also has the capability to easily deploy DAGs from S3 buckets and manage custom Airflow plugins. 

Amazon Managed Workflows for Apache Airflow Benefits

While this isn’t introducing new capabilities to Apache Airflow, it is providing value in new ways, most notably:

  1. Eliminating the need to stand up new infrastructure
  2. Reducing the burden of managing ECS and Docker
  3. Providing access to AWS-level security for your organization

The primary driver for utilizing a managed Airflow solution is simplifying infrastructure management. Before Amazon MWAA, managing infrastructure for Airflow could be challenging. Now, you can leverage Amazon’s managed solution out of the box instead of worrying about Linux configuration, allowing you to focus on more important activities like DAG creation. 

Another great improvement that Amazon has implemented is automatically sending Airflow metrics into CloudWatch. This capability allows you to view DAG-level, task-level, and other operational metrics related to Airflow all within CloudWatch. This improves the production-readiness of Airflow on AWS and gives data engineers better insights into their data workloads.

Amazon MWAA also provides a highly-available, elastic architecture using multiple availability zones and autoscaling to support the execution of Airflow tasks only when you need it. This enables you to scale your infrastructure when needed while saving on costs when there is downtime in your workload.

As you can see, there’s a lot of reasons to consider Amazon Managed Workflows for Apache Airflow! That said, we know that managed services aren’t the right fit for every organization. The good news is that you can still manage your own instance of Apache Airflow without leveraging Amazon’s infrastructure. 

Apache Airflow has been a core offering that we build for our customers who prefer to maintain the infrastructure themselves. Our solution includes a highly available service that uses autoscaling to support the elasticity of any data workload. This offers the flexibility to configure different integrations and isolate different enterprise teams. It also includes ECS, which requires building a Docker image and managing an ECS cluster in a secure environment, but it enables our customers to manage Apache Airflow as they see fit!

As with any new release of an AWS service, it’s important to understand the development lifecycle and process that AWS uses to iterate on new features for that service. Currently, you can only use IAM to authenticate into the MWAA service, which could be a limitation for enterprise customers who use Active Directory or another 3rd party service such as Okta. 

For more information on Amazon Managed Workflows for Apache Airflow, building your own Apache Airflow environment, or which solution is the right fit for you, reach out to us at info@phdata.io. Our team is always available to answer questions and help you find the best solution for your needs.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit