phData’s longtime client, a nationwide restaurant chain, had a technical task with long-term effects as their cloud capabilities became more mature and complex. The customer needed to migrate individual use cases from their existing Airflow stack to AWS Managed Workflows.
With the migration complete, the customer will be able to create new workflows without having to change the infrastructure—giving them both scalability and security.
In Phase 1, phData provided the customer with the documentation and workflows they needed to assess what it would mean to take pipelines from their existing stack into the MWAA instance.
In Phase 2, we proved the new, managed approach by moving both the enterprise sales forecasting and Customer Lifetime Value use cases to MWAA. With both milestones complete, the customer now has the roadmap they need to do the larger part of migration on their own.
Amazon Managed Workflows for Apache Airflow (MWAA) is a relatively new offering from AWS, designed to bring scalability, availability, and security to Apache Airflow instances.
AWS describes MWAA as a “managed orchestration service” that makes it easier to “set up and operate end-to-end data pipelines in the cloud at scale.”
“Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as “workflows.” With Managed Workflows, you can use Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security.”
With many use cases in Apache Airflow across their organization, the customer was eager to take advantage of the flexibility (and even cost-savings) achieved through MWAA. But, first, they needed an expert partner to scope and document the migration so that they had a clear path forward.
phData is already the customer’s leading consulting firm in data and machine learning. Our expert team built the company’s original Airflow platform—naturally, they turned to us to help them migrate to the new AWS managed service. With extensive Airflow experience, the phData team has consistently demonstrated success to the customer.
This project simply introduced more observability and repeatability into our machine learning workflows as we worked to equip the customer.
Using the internal codespawn tool developed within the customer organization, along with templates created for MWAA migrations, phData transitioned all the necessary processes for the sales forecasting and Customer Lifetime Value use cases.
You can see the simplified architecture with the before/after images below—the managed workflow goes a long way.
Previous Airflow architecture
Newly implemented MWAA architecture
Perhaps most importantly, we implemented MLFlow to add a new level of model observability to the organization’s machine learning (ML) applications. MLFlow is an ML pipeline tracking tool that helps in that initiative. Users can track model parameters, performance, and other key metrics from various experimental runs. Implementing MLFlow means the customer is now able to move into MWAA on their own with new models.
All together, the phData team used a handful of core technologies to achieve the migration of these first two use cases:
We successfully migrated the existing ESF and CLV model pipeline Directed Acyclic Graphs (DAGs) from the traditional Airflow stack to MWAA. Additionally, we upgraded the DAGs to be compatible with Airflow 2.0.2.
By adding documentation and education to this project (on top of the migration itself), we set a precedent for others at the customer’s organization who want to do the same upgrade. Between documentation and standing up an MLFlow environment in both development and production, the customer is now set up for success. The step-by-step documentation means they’re ready to migrate the rest of their ML use cases seamlessly.
In short: we helped them take the first few steps to MWAA, and then provided the roadmap to take them the rest of the way.
Looking into managed workflows or other ML solutions for your organization? Learn how phData can help solve your most challenging problems.
Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.