As a major healthcare insurance provider, the client has relatively complex models for assessing everything from risk to pricing for new policies.
Until they engaged with phData, all of the computing infrastructure used for these machine learning models was hosted on-prem. The client knew that on-prem infrastructure is both expensive and hard to maintain—moving to the cloud provides a cheaper and more sustainable path forward.
The client wanted to move to AWS, but they couldn’t see a path forward to migrating legacy data pipelines without losing either some data or some of the power of their machine learning models. Before fully committing to a cloud-first approach, the client wanted to try out AWS and assess how it would impact their current processes.
Our challenge was to build on AWS infrastructure using open source technologies so that the client would have the freedom to move between vendors in the future.
AWS recommended phData for this proof of concept project because of our team’s experience with open-source technologies. The customer had also already worked with phData, which created even more confidence in an outcome dependent on expertise in machine learning and cloud technologies.
phData needed to demonstrate a path forward for the client to host their models in the cloud.
We took on the challenge by migrating a small number of model-scoring jobs to AWS using a handful of open-source platforms that would make the solution platform-agnostic, even as they were hosted on AWS. Airflow is great for orchestrating complex pipelines and logic, regardless of whether using a managed deployment like Amazon’s Managed Workflows for Apache Airflow (MWAA) or the custom ECS/EC2 architecture selected for this project.
In a phrase: we created a set of Airflow pipelines to take advantage of proprietary customer code for the model scoring jobs.
But what did the setup look like in more technical terms?
To set up initial computing power on AWS, our expert team used containerized Apache Airflow on ECS and EC2. The combination gave the client control, flexibility, and scalability in their cloud infrastructure from the start.
First, Apache Airflow is open source and the customer was already familiar with the technology. This gave them the control that they needed to maintain their existing business logic as they moved hosting and CI/CD to the cloud.
Second, ECS on EC2 is a low-level container system that allows the customer to migrate to a different platform down the road if necessary. You can see how all this is mapped out in the image below.
The POC solution that phData developed gave the client the tangible pieces they needed to move forward with cloud computing, including:
Taken altogether, these deliverables gave the client what they needed to move forward with migrating their machine learning capabilities to the cloud.
While they initially contracted phData just for the POC, based on these early insights, the client is moving further down the road with AWS and pursuing more work with phData for implementation.
Looking to get more use out of your customer data in a cloud environment? Learn how phData can help solve your most challenging problems.
Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.