Long story, short: you can’t take your Hadoop, on-prem environment and simply move it to the cloud with the expectation that you’ll immediately save time and money.
If you don’t configure everything correctly for a cloud environment, you can often end up spending more money. From the beginning, it was clear that we’d need to come alongside as a partner, instead of simply a vendor in a lift and shift project.
In addition to our core services for moving to the cloud, we helped the client answer questions like:
phData had already been working with the client for a while, helping them manage their existing Hadoop cluster. When the client decided that it was time to move off Cloudera and into an AWS environment, they knew where to turn.
With the wellspring of trust already in place, and our expertise in Spark, AWS EMR, and AWS S3, they realized that working with phData was the fastest and most promising path forward into the cloud.
While this seemed like a straightforward cloud migration at first, the client faced a handful of interrelated problems that we would help them solve over the course of the project.
First, the client had a clear motivation for moving to the cloud: with the on-prem set up, they just couldn’t get the Hadoop environment to work well for their actuaries, let alone in a cost-efficient way. The main question became: how do we get this Spark application, which originally ran in Cloudera, to run efficiently in AWS EMR?
But the client had a secondary problem, too: the application hadn’t been set up in Spark optimally to utilize all resources. (This is just one example of how the engineering team uncovered previously unrecognized issues in their architecture.)
Finally, we identified opportunities to take a more cloud-native approach, allowing the client to leverage the scalability of the cloud while minimizing costs. With the complex enterprise architecture, our team quickly realized that we couldn’t simply move the client to the cloud and expect a positive outcome. After testing and validating the new solution, we documented a path forward by offering advice on cost and performance optimization.
Our expert team came together to address all three interrelated issues, setting the client up for long-term success with its new cloud environment.
Since the client placed an emphasis on cost savings, we recommended using AWS EMR — it will work as an on-demand resource for the client instead of an always-on platform.
Based on our previous work with the client, we already had Talend set up for orchestration. (Originally, it was set up to let any dev team replicate data in the cloud.) With Talend already in place, we recommended using its integration with EMR to allow for a better pricing strategy.
From there, the majority of the project focused on getting Terraform and CI/CD set up correctly to work with AWS. Our team defined the cluster and the framework to go with it in Cloud Foundation, ensuring the right configurations and effective deployment in the cluster. Finally, our team set up Livy for managing security access.
With dozens of ‘consumers’ for the cloud data—projects and products—we needed to set up the architecture in a way that would allow for simultaneous pulls and storage in multiple S3 buckets. This is what the architecture looks like now:
Here’s the upshot: the client can now make an informed migration plan that will save them over a million dollars per year in licensing costs and seamlessly handle the existing processes to maintain their data lake.
While the move to the cloud was an eventuality for such a large company, phData helped our client get there faster and more efficiently with our proof of concept project. By distilling dozens of complications and dependencies in an action plan, our expert team helped the client fully understand how to move to the cloud.
Now, the client is set up for long-term cost savings and more accessible data for everyone from actuaries to the C-suite.
Learn how phData can help solve your most challenging cloud infrastructure problems.