Case Study

Global Financial Services Powerhouse Migrates From Hadoop to Snowflake

The Customer’s Challenge

A prominent, global professional services organization had been utilizing Cloudera with Hadoop to fulfill their cloud-based data storage and analytics needs. News of the technology’s upcoming obsoletion prompted the client to evaluate all industry options in hopes of identifying a more advanced solution to take their data analytics and reporting outputs to the next level. They also desired a solution that could more easily scale up or down with changing workload demands and was more cost-effective to manage and maintain.

phData’s Solution

phData proposed migrating the client’s data to the Snowflake Data Cloud and suggested incorporating StreamSets and Tableau to better transition the client from the features they had grown accustomed to in Hadoop.

Working together with the client, the phData Data Engineering team strategically executed a fast and efficient migration with minimal errors.

The Full Story

The technology ecosystem behind the client’s organization consists of a blend of commercial software and cloud-native Platform-as-a-Service (PaaS) offerings, all hosted within AWS cloud accounts. Collectively, these integrated tools provide a workbench and hosting service for teams and a foundation for the client’s data strategy.

As their existing platform support was reaching its end of life, the client reached out to phData to serve as their guide, leading them to more modern, scalable technology solutions.

phData carefully evaluated the client’s existing environment to understand:

  • Technologies in the current landscape
  • Challenges associated with large-scale multi-tenant migration
  • Breadth of migration activity (number of Databases/Tables /Views/Workloads etc.)
  • Depth of migration (technology interdependencies and compatibility)
 

The client hosts several tenants within its platform. The platform team played an essential role in collaborating with tenants, setting up environments, capturing inventories, and planning the multi-tenant migration in collaboration with phData during the following stages of the migration:

  • Tenant Discovery Prepared key information to support the migration. 
  • Target state data architecture – Captured data flow, workloads, data endpoints, and technology following DASP patterns​.
  • Data inventory & migration – Confirmed all data assets for migration, practice, data validation, reconcile, and agree cutover​.
  • Environment setup confirmation – Finalize any tenant setup; tenant confirms access & smoke tests environment​.
  • Support & review refactoring Executed & supported workload refactoring & data migration through joint stand-ups & clinics​.
  • Cutover & hyper-care – Coordinated final cutover, enter 30-day hyper-care, and confirmed satisfaction​.

The client’s platform is a multi-tenant environment hosting applications across regions. The process of migration is repeatable to a larger extent in a Factory Model.

  • Migration script generation DDL / Data copy
  • Translation of transformation and ETL codes
  • Statistical Data Validation
  • Snowflake Environment setup
  • Snowflake Grant setup
  • Parallel data loads to sync data from CDH

The client’s migration relied on a series of automations developed by the phData team, including:

  • Streamliner: Uses SchemaCrawler to query database table metadata. It then generates code from a series of reusable templates, which can generate scripts to create tables (DDL) or ingest data into Snowflake.
  • SQL Translation: Converts queries from one SQL dialect to another. Automated translation is incredibly indispensable when migrating between data platforms but can help save time and minimize errors whenever you need to translate SQL from one dialect to another. SQL Translation has been widely used in this migration to create Snowflake-compatible views and translate ETL transformations.
  • Data Source Automation: Validates notebooks that have been developed. It performs structural validation (data type / column order) and data validation through statistics like min / max / length / row count / average / sum for each table and compares between Snowflake and Impala. This ensures that the data migration conforms to expected standards.
  • Provision Tool: Our automated Snowflake resource creation (Dev / Prod workspace) tool helped maintain the hierarchy of resources and privileges in a structured manner. 
snowflake-workflow

Why phData?

The client knew that phData was a reliable and experienced partner in providing data migration and analytics recommendations and services.

  • phData offers a team of skilled data engineers who are knowledgeable about a wide range of data migration methodologies and automations. They are well-prepared to deal with any issues that may emerge during a data migration.
  • phData collaborates with each customer to understand their specific objectives and goals, creating a tailored data transfer strategy to meet those needs. This helps ensure that the data migration is effective and achieves the desired results.
  • phData provides complete data migration services, including data extraction, transformation, loading, and validation. This can speed the data transfer process and alleviate the pressure on in-house IT personnel.
  • phData offers ongoing support and guidance during the migration process to ensure that any kind of challenge is handled and addressed as soon as possible.

Results

At the end of the engagement, the client could migrate smoothly to Snowflake and shift their focus from maintenance to fully scaling their BI reporting and analytics.

  • The use of a repeatable process has improved execution efficiency.
  • Overall migration timeframes have been shortened from weeks to days.
  • The validation method is automated, with high accuracy and low manual intervention.
  • Tenants now have the confidence to proceed with functional validation confidently.
  • New tenant onboarding is completed more quickly.
workflow

Migration Statistics

Below are statistics of multi-tenant migration carried out using the phData migration Toolkit:

Migration Statistics

Take the next step
with phData.

Looking into better data options for your organization? Learn how phData can help solve your most challenging problems. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit