Introducing Data Source, phData’s Data Migration Validation Tool

phData is excited to announce the release of the Data Source tool, the first tool of its kind to focus on data migration validation at scale.

Data migrations with thousands of tables are now commonplace as users migrate from legacy on-premise databases to low-administration, highly scalable cloud based platforms. Migrating a data platform with thousands of tables to another data platform is a tedious, error-prone, and time consuming task. 

The Data Source tool discovers and compares data quality across a source and target database, so you can track the status of your migration project. By automating cross-data-source validations, the Data Source tool can save thousands of hours in a typical large-scale migration.

A diagram of the Data Source Tool from phData that shows how the tool can validate tables.

Toolkit UI page

Installation Instructions

One Tool For Multiple Data Sources

To help automate large-scale migrations, we needed to be able to connect to a wide variety of data sources. Each data platform has its unique data types, functions, and connection methods. The Data Source tool handles all these differences, including type mappings, and makes connecting to each data source simple.

After connecting to a data source, users can perform a scan to collect database, table, and column metadata, or run a profile to collect table and column metrics and histograms. 

Scans and profiles run against the entire platform (or they can be filtered down to specific databases or even tables if desired). Scans and profiles collect information about:

  • Entire databases 
  • Individual tables
  • Each column within a table
  • Column data types and other metadata (like a primary key)
  • Column metrics like count, null count, min/max, and any other aggregation

Users can then compare two scans or profiles, from different data sources, to identify data quality issues.

Compare All Your Data, In One Go

The Data Source tool creates a visual view of the differences between two data sources. Any difference that is found is shown in the Difference UI, where users can see if an object exists in the source only, in the target only, or is different between the two. Difference view compare all the data collected in scans and profiles:

Another screenshot from the Data Source UI that shows how it can compare different data sources at a glance.

This single visual diff can replace hours and hours of manual checks, and end users complaining about missing data, or in the worst case, making a decision based on bad data.

