phData is excited to announce the release of the Data Source tool, the first tool of its kind to focus on data migration validation at scale.
Data migrations with thousands of tables are now commonplace as users migrate from legacy on-premise databases to low-administration, highly scalable cloud based platforms. Migrating a data platform with thousands of tables to another data platform is a tedious, error-prone, and time consuming task.
The Data Source tool discovers and compares data quality across a source and target database, so you can track the status of your migration project. By automating cross-data-source validations, the Data Source tool can save thousands of hours in a typical large-scale migration.
One Tool For Multiple Data Sources
To help automate large-scale migrations, we needed to be able to connect to a wide variety of data sources. Each data platform has its unique data types, functions, and connection methods. The Data Source tool handles all these differences, including type mappings, and makes connecting to each data source simple.
After connecting to a data source, users can perform a scan to collect database, table, and column metadata, or run a profile to collect table and column metrics and histograms.
Scans and profiles run against the entire platform (or they can be filtered down to specific databases or even tables if desired). Scans and profiles collect information about:
- Entire databases
- Individual tables
- Each column within a table
- Column data types and other metadata (like a primary key)
- Column metrics like count, null count, min/max, and any other aggregation
Users can then compare two scans or profiles, from different data sources, to identify data quality issues.
Compare All Your Data, In One Go
The Data Source tool creates a visual view of the differences between two data sources. Any difference that is found is shown in the Difference UI, where users can see if an object exists in the source only, in the target only, or is different between the two. Difference view compare all the data collected in scans and profiles:
This single visual diff can replace hours and hours of manual checks, and end users complaining about missing data, or in the worst case, making a decision based on bad data.