March 1, 2023

phData Toolkit February 2023 Update

By Nick Goble

Hello and welcome to your regularly scheduled phData Toolkit update!  We hope that you all did something special with a special someone on Valentine’s Day!  We had a lovely Valentine’s Day getting together and spending quality time with customers like you! Did somebody remember to send the flowers? Anybody?

In all seriousness, our development team has been working on their labor of love and adding new functionality to the phData Toolkit.  We’re passionate about all things data and have purpose-built this toolkit to provide a unified interface for all of phData’s apps and tools.  These are leveraged to accelerate and automate your data projects, information architecture, migrations, and audit functions.

Let’s dive in and look into what new functionality we’ve been able to bring you this month.

Toolkit CLI

While we love the user interface that we’ve built for the Toolkit, having the ability to write scripts and programmatically use the Toolkit is important as well.  We’ve released a few versions in the last month with the newest being 0.7.0. This brings a number of new features to the CLI.

The first one we want to talk about is the  Toolkit SQL analyze command.  

When customers are looking to perform a migration, one of the first things that needs to occur is an assessment of the level of effort to migrate existing data definition language (DDL) and data markup language (DML).  This command will take in the source and target dialects, a directory of SQL to analyze (along with some other inputs), and generate a coverage report.  This analysis serves as a starting point for how much of the SQL conversion can be programmatically vs. manually performed.

The other major new feature we want to talk about is the Toolkit SQL translate command.  If you’re familiar with the SQL Translation tool in the Toolkit UI, this command allows you to perform a SQL dialect translation just like you would in the UI.  This can perform single SQL statement conversion or can take in a directory (like the analyze command) and will programmatically convert each file in the directory!

We’ve also introduced some quality-of-life upgrades and bug fixes to the CLI for SQL translation functionality. These include:

  • Improved performance of the Oracle Scanner when targeting large schemas.
  • Improved error messages when a JDBC driver isn’t found.
  • Improved the Redshift to Snowflake data type mapping.

One of the other major components of the Toolkit CLI is the Data Source tool. This tool ​​collects, compares, analyzes, and acts on data source metadata and metrics. This allows you to perform tasks such as ensuring data quality against data sources (once or over time), compare data metrics and metadata across environments, and create/manage data pipelines for all your tables and views.  

We’ve added the following changes to the latest version of the Toolkit:

  • Removed the experimental flags from the Exec and Codegen commands. These tools are now considered stable.
  • Fixed an issue where numeric types would sometimes not compare correctly.
  • Added a configurable tolerance when comparing float types. 
    • The tolerance defaults to 0.00001.
  • Fixed an issue showing invalid timestamp/precision issues when scanning an Impala database.
  • Fixed an issue where table filters were filtering too much.

SQL Translation

Our SQL Translation tool also received updates to the translations it provides. Performing SQL translations between dialects is the most common task when migrating between data platforms and it’s incredibly common for customers to have hundreds of thousands of lines of SQL across hundreds of files.  

If you were to manually translate each of those files, it would be incredibly expensive, error-prone, and repetitive work. Instead, we’ve built our SQL Translation tool to do this for you!

Regardless of whether you’re leveraging the Toolkit UI or the Toolkit CLI, you’ll receive these updates!

Over the past few months, our primary focus for additional coverage between source and target translations has been Microsoft SQL Server to the Snowflake Data Cloud. We’ve seen a number of customers either looking to perform this specific migration or are currently providing our customers with support to migrate.

Here are some of the updates we’ve made to this translation:

  • Translated the DELETE statement with common table expression.
  • Translate the EXEC statement with a negative integer argument.
  • Parsed @@sqlstatus method.
  • Implemented the NEXT VALUE FOR expression.
  • Implemented the Create Sequence statement.
  • Parsed keywords as identifiers.
  • Implemented the Select @localVariable statement.
  • Fixed the Execute dynamic sql statement.
  • Added unsupported error message for while loop.

As we mentioned previously, we can leverage the SQL Translation tool to provide a coverage report for migrations. Adding support for these functions from an implementation and parsing perspective allows us to generate additional coverage and support for these migrations.

Bringing It All Together

We hope you’ve learned something new from this month’s update. We are continuing to add additional features and functions to different parts of the phData Toolkit as we work with our customers and identify needs.  

If you are looking for a specific functionality or are interested in chatting with us, you can reach us via the Contact Us form on our website! We look forward to hearing from all of you and to what 2023 brings!

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow: this series for more updates on the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit