March 31, 2023

phData Toolkit March 2023 Update

By Nick Goble

Hello and welcome to the next monthly installation of the phData Toolkit blog series!  We’re excited to talk through the changes we’ve brought into the platform and how it has enabled our customers to build data products with confidence.  

It’s no secret that seasonal depression is something that impacts us all.  Winter makes it challenging to get outside, it’s cold, and the sun is up-and-down before you even finish your work day if you live far enough north.

However, as we turn the corner into spring, it’s warming up outside and people are antsy to get outside.  People want to get outside, soak up the sun, and have a cookout with friends and family.

But alas!  You are stuck performing a critical project or migration for your organization.  You have tons of data and SQL to migrate, and you need to ensure that it’s done right, or it’s your job on the line!

Enter the phData Toolkit to save the day!

The toolkit has been designed to empower your organization with confidence.  Whether it’s translating SQL, ensuring parity between platforms, allowing for programmatic and consistent platform administration, or auditing your system for best practices and organizational policies, we’ve got you covered.

Let the toolkit do the heavy lifting so you can host the next cookout stress free!

So What’s New?

Over the last month, we’ve been heavily focused on adding functionality to the Toolkit command line interface (CLI).  Our Toolkit is primarily broken into two components: the user interface (UI) and the CLI.  UIs are great when it comes to user experience, especially for more non-technical users.  However, many engineers need the ability to programmatically leverage functionality within scripts.

WIthin our Toolkit CLI, we primarily have two different tools: our SQL Translation Tool and our Data Source Tool.  For those who are familiar with our SQL Translation offering in the Toolkit UI, the CLI allows for the same translations against local files or entire directories.  Our Data Source tool is unique to the CLI and enables a wide variety of use cases:

  • Platform migration validation
  • Platform migration automation
  • Metadata collection and visualization
  • Tracking platform changes over time
  • Data profiling and quality at scale
  • Data pipeline generation and automation
  • dbt project generation
 

By leveraging profiling information of your source or target system, you can programmatically generate information, metadata, or use these profiles as template inputs to generate output code.

In late February, we released a new version of the Toolkit CLI: v0.8.0.  This has added additional configuration and fixes.  Let’s take a look at these.

For the SQL Tool, we’ve addressed the following:

  • Continue translation/analysis in toolkit sql when an error happens.
  • Added a default name (the folder or file name) when generating SQL reports.
  • Fix some metrics in the HTML report.
  • Ignore hidden files by default.
  • Added user-configurable options to toolkit sql to mirror the options in the UI.
  • Added a read timeout setting to the toolkit sql command (to be increased for extremely large queries)
    • This is configured in the toolkit.conf with: readTimeout = # In seconds in sql block.

For the Data Source Tool, we’ve addressed the following:

  • Fixed an issue where view filters wouldn’t be disabled when using enabled = false.
  • Fixed an issue where having an empty excludeFilter = “” could break source scanning.
  • Fixed an issue when filtering tables in a database where only the first table listed would be scanned.
  • Fixed an issue in the diff logic where mapped data types were causing tables and views to show as modified.
  • Fixed an issue in the diff HTML where profile metrics were not displaying the correct diff type.
  • Added templates in the codegen historical-load template set to create a Snowpipe and load data incrementally.
  • Removed the init command from the historical-load template set.
    • Instead, set export TOOLKIT_PROJECT_HOME=/path/to/project
 

As users and phData engineers continue to leverage the Toolkit CLI, we’ll continue to iterate on features, functionality, and bug fixes.  If you’ve ran into a bug or want to see something within the platform, let us know!

SQL Translation Updates

Leveraged by both the Toolkit UI and CLI, our SQL Translation API has seen a number of updates as well!  This tool is important because one of the biggest challenges when migrating between platforms is handling variances in SQL dialects.  Much like spoken languages, each language/platform has their own variant on the underlying standard (ANSI SQL typically).  Large organizations frequently have hundreds of thousands of lines of SQL that need to be translated, and performing this manually is not only extremely error prone but expensive if done manually.

Our SQL Translation Tool aims to programmatically perform this with consistent translations so you can focus on more important aspects of your migration.

Over the last month, we’ve been heavily focused on increasing coverage and support for our MSSQL to Snowflake migrations.  We’ve seen this migration path happening more frequently with our customers and thus has been an area of focus.

We’ve added the following to our SQL Translation Tool for both the UI and CLI:

  • Parse issue with LOOP as an identifier
  • Implement STRING_AGG function
  • Parse the keyword IMAGE
  • Translate the DROP INDEX statement
 

Looking at that last statement, you may be thinking “but Nick, Snowflake doesn’t have indexes!” and you’d be right!  This is one of the challenges when migrating between platforms that the Toolkit solves for you.  When migrating between platforms, not only do you have to physically translate the SQL dialects, but you also have to handle functionality that doesn’t exist in the target platform, even if the SQL is valid!

This is what that translation looks like:

Source:

				
					Drop index index1 on obj1;
				
			

Target:

				
					-- WARNING: line 1, character 0: Statement is not supported: DROP INDEX statement.
/* Drop index index1 on obj1; */
				
			

Wrapping Up

We’re really excited to continue to add functionality that allows our customers to accelerate their migrations and build data products.  We’re continuing to iterate on these tools and really appreciate feedback from you!  Check out our Toolkit UI!  It’s free to use and may just be the secret weapon to allow you to finish your migration ahead of schedule and get outside in that wonderful March sun!

As far as what’s coming next, some of our tools that have been around the longest are going through a new iteration based on feedback from customers and our engineers.  Check in next month to hear all about the latest version of Tram, our provisioning automation tool!

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow: this series for more updates on the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit