April 30, 2022

phData Toolkit April Update

By Nick Goble

Hello and welcome to our April update of the phData Toolkit! Our toolkit is built to not only expedite your data migration project but to give you tools to build and enforce an information architecture and data governance practice. 

For those of you who are new to our toolkit, we provide four main tools: Privilege Audit (Traverse), SQL Translation (SQLMorph), Project Administration (Tram), and Pipeline Automation (Streamliner). Each of these tools focuses on a critical part of your data strategy.

SQL Translation Updates

SQLMorph, our free-to-use transpiler, is purpose-built to facilitate query translations between SQL dialects. We frequently have customers moving between platforms, and many platforms use a slightly different dialect of SQL.  

For example, you may have hundreds to thousands of stored procedures in SQL Server and you’re migrating your data platform to the Snowflake Data Cloud. Manually converting all of these stored procedures could take months. SQLMorph can programmatically translate an entire directory and all sub-directories of SQL files and provides warnings and errors for these translations.

Let’s take a look at the changes we’ve made to these translations.

Netezza to Snowflake

For the last month, our primary focus has been on the Netezza to Snowflake translation. We’ve added support for the following statements:

  • Alter database
  • Generate statistics
  • Create/alter user
  • Create table national character type and default constraints
  • Set system default
  • Groom table
  • Create materialized view
  • Create scheduler rule
  • Create/alter sequence
  • Alter schema
  • Create external table
  • Create history configuration

As you can see, our team has been extremely busy! While some of these statements are very similar between the two dialects, some are not as trivial. For example, the alter database statement is identical between Netezza and Snowflake:

				
					Source:
ALTER DATABASE emp RENAME TO employees;

Target:
ALTER DATABASE emp RENAME TO employees;
				
			

However, it’s very common for translation to be more complicated:

				
					Source:
set
  ON_ERROR_STOP TRUE \ connect PCDPROCURE \ unset ON_ERROR_STOP CREATE TABLE t1 (FISCAL_YEAR smallint);

Target:
-- ERROR: NetezzaStmtVisitor: line 1, character 0: Equivalent of this NZSQL command not found in snowflake.
/* \set ON_ERROR_STOP TRUE; */
USE DATABASE PCDPROCURE;
-- ERROR: NetezzaStmtVisitor: line 3, character 0: Equivalent of this NZSQL command not found in snowflake.
/* \unset ON_ERROR_STOP; */
CREATE TABLE t1 (FISCAL_YEAR SMALLINT);
				
			

Sometimes the SQL statement doesn’t translate at all as well. Netezza in this case has some functionality that Snowflake does not. In those cases, we provide errors to the user and they will need to be investigated to ensure parity between the two systems.

As customers leverage tooling, we develop additional functionality to support those engagements. If you don’t see what you need, contact us!

Pipeline Automation Updates

Streamliner, our purpose-built tool for automating the creation of Snowflake ingestion and transformation, has had a few important updates as well over the last month.  One of the main features of Streamliner is the ability to connect to your data source via a JDBC connection and auto-generate data definition language (DDL), data markup language (DML), as well as other objects depending on your needs.  

This allows you to programmatically create Snowpipes, tables, streams, and tasks to load data into your Snowflake account.

Over the last month, we’ve added additional JDBC support for connecting to AWS Redshift. A few of our customers have been making the migration from AWS Redshift to Snowflake, and this unlocks a fast path for our customers to set up all the ingestion jobs necessary to extract from Redshift and load the data into Snowflake in a repeatable fashion.

We’ve also added the ability to configure the log level that Streamliner uses when executing. This allows developers to set a debug log level and dig into any issues they may run into when executing Streamliner.

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow this series for more updates to the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit