June 29, 2022

phData Toolkit June 2022 Update

By Nick Goble

Hello and welcome to your monthly exclusive on all the amazing updates our team has been making to the phData Toolkit.  For the uninitiated, the phData Toolkit is a set of tools we’ve built for customers to mature their information architecture, provide data auditing capabilities, and accelerate their data platform migrations.  The toolkit is available both as a user interface (UI) and a command-line interface (CLI), depending on the tool.

Over the course of the last month, our team has been heavily focused on adding additional coverage to various SQL dialects in our SQL Translation tool.  We’ll get into this in more detail, but first, let’s talk about why this is important.

SQL Translation and Why

We frequently talk about additions we’ve made to the SQL Translation tool, but it’s also important to understand why this tool exists and continues to evolve.  Likely the most common type of engagement we at phData have with clients is platform and data migrations.  

For example, let’s say a customer has been working with SQL Server, SQL Server Integration Services (SSIS), and SQL Server Reporting Services (SSRS) on-premises and is now looking to migrate to a modern data platform. Since we frequently see customers utilizing Snowflake, we’ll use that as a target system in our example.  

The customer has existing data definition language (DDL) and data markup language (DML) that defines how their data is structured and how it is queried for data pipelines and reporting needs. All of this DML and DDL will need to be translated from one SQL dialect (SQL Server) to another dialect (Snowflake) since each platform uses a different dialect (and not all the functionality exists in both platforms).  This requires a manual migration, and it’s very common for customers to have tens of thousands of DDL/DML statements that need to be migrated.

This is where the SQL Translation tool accelerates your migration process.  The tool has the ability to generate a coverage report ahead of time, so you can determine how much of the migration process can be automated, and you can leverage the api to execute that translation programmatically.

Let’s Talk Updates

Over the course of the last month, we’ve made several updates to different tools within the toolkit.  Let’s take a look at some of these updates.

SQL Translation Updates

When we talk about updates to the SQL Translation tool, they mostly fall into three different categories:

  • New features
  • Bug fixes
  • Parsing issues

One of the main ways that we work to grow our coverage within supported SQL statements is by gathering more real-world examples of SQL statements.  Generating test SQL is challenging and generally leads to assumptions about what customers are using. As part of the customer migrations we perform, we also gain experience and exposure to real-world examples of SQL. This allows us to find potential gaps and edge cases we haven’t covered in previous versions of the tool.

For example, we’ve recently fixed a parsing issue when translating Oracle to Snowflake. Both systems have the ability to define a primary key constraint but have an optional argument that says the constraint isn’t enforced.

Oracle:

				
					CREATE TABLE t1 (
 C1 NUMBER(12, 0) NOT NULL,
 CONSTRAINT PK PRIMARY KEY (C1) NOT ENFORCED
);
				
			

Snowflake:

				
					
CREATE TABLE t1 (
 C1 NUMBER(12, 0) NOT NULL,
 CONSTRAINT PK PRIMARY KEY (C1) NOT ENFORCED
);


				
			

While this is a trivial parse issue to fix since both systems support the same argument, it wasn’t something we initially added as we didn’t see customers using this argument until recently.

Another way that we grow coverage for SQL translations is by customer utilization, or in other words, how frequently is this source/target pair being used.  We’ve recently seen a lot of usage for Netezza to Snowflake and have been working diligently to add additional capabilities to this translation.  We’ve added support for the following statements:

  • Warning for “Distribute On” clause
  • Column default value data type mismatch
  • Alter table statements that have a table constraint
  • Parsing the “CT” keyword
  • Parsing delete/update statements that include an alias
  • Create temp table statements
  • Implemented the extract function
  • Added double quotes and a warning message around Snowflake reserved words
  • Fixed Not Null clause translation

To better show the column default value type mismatch, here’s an example of a translation that utilizes this functionality.

Netezza:

				
					
CREATE TABLE employee(
 EMP_ID numeric(9, 0) not null default 0,
 JOINING_DATE date not null default '2/4/2022'
);

				
			

Snowflake:

				
					
CREATE TABLE employee (
 EMP_ID NUMERIC(9, 0) NOT NULL DEFAULT CAST(0 AS NUMERIC(9, 0)),
 JOINING_DATE DATE NOT NULL DEFAULT CAST('2/4/2022' AS DATE)
);


				
			

In the Snowflake translation, we have to add a cast to the default value to make it match the data type of the column itself.  Within Netezza, this is implied, but within Snowflake, this needs to be explicit.

Analytics Library

New to the phData Toolkit, we have introduced the Analytics Library. This can be found within the Toolkit UI and is a home page for several different examples of analytics dashboards and visualizations built out by the phData team.  These are frequently paired with a blog post where you can get more information on what use cases that dashboard/visualization serviced, how it was built, and what decisions were made.

This most recent addition aims to guide customers through how analytics could be represented within their organization, show how analytics can drive business value, and how different tools can service different users within your organization.

What’s In The phData Toolkit?

Our toolkit is made up of several different tools:

  • Privilege Audit
  • SQL Translation
  • Project Administration
  • Pipeline Administration
  • Application Monitoring
  • Automated Infrastructure
  • Analytics Library

While we don’t make updates to every single one of these each month, it’s important to know about these tools, what purpose they serve, and when to leverage them.  If you want to learn more about each of these tools, check out the phData Toolkit for yourself.

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow this series for more updates to the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit