May 31, 2022

phData Toolkit May Update

By Nick Goble

Hello and welcome to our May update of the phData Toolkit! Our toolkit is built to expedite your data migration project and give you tools to build and enforce an information architecture and data governance practice.

This includes capabilities within the Snowflake Data Cloud such as:

  • Track/audit changes to objects, roles, and individual permissions
  • Visualize your role hierarchy and object relationships
  • Integrating your ITSM tool with Snowflake for access requests
  • Automating the provisioning of resources
  • Decoupling engineering teams from servicing common business requests

For our May update, we’ve got some great updates to call out for the platform. So let’s get right into it.

SQL Translation Updates

One of the core tools within the phData Toolkit is the free-to-use SQL Translation tool that allows customers and our own engineers to programmatically translate SQL queries between different SQL dialects. This is incredibly useful for enterprises looking to migrate their data platform from say Hadoop to Snowflake

It’s common for enterprises to have hundreds of thousands of lines of Data Markup Language (DML) and Data Definition Language (DDL).

We’ve been focusing heavily on our Netezza to Snowflake translation over the last month to enable on-going projects with customers. Let’s take a deep dive into the new functionality and fixes we’ve accomplished.

Netezza to Snowflake

We’ve added support for the following statements:

  • Implemented the “/=/” operator
  • Implemented the “Quadruple” indentifer (a.b.c.d)
  • Implemented the Create Library statement
  • Implemented the TO_HEX function

We’ve also fixed some parsing issues with the following:

  • bitwise operators (>>, <<, &)
  • Create History statement
  • using “del” as an identifier
  • Sub-query operator

Let’s take a look at some examples.

One of the more interesting operators within Netezza is the “/=/”.  This operator is commonly used in join operators to allow matches on both equal values and where both columns are null.

Netezza:

				
					select
  *
from
  t1
where
  a /=/ 4;
select
  *
from
  t2
where
  c1 /=/ b;
				
			

Snowflake:

				
					SELECT
  *
FROM
  t1
WHERE
  a = 4;
SELECT
  *
FROM
  t2
WHERE
  (
    c1 = b
    OR (
      c1 IS NULL
      AND b IS NULL
    )
  );
				
			

One of the hardest parts when transitioning between data platforms is feature parity.  It’s common that functionality exists in one system that doesn’t match up or exist in another. These systems often have different ways that they tackle similar problems.

For example, let’s look at the Netezza “Create History” translation.

Netezza:

				
					CREATE HISTORY CONFIGURATION hist1 NPS LOCALHOST USER ' user1' PASSWORD 'password' LOADINTERVAL -1;
				
			

Snowflake:

				
					-- ERROR: NetezzaStmtVisitor: line 1, character 0: Snowflake does not support `CREATE HISTORY CONFIGURATION` statement.
/* CREATE HISTORY CONFIGURATION hist1
 NPS               LOCALHOST
 USER              ' user1'
 PASSWORD          'password'
 LOADINTERVAL      -1; */
				
			

In this case, Snowflake handles history differently. By nature of time travel and streams, engineers are able to create audit logs and view the history of the table.

Next, let’s look at the bitwise operators that are frequently used for bit-shifting.

Netezza:

				
					select
  SUM((("val1".USAGE & (1 < < 0)) > > 0)) AS col1
from
  t1;
				
			

Snowflake:

				
					SELECT
  SUM(
    (
      BITSHIFTRIGHT((BITAND("val1".USAGE, (BITSHIFTLEFT(1, 0)))), 0)
    )
  ) AS col1
FROM
  t1;
				
			

Project Administration Updates

One of the other major parts of the toolkit is our Project Administration tool (formerly known as Tram). This tool allows you to integrate ITSM workflows (or git workflows) with user, permission, and object provisioning automation. Project Administration is a configuration based tool that is executed by a phData provided Java jar file that is executed either manually or within an automation tool.

Over the course of the last month, we’ve been focusing on platform level upgrades to the tool.

One of the quality of life upgrades that we’ve made is having Project Administration check for schema existence before trying to execute statements against that schema. When performing the initial configuration/setup of Project Administration, there are a few tables that it automatically creates to track executed statements and other metadata tables. Previously, these would cause Project Administration to error if the schema for those tables didn’t exist.

We also fixed a minor bug with the “MUST_RESET_PASSWORD” feature for user creation.

Pipeline Automation Updates

Pipeline Automation (formerly known as Streamliner)is our pipeline automation tool, which has received a few updates this month as well. This tool helps with creating the necessary objects required within Snowflake to quickly ingest data into the platform via Snowpipe.  

Pipeline Automation has the ability to (for some systems) crawl your existing database schema and gather information about the tables that you’re planning to migrate to Snowflake.  Previously, this schema crawling was only available for Oracle systems, however we’ve now added functionality to support Hive and Impala as well!  We’ve also increased the performance of crawling these schemas.

As part of our on-going security scrutiny, we’ve also reduced the number of dependencies within Pipeline Automation to minimize our security vulnerability footprint. This included an unused dependency that has not resolved the latest Log4J vulnerability.

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow this series for more updates to the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit