July 18, 2022

phData Toolkit July 2022 Update

By Nick Goble

Hello and welcome to our phData Toolkit series! Each month, we spend some time talking about the changes that we’ve introduced and why the phData Toolkit exists.  

For those unfamiliar, the phData Toolkit contains several tools aimed at accelerating platform migrations, providing governance/audit tools, and allowing for automation of repetitive tasks.

If you want to check out an overview of the Toolkit, check out June’s update or head over to the phData Toolkit!

Over the last month, we’ve made many changes to the various tools in the platform. Let’s take a look at each of these tools, the changes that were made, and talk about why those changes were made.

Project Administration Updates

Our Project Administration tool (formerly known as Tram) is a utility written in Java that provides a template-based approach for provisioning Snowflake resources. It uses the concept of a group and a member to generate and apply a set of SQL statements for each member of the group.  

For example, suppose you have defined what a workspace for a user looks like. In that case, you can provision all the resources for that workspace for each user in your organization (which can be automated using LDAP or ITSM tools like ServiceNow).

Over the last month, we’ve made a few notable quality of life improvements to the tool as well as added some additional functionality.

One of the most notable quality of life improvements was adding the ability to detect multiple keys within a member list. Sometimes groups would have multiple ITSM tickets created to add a member to a group (say to provision a project workspace).  When the first ticket was resolved, and the user was added to the group, everything would work as expected.  When the second one was merged, you’d end up getting an error. This no longer causes an error (and the same issue has been resolved for grants as well)!

The Project Administration tool can cache statements that have previously been run to minimize the number of SQL statements being run against the Snowflake Data Cloud. This is managed by a statements table in the configured schema.  Previously, there was a bug that would cause statements to be inserted into this table before a successful execution.

We’ve also added the ability to provision multiple workspaces or other individual Snowflake resources through one ITSM ticket.  Previously, for each workspace that you wanted to generate, you had to submit individual tickets.  This was a pain point for customers who wanted to generate a large number of workspaces at once.

SQL Translation Updates

Our SQL Translation tool (formerly known as SQLMorph) received several updates this month as well.  This free-to-use transpiler converts between SQL dialects within a user interface and a command line interface! 

When customers need to migrate between data platforms, SQL needs to be converted between the different dialects that those platforms use.  Functions may take different parameters, parameters may not exist, and/or new functionality has been added that needs to be mapped appropriately.  

Our SQL Translation tool will provide this conversion while providing warnings and errors for imperfect translations that can manually be reviewed by engineers.

For example, here’s a query that uses the SELECT INTO syntax within MSSQL that is converted into Snowflake syntax:

MSSQL:

				
					SELECT
  * INTO Bicycles
FROM
  AdventureWorks2012.Production.Product
WHERE
  ProductNumber LIKE 'BK%';
				
			

Snowflake:

				
					-- WARNING: line 1, character 9: Snowflake does not support `into clause` in select statement. Snowflake CTAS is a workaround for this.
CREATE TABLE Bicycles AS
SELECT
  *
FROM
  AdventureWorks2012.Production.Product
WHERE
  ProductNumber LIKE 'BK%';
				
			

While MSSQL supports a SELECT INTO syntax, Snowflake doesn’t directly support this.  The workaround is to use a CREATE TABLE AS statement where you’re creating a table and using a query to insert data into that table. This is very powerful as it allows you to automate something that would be tedious to do manually and allows engineers to focus on the platform migration rather than SQL dialect differences.

There are other situations where a translation is much more nuanced and requires manual intervention to resolve.  For example, if we have a SQL query to drop a partition in Hive and we’re migrating to Snowflake, there isn’t a direct translation.

Hive Family:

				
					ALTER TABLE
  t1 DROP IF EXISTS PARTITION ('key1' = value1);
				
			

Snowflake:

				
					-- WARNING: line 1, character 30: Snowflake supports `DROP PARTITION` in alter external table statement. Please update the location_path to drop partitions.
ALTER EXTERNAL TABLE t1 IF EXISTS DROP PARTITION LOCATION < location_path >;
				
			

Here the tool generates a warning and there’s a requirement to update the location_path to the location of the external partition. Hive expects you to hand it the partition key and value, whereas Snowflake requires the actual external storage path.  These cannot directly be mapped.

Analytics Library Updates

The phData Analytics Library is a showcase of a number of different dashboards and visualizations that have been built by the phData team. This includes tools like Power BI, SIgma, and Tableau across a number of different business verticals and sectors. Over the course of the last month, we’ve added a number of different dashboards that have been built in Sigma.  

Sigma, a partner of phData, gives you the capability to embed dashboards and visualizations outside of the tool, so you can view live data instead of point-in-time examples. We also partner each of these with a blog post detailing what the dashboard is built to accomplish and how it was built.

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow this series for more updates to the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit