October 27, 2022

phData Toolkit October 2022 Update

By Nick Goble

Picture this: you’ve decided to move across the country. With that decision made, you have a number of tasks that you need to complete before the date you’ve set for the move. This may include deciding to:

  • Sell your house
  • Buy a new house
  • What to take with vs. leave or sell
  • How you’re going to move your stuff
  • How to be cost-conscious

Much like moving across the country, migrating between data platforms can be a daunting task. You have to decide what technologies you’re going to keep, which you may need to buy, inventory your data pipelines and products, determine how you’re going to migrate your data, make the move, make sure everything made it to the new platform, and how to ultimately not break the bank!

phData, being an expert in data migration projects, has built a number of tools to make these areas easier for you. We regularly work on these types of projects and have identified areas where tooling would accelerate your move.  

These tools are all encompassed within our phData Toolkit:

Today, we’re going to talk about updates for the month of October, improvements that we’ve made to the platform, and reiterate the value that these tools provide.

SQL Translation

SQLMorph, our SQL translation tool, is built to translate between SQL dialects. For those who don’t know what that means, let me explain.  

Take the English language for example: the basics of the language are the same regardless of where you’re located, but dialects can differ depending on location. 

What do you call the part of your car that covers the engine in the front end of your car? If you’re in the USA, you likely call that a “hood” but if you’re in England you likely call that the “bonnet”.

This type of difference in dialects is exactly what SQLMorph solves for. You provide our tool with the SQL from your current platform, tell us what your target platform is, and we will translate all these differences for you programmatically!

What’s New?

Going back to our language metaphor, the English language continues to get new words, and sometimes translations change due to changes in other languages. Much like language, SQLMorph is continuously updated to support new translations and cover additional functionality added within systems.

Over the course of the last month, we’ve added the ability to translate “merge into” statements from Hive to the Snowflake Data Cloud. This looks like the following:

Source:
				
					MERGE INTO customer USING (
  SELECT
    *
  FROM
    new_customer_stage
) sub ON sub.id = customer.id
WHEN MATCHED THEN
UPDATE
SET
  name = sub.name,
  state = sub.new_state
  WHEN NOT MATCHED THEN
INSERT
VALUES
  (sub.id, sub.name, sub.state);


				
			

Target:

				
					MERGE INTO customer USING (
  SELECT
    *
  FROM
    new_customer_stage
) sub ON sub.id = customer.id
WHEN MATCHED THEN
UPDATE
SET
  name = sub.name,
  state = sub.new_state
  WHEN NOT MATCHED THEN
INSERT
VALUES
  (sub.id, sub.name, sub.state);


				
			

While this is a simple translation, as the dialects are exactly the same, this is frequently not the case. For example, we’ve also fixed an issue where “break” statements were not translating while inside of a loop for our MSSQL to Snowflake translation. 

This looks like the following:

Source:

				
					CREATE PROCEDURE proc1 AS
DECLARE @caseVal bigint = case
  when 1 = 1 then 3
end;
WHILE (
  SELECT
    AVG(lprice)
  FROM
    t1
) < 300
BEGIN
UPDATE
  t1
SET
  lprice = lprice * 2;
Select
  MAX(lprice)
FROM
  t1;
BREAK;
END;
				
			
Target:
				
					CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
LET caseVal BIGINT DEFAULT CASE
  WHEN 1 = 1 THEN 3
END;
WHILE (
  (
    SELECT
      AVG(lprice)
    FROM
      t1
  ) < 300
) DO
UPDATE
  t1
SET
  lprice = lprice * 2;
SELECT
  MAX(lprice)
FROM
  t1;
break;
END
WHILE;
END;
$$;
				
			

With this translation, you can start to see how the two dialects start to diverge. Variables are defined differently, the stored procedure definition syntax is different, and Snowflake also defines the language that is being used for the stored procedure (as Snowflake supports Javascript and SQL).

You can imagine that these types of variances can be tricky to remember exactly how each dialect handles this, and trying to perform this mapping manually would be both error-prone and tedious to complete. 

This is exactly why SQLMorph was built: to accelerate your data definition language (DDL) and data markup language (DML) migration!

Project Administration

Tram, our project administration tool, is built to guide and enforce your information architecture. 

Using our moving metaphor, imagine you’re in your new house and you need to ensure that the house is maintained correctly.  You likely have a set of normalized tasks (such as chores) that need to be completed on a regular basis and require checking things off your list.

You may also need to update parts of your house which may require permits and approvals to make those changes.

Tram allows you to accomplish these tasks.

Tram provides an infrastructure-as-code approach to your Snowflake data platform.  It gives you the ability to define resources that belong to a group and members of that group. 

For example, let’s say you need to create a standardized set of resources that each project gets.  Traditionally, this could be solved with something like a stored procedure that takes in the new project name that you want to provision resources for. However, changes to existing projects may also need to be made as part of your updates. This requires a number of manual changes to alter things correctly. 

What if instead, you could define metadata about the types of objects a project should have, and you could just add your new project to the existing list of projects, and any changes you need to be made to all projects could be handled automatically in one location?

This is a major component of Tram and why it adds significant value to your data platform. 

Tram also gives you the ability to automatically provision resources by integrating with your ITSM tools such as ServiceNow or Jira. 

Instead of a DBA making changes to your platform by getting a ticket and having to go manually run a set of SQL statements, this can now be automated by the ITSM tool integration. Requests go through the normal approval process, and once the approvals are submitted, Tram automatically provisions the requested resources.

What’s New?

We’ve made a number of updates to this tool over the last month.  Most notably, if you’re using Tram today, you will need to change the way that you download Tram in your CI/CD pipeline.  You can read more about that here.

For new functionality, we’ve added a lot of support for tags within Snowflake. Tags give you the ability to catalog your data, apply data masking policies, and filter query results based on the tags associated with columns.

We’ve added the following support for tags:

  • Added support to SET and UNSET COMMENT for tags
  • Added support to ADD, DROP, and UNSET ALLOWED_VALUES for tags
  • Added support to deprovision tag objects
  • Added support to deprovision role grants for users

This continues our ability to utilize Tram to administer and apply tags.

In Summary

By now it’s our hope that we’ve shown the reason and value for the phData Toolkit along with how it accelerates project migrations. 

While we only hardly covered two of the tools available in the Toolkit this month, there are many more tools and apps available. We encourage you to take advantage of these tools, review the tools available within the Toolkit, and provide feedback if these tools are missing the functionality that you need. 

You can always contact us!

Check out our previous month’s toolkit update to learn more about our data source tool!

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022.

Be sure to follow this series for more updates to the phData Toolkit tools and features.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit