SQLMorph February 2022 Update

Welcome to our latest installment of the SQLMorph blog series! Part of the phData Toolkit, SQLMorph is a free-to-use transpiler that converts code between various combinations of SQL dialects.

Offered as a Software as a Service (SaaS), SQLMorph is available for use in both bulk transpilations (via the CLI) or for manual usage through the toolkit UI. Depending on the volume of queries that you’re looking to transpile, you may need to contact phData for extended usage.

Let’s dig into the recent changes to the platform.

Log4J

If you don’t know about this vulnerability yet, you can read more about it here. Essentially, a vulnerability was found within Log4J that could cause Remote Code Execution (RCE). This caused a large ripple within the development community, and SQLMorph was no exception to this.

The surface area of this vulnerability is so large that the Cybersecurity and Infrastructure Security Agency (CISA) published an article regarding it.

There were zero known attacks against SQLMorph, and it has since been patched to ensure this attack vector is blocked.

Microsoft SQL Server

One of the areas we’ve been adding additional functionality to is Microsoft SQL Server (MSSQL) to the Snowflake Data Cloud. As we continue to engage with customers, we see additional use cases for this source-target pair. We’ve added the following features and bug fixes:

Truncate table statement
Create user statement

Fixed translation issue with numeric literals in pivot clause
Parse unsupported statements and give error message

				
					Source: 
CREATE INDEX index1 ON schema1.table1 (column1); 
DEALLOCATE @MyCursor;
CLOSE Employee_Cursor;
FETCH NEXT FROM cur1;
    
Target: 
-- ERROR: MSSqlStmtVisitor: line 1, character 0: Snowflake does not support `Create Index` statement.
/* CREATE INDEX index1 ON schema1.table1 (column1); */
-- ERROR: MSSqlStmtVisitor: line 2, character 0: Snowflake does not support `Deallocate` statement.
/* DEALLOCATE @MyCursor; */
-- ERROR: MSSqlStmtVisitor: line 3, character 0: Snowflake does not support `Close` statement.
/* CLOSE Employee_Cursor; */
-- ERROR: MSSqlStmtVisitor: line 4, character 0: Snowflake does not support `Fetch` statement.
/* FETCH NEXT FROM cur1; */

Hive Family

Another area we’ve been consistently adding functionality to is the Hive Family of SparkSQL. This includes Hive, Impala, Databricks SparkSQL, and SparkSQL. Each of these uses nearly identical SQL dialects and allows us to target a larger number of sources with one transpilation!

We’ve added the following pieces of functionality:

Add common table expression (CTE) with insert statement
Implement MONTHS_BETWEEN and NULLIFZERO functions

We’ve also added fixes for the following statements:

Fixed translation issue with function `unix_timestamp` and `date_sub`
Fixed parse issue with time, option, week, cost, hours, string, system, items, zone, check, end, admin as identifier
Fixed `schema.table.*` parse issue
Fixed parse issue with explode function
Fixed parse issue with “With clause”
Fixed identifier parse issue if $ present with alias
Fixed Interval expression parse issue

As you can see, not all transpilations are perfect. We regularly try to gather as many test cases as possible for our unit testing. These are more complex usages of SQL from systems like Hive. We continue to iterate and validate edge cases as we see usage.

Here are a couple of Hive Family SQL translations:

				
					Source: 
INSERT OVERWRITE TABLE t1 WITH req AS(SELECT pid FROM t2) select eid from all
    
Target: 
INSERT OVERWRITE INTO t1 WITH req AS (SELECT pid FROM t2)SELECT eid FROM all;

				
					Source: 
select cast(from_unixtime(unix_timestamp(date_col,"MMddyyyy HH:mm:ss")) as timestamp) as date_col , date_sub(timestamp_col, id) , timestamp_col, id from alltypessmall order by 1  limit 20;
    
Target: 
SELECT TRY_CAST(TO_VARCHAR(TO_VARCHAR(TO_TIMESTAMP_LTZ(DATE_PART(EPOCH_SECOND,TRY_TO_TIMESTAMP(date_col, 'MMDDYYYY HH:MI:SS'))),'yyyy-MM-dd HH:mm:ss.SSSSSS')) AS TIMESTAMP) AS date_col,
            DATEADD(DAY, - (id), timestamp_col), timestamp_col, id FROM alltypessmall ORDER BY 1 LIMIT 20;

				
					Source: 
select d_id, d_type from d_devices ex1 
                lateral view explode(split((split((split(d_attributes, "\\{")[1]),  "\\}")[0]), ',')) point AS points
    
Target: 
SELECT d_id, d_type, VALUE as points FROM d_devices ex1,
                LATERAL FLATTEN(INPUT = > SPLIT((SPLIT((SPLIT(d_attributes, '\\{') [1]), '\\}') [0]),','));

phData Toolkit

Over the last few months, we integrated SQLMorph directly into the phData Toolkit. This toolkit is a one-stop shop for all phData SaaS offerings as well as information regarding non-SaaS tooling built and maintained by phData. For those who have historically used SQLMorph from its previously dedicated UI, you can now find it under SQL Translation in the Toolkit.

We will continue to add SaaS solutions to the toolkit as we identify additional tools that benefit our customers.

If you’re using Snowflake for your data warehouse, we urge you to check out the Privilege Audit tool to audit your organization.

Direction Going Forward

Over the last year we’ve had this blog series focused exclusively on the changes we’ve been making to the SQLMorph platform. Now that SQLMorph is included as part of the phData Toolkit, we want to take this opportunity to reflect on how we notify users of changes.

We believe it’s beneficial to notify users of changes to the phData tooling ecosystem in totality to ensure users stay informed.

As a result of this, we will be modifying this blog series to encompass changes to the phData Toolkit in its entirety as opposed to being SQLMorph specific. This will start in March of 2022.

SQLMorph February 2022 Update

Log4J

Microsoft SQL Server

Hive Family

phData Toolkit

Direction Going Forward

More to explore

Snowflake Query Tagging Best Practices

Data Ingestion from PostgreSQL to Snowflake using Openflow

The GenAI Strategy Playbook

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

Snowflake Query Tagging Best Practices

Data Ingestion from PostgreSQL to Snowflake using Openflow

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning