May 5, 2021

SQLMorph May 2021 Update

By Nick Goble

Our free-to-use transpiler, built to convert SQL statements from source to target, has received a fairly large refactor over the last month. We frequently see the need to be able to use the SQL dialect grammars we have built internally in other use cases and technologies.

Validating SQL can be a very complicated task due to the different dialects. Over the course of SQLMorph’s development, we have written over 19,000 lines of ANTLR SQL grammar in nine different dialects.

A graphic showing 11 different technologies that use the SQLMorph dialect

NOTE: You may have noticed there are more than nine here. Some of the technologies above use the same SQL dialect.

This month we primarily focused on the core functionality of the platform. SQLMorph has a lot of functionality built into it, and we’re starting to unlock that functionality for other use cases. The first one we’re addressing is SQL parsing and validation.

Core Functionality Updates

SQL validation is a very common task that occurs in various toolings, pipelines, and projects. It’s a good practice to validate that your SQL can be executed against a source system before actually attempting to execute that SQL. This minimizes errors in your development lifecycle and reduces risk of deployment failures or partial updates to your system.

Internally to SQLMorph, when we are transpiling queries between dialects, we parse the provided SQL into an internal representation of the query. This allows us to take that internal representation and output it into a different dialect. This parsing stage is core to the SQLMorph functionality and historically was tightly coupled with the SQLMorph code base.

The biggest core update for the month is the decoupling of this SQL parsing into its own module named SQLValidator. This is a huge step towards adding additional functionality and extensibility to existing phData tooling and external tools.

SQLValidator has the ability to, given a dialect, parse a given SQL statement and provide any errors for the given statement. This unlocks the ability for syntax validation in pull requests or from tools such as Streamliner or dbt.

We are also considering adding this module to a command line interface (CLI) utility (similar to the SQLMorph CLI). If this is something that your organization would benefit from, give us a shout.

Microsoft SQL Server to Snowflake Updates

This month, the primary work we’ve been doing around MSSQL and Snowflake is support for stored procedure transpilation. This is a complex task, and is still considered pre-alpha.

Currently, we have basic functionality around parsing stored procedures and generating a templated result for basic use cases. We will continue to iterate on this functionality to allow users to simplify their migration path. More to come on this in the future.

Oracle to Snowflake Updates

  • Fixed user feedback 
  • Fixed parse issue on Double data type
  • Fixed rewrite issue with FULL OUTER JOIN
  • Fixed grammar of FROM clause to be mandatory in Oracle Select statement
  • Added support for translating identifiers with #
  • Removed genTb1.genExpr1 logic for case statement expression
  • Added support for handling semicolon if present after “remark” statement or in new line
  • PLSQL – Added support for translating multiple PLSQL statements separated by ‘/’
  • PLSQL – Added support for parenthesis with “execute immediate” statement

Example

				
					
Source: 
DECLARE message varchar2(20) := 'Hello, World!';
BEGIN 
  dbms_output.put_line(message);
END;
/ 
DECLARE procedure for_loop_example(a float) is 
BEGIN 
  FOR i IN 1..10 LOOP 
    a := a + i;
  END LOOP;
  RETURN a;
END;

/ 

Target: 
var message = 'Hello, World!';

/ / -- ERROR: PlSqlStmtVisitor: line 4, character 3: Snowflake does not support PLSQL `DBMS_OUTPUT` stmt
dbms_output.put_line(message);

create or replace procedure for_loop_example ("a" FLOAT)
returns string 
language javascript 
as 
$$ 
for (var i = 1; i <= 10; i++) { 
  var a = a + i;
} 
return a;
$$;


				
			

Impala to Snowflake

  • Fix to implement ANTI join with conditions in parentheses.

Example

				
					
Source:
select
    *
from
    employee e left anti
    join department d on (
        (e.deptId = d.Id)
        and (e.status = 'Active')
    )
    and (e.salary = 1000)
    and (
        substr(split_part(d.office, '_', 4), 1, 3) = e.office
    );

select
    *
from
    employee e left anti
    join department d on (e.deptId = d.Id)
    and (
        e.name = d.name
        OR e.salary = d.salary
    );

Target:
SELECT
    *
FROM
    employee e
    LEFT OUTER JOIN department d ON (
        (e.deptId = d.Id)
        and (e.status = 'Active')
    )
    and (e.salary = 1000)
    and (
        SUBSTR(SPLIT_PART(d.office, '_', 4), 1, 3) = e.office
    )
WHERE
    e.deptId NOT IN (
        SELECT
            Id
        FROM
            department
    )
    AND e.status NOT IN ('Active')
    AND e.salary NOT IN (1000)
    AND e.office NOT IN (
        SELECT
            SUBSTR(SPLIT_PART(d.office, '_', 4), 1, 3)
        FROM
            department
    );

SELECT
    *
FROM
    employee e
    LEFT OUTER JOIN department d ON (e.deptId = d.Id)
    and (
        e.name = d.name
        OR e.salary = d.salary
    )
WHERE
    e.deptId NOT IN (
        SELECT
            Id
        FROM
            department
    )
    AND e.name NOT IN (
        SELECT
            name
        FROM
            department
    )
    AND e.salary NOT IN (
        SELECT
            salary
        FROM
            department
    );


				
			

Hive Family to Snowflake

  • Improve identifiers to support dynamic variables
  • Fixed parse issue for where expression with multiple conditions

Example

				
					Source: 
select * from ${hivevar:env}_WEB_USER.TABLE1
Target: 
SET var0 = $env || '_WEB_USER' || '.TABLE1';
SELECT * FROM identifier($ var0);
				
			

User Interface Updates

There have been a few minor updates to the user interface over the last month. Primarily these have been bug fixes for the following:

  • Fixed Show Parse Tree 300 error
  • Fixed one of our demo queries
  • Expanded Hive Family as list in supported statements

If you notice any other bugs within the user interface or with a particular translation, you can report feedback in the upper right-hand corner of the user interface. We welcome any and all feedback.

Tidbits

  • Every supported statement is flagged by the stage of maturity and support within SQLMorph by each dialect pair. You can view them here!
  • When using SQLMorph, you may see the term “HiveFamily.” This is due to the shared SQL dialect that this family (Hive, Impala, Databricks SparkSQL And SparkSQL) uses.
  • When using the SQLMorph CLI, you can translate either an individual file or an entire directory (and subdirectories). This is based on whether you pass a file or a directory to the –input parameter. SQLMorph will translate each file and mimic the input directory file structure into the output directory.

SQLMorph is Now SQL Translation

Fantastic news! phData combined all of its tools and apps into a unified interface called the phData Toolkit. With the Toolkit, you’ll have a centralized place to utilize all of your favorite phData tools including SQLMorph, now called SQL Translation.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit