Welcome to another installment of our monthly SQLMorph update! SQLMorph, part of a set of tools that phData provides for customers, is a free-to-use transpiler built to convert SQL statements from source to target. This empowers our customers to migrate faster between SQL platforms.
We’ve touched many times on the complexity that SQLMorph aims to solve for our customers (see previous series posts). Migrating between database systems and providers is a complex task, especially if your company is a larger enterprise with lots of data pipelines and applications. It’s not uncommon for developers to focus on converting existing processes/SQL scripts during a migration rather than building new functionality out. SQLMorph allows you to script this task (via our CLI) or via our UI.
Over the last month, we’ve been focusing on additional functionality for the Hive family of systems (Hive, Impala, Databricks SparkSQL, and SparkSQL). Since these all mostly use the same variant of SQL syntax, it allows us to enable more of our users to perform transpilations.
Let’s take a look at what new functionality has been added.
Hive Family to Snowflake
As previously mentioned, the Hive family consists of Hive, Impala, Databricks SparkSQL, and SparkSQL. This is a large surface area within data engineering and SQL dialects. Many of the transpilations for Hive to Snowflake seem trivial, transpilations aren’t always as easy as they seem. While the following examples are trivial to transpile manually, users would have to spend the time learning both SQL dialects to know that they’re similar. SQLMorph allows you to transpile your SQL and not focus on comparing and contrasting your SQL dialects.
Source: delete from t1 where c4 is not null; Target: DELETE FROM t1 WHERE c4 IS NOT NULL;
Source: truncate table t1; Target: TRUNCATE TABLE t1;
Source: set mem_limit=64g; Target: SET mem_limit = 64g;
Source: use test1; Target: USE SCHEMA test1;
Source: select sum(hash(*)) from t1; Target: SELECT SUM(HASH(*)) FROM t1;
While it might start to look like Hive and Snowflake SQL dialects are identical, this is not true. Let’s take a look at some more complex examples of Hive to Snowflake transpilations that we’ve added to SQLMorph.
Source: show views like 'test_view1|test_view2'; Target: SHOW views LIKE 'test_view1|test_view2'; Source: show databases; Target: SHOW SCHEMAS; Source: show create table tb1; Target: -- INFO: HiveFamilyStmtVisitor: line 1, character 0: Snowflake equivalent of `show create table` statement is `get_ddl` function. SELECT GET_DDL('TABLE', tb1)
You can imagine that a developer responsible for transpiling existing Hive SQL to Snowflake may just expect to be able to copy and paste their script into Snowflake and have it execute. While that may work for the majority of cases, Snowflake would throw an exception when trying to execute show create table tb1.
One of the tools we’re working on for customers is the ability to generate a report within SQLMorph that will give customers a sense of what their migration will look like. This includes answering questions like:
- How much of my existing SQL can be converted?
- How much of my existing SQL has warnings that I need to look into?
- How much of my existing SQL cannot be directly converted?
- How much of my existing SQL cannot be parsed or isn’t supported by SQLMorph?
By generating this report of the current state, it will give both our customers and phData a sense of the scope of work involved with a platform migration. This drives more accurate business decisions, budgets, timelines, and expectations in projects and project deliverables. This will require some up-front work to gather a customer’s existing SQL and locate it somewhere that the SQLMorph API can reach. This will not require publicly exposing your company’s SQL.
How Does SQLMorph Work?
Throughout this series, we’ve talked about the benefits of using a tool like SQLMorph and the complexities that it aims to solve. What we haven’t shared with you is how SQLMorph actually works. We mentioned in our May SQLMorph update that we’ve written over 19,000 lines of ANTLR grammar in nine different dialects, but what does that ANTLR actually look like?
Let’s take a look at a basic example for the drop view statement in Snowflake. Note that for the sake of simplicity we won’t go through every piece of the grammar.
// --------------------- DROP VIEW --------------- drop_view: DROP VIEW IF_EXISTS? object_reference ;
This drop_view grammar defines what our drop view statement looks like. This checks the input string for the following:
- Starts with DROP VIEW (case insensitive)
- The optional (indicated by the ?) grammar IF_EXISTS is evaluated. This is looking for the string literal IF EXISTS
- String ends with grammar object_reference. This is defined as:
object_reference : (outer_namespace'.')?(inner_namespace'.')?(name | '$'NUMERIC_LITERAL) ;
This basically just checks for database.schema.name where database.schema. Is optional.
As you can see, these grammars can reference each other and be reused for different statements.