This blog was co-written by Arnab Mondal & David Beyer.
A few decades ago, Teradata might’ve been the perfect data warehousing solution for you, but since then, the speed and volume of data ingestion have grown exponentially. The increase in availability and utilization of more and more data has given rise to cloud computing.
In response to this growing demand, the Snowflake Data Cloud has enabled a modern solution that can instantly and automatically scale storage and compute (which was impossible before with Teradata). Snowflake also allows the separation of storage and compute, which offers significant benefits compared to competing technologies.
If you’re curious about why you should consider migrating from Teradata to Snowflake, check out our previous blog that covers that question in more detail. For this blog, we’re going to explore the technical details of how to migrate from Teradata to Snowflake successfully.
How to Prepare for a Migration From Teradata to Snowflake
To help ensure an issue-free and smooth transition to Snowflake, we created this handy checklist to leverage before beginning the migration process. Once complete, this checklist will ensure maximum efficiency.
- Create a list of Teradata databases to migrate
- Create a list of Teradata database objects to migrate
- Create a list of processes and tools that populate and pull data from Teradata
- Create a list of current security roles, users, and permissions
- Create a list of Snowflake accounts that exist or need creating
- Document the frequency of security provisioning processes
- Document the existing Teradata solution into an architectural diagram
- Create a list of current processes to migrate with no changes
- Create a list of current processes to migrate that need re-engineering
- Create a list of current processes to migrate that need to be fixed
- Create a draft of deliverables for the migration
- Create a planned architecture diagram in Snowflake
- Create a planned information architecture in Snowflake
- Determine a migration strategy (Bulk transfer versus a staged migration)
- Create a list of new tools that will be used in the migration
- Create a list of current tools that will no longer be used post migration
- Create a list of development environments needed for the migration process
- Create a list of deployment processes used for migration
- Create a list of prioritized data sets to migrate first
- Identify and document process dependencies for data sets
- Create a list of migration team members and their roles
- Identify a migration deadline
- Create a list of expectations by the business for the migration deadline
- Document the budget allocated for the migration
- Create a list of high-level outcomes expected when the migration is complete.
Once the checklist is completed, you are ready to begin your migration from Teradata to Snowflake!
How to Migrate From Teradata to Snowflake
The migration will be the most challenging part of the process, but it can be made simpler if you follow the steps outlined below in the order given.
Step 1: Create the Information Architecture
The perfect place to begin the migration is to implement an information architecture that includes users, roles, accounts, databases, and schema. This will provide a good foundation for the rest of the migration process. You should create a database in Snowflake for each environment in Teradata and one schema for every database that you have in Teradata.
Snowflake’s Role-Based Access Control (RBAC) provides a significant advantage in the amount of control and flexibility versus Teradata’s user-based access control, but it does have a learning curve to implement the first time. Creating a role hierarchy will take the most time of this step.
A best practice with RBAC is to create an access role for each database object and virtual warehouse and a functional role for each function (ex., Data Analyst) within your organization. This allows the access roles to be assigned to the functional roles, and the functional roles assigned to users.
For more information on RBAC please refer to Snowflake’s documentation for RBAC.
Step 2: Implement the Authentication Method
Once the information architecture is created, an authentication method must be chosen and deployed. Snowflake offers several authentication techniques:
- Basic authentication is the default for Snowflake and it is also the least secure, as it only requires a username and password.
- Federated Authentication/SSO
- Federated authentication enables your users to connect to Snowflake using secure Single Sign-On (SSO). With SSO enabled, your users authenticate through an external, SAML 2.0-compliant identity provider (IdP).
- This authentication works with on-premises providers such as ADFS and cloud providers such as Azure AD and Google Suite.
- Key Pair
- Snowflake supports using key pair authentication for enhanced authentication security as an alternative to basic authentication.
- Snowflake also supports rotating public keys in an effort to allow compliance with more robust security and governance postures.
- Snowflake supports the OAuth 2.0 protocol for authentication and authorization.
- OAuth is an open-standard protocol that allows supported clients authorized access to Snowflake without sharing or storing user login credentials.
- This can be used for applications such as Tableau, Looker, and Alation.
- Multi-Factor Authentication (optional)
- Snowflake supports Multi-Factor Authentication (MFA) to increase login security for users connecting to Snowflake. MFA support is provided as an integrated Snowflake feature, powered by the Duo Security service, and entirely managed by Snowflake.
- MFA is enabled on a per-user basis, but at this time, users are not automatically enrolled in MFA, and users must enroll themselves.
- At a minimum, Snowflake strongly recommends that all users with the ACCOUNTADMIN role be required to use MFA.
Step 3: Implement Security Features
The existing Teradata system security can be a good starting point for setting up security within Snowflake. You should take the time to determine if there are Teradata roles and users that are no longer needed or should be implemented differently as part of your migration to Snowflake.
In Snowflake, all data at rest is always encrypted with Transport Layer Security (TLS, the successor protocol to SSL) in transit.
Snowflake also decrypts data when data is transformed or operated on in a table and then re-encrypts the data when the transformations and operations are complete.
If your organization requires further security measures, they should be implemented before any data is moved to Snowflake.
By default, Snowflake allows users to connect to the service from any computer or device IP address. A security administrator (or higher) can create a network policy to allow or deny access to a single IP address or create an explicit list of trusted or blocked addresses.
Depending on the cloud provider you choose when creating your Snowflake account, you can alternatively implement native security features such as AWS Privatelink or Azure Private Link. Either of these options will provide a direct connection between your organization and Snowflake without having to traverse the public internet.
Step 4: Move the Data Model
After the security and authentication have been established, you need to create the architecture in Snowflake with the Data Definition Language (DDL) scripts of the Teradata environments and recreate them within the new Snowflake environment.
One point to remember is that Teradata environments will be Snowflake Databases, and Teradata Databases will be Snowflake schemas. Under the Snowflake schema, there will be tables and views. You might need to update views as necessary.
You will also need to convert data types between Teradata and Snowflake, and the following list will help identify the conversions that may be required:
|Teradata Column Type||Teradata Data Type||Teradata Data Type|
|“++”||TD_ANYTYPE||TD_ANYTYPE data type isn’t supported in Snowflake.|
|BO||BLOB||BLOB data type isn’t directly supported but can be replaced with BINARY (limited to 8MB).|
|CO||CLOB||CLOB data type isn’t directly supported but can be replaced with VARCHAR (limited to 16MB).|
|DH||INTERVAL DAY TO HOUR||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|DM||INTERVAL DAY TO MINUTE||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|DS||INTERVAL DAY TO SECOND||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|DT||DATASET||DATASET data type isn’t supported in Snowflake.|
|DY||INTERVAL DAY||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|HM||INTERVAL HOUR TO MINUTE||INTERVAL data types aren’t supported in Snowflake but date calculations can be done with the date comparison functions (e.g. DATEDIFF and DATEADD).|
|HR||INTERVAL HOUR||INTERVAL data types aren’t supported in Snowflake but date calculations can be done with the date comparison functions (e.g. DATEDIFF and DATEADD).|
|HS||INTERVAL HOUR TO SECOND||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|LF||CHAR||This data type is in DBC only and can’t be converted to Snowflake.|
|LV||VARCHAR||This data type is in DBC only and can’t be converted to Snowflake.|
|MI||INTERVAL MINUTE||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|MO||INTERVAL MONTH||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|MS||INTERVAL MINUTE TO SECOND||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|PD||PERIOD(DATE)||Can be converted to VARCHAR or split into 2 separate dates.|
|PM||PERIOD(TIMESTAMP WITH TIME ZONE)||Can be converted to VARCHAR or split into 2 separate timestamps (TIMESTAMP_TZ)|
|PS||PERIOD(TIMESTAMP)||Can be converted to VARCHAR or split into 2 separate timestamps (TIMESTAMP_TZ)|
|PT||PERIOD(TIME)||Can be converted to VARCHAR or split into 2 separate times|
|PZ||PERIOD(TIME WITH TIME ZONE)||Can be converted to VARCHAR or split into 2 separate times but WITH TIME ZONE isn’t supported for TIME|
|SC||INTERVAL SECOND||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|SZ||TIMESTAMP WITH TIME ZONE||TIMESTAMP_TZ|
|TZ||TIME WITH TIME ZONE||TIME WITH TIME ZONE isn’t supported because TIME is stored using “wall clock” time only without a time zone offset.|
|UF||CHAR||This data type is in DBC only and can’t be converted to Snowflake|
|UT||UDT||UDT data type isn’t supported in Snowflake.|
|UV||VARCHAR||This data type is in DBC only and can’t be converted to Snowflake.|
|YM||INTERVAL YEAR TO MONTH||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
|YR||INTERVAL||INTERVAL data types aren’t supported in Snowflake, but date calculations can be done with the date comparison functions (e.g., DATEDIFF and DATEADD).|
Step 5: Move the Existing Data Set
After your database objects are created in Snowflake, the next step is to move the historical data to Snowflake. This can be done in one of three ways:
- Using a third-party migration tool
- With an Extract Transform Load (ETL) tool (or Extract Load Transform – ELT)
- Through a manual process
The diagram below illustrates the general data loading path depending on which option is used.
It is important to consider the amount and complexity of the data you need to move when choosing a process.
For tens of terabytes up to a few petabytes, a best practice is to extract all the data to files and move it with a service such as AWS Snowball or Azure Data Box. For more data than that, AWS Snowmobile or Azure Data Box is more appropriate.
To optimize Snowflake’s parallel bulk loading, the individual files to be loaded should have a file size of around 250 MB compressed, but no more than 1 GB each.
Once in a staging bucket (cloud storage location), the files can be loaded into their tables using the COPY command in Snowflake.
Step 6: Migrate Queries and Business Intelligence Tools
Since Snowflake uses ANSI-compliant SQL, most of your existing queries, views, and stored procedures from Teradata will execute on Snowflake without requiring changes. Teradata does, however, use some specific notations which are different in Snowflake, so there are a few differences worth noting:
Snowflake Converted Code
TOP 3 WITH TIES *
CURRENT_DATE + INTERVAL ‘2’ DAY,
Sum(COALESCE(wr.wr_return_quantity, 0)) AS DEC(15, 4)
CURRENT_DATE() + INTERVAL ‘2 DAY’,
SUM(COALESCE(wr.wr_return_quantity, 0)) AS NUMBER(15, 4)
Migrating Business Intelligence (BI) tools simply involves changing their connections from Teradata to Snowflake. This should not be difficult since Snowflake supports standard Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) connections.
Native connectors for Snowflake are available for many of the mainstream BI tools, allowing a seamless connection. The examples above were taken from our in-house tool called SQL Translation, where you can convert Teradata code to Snowflake code in seconds.
You could also use it to translate scripts one at a time. SQL Translation also supports a batch processing method with our Application Programming Interface (API) endpoints. You can head over to our phData Toolkit for a free demo of this versatile conversion process which has been extensively used to great success at many well-known organizations, like this Medical Device manufacturer.
Step 7: Migrate the Data Pipeline and ETL Processes
Although Snowflake is optimized for an Extract Load Transform (ELT) approach, Snowflake supports many traditional ETL and data integration solutions. Because of this, a basic approach of simply pointing your existing pipelines to Snowflake is best to make the migration process as simple as possible.
The pipelines should run in both Snowflake and Teradata during the initial migration. This will allow easy validation by enabling a quick comparison between the two systems to ensure that the migration has taken place as expected and the cutover process will be smooth with no errors. The behavior will remain the same post-cut-off.
For pipelines that require reengineering, this would be a good time to modernize them to an ELT approach. Take advantage of Snowflake native tools such as Snowpipe, which can load data continuously as it arrives in your cloud storage. You can also begin to use modern ELT external tools like Fivetran and others for replication and Data Build Tool (dbt) for transformations.
Step 8: Cutting Over From Teradata Snowflake
Now that you have migrated your data model, historical data, data loads, and reporting to Snowflake, the last step is the final cutover from Teradata to Snowflake. Here are the recommended steps for a successful cutover:
- Execute a historic, one-time load to move all the existing data.
- Set up ongoing, incremental loads to collect new data.
- Communicate the cutover to all Teradata users so they know what’s changing and what to expect.
- Ensure all development code is checked in and backed up.
- Be sure to re-point all analytical tools to pull data from Snowflake.
- Turn off the data pipeline and access to Teradata for the affected users and tools.
How the phData Toolkit Helps With TeradataMigrations
At phData, we pride ourselves on being a leader in data engineering consulting and the 2022 Snowflake Partner of the Year. We have created a Toolkit of software to make migrations painless and for free!
Listed below are a few different components of the phData Toolkit and how they may be able to help your team with a migration project.
The Provision tool (formerly known as Tram) automates the process of creating databases, schemas, users, and roles in Snowflake. It generates SQL scripts and runs them directly from your Snowflake account. Provisioning with the Provision tool streamlines user onboarding and maintenance, ensuring compliance with your organization’s information architecture.
Additionally, it makes scaling Snowflake easier, eliminates manual errors, and saves valuable time for employees.
Using SQL Translation will save time, improve accuracy, and reduce costs by eliminating the need for manual translation of hundreds or thousands of SQL statements.
Data Source Automation
Data Source is a versatile tool that collects and analyzes metadata and profile metrics from data sources. It’s particularly useful for validating data migration by comparing profile metrics from tables in Teradata and Snowflake. Data Source can be run on all tables simultaneously and requires little to no code.
Privilege Audit is an application that helps users and administrators understand roles and privileges in Snowflake. It features a searchable interface to drill down into metadata, including users, roles, privileges, databases, and schemas. The app provides a table view and graph view of metadata, pre-built reports, and export options to CSV.
With Privilege Audit, you can answer common questions about data access, ownership, and usage in your account.
Platform Monitoring is a platform monitoring tool that collects, evaluates, and alerts you about critical moments in your modern data platform. It simplifies monitoring Snowflake and cloud resources with a configurable interface optimized to perform multiple queries in parallel for speed.
Platform Monitoring evaluates data to generate alerts and take actions, such as sending email notifications to operators.
Find out more about the phData Toolkit here.
Customer Use Cases From phData
At phData, we have performed many Teradata to Snowflake migrations. Through these projects, we have found many of the “gotchas” along the way that have slowed progress.
Most of these caveats were found during the historical data migration step as Teradata’s proprietary nature came heavily into play.
The best tip we can provide is to utilize replication software (such as Fivetran) to migrate the historical data. One particular client asked us to create a migration software of our own in Python to reduce costs. This proved to be challenging and enlightening at the same time.
If your organization chooses to go this route, here are some of the tips/caveats of extracting data directly from Teradata to be inserted into Snowflake:
- CASE SENSATIVTITY: Teradata is not case-sensitive by default, but Snowflake is. What it means is that EMPLOYEE, employee, and Employee are all different variables. You need to be careful about this part to ensure that there are no broken links after the migration. One way to tackle it is to use the “UPPER” function to compare or convert, or you could choose another way that suits your architecture the best.
- ENFORCEMENT OF PRIMARY KEYS AND FOREIGN KEYS: Teradata does enforce Primary and Foreign key constraints. Snowflake has the syntax to define Primary and foreign keys; it does not enforce them. Thus you have to re-design all load processes which depend on a constraint so that duplication does not occur and orphaned records do not enter the database.
- DATE VS. TO_DATE(): In Teradata, you could add the “DATE” to the front of a string and make it a date value, but that’s not the same in Snowflake. Here you use the “TO_DATE()” function to convert a string to a datetime object. Snowflake does implicitly convert string to date wherever necessary and hence you might not need to replace it everywhere, but a good practice would be to replace DATE with TO_DATE() to ensure proper functionality.
- TERADATA PERIOD DATA TYPE:Snowflake does not have a PERIOD data type. So when you are migrating into Snowflake, you either need to split the column into multiples for each timestamp element or cast it to a timestamp or varchar. You need to make sure the data loading processes are changed accordingly.
- UPDATING DATA THROUGH A VIEW: In Teradata, you can perform CRUD operations on a view, and then those updates would be applied to the source tables, but the same is not true here as the same CRUD operations need to be applied to the table. This might mean that any load tasks or processes might need to be re-designed for this operation.
- UPDATE SYNTAX: Teradata, you can put FROM inside an UPDATE statement before the SET statement, but in Snowflake, the syntax of the UPDATE statement enforces that FROM should come after the SET statement. You need to take care of this during the SQL statement conversions.
- DELETE ALL SYNTAX: Teradata has the option of adding “ALL” to the end of a delete statement, but the same is not true for Snowflake. This should also be taken care of during converting SQL statements from Teradata to Snowflake to ensure they work as expected.
- TERADATA-SPECIFIC SYNTAX: Teradata has some keywords for creating tables (DDL) that are not there in Snowflake and need to be removed as required.:
- PRIMARY INDEX
- PARTITION BY
- LOCKING ROW FOR ACCESS
- SEL (must be spelled out as SELECT)
- DEL (must be spelled out as DELETE)
All occurrences of any REPLACE VIEW syntax in Teradata should be changed to CREATE OR REPLACE VIEW in Snowflake.
After going through the article, you should have a clearer idea of how to migrate data from Teradata to Snowflake. Moving forward in the modern world of data is crucial, and the data landscape is ever-changing. Teradata was once a pioneer of data warehousing, but it’s time to move towards a cloud-based, auto-scaling modern data warehouse tool like Snowflake.
Data migration might not be easy, but the experts at phData will make it feel like a walk in the park.
Contact phData for all your Snowflake migration needs; we’re here to help!