May 7, 2024

Exploring Snowflake Hybrid Tables

By Justin Delisi

Everyone knows Snowflake Data Cloud is a powerhouse for analytical use cases, but could it also handle your transactional data? Snowflake has an answer for that: Hybrid Tables. 

In this blog, we will look at Hybrid tables, how they stack against standard Snowflake tables, and how you can begin using Snowflake as your operational and analytical database.

What are Hybrid Tables?

Unlike standard Snowflake tables, built primarily for analytical use cases, Hybrid tables were created to handle lightweight transactional use cases like serving data or storing an app’s state.

Hybrid tables enforce unique constraints for required primary keys and include indexes to retrieve data faster. Enforcing the primary keys also allows the use of referential integrity constraints to better define the relationship between primary and foreign keys. Hybrid tables can be used alongside standard tables to combine transactional and analytical data to create deeper insights into your business.

Hybrid Table Features and Limitations

Hybrid tables come with many features, not standard Snowflake tables, but they also have many limitations.

Features

Row-oriented Data Layout with Secondary Columnar Storage

Hybrid tables are built with a row-oriented layout to provide better operational query performance. However, Snowflake will cache some of the data into its normal columnar format storage for use in analytical queries. The best part is that Snowflake’s Query Optimizer will choose which storage to use based on the query, optimizing performance so you don’t have to.

Row-level Locking

Unlike standard tables, which lock tables by partition or table when a query is being run, Hybrid tables have row-level locking. This means that only the rows accessed by a transaction are locked, allowing several transactions to take place at the same time (as long as the transactions are using different rows)

Constraints

Standard tables can have primary keys, foreign keys, and unique constraints in Snowflake, but they aren’t actually enforced. With Hybrid tables, all 3 are enforced, with primary keys required for each table, ensuring referential integrity. 

Index

The table’s primary key orders the data in Hybrid tableskey, but secondary indexes can be added to speed up lookups on non-primary columns. These are especially useful with attributes used in comparisons in queries to reduce the number of records to be scanned.

Unsupported Snowflake Features

Because hybrid tables are so vastly different from standard Snowflake tables, there are several features as of the writing of this blog that are not supported:

These features will likely be added to hybrid tables as Snowflake works to perfect the new table type.

Limitations

Currently, in public preview, Hybrid tables are new to Snowflake and have some limitations that perfect them:

  • Hybrid tables are only available for accounts created in Amazon Web Services (AWS) in certain regions.

  • Snowflake limits you to 500GB of data in Hybrid tables per account.

  • Constraints cannot be defined after the table has been created.

  • Constraints are not enforced at the statement or transaction level.

  • Hybrid tables can’t be used if your account has period rekeying enabled.

  • The SQL API is not supported for hybrid tables. Consequently, if you try using the SQL API, it will produce errors or unexpected results for queries on Hybrid tables.

How to Use Hybrid Tables

Like most features in Snowflake, creating Hybrid tables is simple. The create statement is like any standard Snowflake table, with the added HYBRID. Of course, the table needs to be created with a primary key. Note that when creating a Hybrid table, a warehouse is specified in the current session. 

Here’s an example:

				
					CREATE OR REPLACE HYBRID TABLE PHDATA_HYBRID_TABLE (
 PHDATA_ID NUMBER PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1,
 COLUMN_ONE VARCHAR NOT NULL,
 COLUMN_TWO VARCHAR NOT NULL
);
CREATE OR REPLACE INDEX PHDATA_INDEX(COLUMN_ONE) ON PHDATA_HYBRID_TABLE;
				
			

Alternatively, Hybrid tables can be created using the CREATE HYBRID TABLE … AS SELECT or CREATE HYBRID TABLE…  LIKE syntax. 

Indexes are also simple to add and drop from Hybrid tables. They can either be added at table creation or later like this:

				
					CREATE OR REPLACE INDEX PHDATA_INDEX(COLUMN_ONE) ON PHDATA_HYBRID_TABLE;
DROP INDEX PHDATA_INDEX ON PHDATA_HYBRID_TABLE;
				
			

All other queries are completed as you usually would on a standard table.

Optimizing Hybrid Tables

With Hybrid and standard tables being so different, they require different optimization techniques:

Warehouses

Snowflake recommends using an XS (extra small) warehouse for use on Hybrid table queries as this is sufficient for small transactional queries. However, as it is more likely that several transactions will take place at once, it’s advantageous to use multi-cluster warehouses to increase the node count and performance. 

DML

It is recommended to use INSERT, UPDATE, or DELETE statements instead of MERGE for small numbers of rows, as this will optimize performance.

Data Loading

Hybrid tables should be loaded using CREATE TABLE… AS SELECT whenever possible, as it is optimized for performance and is about 10 times faster than other commands. COPY or INSERT INTO… SELECT is also supported, but data loading with these commands will be slower with large amounts of data.

Understanding Hybrid Table Costs

Like any other cloud service, it’s best to know how you’re being charged before it comes back to bite you. Snowflake’s Hybrid tables are priced a little differently than standard tables. With Hybrid tables, there are 3 modes of consumption:

Storage

Hybrid table storage costs are calculated as a flat rate per GB depending on the region in which the account is based. Hybrid table storage is more expensive than standard table storage.

Virtual Warehouse Compute

Using a virtual warehouse to query Hybrid tables consumes the same credits as when querying standard tables:

Table Requests

Hybrid tables consume additional credits on the underlying row storage clusters because they are serverless resources. The consumption of these types of resources in Snowflake is based on the amount of data being read or written to the clusters. Because hybrid tables store data in pages, even small read or write operations incur a minimum of 4 KB of hybrid table request usage.

Closing

Hybrid tables can provide a distinct advantage to Snowflake users who want their analytical and transactional data in one place. If you’re looking for an all-in-one data management solution, Snowflake has added Hybrid tables!

For additional information on Snowflake Hybrid Tables, reach out to our team of experts.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit