Managing data pipelines efficiently is paramount for any organization. The Snowflake Data Cloud has introduced a groundbreaking feature that promises to simplify and supercharge this process: Snowflake Dynamic Tables.
These dynamic tables are not just another table type; they represent a game-changing approach to data pipeline development and management.
In this blog, we’ll dive into what Snowflake Dynamic Tables are, how they work, the benefits they offer, and relatable use cases for them. From real-time streaming to batch processing and beyond, these tables offer a new level of flexibility and efficiency for data teams.
What are Snowflake Dynamic Tables?
Snowflake Dynamic Tables are a new table type that enables data teams to build and manage data pipelines with simple SQL statements. Dynamic tables are automatically refreshed as the underlying data changes, only operating on new changes since the last refresh. The scheduling and orchestration needed to achieve this is also transparently managed by Snowflake.
Listed below is a simple example of creating a dynamic table in Snowflake with a refresh lag of five minutes:
CREATE OR REPLACE DYNAMIC TABLE PRODUCT TARGET_LAG = '5 MINUTES' WAREHOUSE = INGEST_WH AS SELECT PRODUCT_ID ,PRODUCT_NAME ,PRODUCT_DESC FROM STG_PRODUCT;
What are the Advantages of Using Dynamic Tables?
There are several advantages of using dynamic tables, including:
Simplicity: Dynamic tables allow users to declaratively define the result of their data pipelines using simple SQL statements. This eliminates the need to define data transformation steps as a series of tasks and then monitor dependencies and scheduling, making it easier to manage complex pipelines.
Automation: Dynamic tables materialize the results of a query that you specify. Instead of creating a separate target table and writing code to transform and update the data in that table, you can define the target table as a dynamic table, and you can specify the SQL statement that performs the transformation. An automated process updates the materialized results automatically through regular refreshes.
Cost-Effectiveness: Dynamic tables provide a reliable, cost-effective, and automated way to transform data for consumption. They eliminate the need for manual updates, saving time and effort.
Flexibility: Dynamic tables allow batch and streaming pipelines to be specified in the same way. Traditionally, the tools for batch and streaming pipelines have been distinct, and as such, data engineers have had to create and manage parallel infrastructures to leverage the benefits of batch data while still delivering low-latency streaming products for real-time use cases.
What are Some Use Cases for Dynamic Tables?
Real-Time Data Streaming
One scenario to utilize Snowflake Dynamic Tables is real-time data streaming. Data streaming enables the collection and processing of data from multiple data sources in real time to derive insights and meaning from it. This means you can analyze data and act upon it as soon as it’s generated, allowing for faster and more informed decision-making.
Streaming data has historically posed challenges due to the separation of streaming and batch architectures, resulting in dual systems to manage, increased overhead, and more potential failure points. Integrating batch and streaming data adds pipelining complexity and latency.
Additionally, previous-generation streaming systems have steep learning curves, limiting accessibility. Inefficient processing increases costs, hindering scalability and often leading to projects remaining at the proof of concept stage. Moreover, relying on multiple vendors in the critical path can compromise governance and security.
Thanks to Snowflake Dynamic Tables, customers can use simple and ubiquitous SQL with powerful stream processing capabilities to enable streaming use cases for a lot more customers without needing stream processing expertise in Spark, Flink, or other streaming systems.
Additionally, dynamic tables automatically apply incremental updates for both batch and streaming data, removing additional logic traditionally needed for incremental updates.
Finally, with dynamic tables, you can use the lag parameter, which sets your objective for data freshness in your complex pipelines. With a simple ALTER statement, you can switch your pipeline from delivering data freshness of say, six hours to 60 seconds, with no rework required for your pipeline or its dependencies.
These features of Snowflake Dynamic Tables significantly reduce design complexity and reduce compute costs by eliminating inefficient and inflexible processes for streaming data.
Change Data Capture (CDC)
Change Data Capture (CDC) is a technique used in data management to identify and capture changes in data over time. It records modifications, inserts, and deletions in a database, enabling real-time or near-real-time tracking of data changes.
CDC is crucial for various applications, including maintaining data integrity, auditing, replication, and real-time analytics. CDC processes can become complex though, which is where Snowflake Dynamic Tables can be of value.
Dynamic tables provide a streamlined and efficient mechanism for capturing and processing changes in data. Dynamic tables automatically capture changes as they happen in the source data, eliminating the need for complex, manual CDC processes. This ensures that changes are immediately available for analysis, reporting, and transformation, allowing data teams to work with the most up-to-date information seamlessly.
Second, Snowflake’s Dynamic Tables employ an auto-refresh mechanism. As the underlying data evolves, these tables are automatically refreshed, and only the new changes since the last refresh are processed. This approach significantly reduces the computational load compared to traditional CDC methods, where entire datasets might need to be scanned and compared.
Dynamic tables simplify the setup and management of CDC through declarative SQL, ensuring that the CDC process can be defined and maintained in a straightforward and intuitive manner.
The ability to seamlessly integrate historical and real-time data, coupled with Snowflake’s scalability and performance capabilities, makes dynamic tables a powerful tool for organizations looking to implement robust and efficient CDC processes.
Data vault modeling is a hybrid approach that combines traditional relational data warehouse models with newer big data architectures to build a data warehouse for enterprise-scale analytics. Data Vault is an insert-only data modeling pattern; therefore, updates and deletes to source data are not required.
Snowflake Dynamic Tables do not support append-only data processing; however, because dynamic tables act like materialized views and change dynamically on the data that supports it, they are suited to the information mart layer in a Data Vault architecture.
Snowflake Dynamic Tables can be harnessed effectively to create point-in-time tables, which capture the historical state of data at specific moments. To achieve this, data engineers can create a dynamic table and set up an automated refresh mechanism based on a timestamp or a versioning field within the source data.
As the source data evolves, the dynamic table is refreshed regularly, and it captures the changes while maintaining previous versions. By querying this dynamic table with a timestamp or version parameter, users can access historical data snapshots, effectively creating point-in-time tables that reflect the data at specific points in the past. This is particularly valuable for auditing, compliance, and analytical scenarios where historical data is crucial for analysis and reporting.
Furthermore, Snowflake’s Time Travel feature can complement dynamic tables for point-in-time analysis. Users can leverage time-travel functions to access historical data directly from their main tables, eliminating the need to maintain separate point-in-time tables. This offers a flexible and efficient way to achieve point-in-time analysis in Snowflake, allowing users to query the database as it existed at various moments in the past without additional table maintenance.
By combining Snowflake Dynamic Tables and Time Travel features, organizations can create a powerful system for capturing and analyzing historical data, simplifying the management of point-in-time tables and providing valuable insights for a range of use cases.
CREATE DYNAMIC TABLE DWH.DV_CUSTOMER_ORDER_PIT lag = '1 min' warehouse = VAULT_WH AS SELECT CO.CUSTOMER_KEY ,CO.ORDER_KEY ,CO.ORDER_DATE ,CO.ORDER_TOTAL ROW_NUMBER() OVER (PARTITION BY CO.CUSTOMER_KEY ORDER BY CO.ORDER_DATE DESC) AS ROW_NUMBER FROM DWH.STAGING_CUSTOMER_ORDER CO JOIN DWH.DV_CUSTOMER ON CO.CUSTOMER_KEY = C.CUSTOMER_KEY;
Tips and Notes about Dynamic Tables
Finally, here are a few quick tips and notes about dynamic tables:
Optimize your query: The query that defines a dynamic table is executed every time the table is refreshed. Therefore, it is important to optimize your query to improve performance. You can use techniques such as filtering, sorting, and aggregation to improve the performance of your query.
Monitor your tables: It is important to monitor your dynamic tables to ensure they refresh properly. You can use the Snowflake web UI or the Snowflake API to monitor the status of your tables.
Use fine-grained privileges for governance: Fine-grained privileges allow more control over who can manipulate business-critical pipelines. You can apply row-based access policies or column masking policies on dynamic tables or sources to maintain a high bar for security and governance.
Monitor your credit spend: With any kind of automated process in Snowflake, you’ll want to monitor the amount of credits it’s using. This applies to dynamic tables as well. There are built-in services that can be used to monitor costs in Snowflake, as well as 3rd party tools such as DataDog.
Snowflake Dynamic Tables has redefined the landscape of data pipeline development and management, offering a versatile and powerful solution that meets the evolving needs of today’s data-driven organizations. Their seamless integration of real-time streaming and batch processing, combined with their automated change data capture capabilities, brings a level of agility and efficiency that was previously elusive.
If you’re looking to leverage the full power of the Snowflake Data Cloud, let phData be your guide. As Snowflake’s 2023 Partner of the Year, phData has unmatched experience with Snowflake migrations, platform management, automation needs, and machine learning foundations. Reach out today for advice, guidance, and best practices!
In Snowflake, dynamic tables, and materialized views are both used to materialize the results of a query. However, there are some key differences between the two. Materialized views are designed to improve query performance transparently by automatically rewriting queries to use the materialized view instead of the base table. In contrast, dynamic tables are designed to build multi-level data pipelines and transform streaming data in a data pipeline. While dynamic tables can improve query performance, the query optimizer in Snowflake does not automatically rewrite queries to use dynamic tables. Another difference is that a materialized view can only use a single base table, while a dynamic table can be based on a complex query that can include joins and unions.
Yes, there are some limitations to dynamic tables, including certain query constructs and functions that are not supported, such as:
Most Non-deterministic functions
Sources that include shared tables, external tables, streams, and materialized views
Views on dynamic tables or other unsupported objects