Nowadays data is plentiful, and it is an almost magical feeling to have all of this data at our fingertips. Yet, such a plethora of data raises a number of questions:
How accurate is the data I am querying?
If I make changes to this table, will I break something else?
Who should have access to sensitive data?
How can my analysts discover where data is located?
All of these questions describe a concept known as data governance. The Snowflake AI Data Cloud has built an entire blanket of features called Horizon, which tackles all of these questions and more.
In this blog, we will explain what Horizon is, what features it includes, how you can use it, and how phData can help along the way.
What is Snowflake Horizon?
Horizon consists of a suite of data governance tools from Snowflake, which help users locate the data they need and ensure it is up-to-date, accurate, and compliant with regulatory standards. Horizon also facilitates secure and controlled data sharing for collaborative data product development.
Horizon addresses key aspects of data governance, including:
Compliance
Security
Access
Privacy
Interoperability
Throughout the remainder of this blog, we will dive deeper into each of the above components and take a look at the ways in which Horizon can help. We will begin with compliance.
Compliance
Data Quality
Snowflake handles data quality monitoring through Data Metric Functions (DMFs), which are tools for measuring the quality of your data in tables and views. They come in two flavors: system DMFs (provided by Snowflake) and user-defined DMFs that you can create yourself.
System DMFs provide a way to quantify various aspects of your data, such as:
Freshness: How recent is my data?
Volume: How many rows does my data contain?
Accuracy: Are there any null values or duplicates?
Uniqueness: How many distinct values are in a column?
Statistics: What are the minimum, maximum, and average values?
User-defined DMFs can be customized to create metrics that meet your specific needs, and all DMFs can be scheduled to run automatically, so you do not have to manually run data quality checks.
Data Lineage
The Snowsight UI includes lineage visualizations for tables, views, and machine learning (ML) assets to give users a bird’s-eye view of the upstream and downstream object lineage. Users can observe downstream objects that will be impacted when a change occurs to a table or a view, and users are able to trace end-to-end features and ML model lineage.
Security
Encryption
As a cloud-based data platform, Snowflake has always prioritized customer data security. One method by which Snowflake implements security best practices is end-to-end encryption (E2EE), which prevents third parties from reading data at rest or data in transit to and from Snowflake, minimizing the attack surface.
Access Control
In Snowflake, access privileges are assigned to roles, which are, in turn, assigned to users. This is known as Role Based Access Control (RBAC). RBAC allows increased flexibility and control over users and data access privileges. Within Snowflake, RBAC can be utilized across data, apps, and models.
Trust Center
The Trust Center is a new feature in Snowflake that monitors an account for security risks. Background processes scan your system for risks based on the account configuration. These scans are then evaluated and compared to Snowflake’s security recommendations. If an account violates any of the recommendations, the Trust Center UI displays the account along with suggested strategies for mitigation.
Access Policies
Snowflake offers many granular policies to protect database assets including:
Row Access Policies: Filter unauthorized rows at query time.
Tag-Based Masking Policies: Mask columns automatically based on tags.
Conditional Masking: Mask data based on values in other column(s).
Dynamic Masking: Dynamically mask columns of data at query time.
These policies can be created for an individual table or across the whole account.
Privacy
Shared Data Policies
One of the core features of Snowflake is the platform’s ability to easily share data between external and disparate entities. Correspondingly, an organization may not want to share its lowest granularity data. In an effort to control the type of queries that can be run against shared data, Snowflake has created two policies:
Aggregation Policies: Allows users to run queries that aggregate data rather than solely retrieving individual rows.
Projection Policies: Protects sensitive data by preventing data consumers from joining and querying on shared data identifier columns, like names and phone numbers, while maintaining consumer ability to match records based on a particular value.
Data Clean Rooms
Another key feature for Snowflake data sharing is Data Clean Rooms. Data Clean Rooms allow users to combine and analyze data from different entities without worrying about the privacy concerns associated with sharing raw data. Without having to expose their raw data, users are able to collaborate and share worry-free.
Data Clean Rooms also include privacy-enhanced techniques. For instance, differential privacy adds noise to query results as a means of preventing access to Personally Identifiable Information (PII) and running multi-party computations directly on encrypted data.
Access
Object Insights
Currently in private preview, the Governance tab within both tables and views in the Snowsight UI provides insights into objects without the use of SQL. Users can specify sensitive columns, view the most repeated queries on a table, understand the most frequent table users, and visualize table lineage.
Object Tagging
Tags are schema-level objects that allow data stewards to monitor sensitive data for compliance, protection, or discovery. To locate sensitive data, tags can be applied to tables, views, and even columns. As a result, data stewards can query for the tags themselves and create tag-based masking policies to discover sensitive data and specify how users should query the data.
Snowflake has created a new Governance Dashboard, currently in private preview, where users can see all of their tags, data masking, and row access policies.
Interoperability
Govern Iceberg Tables
Many customers utilize Iceberg Tables for external cloud storage outside of Snowflake. Data storage flexibility is beneficial for customers who cannot (or choose not) to store their data lakes in Snowflake. Supplementing this flexibility is Snowflake’s integration between Horizon and the Polaris Catalog, allowing for governance of Iceberg tables for all REST-compatible engines.
Even if the tables were not created in Snowflake, this integration allows for granular access policy creation and tagging.
Governance Partner Ecosystem
Snowflake is a leader in its partner network and governance alike. If the Horizon built-in tools are not convincing enough, several of the security, data lineage, and observability providers with whom Snowflake works with have pre-built extensions for quick and easy account connectivity.
phData Advisor Tool
Horizon provides plentiful advantageous features to Snowflake. As knowledge of how and when to utilize these tools can be quite daunting, the Snowflake experts at phData created the Advisor Tool, which is an application that identifies areas of opportunity to improve the configuration, security, performance, and efficiency of a Snowflake environment.
The Advisor Tools generates a “report card” that depicts the health of the target Snowflake environment and includes the following outputs:
Configuration Best Practices: Recommendations for your Snowflake environment configuration based on best practices and industry standards.
Security Enhancements: Identify and fix potential security risks in your account and maintain a compliant security posture going forward.
Operational Risks: Identify operational risks, such as data loss or failures, in the event of an unforeseen outage or disaster.
Performance Optimization: Identify and fix bottlenecks in your data pipelines to maximize your Snowflake investment.
Resource Utilization: Optimize the use of resources, such as compute and storage, to ensure that you are not overpaying for resources you do not need.
Environment Scale and Capacity: View critical metrics about the scale and features used in your account to help you understand your current state and plan for growth.
All of the above criteria are then represented in an interactive dashboard for users to review:
Incorporating the Advisor Tool into your data ecosystem is akin to having a Snowflake expert constantly monitor your account. Best of all, this application is free for all phData customers!
In Conclusion
Building a robust data governance program that instills confidence in your data’s accuracy, safety, and effective utilization is worth pursuing, and Snowflake Horizon can help you achieve this ideal state.
If you need help utilizing Snowflake Horizon or could use some data governance pointers and best practices, phData can help! For added guidance, check out our comprehensive guide to Building Data Governance in Snowflake.