Data governance is the key to ensuring your data is easily accessible, consistent, and protected. Effective data governance includes a wide range of principles and practices which may overlap with availability, usability, integrity, security, and compliance.
The Snowflake Data Cloud provides multiple features to enable effective governance for an account and its data. Snowflake’s data governance features can be categorized as shown in the below table.
In this blog, we will focus on one of Snowflake’s key data governance features, Object Tagging, and explore the concepts and advantages of using tags.
What is Object Tagging?
Object Tagging enables governance operations such as data classification, tracking sensitive data, resource usage monitoring, and protecting the data. Object tagging allows us to assign metadata to Snowflake objects in order to categorize a group of objects or describe the type of data stored in a table or column.
What is a Tag?
A tag is a schema-level object in Snowflake, which is created once and then applied to many different objects as needed. Therefore, tagging a Snowflake object is a two-step process:
Tag Definition (CREATE TAG)
Defining a tag is creating a tag with a name, an optional comment, and an optional allowed values attribute.
Example tag definitions:
Tag Assignment (APPLY TAG)
Tag assignment is applying a tag to Snowflake objects with an arbitrary string value that can classify Snowflake objects or data.
In the tag assignments example below, department and region tags are assigned to a database and a warehouse. security_class tag is assigned to a table.
Benefits of Using Tags
Tags act as the baseline for the data governance process by providing means to track sensitive data, classify a group of objects, resource usage monitoring, and enable data masking and row-level security. Some of these processes, such as tracking and masking sensitive data can be automated using tags. The below diagram depicts the benefits of using object tagging.
Object Tagging Characteristics
At first glance, tagging may appear straightforward. However, below are the important characteristics of object tagging to consider before implementing object tagging in your Snowflake account.
A tag must be defined before it can be applied to other Snowflake objects. Tags are created as schema-level objects and live in the schema where they are created. Tags can be assigned to any supported objects in Snowflake, including the database and schema where the tag is created.
An optional allowed values attribute can be specified to the tag definition with a maximum of 50 possible values. These values can be updated as needed.
Tags can be created under any database/schema, but they must be unique within the schema. A role with CREATE TAG privilege is required to create tags on the respective schema.
This example creates a tag admin role and grants CREATE TAG and other required privileges:
USE ROLE USERADMIN; CREATE ROLE tag_admin; USE ROLE ACCOUNTADMIN; GRANT CREATE TAG on SCHEMA to ROLE tag_admin; # grant usage access on database and schema to tag_admin role where the tags are created USE ROLE SYSADMIN; GRANT USAGE on DATABASE to ROLE tag_admin; GRANT USAGE on SCHEMA to ROLE tag_admin;
This example creates a tag definition using tag_admin role:
# Assume tag_admin role is assigned to executing user # assume the tags are created in database named GOVERNANCE and schema named TAG_LIBRARY USE ROLE tag_admin; USE SCHEMA GOVERNANCE.TAG_LIBRARY; CREATE TAG department ALLOWED_VALUES 'sales','finance' COMMENT='department tag';
Once a tag is created, it can be applied to as many different Snowflake objects with the same or different string values. If the allowed values attribute is defined, the tag value must be one of the possible string values listed in the tag allowed values. Tags can be applied to all supported objects while creating the objects or the objects can be altered to apply the tag.
A maximum of 20 unique tags can be assigned to one Snowflake object and the assigned string value is limited to 256 characters.
A role with APPLY TAG privilege is required to apply tags on Snowflake objects. The below example grants the APPLY TAG on account as a global privilege and for a single tag, respectively:
# global privilege - tag_admin role can apply tags on account USE ROLE ACCOUNTADMIN; GRANT APPLY TAG on ACCOUNT to ROLE tag_admin; # grant privileges to apply a single tag # assume the tag cost_center tag is created in database named GOVERNANCE and schema named TAG_LIBRARY GRANT APPLY on TAG GOVERNANCE.TAG_LIBRARY.cost_center to ROLE Data_Owner;
This example applies the department tag to warehouses my_warehouse1 and my_warehouse2.
# assume the tags are stored in database named GOVERNANCE and schema named TAG_LIBRARY USE ROLE tag_admin; # Apply tag while creating a new warehouse CREATE WAREHOUSE my_warehouse1 WITH TAG(GOVERNANCE.TAG_LIBRARY.department='finance'); # Apply tag for an existing warehouse by altering the object ALTER WAREHOUSE my_warehouse2 SET TAG GOVERNANCE.TAG_LIBRARY.department='finance';
The child objects in Snowflake’s securable objects hierarchy inherit all the tags applied to the parent object. This is called Tag Lineage. Example: Assigning a cost_center tag to a schema will inherit the tag to all tables and columns under the schema. The value of the inherited tag can be overridden on the child object with a more specific value.
Replication and Cloning
The tags and their assignments within the primary database are replicated to the secondary database using database replication and replication groups. When you clone a database, the tags and their assignments stored in the database are also cloned.
The below statements return a list of all tag definitions within an account, database, and schema, respectively.
# assume the tags are stored in database named GOVERNANCE and schema named TAG_LIBRARY SHOW TAGS in ACCOUNT; SHOW TAGS in DATABASE GOVERNANCE; SHOW TAGS in SCHEMA GOVERNANCE.TAG_LIBRARY;
The below statement returns the tag value assigned to a Snowflake object by calling the get_tag system function with the parameters ‘<tag_name>’ , ‘<object_name>’ , ‘<object_domain>’
# assume tag database is TAG_LIBRARY and tag schema is TAGS SELECT SYSTEM$GET_TAG('GOVERNANCE.TAG_LIBRARY.cost_center','MY_WAREHOUSE','warehouse')
Additionally, Snowflake supports tracking the tag usage through different views and functions under Snowflake account usage and information schemas. This information will have a latency of 45 minutes to 3 hours to extract updated information.
A conventional tagging strategy should be in place before implementing object tagging in your account. Define a tagging plan with a list of tags, allowed tag values, tag naming convention, and use cases for assigning specific tags. Audit the tags periodically and make changes to the tagging plan according to the business context. Below are the different approaches to creating and maintaining Snowflake tags.
Centralized vs. Decentralized Approach
In a centralized tag governance approach, the tags are created and applied by a ‘Tag Administrator’ role (a custom role created to manage all the tags, for example: TAG_ADMIN).
In a decentralized approach, the tag definitions are managed by the ‘Tag Administrator’ and the respective teams or data owners are granted access to apply tags. While the centralized strategy provides ease of managing the tags through a single administrator, the latter provides the flexibility of managing tag assignments to data owners.
Tags are schema-level objects in Snowflake. Tags can be created in a single database/schema and assigned to other Snowflake objects across the account. Although tags can be created in multiple databases/schemas, creating all the tags in a centralized schema makes it easier to maintain the tags and adds better control on who can create and apply tags.
Example Tag Use Cases
In the below two examples, assume the tags are stored in database named GOVERNANCE and schema named TAG_LIBRARY.
How to Identify All Resources Belonging to a Department
The below statement returns all the resources belonging to the department ‘sales’ by querying Snowflake’s ACCOUNT_USAGE views.
SELECT domain,object_name FROM SNOWFLAKE.ACCOUNT_USAGE.TAG_REFERENCES WHERE TAG_DATABASE='GOVERNANCE' AND TAG_SCHEMA='TAG_LIBRARY' AND TAG_NAME='DEPARTMENT' AND TAG_VALUE='sales';
How to Apply Tag-Based Masking Policies
The below example demonstrates how to mask data based on a tag value. That is, mask data based on tag assignment security_class=’PII’
1. Create a tag-based masking policy.
CREATE MASKING POLICY pii_masking_policy as (val string) RETURNS string -> CASE WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('GOVERNANCE.TAG_LIBRARY.security_class') = 'PII' THEN "**Masked**" ELSE val END;
2. Apply the masking policy to the tag: ‘security_class’.
ALTER TAG GOVERNANCE.TAG_LIBRARY.security_class SET MASKING POLICY pii_masking_policy;
3. Assigning the tag security_class=’PII’ to a column will mask the column data automatically.
ALTER TABLE User_DB.User_Schema.UserInfo_Table ALTER COLUMN User_Email_Column SET TAG security_class='PII';
Along with the guidelines provided on this blog, it is also essential to automate the process of creating and assigning tags to Snowflake Objects. This can be easily achieved using phData’s project administration tool called Tram.
Tram is a provisioning and management tool for managing a Snowflake environment. Using Tram, you can automate the onboarding of project resources such as warehouses, databases, object tagging, and other Snowflake objects.
With Tram, it is possible to organize tags in a single configuration file and manage the different operations like CREATE, ALTER and APPLY tags for multiple project resources simultaneously. To know more about managing Snowflake object tagging through Tram, refer to our blog here.
Need More Help?
At phData, we aim to solve every business problem using a tool or automation which ensures quality and increased productivity. If you have further questions, or if you’re interested in exploring phData’s Snowflake services, do not hesitate to reach out!