June 2, 2022

How Do You Secure PII Data in Snowflake?

By Zach Golden

In the increasingly interconnected and digital world we live in, data is often the fuel that drives innovation and growth, but it is more crucially the glue that binds everything together.

A study by Domo in 2018 found that 2.5 quintillion bytes of data were generated every day, a figure that has likely increased and will continue to do so in our post-2020 world. This data often includes information about individuals that is sensitive or confidential, or could be used to identify and distinguish an individual from that dataset. 

This is referred to as Personally Identifiable Information (PII). It can range from something as simple as someone’s name or address to a bank account number or medical records. Any system with users or interfaces with humans is likely to capture some form of PII, as it is often necessary for functionality or user experience. 

For example, a user providing their Social Security number to verify their identity when creating an account with their health insurance provider. To this point, PII and its security must be a priority for IT professionals in every industry. 

In this post, we will be walking through securing PII data in Snowflake and the tools provided by Snowflake to make this process simple yet still extremely powerful. 

Why Does PII Matter?

It goes without saying that when a user provides PII as input, they expect that information to be secured and access to it to be restricted. A breach of a platform that contains PII could be devastating for users, as nefarious parties gain access to their PII but could have significant effects on the breached platform. Users could lose trust in the provider and leave the platform, and increasingly governments and law enforcement agencies are investigating data breaches and imposing fines. 

A 2017 study by the security provider VIPRE found that an estimated 66% of small and medium-sized businesses would either go out of business or be forced to shut down for an extended time if their data was compromised. 

As a leader in cloud data warehousing, Snowflake understands the critical importance of PII and provides its users with several methods to secure PII to accommodate any scenario.

How to Secure PII Data in Snowflake

The methods to secure PII within Snowflake include Snowflake’s access control framework, row-level security (RLS), column-level security, and views and secure views.

Access Control Framework

Snowflake’s access control framework combines elements of discretionary access control (DAC) and role-based access control (RBAC). In the DAC model, each object has an owner who can grant access to that object, while in RBAC, access privileges are assigned to roles assigned to users. 

In Snowflake’s access control framework, by default, users cannot access database objects unless they are assigned to roles with the proper permissions. PII can be stored in a database or schema that is only accessible to predetermined roles. 

To illustrate, let’s use the example of a Snowflake table containing employee records, including PII. This table, named ‘Employees,’ was created and is owned by the role ‘HR_ADMIN,’ with no other roles given access to query this table. When the role ‘HR_ADMIN’ is used, the ‘Employees’ table is able to be queried with results returned:

example Snowflake table of employee records

However, if we switch to another role, such as ‘IT_ANALYST’, we will not be able to query ‘Employees’ unless the proper privileges are granted to the ‘IT_ANALYST’ role. Without the proper privileges, an error message will always be returned:

SQL compilation error in Snowflake

Row-Level Security (RLS)

RLS is a way to control access to data by restricting which rows can be returned by queries and therefore are visible to the user. RLS in Snowflake is controlled through row access policies, which exist at the schema level and are enforced on tables and views within that schema. 

Row access policies are simply expressions that identify the specific rows that should be returned by queries referencing the objects covered by the policy. The expressions contained within these policies can be as simple or complex as necessary, with the ability to reference other Snowflake objects such as tables and views. 

Row access policies are evaluated at query runtime and will dynamically filter the rows returned in users’ queries based on the policy definitions. Row access policies can also be nested, allowing separate policies to be applied to tables and views built off those tables.  

Continuing with our example table, ‘Employees,’ let’s imagine that our employees work in different regions, and HR employees should only be able to access the data of employees in the same region, while HR administrators should still have access to all employee records. A mapping table called ‘HR_Regions’ can be used to align roles to regions:

example Snowflake table of employee records filtered by HR regions

This mapping table can be used in a row access policy to determine which regions’ data can be accessed by different roles. Now we can create a new row access policy called ‘HR_Access’ like this:

creating a new row access policy in Snowflake

This row access policy accepts one input, HR_Region, which along with the current role the query is being executed under, is used to query the ‘HR_Regions’ table to determine which regions should be visible. A boolean value is returned, which Snowflake uses to determine which rows should be returned in the query results. When applied to our employees table, this policy enables queries executed as HR_Admin to see all rows, while queries executed as roles for specific regions will only see their region-specific data:

queries by region-specific data
filtering HR region by those who are located in Mandalore
filtering by those who are in the HR Tatooine region

RLS enables a great deal of flexibility when it comes to protecting PII, with policies able to be implemented or modified at any time by roles with the correct privileges. 

Column-Level Security

A counterpart to RLS, column-level security in Snowflake refers to a masking policy applied to a column in a table or view. Similar to role access policies for RLS, masking policies for column-level security are created at the schema level but are applied only to specified columns in a table or view.

The masking policy consists of a SQL CASE expression that will be evaluated at query runtime, masking column values as defined in the policy. Masking policies can also be nested, the same as row access policies. This masking policy, ‘employee_ssn_mask,’ accepts a string variable ‘ssn,’ which in this case would refer to a column containing Social Security numbers. 

If the current role of the query execution equals ‘hr_admin,’ then the value of the column is not asked. If the role does not equal ‘hr_admin,’ the value in the column is not displayed to the user, and ‘###-##-####’ is returned in its place.

an example of column-level security in Snowflake with a mask for SSN

Query executed as HR_Admin with full access to SSN:

example data showing SSN shown with full access

Query executed as HR_Mandalore, showing masked SSN. Note: the row access policy created above is still in place, limiting the result set for this role:

SSN data masked in Snowflake column-level security example

Once a masking policy has been created, it must be applied to a column of a table or view, either when it is created or after creation by using an ALTER command. Masking policies can be combined with RLS for even more control over access. However, the same column(s) cannot be specified in both a row access policy and masking policy at the same time.

Views and Secure Views

A view is a database object that returns the results of a query as if it were a table. While views can be used for various purposes, they are particularly useful to protect PII, as the query returned by the view can be written in a way that hides PII as necessary. 

For example, suppose an organization has a table that contains employee names, phone numbers, and Social Security numbers. In that case, a view could be written that only returns the employee names and phone numbers while excluding the sensitive PII, the Social Security numbers. Employees could be granted access to either the view or the underlying table, depending on their permissions. It is important to note that by default, users can view the definition of views (the query that is executed).

Snowflake offers secure views to address aspects of views that could make PII vulnerable. Secure views function the same way as views but provide additional security by only allowing authorized users to access the view definition and characteristics regarding the execution of the query. 

Additional layers of security can be added by creating secure views off of tables or views that have defined access control, RLS, column-level security, or a combination of the three. 

Closing

Snowflake’s multi-faceted approach to securing PII is simple to implement while being robust and customizable enough to meet your security needs. These techniques, orchestrated as part of your overall data strategy, can limit your risk of inadvertently exposing PII.  

Have more questions about Secure PII Data or Snowflake? Our team of experts is here to help!

FAQs

Snowflake’s access control framework and views are available on all editions of Snowflake. Row-level security, column-level security, and secure views are available on Enterprise and higher editions.

Yes – Snowflake offers a classification process that analyzes and categorizes information stored in the columns in database tables and views. Classification can apply tags to columns, which can be used for auditing purposes or for identifying columns to include in masking policies. 

Yes – customers using the Enterprise edition or higher of Snowflake can make use of external tokenization within a masking policy to only de-tokenize data at query runtime for authorized data. This approach can help ensure that PII is not accidentally exposed, because the data is already masked through the tokenization process before landing in Snowflake.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit