September 7, 2022

Introduction to Data Clean Rooms

By Manoj Danthuluru

With the rise in producing and consuming data, we are living in times where data is everything. Many organizations today leverage the data produced by their customers and perform analysis to better understand their customer’s needs, feedback, and market behavior to serve better and stay ahead of their competitors.

While performing analysis, there are cases where data needs to be shared, which could be with another department within the organization or with another organization. Sharing data can be challenging at times as one has to maintain privacy regulations while doing so.

This is where “Data Clean rooms” come into action.

In this blog post, we’ll go through what a data clean room is, why it is essential for organizations to implement data clean rooms and how data clean rooms can be implemented in the Snowflake Data Cloud for a better understanding.

What is a Data Clean Room?

A data clean room is a safe and secure environment where multiple organizations or multiple divisions of an organization join data to perform analysis.

Data clean rooms reduce the challenges faced while sharing data by following the privacy regulations like General Data Protection Regulations (GDPR) and California Consumer Privacy Act (CCPA).

In another sense, multiple organizations or divisions in organizations can consume or share data to perform analysis without exposing the underlying facts like details of their customers.

With data clean rooms, the data provider can control what data comes in, how it is joined with other datasets, and to what extent the consumer can see the data and perform analysis. This gives the data provider complete control over the datasets they share.

Why Should Organizations Implement Data Clean Rooms Using Snowflake?

With the rise in the advancement of cloud technologies, sharing data in Snowflake is more secure and flexible. Traditionally to share the datasets, data needed to be moved around and was accessible through only one location. With Snowflake being a cloud data platform, it eases the process of sharing data or bringing datasets to one place.

Snowflake does not have a dedicated feature called data clean room, but Snowflake users can use a combination of features that create a framework called data clean room.

Features like Secure Data Sharing, streams, tasks, row-level access control policy, and stored procedures support the data clean room in Snowflake.

How Does the Data Clean Room in Snowflake Work?

Data clean rooms can be implemented in Snowflake to share data between two organizations or share data within the organizations with other departments. The diagrams below explain the process while implementing a data clean room with Snowflake in these two conditions.

A diagram illustrating how to implement data clean rooms using Snowflake.

Data clean room enables the capability to share datasets between two Snowflake accounts where the provider is the account holder sharing the data and on the other side is the consumer who wants to consume data to perform analysis.

The consumer might want to query the datasets shared by the provider, and the provider shall create Query Templates that set boundaries for the consumers so that their queries stay within the line. These query templates are stored in the Query Template Tables and are also given descriptive names.

The Query Templates consist of substitution parameters where the consumers can choose from a list of available values. These available values are stored in the Available Values Table in the provider’s account.

When a consumer runs queries on the provider’s datasets, they select one Query Template, and one or more Available Values. Next, the Generate Query Request stored procedure merges the selection of a Query Template and Available Values together and stores them as a record in the Query Request Table on the Consumer’s data clean room Database.

Utilizing the Snowflake Stream, Validate Query Stored Procedure reads the Query Request records. Query Requests that are not yet processed are returned by the stream and compares the request with the Query Template and Available Values Table. The Query Requests are only approved if the comparison is right and the acknowledged record is written to the Request Status Table.

These steps enable the connection between the two Snowflake accounts.

Further down the line with Snowflake’s Data Sharing capability, the Query Request Table is shared to the provider’s account. Query Template, Available Values, and Request Status Tables are shared from the provider’s account to the consumer’s account.

With the help of Row Access Policies, the provider can still share their datasets with the consumer and limit the capability of the consumer to execute any query other than approved queries from the Approved Request Table. The results obtained by the consumer can be stored in the Query Results Table on the consumer account.

Data Clean Room Implementation Using Snowflake Between Two Departments Within an Organization

In this scenario, the implementation requires all the above-mentioned functionalities. However, in this case, the provider and consumer can share data using custom roles in Snowflake. The Snowflake account holder is responsible for creating the custom roles, granting privileges, and allocating them to the respective departments. Roles in Snowflake are very flexible in terms of permissions and accessibilities.

Data Clean Room Use Cases

Data clean rooms play a vital role in the advertising industry. With the rise in stricter privacy laws and cookies being killed as a way to collect data, data clean rooms have become handy for many industries to utilize first-party data without accessing its attributes. Some of the used cases in multiple industries are:

Brands Targeting a Precise Group of Consumers

If a brand is trying to showcase its new product to a younger group of consumers with the help of advertising on a popular streaming service, the streaming service can share its customer data without revealing the underlying facts like name, age, and address in a data clean room. The brand can use this data to perform analysis to identify which shows are popular among a particular group of consumers and advertise their new products in between these shows.

To Monitor the Performance of an Organization

If a car dealership wants to analyze its performance in terms of revenue, data from different teams like Sales, Service, and Accessories needs to be brought to a common point which is a data clean room. This will help teams to perform analysis against different teams without exposing the raw state of data to each other.

Closing

So far, we have covered the data clean room in detail. This is one of the important techniques used to share data securely across organizations or departments in an organization. At times, sharing data could be challenging while handling sensitive information. Such cases require expertise.

phData being the Snowflake 2022 partner of the year, has the expertise and skills to help your organization handle today’s complex data. Reach out to us to tackle the toughest obstacles and ease your experience in handling data.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit