For teamwork to work effectively, it requires the orchestration of many different roles all executing individual tasks. Sports teams typically structure this hierarchy via a manager (coach), players, and the fanbase.
The manager studies the opponent, sets the lineup, and communicates the strategy to the players. The players then execute the coach’s strategy, deploying varying levels of complex movements between each of the different skill positions. And finally, fans are the ever-present entity that watches managers and players come and go, providing true passion, tradition, and emotional support.
Like sports teams, data teams require similar roles to be defined to ensure data products and development are clearly defined. Architects are similar to the manager, shaping the course of the platform. DBA’s and Developers assume separate and different complex tasks, similar to players. Finally, analysts are like the fans, helping shape the narrative back to the business.
To help empower data teams, many organizations are increasingly choosing the Snowflake Data Cloud to drive their data decisions and better understand how their users are going to interact with data.
In this post, we will explore Snowflake role-based access controls and describe an information architecture we call Source-Product (allows for a well-structured data catalog) and expand on a scalable RBAC model that builds on top of our previous blog on organizing your Snowflake role hierarchy.
What Is Information Architecture?
If you are unfamiliar with the term Information Architecture, the Wikipedia definition is rather good,
“Information architecture (IA) is the structural design of shared information environments…”
In short, your IA is how you structure databases, schemas, warehouses, and environments (dev, qa, prod, etc.) so that your user community understands where data lives.
In A Functional Approach For Snowflake’s Role-Based Access Controls, Craig Warman does a fantastic job explaining why you should consider dividing roles into what he terms as “Functional Roles” and “Access Roles.” In our experience, this Snowflake best practice is critical to scaling an RBAC model.
This allows you to clearly define who does what job and what data access each role has. Standardizing and simplifying this process allows everyone to understand a common framework that benefits from minimizing the burden of administration.
Typically the first Snowflake project brings in a few data sources. While these data sources can be mapped directly to Snowflake with the first names that come to mind, we see successful customers taking some time to lay these out in an IA that will allow further growth.
Thoughtful IA is important because we find that even the most tactically focused Snowflake customer is going to find additional datasets they want to bring in after their first successful project.
Oftentimes this takes the form of repenting for “sins of the past” (i.e. technical debt), access databases, underutilized data marts, partially integrated data warehouses from acquisitions, etc.
At other times, it’s something more exciting; A new IoT dataset or bringing a long-sought-after dataset into the fold, perhaps one that was “too large” for prior data warehouses.
In any case, these data sources we are bringing in the Snowflake environment are simply known as “sources.”
Sources are prefixed, typically by “S_” in Snowflake.
As simple as that is, it’s important because we want to differentiate from them inside of Snowflake to ensure that data doesn’t flow into them. These sources represent a snapshot from the “source of truth” outside of Snowflake and we don’t want to intermix that external truth with new insights that are generated inside of Snowflake.
A Product (sometimes called a “Project”) is a new insight that is created inside Snowflake. Products are derived on top of sources. They are prefixed, typically by “P_” in Snowflake.
To be clear, existing reports will point at the source inside Snowflake directly. This is what happens in say a Oracle Exadata to Snowflake migration where the goal is to eliminate the Exadata post-migration. (This is extremely common!)
Products are something new.
In a common IT use case, you might import your device management data, your vulnerability data, and your ticketing system data (all three datasets are sources) to identify high-risk users. The resulting data set would be a product.
Another common use case is taking CRM data along with customer support data, both sources, to generate a propensity to churn score nightly. The score is a product.
Similar to Craig’s structure, we also break up the RBAC structure into components. His concept of “Functional Roles” maps directly to our concept of “Functional Roles.” What is described as “Access Roles” we break into two parts, “Access Roles” and “Data Roles.”
Craig said it best:
“… Functional Roles. These are roles that map to business functions within the organization, such as those for HR Specialists or Financial Analysts, or Managers. Naturally, such roles are typically assigned to users based on their job functions.”
Functional roles are prefixed, typically by “FR_” in Snowflake.
A Data Role defines the access level to a specific database or schema. Data Roles can be used for granting access to other databases. The default Source-Product template defines three access levels.
- RW: Grants the product read write access to the data source.
- RO: Grants the product read only access to all schemas in the source database.
SHARED: Grants the product read only access to the SHARED schema in the source databases.
“Access Roles to which access privileges have been granted for specific Snowflake account objects.”
As discussed above, we break Craig’s “Access Roles” into “Data Roles” and “Access Roles.”
In our template, Access Roles grant access to other Snowflake objects to a functional role. The most common Access Role is granting warehouse access to functional roles. Access Roles are also used to grant access to storage integrations or task operator.
In short, Access Roles in this context provide access to account-level objects (warehouses, storage integrations, and tasks).
Please note, the templates below can be customized to fit specific customer requirements.
Here we have a full Source example:
Does this feel complex?
Yes, manually implementing this would be a challenge (it is tens of thousands of lines of SQL in our internal environment). We honestly never do that. We have two tools, which we give to our customers at no additional cost. This helps them reduce the complexity, automate the process, and help us understand the deployment.
- Tram automates the deployment and enforcement of Information Architectures and RBAC.
- Traverse takes information about your Snowflake account and turns it into a graph that is used to provide an easy-to-use, navigable representation of your Snowflake account.
An easy first step is to sign up for Traverse, for free, and inspect your Snowflake account.