What is the Snowflake Data Cloud and How Much Does it Cost?

The Snowflake Data Cloud was unveiled in 2020 as the next iteration of Snowflake’s journey to simplify how organizations interact with their data. The Data Cloud applies technology to solve data problems that exist with every customer, namely; availability, performance, and access. Simplifying how everyone interacts with their data lowers the barrier to entry by providing a consistent experience anywhere around the globe. 

The primary objective of this idea is to democratize data and make it transparent by breaking down data silos that cause friction when solving business problems.

In addition to breaking down internal data silos, Snowflake unlocks the ability to break down external data silos that accelerate partnerships and efficiency via data sharing and data exchange.

What components make up the Snowflake Data Cloud?

The Snowflake Data Cloud is new terminology but breaking down each of the components allows us to understand the complete data solution:

  • Cloud Data Warehouse
    • Compute isolation
    • Connect existing tools and reports
    • Be the center of your Business Intelligence Strategy
  • Cloud Data Lake
    • Centralized repository to store any type of data
      • Structured
      • Unstructured
  • Data Engineering
    • Build reliable data pipelines with Snowflake automation
      • Streams
      • Tasks
      • Snowpipe
  • Data Science
    • Prepare, standardize, and serve data for building models
      • Feature Store
      • Experiment and Coefficient History
  • Data Applications
    • Availability of data and compute is taken care of
    • Access and Store your data anywhere across clouds
  • Data Exchange and Sharing
    • Access external datasets like they were your own, without actually having to move or ingest the data
    • Share your data in or outside the business with security guaranteed

What is a cloud data warehouse?

A cloud data warehouse is designed to combine a concept that every organization knows, namely a data warehouse, and optimizes the components of it, for the cloud.

As an example, an IT team could easily take the knowledge of database deployment from on-premises and deploy the same solution in the cloud on an always-running virtual machine.

This is “lift-and-shift”, while it works, it doesn’t take full advantage of the cloud. For example, most data warehouse workloads peak during certain times, say during business hours. Lift-and-shift models mean you continue to pay for computing resources even when they are not being used. Additionally, it suffers from the same heavy operational and performance burdens as on-premises offerings, namely contention from multiple users on the system as well as requiring operating system and disk maintenance.

Since the cloud offers the promise of elasticity the ideal solution is:

  • Scale automatically, regardless of usage, with minimal contention
  • There is no hardware (virtual or physical) to select, install, configure, or manage
  • There is virtually no software to install, configure, or manage
  • Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake

To enable this vision, Snowflake modernized the architecture by offering the following:

  • Single source for all data regardless of data type
  • Available worldwide
    • Available on any cloud
  • Separate compute and storage
  • Elastically scale compute
  • Elastically scale storage
  • Usage-based pricing model
  • Always available metadata
    • Snowflake maintains your metadata while listening for work to do
  • Always available compute
    • Snowflake maintains a pool of instances ready to serve your queries

What is a Data Warehouse?

Without belaboring this topic, the definition is well defined in great length on Wikipedia, but for the purpose of this discussion, a Data Warehouse is generally a place to store, process, and analyze structured data.

What is a Data Lake?

Similarly, Wikipedia defines a Data Lake as a location to store raw data that is in any format that an organization may produce or collect. Effectively this is a way to store the source of truth and build (or rebuild) your downstream data products (including data warehouses) from it.

What is the difference between a Data Lake and a Data Warehouse? 

Historically, there were big differences. If you go back to 2014, data warehouse platforms were built using legacy architectures that had drawbacks when it came to cost, scale, and flexibility. The data lake platforms, meanwhile, were built using more modern architectures and featured easier scale and lower cost. As a result, if you had lots of data (which can often happen with raw or unstructured data), you’d typically go with the data lake platform.

Today, data lakes and data warehouses are colliding. Snowflake was founded on the capability to handle data warehouse workloads in the cloud. Since this has been wildly successful, they have also been able to tackle the task of processing unstructured data natively on the platform as well.

When using a platform like Snowflake, there is effectively no difference between a data lake and a data warehouse. With the Snowflake variant data type, 90% of Data Lake use cases can be solved natively. This removes the barrier that file-based data lakes present because you no longer have to worry about file formats or compression and can instead focus on data access and management.

Transition to the Data Cloud

With multiple ways to interact with your company’s data, Snowflake has built a common access point that handles data lake access, data warehouse access, and data sharing access into one protocol. 

This is why we believe that the traditional definitions of data management will change where the platform will be able to handle each type of data requirement natively.

What kinds of workloads does Snowflake handle?

Snowflake has many types of workloads that were highlighted in the components section.

All data workloads follow a common pattern and Snowflake helps simplify these with solutions tailored for each task:

Looking for more information on data pipeline best practices? This Snowflake article will walk you through continuous data loading, change data tracking, recurring tasks, and more!

How much does Snowflake cost?

This is one of the most common questions we encounter because switching from a fixed cost pricing model to a usage-based pricing model can cause significant heartburn. We have thoughts on how to control and estimate costs with Snowflake but here we aim to give you a rough cost estimate.

Since every situation is different we are going to provide estimates and ranges based on data size and quantity.

Before diving into the actual costs, it is worth noting that Snowflake is also beneficial for customers that are smaller than our “Small” bucket and larger than our “Large” bucket. These t-shirt size buckets were selected because they are the typical sizes we see customers initially choose when engaging phData but there are many Snowflake customers that start out much smaller than 5TB worth of data with lower spend.

On the flip side, we have seen “Large” customers mature and require larger spend due to security, data localization, or even data replicated across different clouds. Each of these can increase the cost significantly, especially as you scale out beyond your initial Snowflake instance or require higher-tier features.

How do Snowflake costs change over time?

All Snowflake customers begin their journey with an empty environment. This means that before a customer can start realizing the benefits that the Snowflake platform offers the task of ingesting data from source systems as well as training and onboarding developers is required.

With this in mind, there is a typical learning curve (or ramp-up time) that is required to use any product and Snowflake is no different. We typically see early work begin as handwritten code and shift to automation to help scale migrations, transformations, and user onboarding.

We also observe that the typical customer will see their computing costs change over time. This is unique to Snowflake because it enables everyone to see which part of the process consumes the most cloud credits, which in turn allows us to see where the money is being spent.

Since ingest pipelines are 1-to-1 and reporting/analytics are 1-to-many we witness a shift in spending as customers mature:

Snowflake Cost Changes Over Time

It is phData’s perspective that the data culture inside an organization matures by building increasingly valuable assets inside a data platform that deliver results to the end-users (analysts, BI, reports, DS, etc). As this happens, spend shifts from ELT to analytics.

This is why successful customers spend proportionally more on analytic workloads because they are in such high demand and driving business objectives.

Sizing Cost of Snowflake

Snowflake bills per second.

Snowflake also acts as a serverless compute layer, where the virtual warehouses being used to do work can be turned on or off many times over the course of the day. To simplify this discussion and smooth out assumptions across a longer time period, we typically estimate how many hours a day that a virtual warehouse cluster is required to be on, which is why the following section will state hourly rates.

Additionally, since Snowflake offers the unique ability to track costs for each step of the data lifecycle we are able to better understand what type of computing requirements a customer will have and plug that into our calculations.

Small

Customers in this range typically spend between $25k-$75k.

To get to this amount we make the following assumptions:

  • 10-20 analytics users
  • 10-20 ELT pipelines
  • Under 5 TB of data
  • Most work is done by analysts during business hours
 

Here is a more concrete example from one of our customers:

  • Small and Large-sized warehouses to perform ELT work
  • Medium-sized warehouses for analytics
Cluster Size Cluster Amount Hours Credits per Hour Credit Price Total Cost per Hour
Small 3 1 2 3 $18
Medium 2 4 4 3 $96
Large 1 1 8 3 $24
Storage Size Storage Cost Compute Cost Total Cost
5 TB $1,380 $50,370 $51,750

Medium

Customers in this range typically spend between $100k-$200k.

To get to this amount we make the following assumptions:

  • 30-50 analytics users
  • 30-50 ELT pipelines
  • Under 50 TB of data
  • Most work is done by analysts during business hours
 

Here is a more concrete example from one of our customers:

  • Small and Large-sized warehouses to perform ELT work
  • Medium-sized warehouses for analytics
Cluster Size Cluster Amount Hours Credits per Hour Credit Price Total Cost per Hour
Small 5 2 2 3 $60
Medium 4 6 4 3 $288
Large 2 2 8 3 $96
Storage Size Storage Cost Compute Cost Total Cost
50 TB $13,800 $162,060 $175,860

Large

Customers in this range typically spend between $300k-$500k.

To get to this amount we make the following assumptions:

  • 100+ analytics users
  • 100s – 1000s of ELT pipelines
  • Under 100+ TBs of data
  • Work being done around the clock
 

Here is a more concrete example from one of our customers:

  • Small, Medium, and Large-sized warehouses to perform ELT work
    • Ensure workloads are right-sized
  • Medium and Large-sized warehouses for analytics
Cluster Size Cluster Amount Hours Credits per Hour Credit Price Total Cost per Hour
Small 10 2 2 3 $120
Medium 10 6 4 3 $720
Large 5 2 8 3 $240
Storage Size Storage Cost Compute Cost Total Cost
200 TB $55,200 $394,200 $449,400
Of course, we have seen larger Snowflake customers that span multiple clouds with complex backup strategies, these certainly require larger budgets, but these are typically more mature customers that have been working with Snowflake for multiple years.

Why does this matter?

Data and demand for information have been increasing exponentially since the dawn of the information age.

Snowflake has been able to get in front of and understand the unique challenges that this growth and demand has presented to businesses across the globe.

Enabling access to data, when you want it, where you want it without delay is a core principle to their success.

We hope that distilling each of these terms and paradigm shifts helps educate you on your journey to solve your business objectives.

If you are interested in learning more….we have a discovery session, data strategy session, whitepaper, automation, services, etc.

Ready for the next step in your Snowflake journey?

Our “Getting Started with Snowflake” guide will walk you through everything you need to know to effectively implement Snowflake. 
Get the Guide
Share on linkedin
Share on twitter
Share on facebook
Share on email

Table of Contents

Dependable data products, delivered faster.

Snowflake Onboarding Accelerator

Infrastructure-as-code Accelerator

Snowflake Account Visualization and Auditing

Operational Monitoring and Observability Accelerator

SaaS SQL Translator