Part of moving to new technologies is understanding what it is going to cost to get started?
Traditionally the estimation process had many parts but moving between technologies or projects had the same basic process. Determine how much hardware costs, electricity, man hours, licensing, etc.
Add these up and you have the estimated amount of costs for the year that a technology would cost you regardless of how much it was being used.
The cloud has flipped this on its head and allowed you as a consumer to pay for what you use.
But what does that actually mean?
If you don’t use a product, you will not pay for it, but if you use a product A LOT, then you will pay what appears to be a lot of money for it.
This is amplified when you pay for what you use A LOT and are doing it inefficiently. Inefficiency doesn’t scale and when you do it A LOT you pay for it each time.
This continuous cost fear is normal and every customer we have has gone through this process.
In fact, phData has gone through this internally and we are continually looking at how our internal use of the cloud and, subsequently, Snowflake are being used for development and driving business forward.
The reason this is difficult is because there aren’t clear guidelines/guardrails on how to ensure your 12 month budget isn’t gone in 3 months.
In practice, we see customers solve this problem by setting reasonable constraints around their warehouses, building dashboards to pinpoint issues, and constantly re-evaluating the way data is being used and consumed across the platform.
Let’s unpack exactly what this means and help you understand where we see customers struggle.
A Breakdown of Snowflake Costs
- Estimate the number of warehouses by size that are required
- Estimate the amount of compute hours that each warehouse will use by size
- Estimate the storage size of your data
- Determine the features required to determine the Snowflake account level required
|Snowflake Warehouse Sizes and Credit Usage per Hour
|Credits Per Hour||1||2||4||8||16||32||64||128|
Once all of the above are determined, it is pretty straightforward to provide an estimate based on cost per credit and number of credits that will be consumed.
BUT there is more to consider…
- Are you going to be using any “serverless” Snowflake functions?
- Are you going to be using Materialized Views?
- Are you going to be using Snowpipe?
- Are you going to be using UDFs or external functions?
Why do each of the above matter?
While tracking SQL query usage and credit consumption is relatively straightforward via query history, the functions listed above are not straightforward and often difficult to track. As a new customer, you will not know about this and as an experienced customer this may be a black box for you.
When you do blow out your budget in 3 months (or are on track to), how can you pinpoint the areas that need to improve? Can you point to bad (long-running) SQL? Is it tied to your ingest process? Is there a business unit that is consuming far more credits than necessary? Are developers letting queries run over the weekend?
These are all things that customers are afraid of when moving to Snowflake and can be difficult for an organization to wrap their heads around because they are afraid that they don’t have the necessary controls to reign in the spending when the promise is on demand data for anyone and everyone across the organization.
In practice, phData recommends combining a well defined Information Architecture with metadata tagging and integrating with reports that tie into SQL history/credit analysis. This lets us give users insight into the exact worries they have about overspending with a simple dashboard where they can take actionable steps to mitigate issues before they become avalanches.
Common Snowflake Cost Questions
The most common inefficiencies that we experience typically boil down to design standards. A well designed architecture means raw data gets enriched and standardized through multiple stages. As standards are defined this data should be persisted to a physical table instead of joining views together or creating join columns via subselected queries. This can cause unnecessary table scans and longer run times.
The most common problem our customers have is ingesting data into Snowflake efficiently. This can manifest itself in multiple forms, file sizes, file counts, file speed, file types; and they all lead to increased credit consumption. This problem can typically be solved via tweaks to the ingestion process but sometimes do require tooling that compliments the Snowflake data cloud and help it meet broader enterprise needs.