Case Study

Top North American Grocery Company Leans on phData for Snowflake Cost-Saving Optimizations

Snowflake

Customer's Challenge

A North American grocery giant was one year into utilizing Snowflake and Snowpipe for their combined data warehousing, analytics, and streaming needs and noticed an exponential increase in their data processing costs. They knew they needed advisory assistance to optimize their processes for improved efficiency and cost-effectiveness.

phData's Solution

phData began their 75-day relationship with the client by analyzing the landscape of their current data processing needs. phData quickly advised on three key areas that the client could address to optimize their processes significantly: data archiving, data streaming frequency, and cluster key usage.

Results

Based solely on the data archiving recommendation, phData estimates the client’s cost associated with running one table would reduce from $9,000/year to $300/year.

Reducing data streaming frequency from every second to every five minutes would reduce the customer’s annual costs from $448,000/year to an estimated $23,000/year. 

Lastly, phData compared the time and cost associated with two of the client’s frequently-used queries to their time and cost estimations once the recommendations were implemented and found significant improvements that were noted.

Query 1
Speed Cost/year
Original 69 minutes $67,000
Optimized 7 minutes $7,000
Query 2
Speed Cost/year
Original 118 minutes $38,000
Optimized 30 minutes $9,000

The Full Story

The customer, understanding the importance of data science in maintaining their top two positions in North American grocery companies, migrated to Snowflake for data warehousing and analysis the year prior. They were also utilizing Snowpipe for data streaming but knew – based on their exponential cost increases – that there had to be additional optimizations and efficiencies they hadn’t yet implemented.

The client had been working with Snowflake to optimize their processes but also partnered with phData – based on a recommendation from Snowflake – to gain an impartial, third-party expert that could fast-track the project.

During the 75-day process, the phData team discovered the customer’s data processing needs and reviewed their current Snowflake and Snowpipe processes to look for simple changes they could implement that would significantly improve their outcomes.

In addition to some configuration optimizations to Snowpipe, phData noted three key areas of optimization that the client could easily implement that would cut compute times and subsequent costs: 

phData Blue Shield
phData Blue Shield

Why phData?

The client had been working with Snowflake to troubleshoot their cost concerns, and when Snowflake recommended phData to help, the customer jumped on the chance for phData to help them fast-track their cost optimizations.

phData's Recommendations

When data processing is more clean, efficient, and speedier, associated costs decrease significantly:

Archive historical data older than two months

The client had been querying historical data for their real-time reporting needs, which delayed compute times. phData advised that the customer begin archiving historical data and only querying data from the past two months for their real-time data reporting needs.

Reduce data streaming frequency from every second to every five minutes

The client had been running small batches of data through Snowpipe on the second to have access to real-time data reporting consistently. phData recommended that the client minimize their data streaming frequency through Snowpipe every five minutes to increase their reporting speeds. 

Adjust cluster key usage to aid in data pruning

Unlike traditional database indexes where cluster keys are used on high-cardinality columns, Snowflake data processing speed is improved when cluster keys are used on low-cardinality columns. phData also recommended not using cluster keys on small tables (i.e., three terabytes).

Take the next step
with phData.

Learn how phData can help solve your most challenging data analytics and machine learning problems.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit