Case Study

Luxury Automaker Improves Forecasting Capabilities with AI

The Customer’s Challenge

As a high end auto manufacturer geared up to release its first-ever all-electric model, the North American sales and marketing arm found themselves heavily dependent on a manual, Excel-based approach to sales forecasting. To determine the right order volume for their new electric model — and support sales decisions in the future — the company realized they’d need to move from a sales forecast based on ‘gut feeling’ and manual regression analysis to one supported by centralized data and machine learning.

phData’s Solution

phData brought the forecasting prowess of this automaker up to speed with 2021, utilizing Snowflake’s powerful data platform and a custom Machine Learning (ML) framework to make order demand and sales forecasting for new car models faster, more accurate and user-friendly. With the client’s data centralized in Snowflake, our team engineered the solution to connect the data to AWS (and back again), and introduced novel, trim-level ML forecasting models for more accessible Business Intelligence decisions in the future.

The Full Story

There’s a reason sales forecasting is a critical piece of a company’s financial health: it means better planning, more accurate inventory allocation, and the ability to more quickly course correct.

For a high end auto manufacturer ($34BN+), it meant running manual regression analysis in Excel — at least until they rolled out a new electric model with no historical sales data.

Manual analysis, limited insights

Analysts had built confidence in the automaker’s sales and marketing pipeline by using a standard forecasting model, collaborating in Excel when necessary. But this approach had a number of limiting factors for the company:

  • They needed analysts with specific domain experience to create and fix formulas.
  • The process took months, including multiple teams and stakeholders.
  • Copies of data and multiple data sources led to clutter and confusion.
Most importantly, the company had no historical sales data to accurately forecast sales for its debut electric model.

Centralized data was just the beginning

The client’s needs quickly stacked up — one led to the other until the full scope of the project became clear: 

  • First, they needed a specific way to forecast orders for their new electric model (first of its kind for the brand)
  • That led to the realization that they needed a more efficient and accurate way to forecast in general 
  • But they needed centralized data first (via a data platform), instead of relying on their legacy systems and individual analysts. 

Why phData?

Since the client didn’t already have its data centralized, they needed a partner to own the whole project — from moving to Snowflake to designing and implementing the Machine Learning model.

The client came to phData with all 3 of their needs in mind because we could:

  • Build the foundation: centralized data in Snowflake for better BI and data-informed decisions.
  • Engineer the solution: connect the right pieces of technology to enable powerful data science moves; namely Snowflake, Docker and AWS Batch.
  • Create the model: accurate forecasting for a brand new model of automobile using multiple Machine Learning models with Python and Tensorflow (for deep learning).
  • Implement the analysis: establish the tools/dashboards needed for future forecasting and end users by feeding forecast data back into Snowflake for their BI tools. 

Building the engine

Our data science team knew they were facing a unique problem: creating an ML model that can accurately forecast order demand and sales for new and existing car models, often without the baseline of historical sales data. 

We needed to create an ML framework that could: 

  • Handle time series, seasonal, trend and intermittent data types. 
  • Compare multiple cutting-edge forecasting algorithms simultaneously and dynamically select the most appropriate model.
  • Enable those with domain knowledge to compare models and trim levels. 
  • Solve for the immediate business problem (forecasting for the new electric model) and establish better processes for long-term business efficiency. 
Our data science team decided to create a model that uses data clustering to create a new dataset from similar car models’ historical data, present derived features and feature forecasts from significant features in both this internal dataset and a collection of economic datasets, and allow the team to use this data for the multivariate time-series forecasting framework (see Figure 1).
The framework provides two kinds of outputs: back testing results based on historical data and future forecasts at user-defined time intervals. To account for the lack of historical data for the new model, the framework utilized data clustering to make like-to-like comparisons within the sales patterns of previous models (see Figure 2); this approach allowed for the augmentation of the data for the new model.

The team included advanced functionality within the forecasting system to allow for subject matter experts to use their domain knowledge to further inform the system’s output by identifying analogous sales scenarios.

Best yet, the ML framework adapts to data drift at run time. In other words: the forecasting models are able to continually adapt, re-fit the latest data, and keep improving as more data is collected.

Pedal to the metal

All in all, our data science team created an ML framework that not only accurately forecasted demand for the new electric model, but also set the team up to effectively deal with data drift, incomplete datasets, and allow for the incorporation of additional data in the future.

These integrated forecasting models described above can be run on demand to forecast sales data at both the model and trim level.

Instead of just handing raw data over to the sales team, our engineering team built pipelines to feed the forecasted data back into Snowflake. Now, the client has access to a continuous loop of data for faster, more accurate forecasting and ad-hoc analysis. Our analogous forecasting engine allows domain experts to derive sales forecasts for brand new scenarios using only the data available from potentially similar trims and models.

With Snowflake’s User Defined Functions (UDFs), end-users don’t need to use custom Python to access insights on the command line. Instead, they can hook into relevant data in Snowflake using their existing BI products, searching for any combination of models and trim levels to turn up forecast data.


Our expert data scientists:
  • Built a foundational forecasting platform to address complex data science use cases.
  • Solved the immediate problem for forecasting the new electric model of vehicle without extensive historical data, while building a system that will improve over time and can address multiple types of forecasting problems within the business.
  • Utilized data pipelines built on Snowflake to ensure that the system always uses the latest data for forecasting; by writing the data back to Snowflake, the customer maintains their single source of truth and is able to use their current BI dashboard natively to view model output.
The outcomes included:
  • The sales forecast model that can forecast results 6 months in advance with significantly higher accuracy than current approaches. 
  • In 87% of cases, deviation between the forecast and the observed data was fewer than 10 cars over a 6 month period.
  • The forecasting results at trim level are up to 50% more accurate than traditional forecasting models (such as ARMA or ARIMA), see Figure 3.
  • By utilizing Snowflake as both the source of the input data and the destination for the output data, forecasts can be easily stored and measured against future results to evaluate the accuracy of the system; Snowflake allows the customer’s standard BI tools to be used to directly track lift and overall business ROI. See Figure 4

Take the next step with phData.

Sometimes even the “classic” machine learning problems such as forecasting can have intricacies and gotchas that one might not expect. In this case, the customer had to overcome a severe lack of historical data in order to forecast quantities that had not been measured before.  

Do your business problems qualify as “classic” machine learning that can be solved trivially, or are there special circumstances that will require special adaptations?  More significantly, are you confident that you have a solid understanding of what scenarios would call for more advanced machine learning methods?  

phData’s machine learning experts can help you plan a comprehensive project from day one, evaluate your existing plans and algorithms to help identify possible improvements, and provide full project management and execution capabilities in order to get you started right away.

Talk to an expert today, and make sure you’re getting the highest possible ROI from your machine learning program.

Dependable data products, delivered faster.

Snowflake Onboarding Accelerator

Infrastructure-as-code Accelerator

Snowflake Account Visualization and Auditing

Operational Monitoring and Observability Accelerator

SaaS SQL Translator