Case Study

Outdoor Vehicle Manufacturer Drives Value with Machine Learning

The Customer’s Challenge

A large U.S. Outdoor Vehicle Manufacturer was struggling to accurately forecast demand across hundreds of thousands of unique products, many of which lacked historical sales data. To account for complex seasonal factors and sporadic variability, they needed far more than simplistic data extrapolation. They needed modern machine learning.

phData’s Solution

In just three months, phData’s two-person team delivered a modern, end-to-end ML solution that can account for complex variables and dependencies. Now for the first time, the manufacturer can forecast demand with high confidence (even for new products with zero historical sales data) to better support product launches and minimize stocking costs.

The Full Story

After a string of recent acquisitions, a top Outdoor Vehicle Manufacturer (10,000+ employees, revenue of $5+ billion) had rapidly expanded their product catalogue — not just with new vehicles, but with the thousands of parts required to service them as well.

The number of SKUs in their inventory has ballooned with each new acquisition. It wasn’t long before they found themselves with 700,000 unique SKUs for various vehicle parts; and because they guarantee that replacement parts for any given vehicle will remain available for 10 years, the vast majority of those items aren’t going anywhere any time soon.

Naturally, managing such a massive catalogue presents a slew of challenges. One of the largest, as the company found out, was how difficult it became to accurately forecast demand across hundreds of thousands of unique products — particularly when accounting for seasonal or regional factors.

They realized that machine learning (ML) might help supplement their existing forecasting system (which was built on simple extrapolations of historical sales data for a given part) — but weren’t sure exactly how, or where they should start. And so they turned to phData.

ATV Manufacturer Infographic phData

Taking the plunge with machine learning

The two-person phData ML team worked closely with the manufacturer to understand their existing forecasting system, identify potential opportunities, and explore how ML might help them meet their key business goals — namely, improving their ability to predict demand and minimize the costs associated with purchasing and storing parts that turn out to be superfluous.

The price of poor forecasting

phData Case Study Manufacturing

The next step was to review their options, establish the right goals for machine learning, and draw up a workable plan to get there.

The manufacturer’s existing forecasting system could generally provide adequate predictions for spare parts associated with the older vehicles in their catalogue; however, these predictions suffered from notable inconsistencies — especially for newer parts with less historical data to extrapolate from.

Forecasting difficulties included:

The teams using the system lacked visibility into how the forecasts actually worked and what specific input data was underpinning demand projections; it was nearly impossible to actually diagnose what was causing these problems, much less figure out how to solve them. As a result, the customer relied on a small team to create manual forecasts. Due to the size of the catalog, manual forecasting was time consuming and could only be done for a small percentage of the total items.

Meanwhile, because the customer lacked experience with more sophisticated ML models and algorithms, they weren’t sure what might or might not be possible with a more modern forecasting solution. For example, because their existing system was based solely on historical patterns (requiring at least 3 months of existing sales data), they hadn’t considered that forecasting demand tied to new vehicle launches was even a real possibility.

Unearthing novel opportunities

Our ML team walked the manufacturing company through their own process, helping them understand where their biggest pain points really were and what they would need to do to address them. From there, we designed a modern solution for more powerful, reliable demand forecasting. We leveraged Spark to manage the full data engineering and ML lifecycle, and we deployed it on Microsoft Azure.

And we didn’t stop there. While working to overhaul the customer’s existing forecasting, phData ML engineers identified their inability to forecast demand for new vehicle launches as a major gap — one the customer hadn’t realized it was even possible to solve for.

It posed a tricky data science problem. A successful model would need to deliver accurate predictions with zero baseline of historical sales data to extrapolate from — associating new parts with existing parts while correcting historical biases and accounting for the complex web of variables and interdependencies described above.

With the right ML expertise, however, a solution was both possible and eminently feasible — with the potential to deliver massive results.

Inventing novel solutions

Our team realized early on that the problem of forecasting demands for net-new items couldn’t be solved with a single algorithm. For example, using a single clustering algorithm to group similar parts or vehicles together yielded demand trendlines that weren’t predictive, or placed demand at entirely the wrong scale.

To account for the complexities at play, we designed two distinct ML models — one for predicting the average level of demand factors and another for predicting the level of variance across those factors — combining advanced algorithms such as XGBboost, RandomForest, KNN, and LSH.

As a result, the customer now has the ability to forecast demand for new products with high confidence. When testing our new solution using forecasts across 915 items, 82 percent had a Mean Absolute Error of < 0.3.

As the new solution rolls into production, the customer now has the ability to forecast their entire SKU catalog for the very first time — all thanks to not having to rely on manually produced forecasts.

Case Study Chart

Percentage of items by regression error

  • Scale horizontally – Moreover, the algorithms are implemented with pySpark, which enables us to scale horizontally (we can process high volumes of data at very high speed).


Because our ML teams are highly interdisciplinary — bringing both data science skills and data engineering knowhow — we helped ensure the customer’s solution would grow with the customer’s needs. Implementing the algorithms with pySpark enables the solution to scale horizontally, with the ability to process high volumes of data at very high speed.

And best of all? Our two-person ML team managed all this in record time — just three months!

On the art of algorithm arrangement:

“A real solution to a real problem is never just a single algorithm you can pick up from a library in Spark. The art is in how you choose and combine all the right algorithms and models for the job.” — phData ML Engineer

Powerful new predictive powers, in a matter of months

The modern forecasting and ML solution delivered by phData goes beyond improving the company’s existing demand forecasting — unlocking all-new predictive capabilities to better support product launches and further minimize stocking costs.

Business Outcomes:

The next phase will include additional refinement to the model, as well as work to automate the customer’s ML pipelines using DevOps processes and toolchain components. This is expected to further improve forecasts as well as setting a foundation for easily productionalizing new ML pipelines in other parts of the business. But already the payoff has been huge for the customer — improving their ability to predict demand for products new and old, so they can get customers the parts they need, without running up costs stocking warehouses full of superfluous parts.

Take the next step
with phData.

Learn how phData can help solve your most challenging data analytics and machine learning problems.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit