case study

IoT on snowflake

1.

the customer:

A leading manufacturer of mining and earth-moving equipment sought to increase top line revenue through new products and services, including smart-connected equipment and post-purchase proactive maintenance services. To accomplish this, they needed to transform their existing sensor-based analytics platform into a more efficient, centralized IoT data solution.

2.

THE CHALLENGE THEY FACED:

The manufacturer knew they wanted to take advantage of the latest cloud-native technologies. But they needed help choosing those technologies, executing a successful migration from their existing Hadoop solution, and ensuring the new solution could handle the high volume of IoT data transmitted daily from their equipment sensors.

3.

HOW WE HELPED:

phData designed and built a new cloud-native solution for IoT, based on Snowflake, as well as Spark, Kafka, and Microsoft Azure. phData’s used a Cloud 2.0 approach for both data engineering and cloud infrastructure. This meant the customer had automated infrastructure provisioning using infrastructure-as-code, CI/CD for automated deployment, and an architecture that supported dynamic scale and fault tolerance. Then, to validate the solution’s in-production viability, they helped the manufacturer successfully migrate their application from Hadoop to Snowflake.

4.

WHAT WE GOT DONE:

The manufacturer has transformed what started out as a small web application into a unified IoT data store, analytics, and visualization platform — architected and optimized by phData to maximize the value of the cloud.
  • Production-tested foundation for enterprise IoT, built on Snowflake
  • Cloud-native efficiencies and simplified management
  • More unified data for improved collaboration
  • Simplified security with Azure AD
IoT to Snowflake by the numbers:
  • Daily IoT data...
  • 8-10 billion sensor records
  • 2 million alarm and event records
  • 1 million KPI-derived values
  • Historical data intake...
  • 40.8 TB
  • 127 tables
  • 3.8 trillion rows

Full story: Industrial Manufacturer Extracts Big Efficiencies for Big Equipment, with IoT on Snowflake

A leading manufacturer of earth-moving equipment, including construction, mining, and forestry equipment, has increasingly come to rely on sensor data to understand how their machines are performing.

Most machines they make — from excavators and front-end loaders to subsurface mining equipment and drills — include sensors to track a variety of indicators like hydraulic pressure, engine RPMs engine, oil temperature, and wheel speed. This Internet of Things (IoT) data not only allows them to predict when individual machines require maintenance, but also how they might help customers operate their products more efficiently (by analyzing operational cycles and patterns).

Because many of these machines may stay running 24×7 — and because such large machines often require similarly large outlays in capital — these insights provide enormous business value. This materialized by increasing top line revenue through new products and services, including smart-connected equipment and post-purchase proactive maintenance services.

After several previous iterations, the manufacturer had been using a Hadoop-based solution to process, store, and analyze all their sensor data. However, maintaining the platform required their small analytics team to spend more time administering the cluster than getting value from the data they collected, in addition, the static resource allocation model meant they could not scale dynamically and their compute costs were increasing.

As a result, they decided to explore how they might take advantage of the latest cloud-native services and data technologies to streamline systems management and improve efficiency, while simultaneously consolidating their siloed data sources.

Mountains of sensor data

To meet their goals and justify the costs of moving to a new platform, the manufacturer would need to design a modern, cloud-based data analytics solution; they also needed to ensure this solution could intake their existing data from Hadoop, and handle the high volume of new IoT data being pushed daily from their equipment. Key challenges:

  • Designing and validating the right solution architecture — The manufacturer knew they wanted to move off of Hadoop and take advantage of cloud-native data technologies; however, they were less sure about which of those technologies were right for the job, how they should be optimized, and how they demonstrate the feasibility of the new solution.
  • Moving mountains of IoT data — With sensors generating thousands of data points per minute, per individual machine, the revamped solution would need to handle billions of sensor records per day. And with 40+TB of data to ingest, migration from their existing Hadoop-based solution was bound to be a complex challenge.
  • Unifying disparate systems and data — To handle the volume and heterogeneity of sensor data from all their different equipment (often transmitted from highly remote locations with poor internet connectivity), the manufacturer was parceling the data into files and uploading them once a minute. Accordingly, the new solution would need to incorporate their existing proprietary API, then somehow convert these files into a consistent and usable format. It would also need to serve as a central repository to help tear down corporate data silos and unify the multitude of existing systems of record.

The manufacturing corporation realized they were out of their depths. They needed help understanding how to best move on from their Hadoop-based solution and take advantage of the latest cloud-native technologies. But their own data science and analytics team was tied up as it was just keeping the current analytics systems up and running. Seeking a partner with proven data engineering expertise, they turned to phData.

Digging deeper with a Cloud 2.0 Architecture

The phData team worked closely with the manufacturer’s analytics team to understand both their existing Hadoop-based platform and their goals for overhauling it, then provided technology recommendations and support they needed to successfully transform it.

To deliver the required improvements in efficiency, maintainability, and data accessibility, phData designed a new architecture around Snowflake. They leveraged both the right mix of cloud-native services and data technologies (such as Spark and Kafka for data processing, and Microsoft Azure and Kubernetes for infrastructure and orchestration) as well as the right “Cloud 2.0” design and deployment practices (such as taking containerized, “infrastructure-as-code” approach to deploy the Kafka Connector using Azure Kubernetes Service) to make the most of those technologies. Finally, they proved the viability of the new solution by helping to successfully migrate one of the manufacturer’s large applications from Hadoop to Snowflake.

The data files generated once per minute by the sensors are now uploaded to Azure blob storage via a proprietary REST API; then, these hundreds of millions of small files are processed and normalized by Spark before being transmitted to Snowflake via the Spark-Snowflake connector.

Once in Snowflake, the data is consolidated and enhanced, every two minutes, via a series of tables and schemas designed to flatten data structures and introduce new data columns that provide more ways to break down the data.

The final result? A common data warehouse that’s easily accessible via Power BI dashboards.

Striking paydirt with IoT on Snowflake

Thanks to the solution architecture design and migration support from phData, the manufacturing corporation has transformed what started out as a small Microsoft SQL Server-based web application into a unified IoT data store, analytics, and visualization platform — one with the potential to now support the entire business:
  • A proven foundation for IoT — By successfully executing a large Hadoop-to-Snowflake migration, phData helped prove the value and viability of the new Snowflake-based solution; as a result, the platform is already seeing surging adoption across more and more equipment types and product lines, and has garnered additional funding from corporate leadership.
  • 8 billion IoT data points daily — The solution processes billions of sensor records on a daily basis, coming in from mining, construction, and other industrial equipment all around the world:
  • Dynamic Scale Using Cloud — Snowflake makes it easy to “right-size” warehouses to the use case at hand. For example, to write all 8-10 billion daily sensor records to a persistent table, they can spin up a single 4X-Large warehouse to complete the job in minutes, for the same cost it would take using much slower smaller clusters). Also, by making the most of cloud infrastructure technologies like Microsoft Azure and Azure Kubernetes Service, the solution maximizes utilization and keeps costs at a minimum.
  • More unified data, more collaboration — Data sets previously stored across multiple silos are now consolidated in Snowflake, with new data being added on a regular basis; this makes it much easier to share intelligence between groups, organizations, and customers.
  • Simplified security and access control —Thanks to Snowflake’s Azure AD integration, the manufacturer can extend user access and manage identities and permissions with ease, in a secure fashion, with Single Sign On (SSO) for internal teams and external customers alike.

Results

The manufacturing corporation now has a proven path to further break down data silos and migrate more large applications from Hadoop to the modern Snowflake-based solution architected by phData. Before long, they’ll be able to empower customers using equipment across their entire product portfolio with all the improved efficiencies of an IoT data and analytics platform built from the ground up with phData’s Cloud 2.0 approach.

Ready to learn more about phdata? Let's chat.