case study

Data Streaming & IoT

1.

the customer:

A large U.S. Oil & Gas (O&G) company seeking to minimize risks to onsite workers and equipment by getting more out of their sensor-based safety monitoring system.

2.

THE CHALLENGE THEY FACED:

Adding capabilities like automated alerting and historical forecasting would require building a new data streaming architecture that could ingest sensor data from the O&G-standard WITSML form factor while meeting very high performance and uptime requirements.

3.

HOW WE HELPED:

The phData team architected an Internet of Things (IoT) solution to deliver both real-time and historical data into production, using Streamsets and Kudu to handle the sensor data coming in from the field — building custom software to handle data ingestion from WITSML — and HDFS for long-term storage. They implemented a lambda architecture between Kudu and HDFS, providing a unifying Impala view to query both hot and cold datasets.

4.

WHAT WE GOT DONE:

The O&G’s company’s safety monitoring system now goes far beyond simple dashboards to support automated anomaly detection and alerting, and historical forecasting, empowering them to act more decisively to protect their people, their capital investments, and their brand.
  • Automated alerts for reliable detection
  • Historical data analysis for future worker safety
  • High fault tolerance for high confidence
  • Better performance for better accuracy - Streaming (~30 days): 2 billion records - Historical (20 years): 300 billion records

Case Study: Fortune 500 O&G Company Taps Data Streaming and IoT to Fuel Best-in-Class Safety Monitoring System

In the unpredictable world of oil and gas (O&G), the safety of onsite workers is paramount; and with all the variables at play in the extraction process, successful operators know to expect the unexpected. That’s what drives one Fortune 500 O&G company based in the U.S. to do whatever they can to improve their safety monitoring systems.

Their drilling rigs are equipped with a variety of sensors that transmit status readings such as well pressure, flow rate, and temperature to a third-party data vendor. This allows the O&G company’s operations team to monitor status indicators — looking for aberrations that might suggest any potential danger to personnel or equipment — via dashboards on their vendor’s web application.

However, because the company was unable to build additional automation via the vendor tooling, they decided to build a more robust solution that went beyond basic dashboards to actually flag anomalies and send out urgent alerts automatically, in order to minimize risk in the event of an incident. In addition, their data scientists sought a long-term solution to store the sensor data coming in, in order to power historical analyses and predictive analytics to help them preempt future accidents even before they happened.

Complexities of real-time data streaming for O&G

To support monitoring and forecasting across all their drilling sites, the O&G company would need to build out a new data streaming framework that could process the high volume of data coming from their IoT sensors in real time — all while ensuring a consistent, unified view across their internal teams, their third-party data vendor, and their ecosystem of smaller contractors and subcontractors.

Challenges included:

  • A gusher of data volume — The solution needed to process a massive volume and frequency of IoT data from dozens (often hundreds) of wells very day, each of which generates sensor values every single second. To stream that kind of data in real-time, architecture design, technology selection, and performance tuning would all be paramount.
  • O&G industry complexities — The O&G world necessarily involves integrating industry-specific technologies, parsing jargon, and wrangling different subcontractors with differing standards for metric naming and taxonomy. For example, the solution would need to ingest sensor data from the idiosyncratic WitsML form factor used in O&G, which isn’t natively supported by most streaming platforms like Streamsets.
  • High fault-tolerance requirements — Because drill sites are often in remote locations — typically lacking in server and network infrastructure to transmit this sensor data (more often, it might be coming from a laptop) — it would be critical for the solution to seamlessly account for any potential disruption, without losing data or disrupting visibility in such a way that risks a preventable disaster.
Although the O&G company did have a Cloudera data platform in place, they lacked specific data streaming expertise they needed around technologies like Streamsets and Kudu. Realizing they were out of their depths, they turned to phData.

Digging deeper with IoT

The phData team designed a solution architecture to deliver both real-time and historical data into production, using Streamsets and Kudu to handle the sensor data coming in from the field.
StreamSets to Kudu to Spotfire architecture diagram
  • Hot-to-cold data — phData chose Apache Kudu for storing hot data and HDFS for long-term storage, with Apache Spark, Impala and Kudu all running on Cloudera CDH. They implemented a lambda architecture between Kudu and HDFS for cold data, and a unifying Impala view to query both hot and cold datasets.
  • Data ingestion — phData built a custom StreamSets origin to read the sensor data from the O&G industry’s standard WitsML format, in order to support both real-time alerting and future analytics processing.
  • Designed for fault tolerance — To account for the high likelihood of downtime, the solution includes two Streamsets and two Cloudera pipelines, as well as buffering messages in Kafka, to minimize data loss if one system goes down.
  • Tuned for performance — To process the enormous volume of IoT data coming from all those sites, the team used their deep knowledge of Kudu, HDFS and Streamsets to tune the solution for optimal performance at every layer.

As a result, the company’s monitoring systems and dashboards can continuously query the Kudu table where this data was stored to compare it against the historical data now being compiled by their data science team.

Safer extraction through smarter technology

With the solution designed by phData, the O&G’s company’s safety monitoring system now goes far beyond simple dashboards to include automated anomaly detection and alerting, as well as the foundation for future forecasting and historical analysis.
Architecture Diagram phData Monitoring and Streaming Data
  • Automated alerts for reliable detection — The ability to automatically trigger safety alerts when an anomaly is detected at a drill site is a boon for worker safety, safeguarding against human error and protecting workers when monitoring teams may be short-handed.
  • Historical data analysis for future worker safety — By rolling cold sensor data into long-term HDFS storage, the O&G’s data science team can now create all-new historical forecasting and analysis capabilities, such as identifying sensor patterns that may indicate a given piece of equipment is likely to begin failing.
  • High fault tolerance for high confidence — With high-redundancy built into their solution architecture, the O&G company can have faith in their safety monitoring systems.
  • Better performance for better accuracy — Supporting a higher frequency of data gives the O&G operations team a more fine-grained view into the status of drilling sites, getting the most out of their sensors’ capabilities to help ensure that no anomaly goes undetected.

Results

With their new automated alerting and historical forecasting capabilities — made possible by the data streaming and IoT solution delivered by phData — the company has vastly improved ability to know what’s going on at their drilling sites. And in the unpredictable world of O&G exploration and drilling, knowledge is power. Detecting issues faster empowers them to act more decisively to protect their people, their capital investments, and their brand.

Ready to learn more about phdata? Let's chat.