May 25, 2022

Driving Healthcare Analytics Forward with HL7 Data and Snowflake

By John Nowak

This blog was co-written by John Nowak and Troy Fokken

HL7 has always been the main format for healthcare messaging, but it is often the toughest to extract value-rich insights from.

In this blog, we’re going to explore what is HL7 data, and how pairing it with the Snowflake Data Cloud is a method for a successful, data-driven future in healthcare.

What is HL7 Data?

Health Level 7 (HL7) is an international standard for communicating healthcare data.  It is widely used by organizations of all sizes to transmit patient information.

HL7 messages are event-based: when an admit occurs, a lab test is ordered, or a medication is given, a message composed of standard segments that identify the system, patient, visit, and the action gets generated.

These messages are the digital imprint of nearly every interaction for each patient at the point of care.

A screenshot of some sample HL7 Data

Example HL7 message excerpt with segments annotated

The example shown above would be a typical HL7 message for a patient being admitted into the hospital; indicated by the highlighted event type of ‘ADT’, which stands for Admit, Discharge, Transfer.

Analytics Potential and Challenges with HL7 Data

For decades, HL7 has allowed systems of record across vendors and technologies to exchange data. Since messages are transmitted in real-time, HL7 data can also bridge gaps left by a vendor’s own reporting solution, providing more flexibility around when and how data is made available and consumed.

As we’ve described, these messages contain extensive and valuable information from a clinical setting, where all these insights are just waiting to be tapped into for research, analysis, or even joined together with other datasets.

The problem is that an HL7 message is a “semi-structured” data element. This means that the segment and fields sent have specific markers to indicate each one, so it can be read properly by the receiving system, but it is also not easily transformed into a table in a data warehouse in column/row format, either.

Traditionally, working with HL7 typically involves ETL tools, which are both costly to license and time consuming to code and support. Unfortunately, this approach keeps this data out of reach and out of mind for many analytics initiatives.

HL7 Data and Snowflake

Fortunately, there are emerging solutions using the Snowflake Data Cloud, a powerful and extensible platform that can help everyone democratize this important data.

Snowflake provides native streaming capabilities, independently scalable storage and compute, along with the key of support for User-Defined Table Functions within the Snowpark product. These features allow software engineers from a variety of backgrounds to use Java, Scala, or Python to deliver purpose-built data integrations and transformations – all within Snowflake.

What this provides is a means to rapidly take raw HL7 and apply open-source, third-party libraries to the data. There are a number of these across languages that can be used to parse and process HL7 from its native, segmented form and represent the data in JSON format.  

Unlike many other databases, Snowflake provides a VARIANT data type to store and query JSON data. With this technique, HL7 data can actually then be flattened and represented in structured views and tables with ease.

A screenshot of a sample HL7 Data solution in a Snowflake environment

Sample HL7 data solution in a Snowflake environment

Numerous potential possibilities exist by parsing and flattening HL7 data in Snowflake including:

  • HL7 messages could be at the center of a new analytics project not well served just by the data accessible from a vended system
  • The data could be further mapped into FHIR, a more modern API-based, standard format also administered by HL7 
  • Placed into an OMOP Common Data Model along with other clinical data sources.

Allowing HL7 data to finally co-exist with the rest of the enterprise data sources in a format analysts and reporting teams are familiar with can both save time and pay off immediately.


Delivering healthcare-specific business value has never been easier with new features and capabilities supported in the Snowflake Data Cloud. Snowpark and User-Defined Table Functions bring software engineering and open source libraries directly to the data, allowing for agile development, continuous integration and delivery, unit and integration testing, and data quality validation.

phData’s Data Engineering team has extensive experience designing, implementing, and supporting data products of healthcare customers. If you’re interested in a demo of phData’s HL7 solution on Snowflake, please contact us below!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit