Case Study

Pharmacy Benefits Manager Solves Claims Challenge with Spark

The Customer’s Challenge

After three failed attempts, a Pharmacy Benefits Manager serving 27+ million members needed help getting a mission-critical Operational Data Store (ODS) into production. Even with 12+ months and millions of USD sunk in the project, the Spark-based ODS was still too slow, unstable, and prone to data reliability issues to be released to end-users.

phData’s Solution

After diagnosing a number of critical issues, phData reimplemented the data pipeline and Spark jobs in line with platform-specific best practices. The ODS is now ready to serve tens of terabytes of data ingested, processed, and served using Spark — with 1.5 million new and updated claims per day — in a sustainable, reliable, and fully compliant way.

The Full Story

For Pharmacy Benefits Managers (PBMs), centralizing pharmacy claims information is what empowers them to identify fraud, waste, and other potential savings for both pharmacies and insurers. In other words, their data is their value.

So, when one PBM (serving 27+ million members throughout the U.S.) found themselves struggling to get their operational data store (ODS) into production — key to delivering critical claims data to their insurance clients — they realized they needed a solution, and needed it fast.

Diagnosing the problem

The PBM in question had contracted a large global outsourcing firm to stand up their ODS in Cloudera. But after a year and a half into the project — and three failed attempts to get the ODS into production — almost nothing was working.

The data coming in from the source systems was unreliable, with frequent duplicates and missing data, and Spark processes were extremely slow. The PBM still couldn’t deliver the platform to their customers.

Realizing that throwing outsourcers at the problem wasn’t the answer, they decided to bring in a small team of specialists from phData. The phData engineers and solutions architects analyzed the code developed by the PBM’s previous partner,
and uncovered a range of critical issues:

Altogether, these issues — many stemming from lack of technology-specific best practices — explained why PBM
was still unable to get the system into production.

After years of taking a “go-wide approach” to software development — contracting large teams of lower-cost,less-specialized developers — the PBM needed to try something different. It was time to go deep.

The antidote: depth over breadth

phData brought in a small team of data engineering specialists with hands-on knowledge and experience using the specific technologies and processes involved in delivering an ODS on this scale — from Spark, Spark Streaming, and NiFi, to DevOps, agile, and change data capture (CDC).

Although the team was much smaller than the one provided by the PBMs global outsourcer, they were able to do in a matter of months what the big team of generalists had been unable to do in years: get the ODS platform ready to go into production.

phData reimplemented the data pipeline and Spark jobs in a way that included:


The PBM’s ODS is finally ready to serve tens of terabytes of data ingested, processed, and served via Spark. By taking a “go-deep” rather than a “go-wide” approach — bringing in a small, experienced team that actually understood the technologies and design principles at hand — they were able to build, deploy and run the data pipeline that their users depend on, in a sustainable, reliable, and fully compliant way.

Take the next step
with phData.

Learn how phData can help solve your most challenging data analytics and machine learning problems.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit