March 1, 2024

How to Rapidly Find Patient Cohorts for Clinical Research with Sigma Computing & Snowflake

By Mitch Lee

Creating and analyzing patient groups for modern clinical research is more challenging than ever. With more competition for patients and higher rates of patients not following instructions, clinical trials often take longer than before, delaying important treatments to patients.

Academic institutions, clinical research organizations (CROs), drug developers, and others doing clinical research need tools to find and evaluate patient groups faster. This is especially important as more healthcare data is available than ever.

In this blog, we will show you how phData, Sigma Computing, and the Snowflake Data Cloud have partnered to create a Clinical Cohort Creation Accelerator that combines Snowflake’s scalable computing power with Sigma’s user-friendly data modeling and visualization capabilities to quickly identify and evaluate patient cohorts based on sophisticated inclusion criteria and geospatial trial requirements.

What is the Clinical Cohort Creation Accelerator?

With our accelerator, clinical analysts and coordinators can use a simple interface to quickly filter through millions of patients based on their demographic and clinical data to find those that match specific inclusion criteria for a clinical trial.

Users can also explore where to best host a trial by selecting possible locations, matching potential participants to those locations by distance, and exploring risk for non-adherence and loss to follow up based on that distance and other predictors.

The final output is a list of locations and participants that, combined, best suit a clinical trial’s requirements.

Best of all, it all happens in one place and takes only minutes or less.

Key Benefits of The Accelerator

  1. Speed to Insight: Snowflake’s massive computing power compares billions of records from millions of patients against sophisticated inclusion and geospatial requirements to identify trial candidates in seconds.
  2. No Data Movement: The data never leaves Snowflake and is only directly queried by Sigma, improving performance and minimizing data security risks.
  3. No Code: Thanks to Sigma’s intuitive point-and-click interface, users don’t have to write code to set inclusion criteria.
  4. Customizable: Customize this solution to meet an institution’s particular needs regarding clinical research.
  5. Access to Machine Learning: Users can access machine learning models (and other custom functions) written in Python (and other languages) that are trained, saved, and applied on Snowflake’s Snowpark – all directly through the tool in Sigma.
  6. Direct User Data Input: Users can add comments, notes, and other information directly through the solution in Sigma that is saved with the data on Snowflake for later reference.
  7. Geospatial Analysis & Mapping: Users can visualize the distribution of potential candidates relative to trial locations, and can evaluate other geospatial considerations.
  8. Full Visualization Capabilities: Our accelerator can include a wide range of visualizations that allow users to visually segment and analyze created cohorts to explore and find relevant patterns in patient traits

Why Sigma Computing & Snowflake?

Snowflake’s speed and scalability make it ideal for storing and processing the goliath volumes of data that exist in the healthcare and clinical research space.

Its robust set of text processing and user-defined functions also enable it to robustly handle unstructured text fields that pervade clinical data, while its native machine learning capabilities through Snowpark allow for easy application of models to categorize patients and predict crucial trial outcomes like non-adherence and loss to follow up.

Sigma Computing makes organizing, displaying, and understanding clinical information easy with its intuitive interface, clear tables & visualizations, and convenient suite of data exploration options – all operating at Snowflake speed.

Sigma’s ability to seamlessly call many of Snowflake’s powerful capabilities, like Snowpark ML models, also makes it an ideal platform to empower users in their search for cohorts.

Who Would Benefit Most from this Accelerator?

This accelerator is ideal for organizations who struggle to keep pace with their patient cohort creation demands, especially because their clinical analysts and coordinators spend too much time:

  • Switching environments to:
    • Search for patients
    • Apply predictive modeling
    • Conduct geospatial analysis
    • Visualize and analyze their cohorts
  • Writing code to query for qualifying patients
  • Waiting for data returns

With this accelerator, all of these steps can be completed seamlessly in one place in seconds.

How the Clinical Cohort Creation Accelerator Works

Before we go into how this solution works, it’s important to cover how the data is structured. 

The data for this accelerator is structured as a flattened table, stored on Snowflake, that can involve billions of rows and contains demographic and clinical data from millions of patients. Conveniently, this data source can be created by structuring and materializing a Sigma dataset using Sigma’s easy point-and-click interface. 

A great benefit of this approach is that the developer working in Sigma to create the dataset does not need Snowflake credentials. That said, the creation of the flattened table could be pushed upstream of Sigma into Snowflake (with the opportunity to employ data modeling software like dbt).

This accelerator works through the following steps:

  1. The user selects patient inclusion criteria and potential trial locations.
  2. The accelerator then:
    • Identifies patients who match the criteria.
    • Finds the closest potential trial location for each of those patients.
    • Applies a Python machine learning model – trained and saved on Snowflake through Snowpark – to predict each patient’s risk of dropout based on their distance and other traits*
    • Presents a list of potential participants to the user based on the above steps.
  3. The user confirms which potential participants returned from step 2 to qualify for the trial.
    • The user can review the risk of dropout and detailed clinical records for potential participants at this step to assist with confirmation of qualification for trial
    • The user can also add notes and comments on patients directly through the tool. That information is saved permanently with the data.
  4. The user finalizes the cohort by selecting which potential locations, and their corresponding closest patients, to use for the trial.
    • The user has the opportunity here to examine the geographic distribution of patients around potential locations to adjust selections before finalizing the cohort.

The Final Product

The final output of the accelerator is a roster of validated participants and locations, along with visualizations that empower users to segment and analyze traits of the cohort. Users could conveniently materialize that final roster back to Snowflake as a callable table or export it locally.

Accelerator in Action

Below is a demonstration of how to create a cohort for a (pretend) clinical trial of a new drug to prevent coronary artery stent failure in women.

Step 1: Set the Following Patient Inclusion Criteria:

  • Age: 40 – 60
  • Gender: Female
  • Race: All
  • Ethnicity: All
  • Current has coronary artery stent (for any amount of time)
  • Never had a stroke
  • Currently taking Simvastatin (for any amount of time)
  • Not allergic to lisinopril

Step 2: Select Potential Trial Locations

Step 3: Confirm Participant Eligibility

Step 4: Finalize Locations & Cohort

Step 5) Explore and Go


This accelerator makes finding cohorts for clinical research a rapid, one-stop process. With it, clinical analysts and coordinators will spend less time searching for who they need and more time getting trials up and running to develop new treatments and deliver them to the patients who need them.

Interested in expediting your clinical research journey with our accelerator?

We’ve got you covered. Contact us today for a demo.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit