leading healthcare technology company that predicts customer revenue cycles was in search of a solution that would help them optimize and enhance its current data transformation processes.
Leveraging Snowpark, phData was able to help the client optimize its daily data run time from 20 hours to just 13 minutes. Additionally, phData helped increase the client’s pipeline performance and reduce unnecessary costs and time.
A prominent healthcare technology platform company was in search of a solution to help drastically reduce the run time of its daily data transformations that currently took over 20 hours.
The client’s technology platform is entrusted to take raw data and transform it to predict trends regarding the varying payment cycles of healthcare customers. This service provides healthcare organizations more clarity into their revenue cycles
As an existing user of the Snowflake Data Cloud, the client’s data was stored in the Data Cloud however, they needed additional support in the realm of data transformation and machine learning.
Presently the client’s team of data specialists performed daily data transformations that supported their revenue cycle predictions. These daily runs took in excess of 20 hours to deliver. This timeframe provided very little flexibility to accommodate any data errors or failed runs. If a daily job were to fail after 20 hours, it would be impossible to recover and generate output in time for the next business day.
As a part of the current process, the client copied their data out of Snowflake and into a separate system for transformation. An environment external to Snowflake had historically been a requirement to accommodate the open-source libraries used to generate predictions with machine learning. The transformed data – often just one or two columns – were then loaded back into Snowflake.
The client wished to optimize this process by bringing the code to the data within Snowflake so transformations could take place without the costly and time-consuming data transfers. This is where Snowpark by Snowflake really came to shine.
The client knew that phData had worked extensively with Snowpark since its inception and had a proven track record of helping businesses of all sizes to better collaborate on the same single copy of their data.
phData proposed adding Snowpark to the client’s data pipeline as it integrates seamlessly with Snowflake and removes the lengthy and unnecessary process of transferring data between systems for transformation. The implementation of Snowpark would also support and enhance the client’s predictive models that leverage open-source machine learning tools.
As always, phData would also look for process and software optimizations along the way to help increase the client’s pipeline performance and reduce unnecessary costs and time.
phData began migrating the client’s existing Flask data model to Pandas Series which can be easily moved into Snowpark. Then JSON was utilized to allow for flexible data output without having to use user-defined table functions (UDTFs). Because of the expertise of the phData team of data engineers, the code created for the client was efficient and succinct.
Utilizing a combination of Snowpark and a Python package called cachetools, phData was able to reduce the duration of the client’s daily data run from 20 hours to now just 13 minutes.
Additionally, phData performed an optimization audit that revealed the client’s overall Snowflake credit usage and their minimum credit requirements which allowed for a reduction in their ongoing expenses.
With Snowpark, the client can now perform data transformations 200% faster and with a cost that is 20x less expensive.
Looking into better data options for your organization? Learn how phData can help solve your most challenging problems.
Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.