Python for Snowpark Overview

At phData, we have worked quite a bit with Snowpark since it has become available from the Snowflake Data Cloud. Historically, Snowpark has been based on Scala and Java and these languages had to be used when building data transformations or data applications. At the 2022 Snowflake Summit, Snowflake announced that Python is now supported […]

Snowpark Performance Best Practices

Snowpark is a powerful programming abstraction that will run data pipelines or applications inside the Snowflake Data Cloud without moving the data to another product. Snowpark makes it easy to convert code into SQL commands that are executed on Snowflake. This makes creating data transformations and applications easier on Snowflake.  Because Snowflake is a SaaS […]

What is Snowpark — and Why Does it Matter? A phData Perspective

A featured image of the blog with the title displayed in white writing with a red background

This blog was originally written by Keith Smith and updated for 2022 by Nick Goble.  In 2021, the overall amount of data generated in the world was estimated to be around 79 zettabytes. To try and understand how large that is, we’re usually used to talking about things in terabytes. A zettabyte is 1TB * […]

How to Implement Association Rule Mining in Snowpark

Have you ever wondered how recommendations popup as you click on products while shopping online? Think about that “frequently bought together” window you see when you go to check out.  If so, you’re in luck. In this article, we introduce the algorithm commonly used by large e-commerce companies like Amazon to uncover associations between products—known […]

Demand Forecasting Models In Snowpark

Welcome back to our blog series on Snowpark, the latest product from the Snowflake Data Cloud. In this post, we aim to highlight the use of demand forecasting with Snowpark by applying the most popular time series forecasting model (ARIMA), implemented in Java.  We will then use Snowpark to forecast in Snowflake the future demand […]

Snowflake ML: How to do Document Classification with Snowpark

An abstract image of data and vectors

Join us on this technical walkthrough as we determine the practicality of the Snowflake Data Cloud and Snowpark for machine learning use-cases. Document Vectors With the success of word embeddings, it’s understood that entire documents can be represented in a similar way. In this case study, we will build a vector that represents a document […]

A Spark Developer’s Guide to Snowpark

As a Spark developer who uses the Snowflake Data Cloud, you’ve undoubtedly heard all the buzz around Snowpark. Having the ability to execute arbitrary Scala code in your Snowflake compute environment could be game-changing! You begin to wonder how this works in a practical way, and ask yourself the following questions: What is the architecture […]

Machine Learning on Snowflake: Clustering Data with Snowpark

Next up in our blog series on Snowpark, we’ll discuss machine learning basics and K-Means clustering in Snowpark with an example. What is Machine Learning? Machine learning (ML) is established by the evolutionary study of pattern recognition and computational learning theory in artificial intelligence. ML uses algorithms that can learn from and make predictions on […]

Executing Machine Learning Models In Snowpark

Welcome back to our blog series on Snowpark, the latest product from the Snowflake data cloud. In this post, we aim to highlight the use of machine learning with Snowpark by applying the XGBoost algorithm to a dataset using scikit-learn (or sklearn) in Python and export the model to an open format called PMML, the […]

Complete Installation Guide of Snowpark on Linux

In this technical blog post, we’ll walk you through how to install Snowpark using the Linux version, more specifically, the Ubuntu 20.04 x86-64 version of IntelliJ with version 2021.1.1. It should be noted that this installation process is similar on other operating systems. In particular, once IntelliJ is installed, the process should be identical to […]