May 24, 2024

What is Snowflake Cortex?

By Justin Delisi

Every day, businesses around the world collect more data. Through machine learning and artificial intelligence, we’ve seen how organizations can harness this information to make informed decisions and grow their businesses. However, utilizing these technologies efficiently and effectively can be challenging. That’s why Snowflake Data Cloud created Cortex, an AI service built directly in Snowflake that’s easy to use and understand.

In this blog, we’ll explain Cortex, how its features can be used with simple SQL, and how it can help you make better business decisions.

What is Snowflake Cortex?

Snowflake Cortex is an intelligent, fully-managed service within Snowflake that lets businesses leverage the power of machine learning (ML) and artificial intelligence (AI) directly on their data with minimal ML or AI knowledge. 

Cortex offers pre-built ML functions for tasks like forecasting and anomaly detection and access to industry-leading large language models (LLMs) for working with unstructured text data. 

These functionalities are integrated directly within Snowflake and accessible through the SQL environment, making it extremely easy to gain insights and automate tasks without specialized programming expertise. 

LLM Functions

COMPLETE

COMPLETE simply takes in a prompt (and the model for the prompt to be used) from the user and outputs a response from the model, similarly to other LLMs. However, it is done all within SQL commands. Here is a simple example using the snowflake-arctic model:

EXTRACT_ANSWER

EXTRACT_ANSWER will answer a question based on a text document in plain English or as a string representation of JSON. 

				
					SELECT
    SNOWFLAKE.CORTEX.EXTRACT_ANSWER(
        blog_post,
        'Why does the author think Mace Windu is the best Star Wars character?')
FROM blogs LIMIT 5;
				
			

EMBED_TEXT_768

EMBED_TEXT_768 takes any unstructured data and creates an embedded vector from it. These vectors can then be compared with other applications for similarities. Again, a simple SQL command is used to create the vector:

SUMMARIZE

Another command that feels similar to chatting with other LLM chatbots is SUMMARIZE, which returns a summary of the given content.

				
					SELECT
    SNOWFLAKE.CORTEX.SUMMARIZE(blog_post)
FROM blogs LIMIT 10;
				
			

SENTIMENT

SENTIMENT returns a floating-point number between -1 and 1 based on the text, giving -1 for the most negative text, around 0 for neutral, and 1 for the most positive.

				
					SELECT
    SNOWFLAKE.CORTEX.SENTIMENT(blog_post)
FROM blogs LIMIT 10;
				
			

TRANSLATE

Lastly, TRANSLATE will translate text from one supported language to another. 

ML Functions

Forecasting

Using Cortex, you can train a model with time series data and receive predictions from the model in just a few short SQL commands. In order to produce a forecast, simply:

  • Prepare the data to have:

    • At least one target column that you want the model to make a prediction on

    • A timestamp column with a fixed frequency (daily, weekly, hourly, etc.)

    • Optionally, you can include other columns that may or may not have influenced the changes in the target column

    • This can be a table or a view that is passed as a reference for the model to use

  • Train the model

    • Create a forecast object in Snowflake based on your prepared data:

				
					CREATE SNOWFLAKE.ML.FORECAST phdata_model(
    INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'phdata_view'),
    TIMESTAMP_COLNAME => 'daily_timestamp',
    TARGET_COLNAME => 'sales'
);
				
			
    • Retraining the data can be achieved by recreating the forecast object and should be done at regular intervals to improve accuracy

  • Run a forecast

    • Then, it’s as easy as making a call to a function or procedure to run a forecast to receive a prediction

    • This will receive a prediction for the next two timestamp intervals in our previously trained forecast object:

				
					CALL phdata_model!FORECAST(FORECASTING_PERIODS => 2);
				
			

Anomaly Detection

Another time series function, like Forecast Anomaly Detection, allows you to train a model to find outliers in your data. Detecting and removing outliers from your data can significantly improve the accuracy of any other machine learning models you train on your data. 

To use Anomaly Detection, prepare the data the same way it’s prepared for a Forecast and then create an ANOMALY_DETECTION object with the same parameters as a Forecast seen above. You are then able to use methods with the trained model:

  • !DETECT_ANOMALIES

    • Returns the input data as a table with an extra column showing whether or not the row is an anomaly

  • !EXPLAIN_FEATURE_IMPORTANCE

    • Returns the importance of each feature relative to your target data with a value from 0 being the lowest importance to 1 being the highest

  • !SHOW_EVALUATION_METRICS

    • Returns the cross-validation metrics generated when the model was trained, or you can call it with additional data that was not available at training time and receive metrics based on how well the model predicts that data.

  • !SHOW_TRAINING_LOGS

    • Returns all the training logs from the model

Contribution Explorer

Don’t understand why your data is trending a certain way? Contribution Explorer will analyze your data and determine which data segments are driving trends within your target. This way, you can quickly determine what is driving an unwanted result and take immediate action to fix the problem. This can be an excellent feature, particularly when the dataset has a large number of dimensions.

Contribution Explorer can be used with a function called TOP_INSIGHTS, which takes dimension mappings, the target metric, and a flag to determine whether the data is tested or controlled. It then outputs a table with each contributor and a relative change indicator, which shows how much the contributor positively or negatively affected the target metric.

Features Powered by Cortex

Document AI

Snowflake created Document AI to be able to take in documents (PDFs, Word docs, even handwritten scanned documents) and answer questions about them. Users can now extract key information buried within large documents without any code or ML knowledge required. Simply upload your documents, ask a question, and get the answer!

Universal Search

Using AI, Snowflake has created Universal Search, which takes natural language input from the user and can interpret it to give results not only on the objects in your account but also from Snowflake Marketplace, Snowflake documentation, or Knowledge Base articles.

Snowflake Copilot

Within Cortex, Copilot is an LLM-powered assistant that works alongside your data analysts to help analyze data and build SQL queries. Input into Copilot is always in a natural language and can be asked to analyze structured and unstructured data, build SQL queries, or refine and optimize queries created by humans. 

Closing

As you can see, Cortex makes AI accessible to more businesses and users within your business. By simplifying data analysis, automating tasks, and fostering deeper insights, Cortex equips you to confidently make data-driven decisions and propel your business forward in the age of AI.

Unlock the Power of Cortex with phData

If you’re interested in maximizing the impact of your data with Cortex, phData can help! From implementation to mentorship, our experts can help accelerate your success story with Cortex.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit