July 1, 2024

Upcoming Snowflake Features

By Loc Dao

The recent Snowflake Summit 2024 brought plenty of exciting upcoming features, GA announcements, strategic partnerships, and many more opportunities for customers on the Snowflake AI Data Cloud to innovate. Announcements ranged from partnerships and initiatives with interoperability, advancements in AI, collaboration tools, observability, security, and more.

In this blog, we will highlight some of the most important upcoming features and updates for those who could not attend the events, specifically around AI and developer tools. 

Polaris Catalog and Iceberg Tables General Availability

One of the biggest announcements leading up to Snowflake Summit 2024 was the Polaris Catalog, an open-source catalog for Apache Iceberg. Apache Iceberg is one of the leading open-table formats that enable engines like Snowflake, Spark, Trino, and Flink and the ability to query and manage massive datasets. 

This new feature would enable customers to avoid having to move and copy data for different engines and catalogs and instead interoperate multiple engines on a single copy of data in one place. 

As of June 3rd, 2024, the Polaris Catalog will be open-sourced within 90 days and hosted in either Snowflake-managed infrastructure or other infrastructure of choice. Along with this update, Snowflake also announced during the Summit the General Availability of Iceberg tables.

NVIDIA Partnership and AI Updates

Another major announcement before the Summit was a new collaboration between NVIDIA and Snowflake on customized AI applications. Snowflake plans to integrate NVIDIA’s AI Enterprise software into Snowflake Cortex AI. 

This integration includes NVIDIA’s NeMo Retriever, which would bring performant information retrieval for building RAG-based applications, and Triton Inference Server, which can deploy and run AI inference at scale.

NVIDIA NIM inference microservices pre-built AI containers will also be deployable within Snowflake as a native app in Snowpark Container Services. Furthermore, Snowflake Artic LLM will be available as an NVIDIA NIM so Snowflake customers and partners can begin innovating with AI in seconds.

Snowflake Cortex AI Updates

Generative AI advances rapidly, transforming businesses in many industries with many updates. Likewise, Snowflake Summit 2024 showed no shortage of exciting upcoming features for Snowflake Cortex AI. If you are new to Snowflake Cortex AI, check out this introductory blog. 

Below are several of the upcoming features announced during the Summit:

  • Cortex Analyst: this upcoming feature, built with Meta’s Llama 3 and Mistral Large models, allows business users to chat with their data on Snowflake. Snowflake Copilot, soon-to-be GA, allows technical users to convert questions into SQL. At the same time, Cortex Analysts will be able to provide the answers to business questions.

  • Cortex Search: This feature provides a search solution that Snowflake fully manages from data ingestion, embedding, retrieval, reranking, and generation. Use cases for this feature include needle-in-a-haystack lookups and multi-document synthesis and reasoning. 

  • Cortex Fine-Tuning is another highly impactful feature that will enable customers to fine-tune models for their use cases using a low/no-code interactive interface. Organizations and industries have different needs, and this feature would allow them to customize the models to better address them.

  • Snowflake AI & ML Studio: Snowflake is set to release new capabilities that allow users of all skill levels to try out LLMs in Snowflake and compare outputs from multiple models in a playground-type environment. These features follow the existing capabilities for easily performing Anomaly Detection, Classification, and Forecasting in the UI.

  • Snowflake Cortex Guard: Generative AI has its risks, especially potentially unsafe and harmful responses. To address this, Snowflake plans to launch this new feature to enable organizations to identify and filter out this content.

Check out this blog for a deeper look at these features, phData’s new partnership with LandingAI, and their recently released native app on Snowflake. 

Development Tools and Enhancements

The exciting updates and feature announcements don’t just stop at AI. Snowflake announced many new features that will enhance development and collaboration:

  • Snowflake Notebooks: Currently available in public preview, Snowflake notebooks provide a notebook interface that enables data teams to collaborate with Python and SQL in one place. Furthermore, Snowflake Notebooks can also be run on a schedule.

  • Snowflake Command Line Interface (CLI): A highly anticipated feature, the Snowflake CLI allows data and infrastructure teams to build, manage, and deploy Snowflake infrastructure and objects from the command line. 

  • Snowflake Python API: In addition to the updated CLI, the Snowflake Python API will soon be GA-released and provide teams with another option for managing Snowflake resources and data pipelines via Python.

Code example for creating Snowflake resources with the Snowflake Python API
				
					from snowflake.snowpark import Session
from snowflake.core import Root
from snowflake.core.database import Database
from snowflake.core.schema import Schema
from snowflake.core.table import Table, TableColumn

# Define connection parameters using RSA key pair authentication
connection_params = {
    "account": "YOUR_ACCOUNT_NAME",
    "user": "YOUR_ACCOUNT_USER",
    "private_key": "https://i0.wp.com/www.phdata.io/path/to/your/private_key.pem",
    "warehouse": "YOUR_WAREHOUSE",
    "database": "YOUR_DATABASE",
    "schema": "YOUR_SCHEMA"
}

# Establish a connection to Snowflake
session = Session.builder.configs(connection_params).create()
root = Root(session)

# Create a new database
my_db = root.databases.create(Database(name="my_db"), mode="orreplace")
    
# Create a new schema within the database
my_schema = my_db.schemas.create(Schema(name="my_schema"), mode="orreplace")

# Create a new table within the schema
my_table = Table(
    name="my_table",
    columns=[
        TableColumn(name="ID", datatype="int", nullable=False),
        TableColumn(name="NAME", datatype="string")
    ]
)
root.databases["my_db"].schemas["my_schema"].tables.create(my_table)   

print("Database, schema, and table created successfully.")

				
			
  • Snowpark Pandas API: This feature was recently launched in the public preview. It allows data engineers familiar with Python and Pandas to run their Pandas code in a scalable and distributed manner. Snowflake leveraged the open-source Modin API to make this possible and enable customers to process large datasets using familiar Pandas functionality. 

  • Snowflake Trail: This feature provides a new set of observability capabilities that data and infrastructure teams can use to get enhanced insights and alerts from their applications and data pipelines. Some of these capabilities, which are both in public and private preview, include Snowpark metrics, automatic Python DataFrame tracing, a code profiler for Python, log attributes, and serverless alerts. 

Closing

The upcoming Snowflake features highlighted in this blog are just a subset of what Snowflake has in store. Many more exciting features and updates include AI-powered Object Descriptions, Universal Search, and Sensitive Data Classification with Snowflake Horizon. 

The data landscape is changing rapidly, and organizations must innovate quickly to stay competitive and address new customer demands. At phData, we are committed to staying at the forefront and helping our clients modernize their data strategy, analytics, data engineering, and AI/ML workloads. If you want to discuss how we can help your organization, please contact us today! 

FAQs

Here are some useful resources:

At phData, we have early access to Snowflake features and many other tools and vendors in the ecosystem. Our team has extensive experience building POCs for various clients. Please do not hesitate to reach out if you need help or a consultation. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit