May 31, 2024

Prompt Programming: A Novel Approach to Prompt Engineering with Stanford’s DSPy

By Garrett Springer

Visualizing data through natural language queries has become a crucial need for businesses aiming to gain insights quickly and efficiently. Previously, phData showcased a solution utilizing ChatGPT, Snowflake, and dbt to allow users to write natural language queries, which were then converted into SQL and Python to automatically query, summarize, and visualize data. This approach, while effective, can be improved using higher quality LLM prompts with Stanford’s DSPy–a framework for systematically optimizing prompts.

In this blog, we will demonstrate an improved solution by using DSPy and integrating Snowflake Cortex–a fully managed Generative AI service that integrates directly with your data housed in the Snowflake AI Data Cloud.

Motivation for DSPy

In our previous implementation, we used OpenAI’s ChatGPT to convert natural language queries into SQL to obtain and summarize the data returned. To get ChatGPT to produce the correct response, it required a lot of prompt engineering to get a consistent response and syntactically correct SQL statements. This approach works, but it can be improved by using better prompts.

DSPy often produces better prompts than human-written ones. This is not because DSPy is more creative but because it can generate, evaluate, and iterate on prompts far more efficiently than a human can. 

Why Snowflake Cortex?

Snowflake Cortex is an intelligent, fully managed service offering machine learning and AI solutions natively within Snowflake. By using Cortex, we can generate SQL queries and Python visualization code without the need for external APIs, keeping all data within the secure Snowflake perimeter.

Advantages of Snowflake Cortex

  • Simplified Architecture: Removing the need for an external LLM simplifies the overall solution.

  • Enhanced Security: Keeping all data within Snowflake eliminates the risk associated with sending data over external networks.

Optimizing LLM Behavior through Programming – Not Prompting

Crafting high-quality prompts is essential for creating consistent and reliable performance from LLM applications. Traditional prompt engineering can be tedious and time-consuming, which is why Stanford created DSPy, a framework to systematically optimize prompts.

What is DSPy?

DSPy replaces the guess-and-check prompt engineering process with a framework that constructs and iteratively optimizes prompts using algorithms that converge on the desired behavior. DSPy is analogous to neural networks, using labeled data (LLM input/output) and optimization algorithms to iterate and improve the solution. By providing a few examples of expected input/output and a metric function to score the LLM’s responses, DSPy can produce high-quality prompts.

Let’s walk through the code to show how we used DSPy to generate a prompt for constructing Streamlit graphs using queried data.

  1. Define Behavior: Specify the high-level task we want the LLM to perform. In our case, the LLM will receive data returned from a SQL query and output code to produce a Streamlit graph.

				
					
class GeneratePythonVisualizationCode(dspy.Signature):
    """Generate python code that produces streamlit visualizations of the input data"""
    input_data = dspy.InputField(desc='Input data to use for creating a streamlit visualization')
    visualization_code = dspy.OutputField(desc='Valid python code that creates a graphic visualization with the streamlit lib')

predict_obj = dspy.Predict(signature=GeneratePythonVisualizationCode)
				
			

DSPy uses a Signature to define the expected input and output behavior. This is a semantic description used to help construct the prompts. The signature is then used to create a Predict object which contains all the configurations to execute an LLM (such as Snowflake Cortex or ChatGPT).

  1. Provide Examples: Offer examples to help guide the LLM. These are typically constructed by hand–we only needed to create one to start seeing improved results.

				
					example_1 = dspy.Example(
    input_data="""
                YEAR: 2020 TOTAL_SALES: 251381.25
                YEAR: 2021 TOTAL_SALES: 293456.50
                YEAR: 2022 TOTAL_SALES: 315079.50
    """,
    visualization_code="""
        import streamlit as st
        import pandas as pd
        
        year = [2020, 2021, 2022]
        sales = [251381.25, 293456.50, 315079.50]
        
        df = pd.DataFrame({'YEAR': year, 'TOTAL_SALES': sales})
        
        st.line_chart(df)
    """
).with_inputs('input_data')
				
			

The example shows the input data returned from a SQL query and the Python code used to construct a graph.

  1. Metric Function: Create a metric function to score the LLM responses. This is used for DSPy to gauge the quality of its responses when optimizing the prompt.

				
					
def visualization_metric(visualization_code):
    try:
        # Try to compile the code to check for syntax errors
        compiled_code = compile(visualization_code, '<string>', 'exec')
        # Try to execute the compiled code to check for runtime errors
        exec(compiled_code)
        print('Code ran successfully')
        score = 1.0
    except Exception:
        print(f'Encountered error in code.')
        score = 0.0
    return score
				
			

Our metric function is very simple and only checks if the generated code runs without errors. Metric functions can be much more sophisticated to provide a more granular score to the DSPy.

  1. Optimization: Now that all the setup is finished, DSPy is ready to construct prompts! We used a DSPy optimizer to explore and refine prompts automatically.

				
					teleprompter = COPRO(
    metric=visualization_metric
)

optimized_predict_obj = teleprompter.compile(
                            predict_obj,
                            trainset=training_examples
                        )
				
			

The code above compiles the training examples, metric function, and Predict object to construct an optimized prompt. We applied the COPRO optimize, which fine-tunes the prompt using an iterative algorithm that generates candidate prompts and scores them based on performance using a provided example.

Tip: Be mindful of token usage and cost when running the optimization process. DSPy is repeatedly invoking an LLM service in this example, which can quickly rack up a bill. We recommend setting a hard limit on the number of iterations and tokens produced in the COPRO settings.

COPRO Algorithm

Final Product

In the video below, we walk through the full working solution using the optimized prompt from DSPy and Snowflake Cortex.

Conclusion

In this blog, we demonstrated how to improve the Talk to Your Data application by integrating Snowflake Cortex and constructing prompts with DSPy. DSPy removes the burden of manually tweaking prompts. This systematic approach saves time and provides a modular framework for prompt construction. 

If you’re interested in taking your data insights to the next level, schedule a consultation with one of our architects at phData to explore how we can empower your data and AI journey.

Lastly, we’re offering a series of free Generative AI workshops aimed at exploring the potential of AI for businesses like yours. Regardless of what stage you are on your AI journey, these workshops will help steer your business in the right direction.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit