March 15, 2023

How to Create 3D Basketball Shot Charts With Streamlit and Snowflake

By Lawrence Liu

Traditional 2D shot charts for basketball have been around for decades. In recent years, sports networks have begun to weave 3D shot charts into their programming for some extra flare. At first glance, this extra dimension may seem daunting to implement.

In honor of March Madness, I’ve brushed up on my Algebra II skills and have broken down the math so that anyone can begin to create their own 3D shot chart visualization. In this tutorial, we’ll re-create the shot chart from the 2022 UNC vs Kansas Men’s National Championship game in 3D using Snowpark Python and Streamlit.

3D shot chart visualization

Why Snowpark Python and Streamlit?

The Snowflake Data Cloud ecosystem and Streamlit go hand-in-hand when it comes to making powerful data applications for Python programmers. Streamlit provides an elegant framework that only requires a few lines of Python code to create data applications, while Snowpark Python provides intuitive APIs to run queries on data in Snowflake.

Requirements

This tutorial will be connecting to Snowflake with the Snowpark Python connector, which requires a Python 3.8 environment. To set up your environment along with all the necessary packages, run the following commands: 

				
					conda create --name snowpark --override-channels -c https://repo.anaconda.com/pkgs/snowflake python=3.8 numpy pandas plotly


conda activate snowpark


conda install snowflake-snowpark-python


pip install streamlit 


				
			

The dimensions I used to draw the court lines come from a modutile diagram of the NCAA College Basketball half-court.  

The play-by-play table that contains the (x, y) shot coordinates and the schedule table loaded into Snowflake were retrieved using the sportsdataverse.mbb Python package. The specific columns of data we’ll be using from the play-by-play table are:

play-by-play tabl

The specific columns of data we’ll be using from the schedule table are:

schedule table

Setup

The entire visualization will be contained in one 3D plot. Streamlit has really simple and effective native charts out of the box, while also supporting other python plotting libraries including Plotly, which is the library we will be using today.

To start up a streamlit application, create a main.py file, import the following packages, and add a streamlit call st.title() to add a title.

				
					import streamlit as st
import plotly.express as px
import pandas as pd
from snowflake.snowpark import Session

st.set_page_config(layout="wide")
st.title("UNC vs Kansas Men's Basketball Championship - National Championship 2022 ")


				
			

Typing streamlit run main.py in your command line will start up your application.

Building The Court

To draw the court lines, we will create one DataFrame with x, y, and z columns for the line coordinates, a line_group column to differentiate between separate lines, and a color column that we can use later with a dictionary to color the lines.    

To align ourselves in this 3D scene, the x-axis will be the width of the court, the y-axis will be the length, and the z-axis will be the height dimension. This tutorial will treat each unit as one foot ex:// 0.25 units equals 3 inches.

The easiest lines to map out are the outer perimeter and the half-court line. Knowing that the full court measures 94 ft by 50 ft, you can keep the z-axis (the height) at 0, and create a DataFrame with 5 points, starting from the (0, 0) coordinate, going to each corner, and repeating (0,0) at the end to close the loop. 

				
					width = 50 
length = 94
court_perimeter_bounds = [[0, 0, 0],[width, 0, 0],[width, length, 0],[0, length, 0],[0, 0, 0]]


court_df = pd.DataFrame(court_perimeter_bounds, columns=['x','y','z'])
court_df['line_group'] = 'outside_perimeter'
court_df['color'] = 'court'
				
			

For the half court line, you only need two points as it’s just one straight line.

				
					cohalf_court_bounds = [[0, length/2, 0], [width, length/2, 0]]

half_df = pd.DataFrame(half_court_bounds, columns=['x','y','z'])
half_df['line_group'] = 'half_court'
half_df['color'] = 'court'
				
			

To display a Plotly 3D Line plot on Streamlit, in your main.py file, first create the Plotly figure, passing in the DataFrame of line coordinates and then pass the figure to the Streamlit call st.ploty_chart(). Notice that the lines were created in separate DataFrames, but later combined into court_lines_df.

				
					fig = px.line_3d(
   data_frame=court_lines_df, x='x', y='y', z='z', line_group='line_group', color='color',
   color_discrete_map={
       'court': '#000000',
       'hoop': '#e47041'
   }
)
st.plotly_chart(fig, use_container_width=True)


				
			

Your Streamlit app should now look something like this.

Streamlit app

I’m going to speed-run through the remainder of the court lines so we can get to the fun part of 3D shot paths.

The backboard measures 6 ft wide by 4 ft tall, is offset from the baseline by 3 ft, and is offset from the ground by 9 ft. The center of the hoop is offset from the baseline by 4 ft 3 in and offset from the ground by 10 ft. The coordinates to create the backboards are pretty self-explanatory. To create the hoop, knowing that the hoop has a radius of 9 in, use the equation of a circle:

where (h, k) is the center of the hoop, (25, 4.25) and r is the radius of the hoop, 0.75.

The men’s three-point line is 22 ft and 1.75 in from the center of the hoop but has a straight portion along the sides that is 8 ft and 10.75 in long that is only 21 ft and 8 in from the center of the hoop. To plot out the three-point line, first create one side of the straight portion, next the arc, and lastly the other side of the straight portion. The arc can be calculated using the distance between two points equation: 

Which can be rearranged into the following form:

Where (h, k) is the center of the hoop, d is the distance from the hoop, and (x, y) is every point along the arc of the three-point line, we can loop through every x coordinate along the three-point line to solve for y using the quadratic formula:

Where a=1, b=-2k, and c=k2+x2-2xh+k2-d2.

This was my first time, and probably your first time, using the quadratic formula outside of a classroom setting. With the coordinates of the backboard, hoop, and three-point line calculated, we are ready to graduate to shot paths.

coordinates of the court

The complete code to generate the coordinates of the court can be found here

Creating One Shot Path

Just like how you must learn to walk before you run, you must learn to plot shot paths in 2D before you plot shot paths in 3D. 

We will be using the first three-pointer of the game, made by Obcha Agbaji of Kansas, for this example. 

The shot has the coordinate pair value of (35, 24). This would place the start of the shot on the left half of the court. ESPN shot charts usually have the home team on the right half of the court, so to follow that convention, we’ll need to flip the y-coordinate to 70 (94 – 24) and then adjust the value again by – 4.25 to 65.75 because the y-coordinate in the dataset is in respect to the hoop, which is offset from by baseline by 4 ft and 3 in.

To draw the shot path in 2D, first orient yourself under the left hoop facing the right hoop. Your x-axis remains the width of the court, but your y-axis is now the height. We will assume a perfect parabola that passes through the shot start and the hoop and find the equation of that parabola.

The vertex form of a parabola is: 

Where (h, k) is the vertex. At first glance, you might say: “Wait! We only have 2 known coordinates for x and y and there are 3 unknown variables in a, h, and k, this is not possible to solve!” And to that, I say: “touché”.

However, if we make an educated guess on the peak height of the shot, say k=17, then we can solve for a and h without having to sit around and wait for 3D ball tracking data to become available to the public.

The image below may ease any confusion regarding what we are trying to do.

shots

To solve for h, first refactor the vertex form to equal a:

Next, use our two known (x, y) coordinates from the shot start (x1, y1) and the hoop (x2, y2) and set the two equations against each other:

And then refactor the equation into this monstrosity of a quadratic equation:

The quadratic formula will come to the rescue again to help us solve for h, the x-coordinate of the vertex of our parabola. Now that we’ve solved that h=28.91, we plug all our values back into to vertex form of the parabola to solve the missing unknown, a=-0.46.

Congrats, we’ve now found the equation for our shot path in 2D. If we were to calculate all the (x, y) coordinate pairs along the curve using our equation, and plot it in our scene, we end up with a 2D parabola in 3D space

make our parabola

The last step is to make our parabola move along the y-axis (in 3D space), starting from the start of the shot and ending at the center of the hoop. Given that we will generate 100 coordinates for the shot path. 

First, we’ll find the y-distance between the start of the shot and the hoop, and then calculate the y-offset required so that each coordinate will be one y-offset unit closer to the hoop until the path ends at the hoop.  

				
					shot_start_x = 35
shot_start_y = 65.75
hoop_x = 25
hoop_y = 89.75 
shot_vertex_x = 28.91
shot_vertex_z = 17
num_coords = 100
y_shift = hoop_y - shot_start_y
y_shift_per_coord = y_shift / num_coords

for index, x in enumerate(np.linspace(shot_start_x, hoop_x, num_coords + 1)):

    z = a * (x - shot_vertex_x)**2 + shot_vertex_z
    # in 3D space, the “y-axis” we used to calculate the 2D parabola is the z-axis 
    shot_path_coords.append([index, x, shot_start_y + (y_shift_per_coord * index), z])

shot_path_coordinates_df = pd.DataFrame(shot_path_coords, columns=['shot_coord_index', 'x', 'y', 'z'])


				
			

With all the logic in place and passing the shot_path_coordinates_df DataFrame into our Plotly 3D Line plot, we get one shot path plotted!

shot path

The complete code to generate the shot path coordinates for any shot can be found here.

Adding All of The Shots

To finish up our visualization, we will connect to Snowflake, retrieve all the shot chart data from the championship game, and use the class created in basketballShot.py that encapsulates all the logic to add all of the shot paths into our 3D scene.

Back in the main.py file, we first initiate a connection to our Snowflake account using a Snowpark Session.

				
					@st.cache_resource
def create_session_object():
   connection_parameters = {
      "account": "<ACCOUNT>",
      "user": "<USER>",
      "password": "<PASSWORD>",
      "role": "<ROLE>",
      "warehouse": "<WAREHOUSE>",
      "database": "<DATABASE>",
      "schema": "<SCHEMA>"
   }
   session = Session.builder.configs(connection_parameters).create()
   return session

session = create_session_object()


				
			

Next, we will query the play-by-play data and return the output into a pandas DataFrame.

				
					@st.cache_data
def load_data(query):
   return session.sql(query).to_pandas()
  
play_by_play_query = """
   SELECT  sequence_number,
           coordinate_x,
           coordinate_y,
           team_id,
           text,
           scoring_play,
           case
               when team_id = home_team_id
                   then 'home'
               else 'away'
           end as scoring_team
   FROM    play_by_play
   WHERE   game_id = 401408636 -- national championship game id
   AND     shooting_play
   AND     score_value != 1  -- shot charts typically do not include free throws
"""

game_shots_df = load_data(play_by_play_query)


				
			

Each row in the DataFrame is one attempted shot, so we’ll pass each row into our BasketballShot class that encapsulates all the math logic we went over. 

				
					game_coords_df = pd.DataFrame()
# generate coordinates for shot paths
for index, row in game_shots_df.iterrows():
   shot = BasketballShot(
       shot_start_x=row.COORDINATE_X,
       shot_start_y=row.COORDINATE_Y,
       shot_id=row.SEQUENCE_NUMBER,
       play_description=row.TEXT,
       shot_made=row.SCORING_PLAY,
       team=row.SCORING_TEAM)
   shot_df = shot.get_shot_path_coordinates()
   game_coords_df = pd.concat([game_coords_df, shot_df])


				
			

Lastly, pass the coordinates DataFrame to a Plotly 3D Line plot and append the shot traces to the original court plot.

				
					shot_path_fig = px.line_3d(
   data_frame=game_coords_df,
   x='x',
   y='y',
   z='z',
   line_group='line_id',
   color='team',
   color_discrete_map=color_mapping,
   custom_data=['description']
)
shot_path_fig.update_traces(opacity=0.55)

# add shot line plot to court plot
for i in range(len(shot_path_fig.data)):
   fig.add_trace(shot_path_fig.data[i])


				
			

Ba bam. You now have a 3D shot chart.

3D shot chart.

To take our data application to the next level, we can use our Snowpark to query our schedule table to create a game option using the Streamlit selection widget. Remember to remove the game_id filter from our original play-by-play query filter. Then, use the selected game to simultaneously filter our play-by-play data, the colors corresponding to each school, and the game name.

				
					schedule_query = """
   SELECT  concat(away_display_name_short, ' @ ', home_display_name_short, ' - ', notes_headline) as GAME,
           game_id,
           home_color,
           away_color
   FROM    schedule
   ORDER BY game_id desc
"""

schedule_df = load_data(schedule_query)
play_by_play_df = load_data(play_by_play_query)

# create streamlit single selection option
schedule_options = schedule_df[['GAME','GAME_ID']].set_index('GAME_ID')['GAME'].to_dict()
game_selection = st.sidebar.selectbox('Select Game', schedule_options.keys(), format_func=lambda x:schedule_options[x])

# filter game specific values
game_shots_df = play_by_play_df[(play_by_play_df['GAME_ID'] == game_selection)]
home_color = schedule_df.loc[schedule_df['GAME_ID'] == game_selection]['HOME_COLOR'].item()
away_color = schedule_df.loc[schedule_df['GAME_ID'] == game_selection]['AWAY_COLOR'].item()
game_text = schedule_options[game_selection]
st.title(game_text)


				
			

With only one additional Streamlit call, we now have an interactive app that can create the 3D shot chart of any game available in your dataset.

styling

This tutorial did not include all of the code required including styling; the complete project code can be seen here.

Conclusion

Through this tutorial, we were able to kill three birds with one stone. We combined a favorite past-time of many in college basketball with the Snowflake ecosystem, while also finally finding a use for the quadratic formula. 

In retrospect, 3D shot charts are not revolutionary, they’re not going to change the basketball landscape. Even so, this was fun to make and that’s sometimes all you can hope for. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit