September 3, 2022

How to Make Effective Scatter Charts in Sigma

By Will Strouse

Like a baseball scout finding a big left-handed reliever with a 100mph fastball and a disappearing sinker, I love looking for outliers in data. Scatter Charts are a great chart type for understanding outliers using multiple performance metrics and Sigma Computing is an ideal platform to visualize them.

All-Star players are not the only outliers in baseball. Having recently attended a game with my teammates at phData, I realized the importance of concession stand placement and menu options.

Fans are unlikely to go up or down to a different level from where their seat is, but still expect a wide selection of goods. Concessions are a huge part of the fan experience and a significant source of revenue for the stadium. Understanding which vendors and items are most popular with the fans can help organizations meet the needs of their fans and maximize profit.

In this article, I will guide you through three examples of scatter plots from a sample data set included in your Sigma start-up sample connection (MLB_STADIUM_SALES) that will help you make more effective scatter charts in Sigma.

But first, we need to combine information from three different tables.

The Data

A sample data set from Sigma Computing

F_POINT_OF_SALES has rows of sales transactions within the stadium at the product level. Each row has an order number, keys to identify the stand and product, and some pre-aggregated values such as the unit cost, unit price, and quantity.

D_STAND is a lookup table giving the stand type and stadium level.

D_STADIUM_PRODUCT is a lookup table giving the product type, product line, and unit price.

Because I want multiple fields from D_STAND and D_STADIUM_PRODUCT right in my row level POS data, I am using a join with the Product and Stand keys. Lookups are great for single field additions between two tables.

A screenshot from Sigma that shows a number of data points

I am only interested in food and beverage sales, so this Product Type control element excludes Apparel.

Example 1: By Stand Type Profit Profile Scatter Plot

My first scatter chart will be to investigate the sales and profitability by stand type. I list out the key elements of the scatter plot configuration before each how-to steps for reference. It is important to understand the row level aggregation of your source data to build a new scatter chart. Whatever field or combination of fields you want each circle in the scatter to represent needs to be reflected in the source table grouping(s).

Final Scatter Chart Configuration:

  • Circles: Stand Type
  • X Axis: Total Revenue
  • Y Axis: Total Cogs
  • Color: Profitability %
  • Size: Profit $
A screenshot that lists a number of columns with pertinent information for the example.

1. Create a child table from F_POINT_OF_SALES + 2 and rename it “Child of F_POINT_OF_SALES + 2 for Stand Scatter Plot Example”

  • Group on [Stand Type]
  • In the calculations add,
    • [Sum of Revenue]: Sum([Revenue])
    • [Sum of Cogs]: Sum([Cogs])
    • [Sum of Profit $]: Sum([Profit $]) 

2. Create a child data visualization; select scatter chart then rename it “Stand Profit Scatter”:

A screenshot from Sigma that's titled, "Stand Profit Scatter"
  • Create a new column, [Stand Profit %]: (Sum([Sum of Revenue]) – Sum([Sum of Cogs])) / Sum([Sum of Revenue]). (We already have Profit % in the data, but we want to make sure it is being calculated at the same level of detail (Stand) as the scatter.)
  • Drag [Sum of Revenue] onto the X axis
  • Drag [Sum of Cogs] onto the Y axis
  • Drag [Sum of Profit] onto Size
A small pop-up menu that is titled, "MARKS"
  • Set the color to by scale, add [Stand Profit %], then select a color range.
Another pop up menu titled, "MARKS" that has a drop-down menu titled, "By scale"
  • Drag [Stand Type] into the tool tip, then edit the calculation to First([Stand Type])
Another pop-up from the Marks navigator tab with a drop-down that's titled, "Select Column"
A scatter chart in Sigma with several data points.

After adding a pair of dashed reference lines to show the average Cogs and Revenue by Stand, a key outlier and two clear groups are visualized. In the tooltips, we can see that…

  • High cost / low profit stands in blue
  • Low cost / high profit in dark orange
  • The light orange outlier with the highest revenue of all stands; the Beer Stand.

Example 2: By Product Type and Line Profit Profile Scatter Chart w/ Parameter Grouping Calculation

The second scatter plot shows sales and profitability by product type, and a drill down to the product line. You could also use parameter controls to switch the X and Y axis values, adding even more layers of analysis into a single chart.

Circles: Product Types OR Product Lines

X Axis: Total Revenue

Y Axis: Total Cogs

Color: Profitability %

Size: Profit $

A grouping of data from Sigma that has several columns of data points

1. Create a child table from F_POINT_OF_SALES + 2 and rename it “Child of F_POINT_OF_SALES + 2 for Product Scatter Plot Example”

2. Create a List Values Control Element (This will allow the user to flip between different groupings in a single chart)

A List Values Controls Element screenshot from Sigma
  • Select Parameter
  • Select List in the Control Element Dropdown
  • Value Type Text
  • For values, add “By Type” and “By Line”
  • Deselect the Show Null Option
  • Rename “LOD Parameter”

3. Go back to the child table and create a new column [LOD Selection]: If([LOD-Parameter] = “By Type”, [Product Type], [Product Line])

  • Group by [LOD Parameter]
  • Group by [Product Type]
  • In the calculations add,
    • [Sum of Revenue]: Sum([Revenue])
    • [Sum of Cogs]: Sum([Cogs])
    • [Sum of Profit] $: Sum([Profit $])

4. Create a child data visualization; select scatter chart then rename it “Product Profit Scatter”

A screenshot titled, "Product Profit Scatter"
  • Create a new column, [Product Profit %]: (Sum([Sum of Revenue]) – Sum([Sum of Cogs])) / Sum([Sum of Revenue])
  • Drag [Sum of Revenue] onto the X axis
  • Drag [Sum of Cogs] onto the Y axis
  • Drag [Sum of Profit] onto Size
A screenshot titled, "MARKS"
  • Set the color to by scale, add [Product Profit %], then select a color range. 
A screenshot titled, "MARKS" with a drop-down menu named "By scale"
  • [LOD Selection] into the tool tip, then edit the calculation to First([LOD Selection]), then rename the field to “Product Type / Line”
A screenshot titled, "MARKS"

Using the parameter control, I can see the profit profile for the Product Types or Lines with a single click. In Types, Premium Food is a high revenue, high cost Product Type with Chicken Tenders as a significant outlier. 

A scatter chart in Sigma with 5 data points illustrating the profit of the example scenario.
Another scatter chart with more data points in Sigma.

Using the parameter control, I can see the profit profile for the Product Types or Lines with a single click. In Types, Premium Food is a high revenue, high cost Product Type.

Another option is to use categorical colors. By applying Product type to color, the user can clearly see what Type grouping each Line is in at the Product Line level of detail.

  1. Change the color type from Scale to Category
  2. Drag [Product Type to Color
  3. Select a color palette
A scatter chart titled, "Product Profit Scatter W Color on Product Type"

Example 3: Detailed Product/Stand Scatter Plot

Lastly, I want to combine these two concepts to find the most profitable product at each stand, then trellis the scatter by stadium level. At this level, we can see true outliers in our most detailed data groups.

Circles: Product Lines Sold at the Stand ID Level

X Axis: Total Revenue

Y Axis: Total Cogs

Color: Profitability %

Size: Profit $

Several columns of POS data

1. Create a child table from F_POINT_OF_SALES + 2 and rename “F_POINT_OF_SALES + 2 for Final Scatter”

  • Group by [Stand Key]
  • Group by [Product Key]
  • Group by [Stadium Level]
  • Group by [Stand Type]
  • Group by [Product Type]
  • Group by [Product Line]
  • In the calculations add,
    • [Sum of Revenue]: Sum([Revenue])
    • [Sum of Cogs]: Sum([Cogs])
    • [Sum of Profit $]: abs(Sum([Profit $]))

2. Create a child data visualization; select scatter chart then rename it “Stand ID, Product ID Outliers Scatter”

A pop-up menu titled, "Stand ID, Product ID Outliers"
  • Create a new column,[ Profit %]: (Sum([Sum of Revenue]) – Sum([Sum of Cogs])) / Sum([Sum of Revenue])
  • Drag [Sum of Revenue] onto the X axis
  • Drag [Sum of Cogs]] onto the Y axis
  • Drag [Sum of Profit onto Size
A screenshot titled, "MARKS"
  • Set the color to by scale, add [Profit %], then select a color range.
A screenshot titled, "MARKS"
  • Drag [Stand Key], [Stand Type], [Product Type], and[ Product Line] into the tool tip and edit the default aggregation from count() to first().
  • Trellis the rows by Stadium Level, and be sure to add a descending custom sort so level 300 is on top. 

For more cool trellis charts, check out Katrina Johnson’s blog on How to Create A Trellis Chart in Sigma.

 The control elements on Revenue, Cogs, Profit $ and Profit % allow the user to zero in on the top and bottom performance outliers. The large, dark orange circles with the highest profit $’s and % are all in the Liquor Product Type. The light blue outliers with the lowest profit % are all premium foods, with cheeseburgers on level 100 and “Astro Poke’s (hotdogs)” minimally profitable on all levels.

A final scatter chart titled, "Stand ID, Product ID Outliers Scatter"

In Closing

In these three scatter charts, we found that the Beer Stand was driving the most revenue and is relatively profitable, increasing profit margins on chicken tenders and other premium foods would have a huge impact, and no matter what stand or stadium level fans are on, alcoholic beverages generate the most profit.

Interested in uncovering actionable insights from your business data using Sigma Computing? As Sigma’s largest consulting partner, phData helps businesses of all sizes succeed with Sigma. Whether you work for the MLB or a mid-sized accounting firm, phData is here to help! 

FAQs

Aside from detecting outliers, what other analytical concepts are best visualized using a scatter plot?

Scatter charts are great for visualizing deviation, distribution, magnitude, and correlation using multiple values. Where I would not use scatter charts in Sigma is when the goal is to describe a process flow, display rankings, show part-to-whole, or highlight change over time (connected scatter plots are not currently available in Sigma).

How do I add reference lines to my scatter chart?

Reference lines are an option in the Element Format window (the paintbrush on the left toolbar). From there, add a new REFERENCE MARK and set the configuration to use a line or band, select the axis and measure, line format, and label preferences.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit