July 21, 2021

How to Create Network Visualizations in Tableau

By Alex Johnson

Network visualizations display relationships between elements by linking nodes with common characteristics. This allows the user to visualize clusters quickly and determine relationships. Unfortunately, the development process in Tableau isn’t as straightforward. To get the data into a Tableau-friendly format for network visualization, we’ll first do some data preparation then bring it into Gephi to produce spatial coordinates.

This example is expanding on the blog published by The Data Surfers. I’ll be showing the steps in both Python and Excel. I prefer Python, but either (or a tool of your choice) should produce the same result. 

Prepare a Dataset Capable of Creating Network Visualizations

To accomplish this, you’ll need a dataset capable of showing a network of relationships. In this example, I’ll be using Harry Potter screen time statistics gathered from IMDB. This allows us to link each actor to the movies they appeared in. I’ll be referring to this as the main dataset.

If you’d like to follow along with the same data, do it! I even cleaned it up a bit for you; thank me later.   

To prepare the data for Gephi, we need to create two CSV files: nodes and edges.

Prepare Data for Gephi: Create a CSV Nodes File

The nodes file will contain three columns:

  • ID: a unique identifier for each label
  • Nodes: a unique identifier to create each node
  • Label: a list of all unique values of the two columns you’re visualizing (in my case: Actors and Movies)

Python

# 1. Read in the main dataset

import pandas as pd

import numpy as np

main = pd.read_csv(yourfilename)

# 2.  Take a unique list of the two network columns (Actor and Movie)

movies = main.Movie.unique()

actors = main.Actor.unique()

# 3. Concatenate the two list into one array

label = np.concatenate([movies, actors])

# 4. Create the nodes dataframe from the label array

nodes = pd.DataFrame(label, columns = [“Label”])

# 5. Add the ID and Nodes unique identifier columns

nodes[“ID”] = range(1, 1 + len(nodes))

nodes[“Nodes”] = range(1, 1 + len(nodes))

# 6. Export the nodes dataframe as a CSV

nodes.to_csv(‘nodes.csv’, index = False)

Excel 

  1. Create a new sheet named nodes
  2. Create three columns “ID,” “Nodes,” and “Label”
  3. Copy the Movies column from main and paste it under “Label”
  4. Copy the Actors column from main and paste it under “Label” below the Movies values
  5. Select the entire “Label” column and remove duplicates
    • Data – Data Tools – Remove Duplicates
  6.  Create a unique ID and Node for every value in “Label” 
    • Make sure the “ID” and “Node” columns are identical
  7. Save the sheet as a CSV
  1.  

Prepare Data for Gephi: Create a CSV Edges File

Before we attack the edges file, let’s add two columns to the main dataset: “Source” and “Target.” We’ll use the nodes file to get the unique ID of both the Actor and Movie labels. The “Source” value will contain the Movie ID and “Target” the Actor ID.

Python

# use the map() function to add the columns with data from nodes

main[“Source”] = main.Movie.map(nodes.set_index(‘Label’)[‘Nodes’].to_dict())

main[“Target”] = main.Actor.map(nodes.set_index(‘Label’)[‘Nodes’].to_dict())

I found this article helpful in implementing .map()

Excel

Use the Index Match function combination to transfer the IDs from the nodes to the “Source” and “Target” columns.

Here’s a helpful article on using Index Match 

Build the Edges Dataset

Ok, now we’re ready to build the edges dataset. 

The edges file will contain four columns:

  • Source: The unique Movie ID from the main dataset
  • Target: The unique Actor ID from the main dataset 
  • Type: “Directed,” which tells Gephi that edges can have a direction. If you’d like to read further into it, here ya go
  • Weight: “1,” which tells Gephi the total connections between Source and Target. Again, if you’d like to read further into it, have at it.  

Python 

# 1. Grab the Source and Target columns from main

edges = main[[“Source”, “Target”]]

 

# 2. Add the Type and Directed columns

edges.insert(2, ‘Type’, “Directed”)

edges.insert(3, ‘Weight’, 1)

 

# 3. Export the edges dataframe as a CSV

edges.to_csv(“edges.csv”, index = False)

 

# 4. Export the main data frame as a CSV 

main.to_csv(“main.csv”, index = False)

 

Excel

  1. Create a new sheet named edges
  2. Copy the “Source” and “Target” columns and paste them into the edges sheet
  3. Create a column named “Type” and populate every row with “Directed”
  4. Create a column named “Weight” and populate every row with “1”
  5. Save the edges sheet as a CSV
  6. Save the main sheet as a CSV

Loading Data into Gephi

We now have all the data necessary for Gephi. If you haven’t already, install the free software

When loading the software, I got an error “Cannot find Java 1.8 or higher”. This site helped resolve it. To summarize, add a new line to the top of the gephi.conf file directing it to your instance of Java (ex. jdkhome=”C:/Program Files (x86)/Java/jre1.8.0_291″)

Let’s start loading some data. 

  • Click File – Open and select your nodes CSV file
    • Gephi should recognize this as a Nodes table
  • Click Next – Finish 
  • Change Graph Type from Mixed to Directed
  • Click OK

At this point, you should see an array of nodes that appear randomly distributed in your workspace. 

Let’s input the edges file (a very similar process)

  • Click File – Open and select your edges CSV file
    • Gephi should recognize this as an Edges table
  • Click Next – Finish
  • Change Graph Type to Directed
  • Select “Append to existing workspace”
  • Click OK

We’re getting somewhere! It’s time to utilize the Layout dropdown to format our network. 

 

I recommend going through each layout and altering its preset properties until you find one that meets your needs. I ended up using the Yifan Hu Proportional. 

 

Once you’ve selected a layout, you can manually move nodes using the drag tool (select Configure to change the diameter of your cursor).  

 

To export the coordinate data, select File – Export – Graph File and save the file as a .gexf

Final Data Prep

We’re almost to Tableau, I promise. To extract the coordinates, we must open the .gexf file as an XML table. I found this easiest to implement in Excel. 

  • Open an empty Excel sheet
  • Click File – Open and navigate to the .gexf file you just created
    • Make sure that “All Files” is selected from the file search dropdown
  • Select the .gexf file and click Open
  • When prompted to select how to open the file, choose “As an XML table”

You’re looking at something like this, right?

Perfect. We’ll call this new table network. The only columns you’ll need from network are “label,” “value,” “x,” and “y.” Do what you wish with the remaining columns. 

The goal is to transfer the x and y values to our main dataset using the unique ID of each node. Notice the “value” column for each label in the network sheet corresponds to the “Source” and “Target” columns in the main dataset. 

For the links in the network visualization, we need to create two nodes for each source and target combination in the main dataset. (ex. one at Harry Potter and one at The Sorcerer’s Stone)

Python 

Save the network table as an .XLSX file.

# 1. Read in the main dataset

main = pd.read_csv(“main.csv”)

# 2. Read in the network dataset 

network = pd.read_excel(“network.xlsx”)

# 3. Merge the ‘x’ and ‘y’ coordinates corresponding to the “Target” value for each main dataset combination. Save this as a new dataframe

target = main.merge(network[[‘x’, ‘y’, ‘value’]], how = ‘left’,

          left_on = ‘Target’, right_on = ‘value’).drop(columns = [‘value’])

# 4. Merge the ‘x’ and ‘y’ coordinates corresponding to the “Source” value for each main dataset combination. Save this as a new dataframe

source = main.merge(network[[‘x’, ‘y’, ‘value’]], how = ‘left’,

          left_on = ‘Source’, right_on = ‘value’).drop(columns = [‘value’])

# 5. Append the source data to target and save it as main

main = target.append(source)

# 6. Export main as a CSV

main.to_csv(“main.csv”, index = False)

Excel

  1. Make a copy of the main sheet
  2. In the original sheet, use the Index Match function combination to match the Target value of each pair with its corresponding x and y coordinates. 
    • ex. = INDEX(array of x coordinates, MATCH(label in main, array of labels in network, 0))
  3.  In the copy of the sheet, use the same function combination to match the Source value of each pair with its corresponding x and y coordinates
  4. Copy and paste all the data in the second sheet (the source value coordinates) under the data in the original sheet (the target value coordinates).
  5. Delete the second sheet and save

Connect to Main Dataset in Tableau

About time, right? Now, all you’re going to need is your main dataset. Once you’ve connected to the main dataset, in a new sheet, create a dual-axis chart using the X and Y coordinates.

  • Add X to columns
  • Add Y to rows twice
    • Note both the X and Y variables must be continuous dimensions 
  • Right-click on the second Y variable and select “Dual Axis” 
  • Right-click on either Y-axis and select “Synchronize Axis”
  • Set the first value on the Marks card, Y, to Line
  • Set the second value on the Marks card, Y (2), to Circle

Select All on the Marks card and add Source and Target to detail as continuous dimensions.

Bingo.

I’ll leave the design decisions up to you, but this was the process I used to create the Network Visualization – Harry Potter dashboard.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit