August 4, 2020

Building an Emotion Classification System for Videos with Deep Learning

By Robert Coop

Introduction to Deep Learning with Dataiku

At phData, we have seen the value that deep learning brings to organizations that can successfully harness it. From reducing diagnostic errors in radiology to more accurately detecting manufacturing defects, we’ve certainly seen our share of wins, but not without pain. Most organizations will fail to adopt deep learning because of the complexity of its systems and development. 

In this blog, we’ll show you how we have used the Dataiku Data Science Studio (Dataiku DSS), a tool that we already use to simplify data science, to build out deep learning applications with relative ease. More specifically, we will show how to build an emotion classification system on videos using the Dataiku deep learning for images plugin. Our emotion classification dataset will be the speech video data from RAVDESS. An example of one of the videos can be found on YouTube.

We will leverage the Dataiku deep learning for images plugin, which allows us to download pre-trained deep learning networks and provides recipes such as image classification retraining and scoring to classify emotions for the videos. Dataiku’s deep learning plugin uses Tensorflow and Keras on Python for image classification.  

The approach that we will take for video classification is to break each emotion video into a fixed number of frames and then use these images to train a deep residual neural network (known as resnet) to classify emotions within each image. This resnet network has been previously trained on the ImageNet dataset, so we do not have to start from scratch.  Finally, we’ll evaluate the predicted emotion for a video by taking a majority vote on labels predicted across all its frames. 

Majority-vote process used to classify emotion videos

Dataiku Deep Learning Tutorial Overview 

Obtaining Dataiku DSS

The Dataiku free trial version can be used on your local machine, or you can deploy Dataiku to AWS using their AWS Marketplace image.

Using the Dataiku Deep Learning for Images Plugin

Before we dive into the tutorial, let’s get familiar with some of the recipes provided by the Dataiku deep learning plugin and their requirements.

The retraining image classification model recipe takes a previously trained Tensorflow neural network and retrains one or more layers on a new set of images.  

Inputs:
  • A folder containing the previously trained weights (in Tensorflow h5 format)
  • A folder containing images to use for training
  • A dataset containing the relative path of each image in the folder along with the label to use for that image
Outputs:
  • A folder containing the new weights and information about the network structure

The image classification recipe uses a trained neural network to generate classification scores for images.

Inputs:
  • A folder containing the network weights and information about the network structure
  • A folder containing the images to classify
Outputs:
  • A dataset containing classes and their respective scores for each image

Steps Required for the Project

Refer to the diagram and the plan below, which indicates where each step fits into the project’s workflow.

Overview of the emotion classification training and testing procedure

Step 1: Prepare the Emotion Classification Dataset

  1. Download emotion video data using a shell code recipe.
  2. Extract frames from videos and create a dataset using a Python code recipe. 
  3. Split the data into training and testing sets using a split visual recipe.

Step 2: Train the Deep Learning Model

  1. Download resnet weights using the download pre-trained model macro.
  2. Create, configure, and run the retraining image classification model recipe.

Step 3: Score Frames and Evaluate Labels for Emotion Videos

  1. Create a folder containing all frames/images of videos to be tested.
  2. Use the image classification recipe to score test images.
  3. Extract labels for images. 
  4. Evaluate and assign labels to videos. 

Step 4: Visualize Results

  1. Analyze results.
  2. Build a Dataiku dashboard displaying the confusion matrix and accuracy analysis.

Prerequisites

A: Install the Deep Learning for Images Plugin

Open up Plugins settings.

Dataiku DSS settings menu

Install the appropriate Dataiku deep learning for images plugin, depending on if you are using CPU or GPU.

Dataiku deep learning on images plugins

B: Create a Python Code Environment Containing OpenCV

Open Administration and go to Code Envs.

Dataiku DSS settings menu

Create a new Python env called py27opencv. Install the Jupyter notebook packages if you want to be able to use a notebook to experiment with code.

DSS Python environment creation options

Open this environment and go to Packages to install, and add.

opencv-python==4.2.0.32

Dataiku code env settings

Select save and update to install the packages.

Step 1: Prepare the Emotion Classification Dataset

Flow for creating emotion classification images

A: Download Emotion Video Dataset

Create a shell recipe to download videos.

Dataiku shell recipe

For the output, create a new folder called Emotion Videos.

Dataiku recipe configuration

Use this bash script to download and extract the video files (note that the extracted videos take up 6.31 GB):

#!/bin/bash

for i in $(seq -f "%02g" 1 24); do
    curl "https://zenodo.org/record/1188976/files/Video_Speech_Actor_$i.zip?download=1" -o "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip"
    unzip "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip" -d $DKU_OUTPUT_0_FOLDER_PATH
    mv $DKU_OUTPUT_0_FOLDER_PATH/Actor_$i/02-* $DKU_OUTPUT_0_FOLDER_PATH
    rm -rf "$DKU_OUTPUT_0_FOLDER_PATH/Actor_$i"
    rm "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip"
done

Note that we only keep videos starting with 02-*, this is because the 01-* videos are duplicates that include audio. Run the recipe.

B: Extract Frames From Videos

Create a Python code recipe.

Dataiku Python Code recipe

For the input, select Emotion Videos. For the output, create a local folder called Emotion Images. We also want to create a new dataset, EmotionImagesCSV, to hold information about the images.

Dataiku Dataset configurations

Dataiku recipe inputs/outputs

In the Advanced settings menu, select the code environment containing OpenCV (py27opencv). 

Dataiku code env specification

Save and run this code in the recipe. It will iterate through all video files in Emotion Videos, extracting frames from each video at different points. We extract frames at 5% intervals (i.e. the frame that occurs when the video is 5% complete, 10% complete, etc.) and store them as png files in Emotion Images. It will also extract the information on every frame such as the video interval for the frame, image file name, emotion, etc., and write it to EmotionImagesCSV.

You should now be able to see frames sampled from each video in Emotion Images.

Images from emotion classification videos

EmotionImagesCSV should contain metadata on each frame with columns like image path, label, actor, etc. These values are obtained from the file name according to the following table:

Dataiku filename identifiers

Dataiku dataset containing emotion video data

C: Split Data Into Train and Test 

Apply the split recipe on EmotionImagesCSV to create TestingImages and TrainingImages. Group the rows on video_path. We do this to make sure that all the frames of a video either belong to train or test data. This is to prevent frames from one video ending up in both the training and test set, which ensures that our network is learning to recognize emotion instead of specific characteristics of each video.

Dataiku split recipe

Then, run the recipe.

Step 2: Train the Deep Learning Model

Flow for training the deep learning classifier

A: Download the Dataiku Tensorflow Resnet Model Using Macro

Go to Macros.

Dataiku project menu

Click on Download pre-trained model.

Put the output folder name as Original Resnet and set the pre-trained model to download to Resnet Trained on ImageNet.

Resnet pre-trained model download

Run Macro. You should be able to see .h5 files in the model folder Original Resnet.

Pre-trained Tensorflow weights for resnet

B: Create the Retraining Recipe

Create a new folder TrainedResnet. This is where we will store the weights after retraining the model on our dataset.

Go to add Recipe. Under Plugins, choose Deep Learning on images.

Dataiku deep learning on images recipe

Choose Retraining Image Classification model. 

Retraining the resnet deep learning network

For input, set Label Dataset to TrainingImages, Image Folder to Emotion Images, and Model folder to Original Resnet. Set the output model folder to TrainedResnet.

Click Create Recipe. This will open a window that will allow you to set training ratio, hyperparameters, and columns that contain the path and label information on training images.

Set the Image filename column to path and the Label column to label.

Retraining image classification settings

Set the rest of the configuration as shown below.

Deep learning network training settings

Deep learning image augmentation ImageDataGenerator settings

The optimization settings use the Keras stochastic gradient descent optimizer, and we use custom parameters to set the momentum value and to use Nesterov momentum. Details about the algorithm and parameters can be found in the Keras documentation.

We also use data augmentation to apply random transformations to each image to increase the number of training images. The details of each of these parameters can be found in the Keras documentation for ImageDataGenerator.

If you are using the Dataiku deep learning for images plugin (GPU), add the following settings:

Dataiku GPU settings

Run the Retraining Recipe.

Retrained model files will be available in the output folder TrainedResnet.

Step 3: Score Frames and Evaluate Labels for Emotion Videos

Flow for scoring emotion images and classifying emotion videos

The Dataiku image classification (score) recipe takes in a Dataiku Tensorflow model folder and a folder with images to be scored. It returns a dataset with the path of the image and JSON object containing predicted labels as keys and their respective prediction probabilities as values. 

A: Prepare the Test Images Folder

Add Python recipe. Set inputs to TestingImages dataset and Emotion Images folder. Set output to a new folder Testing Image Files.

Add the following code snippet and run the recipe. It will load all images from Emotion Images in the Testing Image Files folder whose paths are in the TestingImages dataset.

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandas utils as pdu


# Read recipe inputs  -- CHANGE THIS IDENTIFIER TO MATCH YOUR INPUT FOLDER
emotion_test = dataiku.Dataset("TestingImages")
emotion_test_df = emotion_test.get_dataframe()

local_images = dataiku.Folder("StYjUWGk")
local_images_info = local_images.get_info()

# Write recipe outputs -- CHANGE THIS IDENTIFIER TO MATCH YOUR OUTPUT FOLDER
test_images = dataiku.Folder("cUcmzSP8")
test_images_info = test_images.get_info()

# Clear recipe output before writing 
test_images.clear()


for ind, row in emotion_test_df.iterrows():
    stream = local_images.get_download_stream(row.image_path)
    test_images.upload_stream(row.image_path, stream)

B: Use Plugin Recipe to Score

Go to the Dataiku deep learning for images plugin. Click on Image Classification.

Image classification recipe

Set input to Testing Image Files and the new output dataset name to ScoredImages. 

Run the recipe. Your ScoredImages file should look like this:

Dataiku dataset for scored emotion images

C: Extract Predicted Labels for Images

To get a final class for a frame/image, we will use the label that has been predicted with maximum confidence. 

Add Prepare recipe. Set input to ScoredImages and output to a new dataset ScoredImagesPrepared.

Add a New Step in the Script. Use the Python function.

Dataiku prepare dataset recipe Python function

Set the Mode to row: return a row for each row.

Dataiku prepare recipe python settings

Click on Edit Python Source Code and add the following snippet of code:

def process(row):
    max_val = 0
    max_label = None
    
    emotions = ['calm', 'sad', 'surprised', 'neutral', 
                'fearful', 'angry', 'happy', 'disgust']
    
    for e in emotions:
        p = float(row['prediction_{}'.format(e)])
        if p > max_val:
            max_val = p
            max_label = e
    
    row['max_prediction'] = max_val
    row['max_label'] = max_label
    row['label'] = row['images'].split('_')[1]
    row['correct'] = 1 if row['label'] == row['max_label'] else 0
    row['video_path'] = '/{}.mp4'.format(row['images'].split('_')[0])
    
    return row

Run the recipe.

ScoredImagesPrepared should now have a new column max_label (label predicted with maximum probability). This is the final label that is assigned to the frame/image and we will also use this to classify labels at a video level.

There is also a column called correct. It is evaluated as 1 or 0 based on whether the original label of the image is equal to max_label.

D: Evaluate Labels for Emotion Videos

Add a Group recipe. Set input to ScoredImagesPrepared and call the new output file ScoredImages_prepared_by_video_path. Create recipe.

Add column video_path to Group Keys.

Dataiku group recipe

Select Concat for max_label.

Dataiku group recipe settings

You can leave other columns as it is. Run the group recipe.

The next step is the final one. This is when we finally assign labels to the videos!

Use the Prepare recipe once again with input as ScoredImages_prepared_by_video_path. Set output to a new dataset ScoredVideos.

Add New Step in the Script. Use Python function. Set the Mode to row: return a row for each row. Edit Python Code and add this:

from collections import Counter 
   
def most_frequent(List): 
    occurence_count = Counter(List) 
    return occurence_count.most_common(1)[0][0] 

def process(row):
    row['label'] = row['label_first']
    
    row['most_freq_label'] = most_frequent(row['max_label_concat'].split(','))
    row['most_freq_correct'] = 1 if row['label'] == row['most_freq_label'] else 0
    
    return row

    row['most_freq_correct'] = 1 if row['label'] == row['most_freq_label'] else 0
      
    return row

This code will take the most frequent label for every video and store it in the most_freq_label column. We also create the most_freq_correct column that compares the original label with the most frequent label and gives 1 or 0 accordingly.

Step 4: Visualize the Predictions

A: Analyzing the Results

Click Explore on ScoredVideos. Go to the most_freq_correct column and click on Analyze.

Analyzing the Dataiku dataset results

You should see the percentage of correct and incorrect predictions.

Emotion video classification results

B: Building a Confusion Matrix

Go to Charts->Tables -> Colored.

Dataiku chart settings

Set rows to label, columns to most_freq_label, and content to Count of records.

Confusion matrix for emotion classification videos

Publish this chart to a Dataiku dashboard by clicking the Publish button on the top right corner.

C: Univariate Analysis

Click Explore on ScoredVideos dataset. Go to the Statistics tab.

Click on New Card at the top right corner. Then choose Univariate Analysis.

Dataiku analysis options

Drag and drop the most_freq_correct variable into Variables to describe. Then, click Create Card.

Dataiku univariate analysis options

Analysis of the emotion classification error rate

Click on the top right corner of the card and publish to the Dataiku dashboard.

Conclusion

In this post, we examined how Dataiku’s deep learning for images plugin can be used to perform advanced deep learning with relative ease. You can continue the process and learn how to deploy this plugin in this follow-up post.

Of the 287 videos in our test set, we were able to correctly classify the emotion within the video 97% of the time. Dataiku DSS allows advanced users to directly apply custom code and Python packages like OpenCV, while the Dataiku deep learning for images plugin uses Keras and Tensorflow to simplify the usage of deep learning networks. The same technique we’ve used here for emotion video classification can be applied to many different video classification tasks, from determining if retail customers in surveillance footage are enjoying themselves to identifying defective parts on a manufacturing line. Looking for more ideas about how you can use cutting-edge tools to advance your business? Need deep learning consulting for your advanced machine learning projects? phData’s Machine Learning practice is here to help! 

The code for this project has been made available on our Github.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit