Building an Emotion Classification System for Videos with Deep Learning

Introduction to Deep Learning with Dataiku

At phData, we have seen the value that deep learning brings to organizations that can successfully harness it. From reducing diagnostic errors in radiology to more accurately detecting manufacturing defects, we’ve certainly seen our share of wins, but not without pain. Most organizations will fail to adopt deep learning because of the complexity of its systems and development.

In this blog, we’ll show you how we have used the Dataiku Data Science Studio (Dataiku DSS), a tool that we already use to simplify data science, to build out deep learning applications with relative ease. More specifically, we will show how to build an emotion classification system on videos using the Dataiku deep learning for images plugin. Our emotion classification dataset will be the speech video data from RAVDESS. An example of one of the videos can be found on YouTube.

We will leverage the Dataiku deep learning for images plugin, which allows us to download pre-trained deep learning networks and provides recipes such as image classification retraining and scoring to classify emotions for the videos. Dataiku’s deep learning plugin uses Tensorflow and Keras on Python for image classification.

The approach that we will take for video classification is to break each emotion video into a fixed number of frames and then use these images to train a deep residual neural network (known as resnet) to classify emotions within each image. This resnet network has been previously trained on the ImageNet dataset, so we do not have to start from scratch. Finally, we’ll evaluate the predicted emotion for a video by taking a majority vote on labels predicted across all its frames.

Dataiku Deep Learning Tutorial Overview

Obtaining Dataiku DSS

The Dataiku free trial version can be used on your local machine, or you can deploy Dataiku to AWS using their AWS Marketplace image.

Using the Dataiku Deep Learning for Images Plugin

Before we dive into the tutorial, let’s get familiar with some of the recipes provided by the Dataiku deep learning plugin and their requirements.

The retraining image classification model recipe takes a previously trained Tensorflow neural network and retrains one or more layers on a new set of images.

Inputs:

A folder containing the previously trained weights (in Tensorflow h5 format)
A folder containing images to use for training
A dataset containing the relative path of each image in the folder along with the label to use for that image

Outputs:

A folder containing the new weights and information about the network structure

The image classification recipe uses a trained neural network to generate classification scores for images.

Inputs:

A folder containing the network weights and information about the network structure
A folder containing the images to classify

Outputs:

A dataset containing classes and their respective scores for each image

Steps Required for the Project

Refer to the diagram and the plan below, which indicates where each step fits into the project’s workflow.

Step 1: Prepare the Emotion Classification Dataset

Download emotion video data using a shell code recipe.
Extract frames from videos and create a dataset using a Python code recipe.
Split the data into training and testing sets using a split visual recipe.

Step 2: Train the Deep Learning Model

Download resnet weights using the download pre-trained model macro.
Create, configure, and run the retraining image classification model recipe.

Step 3: Score Frames and Evaluate Labels for Emotion Videos

Create a folder containing all frames/images of videos to be tested.
Use the image classification recipe to score test images.
Extract labels for images.
Evaluate and assign labels to videos.

Step 4: Visualize Results

Analyze results.
Build a Dataiku dashboard displaying the confusion matrix and accuracy analysis.

Prerequisites

A: Install the Deep Learning for Images Plugin

Open up Plugins settings.

Install the appropriate Dataiku deep learning for images plugin, depending on if you are using CPU or GPU.

B: Create a Python Code Environment Containing OpenCV

Open Administration and go to Code Envs.

Create a new Python env called py27opencv. Install the Jupyter notebook packages if you want to be able to use a notebook to experiment with code.

Open this environment and go to Packages to install, and add.

opencv-python==4.2.0.32

Select save and update to install the packages.

Step 1: Prepare the Emotion Classification Dataset

A: Download Emotion Video Dataset

Create a shell recipe to download videos.

For the output, create a new folder called Emotion Videos.

Use this bash script to download and extract the video files (note that the extracted videos take up 6.31 GB):

#!/bin/bash

for i in $(seq -f "%02g" 1 24); do
    curl "https://zenodo.org/record/1188976/files/Video_Speech_Actor_$i.zip?download=1" -o "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip"
    unzip "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip" -d $DKU_OUTPUT_0_FOLDER_PATH
    mv $DKU_OUTPUT_0_FOLDER_PATH/Actor_$i/02-* $DKU_OUTPUT_0_FOLDER_PATH
    rm -rf "$DKU_OUTPUT_0_FOLDER_PATH/Actor_$i"
    rm "$DKU_OUTPUT_0_FOLDER_PATH/Video_Speech_Actor_$i.zip"
done

Note that we only keep videos starting with 02-*, this is because the 01-* videos are duplicates that include audio. Run the recipe.

B: Extract Frames From Videos

Create a Python code recipe.

For the input, select Emotion Videos. For the output, create a local folder called Emotion Images. We also want to create a new dataset, EmotionImagesCSV, to hold information about the images.

In the Advanced settings menu, select the code environment containing OpenCV (py27opencv).

Save and run this code in the recipe. It will iterate through all video files in Emotion Videos, extracting frames from each video at different points. We extract frames at 5% intervals (i.e. the frame that occurs when the video is 5% complete, 10% complete, etc.) and store them as png files in Emotion Images. It will also extract the information on every frame such as the video interval for the frame, image file name, emotion, etc., and write it to EmotionImagesCSV.

You should now be able to see frames sampled from each video in Emotion Images.

EmotionImagesCSV should contain metadata on each frame with columns like image path, label, actor, etc. These values are obtained from the file name according to the following table:

C: Split Data Into Train and Test

Apply the split recipe on EmotionImagesCSV to create TestingImages and TrainingImages. Group the rows on video_path. We do this to make sure that all the frames of a video either belong to train or test data. This is to prevent frames from one video ending up in both the training and test set, which ensures that our network is learning to recognize emotion instead of specific characteristics of each video.

Then, run the recipe.

Step 2: Train the Deep Learning Model

A: Download the Dataiku Tensorflow Resnet Model Using Macro

Go to Macros.

Click on Download pre-trained model.

Put the output folder name as Original Resnet and set the pre-trained model to download to Resnet Trained on ImageNet.

Run Macro. You should be able to see .h5 files in the model folder Original Resnet.

B: Create the Retraining Recipe

Create a new folder TrainedResnet. This is where we will store the weights after retraining the model on our dataset.

Go to add Recipe. Under Plugins, choose Deep Learning on images.

Choose Retraining Image Classification model.

For input, set Label Dataset to TrainingImages, Image Folder to Emotion Images, and Model folder to Original Resnet. Set the output model folder to TrainedResnet.

Click Create Recipe. This will open a window that will allow you to set training ratio, hyperparameters, and columns that contain the path and label information on training images.

Set the Image filename column to path and the Label column to label.

Set the rest of the configuration as shown below.

The optimization settings use the Keras stochastic gradient descent optimizer, and we use custom parameters to set the momentum value and to use Nesterov momentum. Details about the algorithm and parameters can be found in the Keras documentation.

We also use data augmentation to apply random transformations to each image to increase the number of training images. The details of each of these parameters can be found in the Keras documentation for ImageDataGenerator.

If you are using the Dataiku deep learning for images plugin (GPU), add the following settings:

Run the Retraining Recipe.

Retrained model files will be available in the output folder TrainedResnet.

Step 3: Score Frames and Evaluate Labels for Emotion Videos

The Dataiku image classification (score) recipe takes in a Dataiku Tensorflow model folder and a folder with images to be scored. It returns a dataset with the path of the image and JSON object containing predicted labels as keys and their respective prediction probabilities as values.

A: Prepare the Test Images Folder

Add Python recipe. Set inputs to TestingImages dataset and Emotion Images folder. Set output to a new folder Testing Image Files.

Add the following code snippet and run the recipe. It will load all images from Emotion Images in the Testing Image Files folder whose paths are in the TestingImages dataset.

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandas utils as pdu


# Read recipe inputs  -- CHANGE THIS IDENTIFIER TO MATCH YOUR INPUT FOLDER
emotion_test = dataiku.Dataset("TestingImages")
emotion_test_df = emotion_test.get_dataframe()

local_images = dataiku.Folder("StYjUWGk")
local_images_info = local_images.get_info()

# Write recipe outputs -- CHANGE THIS IDENTIFIER TO MATCH YOUR OUTPUT FOLDER
test_images = dataiku.Folder("cUcmzSP8")
test_images_info = test_images.get_info()

# Clear recipe output before writing 
test_images.clear()


for ind, row in emotion_test_df.iterrows():
    stream = local_images.get_download_stream(row.image_path)
    test_images.upload_stream(row.image_path, stream)

B: Use Plugin Recipe to Score

Go to the Dataiku deep learning for images plugin. Click on Image Classification.

Set input to Testing Image Files and the new output dataset name to ScoredImages.

Run the recipe. Your ScoredImages file should look like this:

C: Extract Predicted Labels for Images

To get a final class for a frame/image, we will use the label that has been predicted with maximum confidence.

Add Prepare recipe. Set input to ScoredImages and output to a new dataset ScoredImagesPrepared.

Add a New Step in the Script. Use the Python function.

Set the Mode to row: return a row for each row.

Click on Edit Python Source Code and add the following snippet of code:

def process(row):
    max_val = 0
    max_label = None
    
    emotions = ['calm', 'sad', 'surprised', 'neutral', 
                'fearful', 'angry', 'happy', 'disgust']
    
    for e in emotions:
        p = float(row['prediction_{}'.format(e)])
        if p > max_val:
            max_val = p
            max_label = e
    
    row['max_prediction'] = max_val
    row['max_label'] = max_label
    row['label'] = row['images'].split('_')[1]
    row['correct'] = 1 if row['label'] == row['max_label'] else 0
    row['video_path'] = '/{}.mp4'.format(row['images'].split('_')[0])
    
    return row

Run the recipe.

ScoredImagesPrepared should now have a new column max_label (label predicted with maximum probability). This is the final label that is assigned to the frame/image and we will also use this to classify labels at a video level.

There is also a column called correct. It is evaluated as 1 or 0 based on whether the original label of the image is equal to max_label.

D: Evaluate Labels for Emotion Videos

Add a Group recipe. Set input to ScoredImagesPrepared and call the new output file ScoredImages_prepared_by_video_path. Create recipe.

Add column video_path to Group Keys.

Select Concat for max_label.

You can leave other columns as it is. Run the group recipe.

The next step is the final one. This is when we finally assign labels to the videos!

Use the Prepare recipe once again with input as ScoredImages_prepared_by_video_path. Set output to a new dataset ScoredVideos.

Add New Step in the Script. Use Python function. Set the Mode to row: return a row for each row. Edit Python Code and add this:

from collections import Counter 
   
def most_frequent(List): 
    occurence_count = Counter(List) 
    return occurence_count.most_common(1)[0][0] 

def process(row):
    row['label'] = row['label_first']
    
    row['most_freq_label'] = most_frequent(row['max_label_concat'].split(','))
    row['most_freq_correct'] = 1 if row['label'] == row['most_freq_label'] else 0
    
    return row

    row['most_freq_correct'] = 1 if row['label'] == row['most_freq_label'] else 0
      
    return row

This code will take the most frequent label for every video and store it in the most_freq_label column. We also create the most_freq_correct column that compares the original label with the most frequent label and gives 1 or 0 accordingly.

Step 4: Visualize the Predictions

A: Analyzing the Results

Click Explore on ScoredVideos. Go to the most_freq_correct column and click on Analyze.

You should see the percentage of correct and incorrect predictions.

B: Building a Confusion Matrix

Go to Charts->Tables -> Colored.

Set rows to label, columns to most_freq_label, and content to Count of records.

Publish this chart to a Dataiku dashboard by clicking the Publish button on the top right corner.

C: Univariate Analysis

Click Explore on ScoredVideos dataset. Go to the Statistics tab.

Click on New Card at the top right corner. Then choose Univariate Analysis.

Drag and drop the most_freq_correct variable into Variables to describe. Then, click Create Card.

Click on the top right corner of the card and publish to the Dataiku dashboard.

Conclusion

In this post, we examined how Dataiku’s deep learning for images plugin can be used to perform advanced deep learning with relative ease. You can continue the process and learn how to deploy this plugin in this follow-up post.

Of the 287 videos in our test set, we were able to correctly classify the emotion within the video 97% of the time. Dataiku DSS allows advanced users to directly apply custom code and Python packages like OpenCV, while the Dataiku deep learning for images plugin uses Keras and Tensorflow to simplify the usage of deep learning networks. The same technique we’ve used here for emotion video classification can be applied to many different video classification tasks, from determining if retail customers in surveillance footage are enjoying themselves to identifying defective parts on a manufacturing line. Looking for more ideas about how you can use cutting-edge tools to advance your business? Need deep learning consulting for your advanced machine learning projects? phData’s Machine Learning practice is here to help!

The code for this project has been made available on our Github.

Building an Emotion Classification System for Videos with Deep Learning

Introduction to Deep Learning with Dataiku

Dataiku Deep Learning Tutorial Overview

Obtaining Dataiku DSS

Using the Dataiku Deep Learning for Images Plugin

Inputs:

Outputs:

Inputs:

Outputs:

Steps Required for the Project

Step 1: Prepare the Emotion Classification Dataset

Step 2: Train the Deep Learning Model

Step 3: Score Frames and Evaluate Labels for Emotion Videos

Step 4: Visualize Results

Prerequisites

A: Install the Deep Learning for Images Plugin

B: Create a Python Code Environment Containing OpenCV

Step 1: Prepare the Emotion Classification Dataset

A: Download Emotion Video Dataset

B: Extract Frames From Videos

C: Split Data Into Train and Test

Step 2: Train the Deep Learning Model

A: Download the Dataiku Tensorflow Resnet Model Using Macro

B: Create the Retraining Recipe

Step 3: Score Frames and Evaluate Labels for Emotion Videos

A: Prepare the Test Images Folder

B: Use Plugin Recipe to Score

C: Extract Predicted Labels for Images

D: Evaluate Labels for Emotion Videos

Step 4: Visualize the Predictions

A: Analyzing the Results

B: Building a Confusion Matrix

C: Univariate Analysis

Conclusion

More to explore

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit