Multimodal Emotion Recognition Based on Hybrid Ensemble Deep Learning Framework

Authority: ICIC Express Letters
Category: Journal Publication

Understanding emotions is crucial for accurately predicting human behavior. By anticipating emotions, we can forecast decisions and respond effectively. Emotion recognition models can be applied to robots and computers to enhance various business environments. Recognizing emotions is challenging due to diverse sources such as facial expressions, audio, text, and electroencephalogram (EEG) signals. In this research, we propose a hybrid ensemble deep learning framework for multimodal emotion recognition using emotional facial images and audio. The framework involves extracting features from facial images (visual) and audio using well-known convolutional neural network (CNN) models, followed by processing these features with bidirectional long short-term memory (BiLSTM) networks. We employed DenseNet121-BiLSTM and ResNet50-BiLSTM, referred to as V-Emotion and A-Emotion, respectively. Additionally, audio and visual features were concatenated and fed into a BiLSTM, named VA-Emotion. The final step of the proposed framework integrates the outputs of the V-, A-, and VA-Emotion models using a weighted average ensemble learning method, assigning higher weights to models with greater classification accuracy. We evaluated the proposed framework on the RAVDESS dataset, achieving an accuracy of 91.67%. Our experimental results demonstrate that the proposed framework outperforms existing methods

Interdisciplinary Research Centers (IRCs)

Applied Research Centers (ARCs)

Joint Research Centers (JRCs)

Research Support

CONSORTIA

Chair Professors

Visiting Scholars & Postdocs

Publications

Multimodal Emotion Recognition Based on Hybrid Ensemble Deep Learning Framework