Learning Spontaneity to Improve Emotion Recognition in Speech


We investigate the effect and usefulness of spontaneity in speech (i.e. whether a given speech data is spontaneous or not) in the context of emotion recognition. We hypothesize that emotional content in speech is interrelated with its spontaneity, and thus propose to use spontaneity classification as an auxiliary task to the problem of emotion recognition. We propose two supervised learning settings that utilize spontaneity to improve speech emotion recognition: a hierarchical model that performs spontaneity detection before performing emotion recognition, and a multitask learning model that jointly learns to recognize both spontaneity and emotion. Through various experiments on a benchmark database, we show that by using spontaneity as an additional information, significant improvement (3%) can be achieved over systems that are unaware of spontaneity. We also observe that spontaneity information is highly useful in recognizing positive emotions as the recognition accuracy improves by 12%.

Interspeech 2018 (Oral)