We aim to develop a measure for engagement/synchrony between two human speakers in a face to face conversation processing vaiorus modalities like audio-visual clues and gestures using emotion recognition and sentiment analysis. We are using the IEMOCAP database for our research purpose.
As a baseline model, my work as undergraduate research project for Vth and VIth semester (junior year) focuses using the speech clues for learning the synchrony present in the audio signal between speakers. Discete Time Warping is suggested as a measure for syncrony for speech signals.
This preliminary project report details our approach and assumptions. This research is still in progress.