Faculty Articles

Speech emotion recognition based on transfer learning from the FaceNet framework

Shuhua Liu, Northeast Normal University, Changchun, Jilin Province 130117, China.
Mengyu Zhang, Northeast Normal University, Changchun, Jilin Province 130117, China.
Ming Fang, Northeast Normal University, Changchun, Jilin Province 130117, China.
Jianwei Zhao, Northeast Normal University, Changchun, Jilin Province 130117, China.
Kun Hou, Northeast Normal University, Changchun, Jilin Province 130117, China.
Chih-Cheng Hung, College of Computing and Software Engineering, Kennesaw State University, Marietta, Georgia 30060, USA.

Department

Computer Science

Document Type

Article

Publication Date

2-1-2021

Abstract

Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively. The cross-training is then conducted on the dataset, and comprehensive experiments are performed. Experimental results indicate that the proposed approach outperforms state-of-the-art methods on the IEMOCAP dataset among single modal approaches.

Journal Title

The Journal of the Acoustical Society of America

Volume

149

Issue

First Page

1338

Digital Object Identifier (DOI)

10.1121/10.0003530

Link to Full Text

Find in your library

COinS

Faculty Articles

Speech emotion recognition based on transfer learning from the FaceNet framework

Department

Document Type

Publication Date

Abstract

Journal Title

Volume

Issue

First Page

Digital Object Identifier (DOI)

Search

Authors

Browse

Useful Links

Faculty Articles

Speech emotion recognition based on transfer learning from the FaceNet framework

Authors

Department

Document Type

Publication Date

Abstract

Journal Title

Volume

Issue

First Page

Digital Object Identifier (DOI)

Share

Search

Authors

Browse

Useful Links