EmoConv-Diff: Speech Emotion Conversion
This website contains supplementary material to the paper:
- EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data Submitted to IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), Sep, 2023
EmoConv-Diff: MSP-Podcast v1.10
MSP-Podcast v1.10
MSP-PODCAST-0538-0133.wav | MSP-PODCAST-0001-0100.wav | MSP-PODCAST-0003-0442.wav | MSP-PODCAST-0202-0002.wav | MSP-PODCAST-0114-0141.wav | MSP-PODCAST-2329-1765.wav | |
---|---|---|---|---|---|---|
Ground-truth Arousal | 3.2 | 4.8 | 6.2 | 5.2 | 3.6 | 3.0 |
Ground-truth | ||||||
Target Arousal 1 | ||||||
Target Arousal 7 | ||||||
Target Arousal 4 |
----------------------------------------------------------------------------
HiFiGAN-based speech emotion conversion
The audio samples for our prior work, the HiFiGAN-based speech emotion conversion, can be found in the below link:
- In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis 15th ITG Conference on Speech Communication, September 2023