Analysing Diffusion-based Generative Approaches against Discriminative Approaches for Speech Restoration

This website contains supplementary material to the paper:

Analysing Diffusion-based Generative Models against Discriminative Approaches for Speech Restoration Tasks, Oct 2022 [1]

Code

The code is availabe at https://github.com/sp-uhh/sgmse on the "icassp_2023" branch

Enhancement

Dereverberation

Bandwidth Extension

Full method comparison

The starred methods were trained on the specific input bandwidth. Other methods are bandwidth-agnostic. All models were trained on the VCTK corpus.

	14p351317down4.wav	116p351098down2.wav	1346p360146down8.wav	1511p360340down2.wav
Input Sampling Frequency	4kHz	8kHz	2kHz	8kHz
Clean
Bandlimited
SGMSE+M [1]
VoiceFixer [3]
TUNet * [4]
NuWave 2 [5]

References

[1] Jean-Marie Lemercier, Julius Richter, Simon Welker and Timo Gerkmann. Analysing Diffusion-based Generative Models against Discriminative Approaches for Speech Restoration Tasks. arXiv preprint arXiv:2211.02397. 2022.

[2] Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, and Timo Gerkmann. Speech Enhancement and Dereverberation with Diffusion-Based Generative Models. arXiv preprint arXiv:2208.05830. 2022.

[3] Haohe Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang and Yuxuan Wang. VoiceFixer: Toward General Speech Restoration with Neural Vocoder. ISCA Interspeech. 2022.

[4] Viet-Anh Nguyen, Anh H. T. Nguyen and Andy W. H. Khong. TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining. ICASSP. 2022

[5] Seungu Han and Junhyeok Lee. NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates. ISCA Interspeech. 2022