Normalized Features for Improving the Generalization of DNN Based Speech Enhancement
Sound Examples
By clicking the following boxes to show the sound examples of the respective category. The sound examples in the listening experiments use a minimum gain of -15 dB while for the remaining examples a minimum gain -20 dB has been used. The following notation is used for the algorithms:- noisy: noisy signal
- OMLSA: optimally modified log-spectral amplitude estimator [1, 2] taken from here
- non-ML: non-ML based speech enhancement approach based on [3-5] described in Section II
- ML (|Y|²): DNN-based enhancement scheme using the noisy log-spectra as input features
- ML (|Y|², λ): DNN-based enhancement scheme using the noisy log-spectra with appended logarithmized noise PSD as input features
- ML (ξ, γ): DNN-based enhancement scheme using the logarithmized a priori and a posteriori SNRs as input features
- ML (γ): DNN-based enhancement scheme using the logarithmized a posteriori as input features
- ML (ξ): DNN-based enhancement scheme using the logarithmized a priori as input features
[2] I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 5, pp. 466–475, Sep. 2003.
[3] C. Breithaupt, T. Gerkmann, and R. Martin, “A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, NV, USA, 2008, pp. 4897–4900.
[4] T. Gerkmann and R. C. Hendriks, “Noise Power Estimation Based on the Probability of Speech Presence,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2011, pp. 145–148.
[5] T. Gerkmann and R. C. Hendriks, “Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1383–1393, May 2012.
Listening experiment, factory 1 noise (seen)
female speaker | male speaker | |
---|---|---|
noisy |
|
|
clean |
|
|
non-ML |
|
|
ML (|Y|²) |
|
|
ML (ξ, γ) |
|
|
Listening experiment, traffic noise (seen)
female speaker | male speaker | |
---|---|---|
noisy |
|
|
clean |
|
|
non-ML |
|
|
ML (|Y|^2) |
|
|
ML (ξ, γ) |
|
|
Listening experiment, factory 1 noise (unseen)
female speaker | male speaker | |
---|---|---|
noisy |
|
|
clean |
|
|
non-ML |
|
|
ML (|Y|²) |
|
|
ML (ξ, γ) |
|
|
Listening experiment, traffic noise (unseen)
female speaker | male speaker | |
---|---|---|
noisy |
|
|
clean |
|
|
non-ML |
|
|
ML (|Y|²) |
|
|
ML (ξ, γ) |
|
|