Normalized Features for Improving the Generalization of DNN Based Speech Enhancement

Sound Examples

By clicking the following boxes to show the sound examples of the respective category. The sound examples in the listening experiments use a minimum gain of -15 dB while for the remaining examples a minimum gain -20 dB has been used. The following notation is used for the algorithms:

noisy: noisy signal
OMLSA: optimally modified log-spectral amplitude estimator [1, 2] taken from here
non-ML: non-ML based speech enhancement approach based on [3-5] described in Section II
ML (|Y|²): DNN-based enhancement scheme using the noisy log-spectra as input features
ML (|Y|², λ): DNN-based enhancement scheme using the noisy log-spectra with appended logarithmized noise PSD as input features
ML (ξ, γ): DNN-based enhancement scheme using the logarithmized a priori and a posteriori SNRs as input features
ML (γ): DNN-based enhancement scheme using the logarithmized a posteriori as input features
ML (ξ): DNN-based enhancement scheme using the logarithmized a priori as input features

[1] I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Processing, vol. 81, no. 11, pp. 2403–2418, 2001.
[2] I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 5, pp. 466–475, Sep. 2003.
[3] C. Breithaupt, T. Gerkmann, and R. Martin, “A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, NV, USA, 2008, pp. 4897–4900.
[4] T. Gerkmann and R. C. Hendriks, “Noise Power Estimation Based on the Probability of Speech Presence,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2011, pp. 145–148.
[5] T. Gerkmann and R. C. Hendriks, “Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1383–1393, May 2012.

Sound examples, mod. white noise (unseen)

Speaker	Noisy	OMLSA	non-ML	ML (\|Y\|²)	ML (\|Y\|², λ)	ML (ξ, γ)	ML (γ)	ML (ξ)
Speaker 1	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 2	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 3	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 4	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 5	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 6	▸	▸	▸	▸	▸	▸	▸	▸

Sound examples, traffic noise (unseen)

traffic noise, 5 dB SNR

Speaker	Noisy	OMLSA	non-ML	ML (\|Y\|²)	ML (\|Y\|², λ)	ML (ξ, γ)	ML (γ)	ML (ξ)
Speaker 1	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 2	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 3	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 4	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 5	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 6	▸	▸	▸	▸	▸	▸	▸	▸

Sound examples, factory 1 noise (unseen)

Speaker	Noisy	OMLSA	non-ML	ML (\|Y\|²)	ML (\|Y\|², λ)	ML (ξ, γ)	ML (γ)	ML (ξ)
Speaker 1	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 2	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 3	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 4	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 5	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 6	▸	▸	▸	▸	▸	▸	▸	▸

Sound examples, babble noise (unseen)

Speaker	Noisy	OMLSA	non-ML	ML (\|Y\|²)	ML (\|Y\|², λ)	ML (ξ, γ)	ML (γ)	ML (ξ)
Speaker 1	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 2	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 3	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 4	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 5	▸	▸	▸	▸	▸	▸	▸	▸
Speaker 6	▸	▸	▸	▸	▸	▸	▸	▸