Nonlinear spatial filtering in multichannel speech enhancement
This website accompanies the journal paper
Kristina Tesch, Timo Gerkmann, "Nonlinear Spatial Filtering in Multichannel Speech Enhancement", IEEE/ACM Trans. Audio, Speech, Language Proc., Vol. 29, pp. 1795-1805, 2021. [doi] [arxiv] [pdf].
The sound examples on this website demonstrate the performane advantage of a joint spatial-spectral nonlinear filter TMMSE in comparison with the classical combination of a linear beamformer and a spectral postfilter, referred to as TMVDR-MMSE. We provide sound examples for the following experimental settings:
- (Section V) An inhomogeneous noise field created by five directional sources emitting non-overlapping Gaussian signals
- (Section IV-B) An inhomogeneous noise field created by five interfering human speakers
- (Section IV-C) Real-world noise data from the CHiME3 database
- (Section IV-A) Noise sampled from a Gaussian mixture distribution with varying heavy-tailedness
V. Interpretation: a nonlinear spatial filter enables superior spatial selectivity
The experimental setup is illustrated by the figure on the right. A target speech source is placed in broadside direction of the two-dimensional microphone array and five directional interfering sources are positioned as indicated by the black boxes. Here we use non-overlapping Gaussian bursts as interfering signals.
IV-B. Inhomogeneous noise field (interfering speech)
IV-C. Real-world CHiME3 noise
The differences between the proposed spatially nonlinear TMMSE and the classical combination of a linear beamformer TMVDR-MMSE are most obvious for directed impulsive noise as the clattering of dishes in the cafeteria noise examples.
|Example 1 (CAF)|
|Example 2 (CAF)|
|Example 3 (PED)|
|Example 4 (PED)|
IV-A. Heavy-tailed noise distribution
|Kurtosis factor q ≈ 8 (female)|
|Kurtosis factor q ≈ 8 (male)|
|Kurtosis factor q ≈ 4 (female)|
|Kurtosis factor q ≈ 4 (male)|