The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
These audio examples accompany the paper "The PESQetarian: On the Relevance of Goodheart's Law for Speech Enhancement", accepted at Interspeech 2024. [preprint]
Authors: Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann
Abstract: To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation. For this, we introduce enhancement models that exploit the widely used PESQ measure. Our "PESQetarian" model achieves 3.82 PESQ on VB-DMD while scoring very poorly in a listening experiment. While the obtained PESQ value of 3.82 would imply "state-of-the-art" PESQ-performance on the VB-DMD benchmark, our examples show that when optimizing w.r.t. a metric, an isolated evaluation on the same metric may be misleading. Instead, other metrics should be included in the evaluation and the resulting performance predictions should be confirmed by listening.
! WARNING: Some of the files presented below contain high-pitched or loud artifacts. Please adjust your volume accordingly and protect your hearing !
Filename | Clean | Noisy | MSE | PESQetarian | PESQ-SDR | Direct Optimization |
---|---|---|---|---|---|---|
p232_093.wav |
(PESQ = 4.64)
|
(PESQ = 2.29)
|
(PESQ = 3.69)
|
(PESQ = 4.17)
|
(PESQ = 3.42)
|
(PESQ = 4.38)
|
p232_185.wav |
(PESQ = 4.64)
|
(PESQ = 2.37)
|
(PESQ = 3.30)
|
(PESQ = 4.20)
|
(PESQ = 3.44)
|
(PESQ = 4.06)
|
p232_225.wav |
(PESQ = 4.64)
|
(PESQ = 1.29)
|
(PESQ = 2.02)
|
(PESQ = 2.49)
|
(PESQ = 3.30)
|
(PESQ = 3.63)
|
p257_027.wav |
(PESQ = 4.64)
|
(PESQ = 1.37)
|
(PESQ = 1.83)
|
(PESQ = 3.38)
|
(PESQ = 2.56)
|
(PESQ = 3.20)
|
p257_335.wav |
(PESQ = 4.64)
|
(PESQ = 1.10)
|
(PESQ = 2.39)
|
(PESQ = 3.27)
|
(PESQ = 3.28)
|
(PESQ = 3.27)
|
p257_365.wav |
(PESQ = 4.64)
|
(PESQ = 3.04)
|
(PESQ = 3.56)
|
(PESQ = 4.55)
|
(PESQ = 3.75)
|
(PESQ = 4.00)
|