The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement
These audio examples accompany the paper "The PESQetarian: On the Relevance of Goodheart's Law for Speech Enhancement", accepted at Interspeech 2024. [preprint]
Authors: Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann
Abstract: To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation. For this, we introduce enhancement models that exploit the widely used PESQ measure. Our "PESQetarian" model achieves 3.82 PESQ on VB-DMD while scoring very poorly in a listening experiment. While the obtained PESQ value of 3.82 would imply "state-of-the-art" PESQ-performance on the VB-DMD benchmark, our examples show that when optimizing w.r.t. a metric, an isolated evaluation on the same metric may be misleading. Instead, other metrics should be included in the evaluation and the resulting performance predictions should be confirmed by listening.