Our research focuses on the derivation and development of novel digital signal processing algorithms for speech, audio, and multimodal signals. Addressed applications include communication devices such as hearing aids and mobile phones, as well as human-machine interfaces such as voice controlled assistants and robots. We aim at finding optimal solutions using statistical signal processing and machine learning given a set of practical constraints. These constraints may include the low algorithmic latency in communications devices, limited computational power in mobile devices, and limited resources for training. The employed methods range from statistical Bayesian models and estimation to modern machine learning techniques including artificial deep neural networks (DNNs).
The following video shows our real-time source separation demo in our varechoic sound studio.
Modern speech communication devices like smartphones and hearing devices are used in many different environments. Particularly in noisy and reverberant environments communication devices, hearing devices and acoustic human-machine interfaces still exhibit limited performance. This can be very annoying, for instance if hearing aid users are unable to follow a conversation, e.g. in a noisy restaurant. Also automatic speech recognition for human-machine interfaces severly suffers from performance degradation when employed in noisy and reverberant environments.
Our group works on developing robust solutions to reduce additive noise and mend the negative effects of reverberation.
Single Channel Speech Enhancement and Source Separation
On these pages you can find an Introduction to Speech Enhancement Research.
Today, state-of-the art performance in single microphone speech enhancement and source separation is achieved by machine learning approaches using DNNs. However, supervised training methods that are commonly employed are likely to exhibit generalization problems leading to performance degradation in conditions not adequately represented in the training set. Here Combining Statistical Signal Processing and Machine Learning can help DNNs to generalize to unseen conditions. Another approach to improve generalization are unsupervised approaches where a very interesting candidate is the Variational Autoencoder for Speech Enhancement, which combines statistical signal processing and machine learning in a particular elegant way.
In many speech enhancement and source separation approaches only the magnitude of speech is estimated and only the magnitude of the noisy observation is employed to estimate clean speech coefficients. In contrast we have shown that Phase-Aware Signal Processing may yield relevant additional information to solve the speech enhancement task.
Nonlinear Multichannel Speech Enhancement and Source Separation
If multiple microphones are available the question if and how artifical intelligence, machine learning and deep neural networks can be used to obtain robust results is less clearly answered than for the single microphone case. Most research today restricts itself to linear spatial processing which is aided by DNNs. This solution is very convenient in practice, as spatial processing and DNN-based single channel postprocessing can be separated. However, we showed in our research on Nonlinear Spatial Filtering that impressive performance gains can be obtained when a nonlinear joint spatial-spectral filter is used instead of the traditional concatenation of linear beamforming and postfiltering. Therefore, using DNNs as flexible nonlinear function approximators is an exciting research topic for multichannel speech enhancement.
Do not miss our exciting audio examples, which strongly motivate further research into DNN-based nonlinear joint spatial spectral processing.
Automatic Recognition of Affect in Social Interactions
The automatic recognition of affect and emotions is an exciting research topic. The Automatic Recognition of Affect in Social Interactions is challenging but of particular interest, as it can help to make meetings more successful. A particular challenge arises in virtual meetings as many of us have experienced during the pandemic.