Open Source Acoustic Models for German Distant Speech Recognition
Open acoustic models and speech data for German speech recognition
In the course of the BMBF project Dialog+, the LT and the Teleccoperation group have developed acoustic models for German distant speech recognition. These have been built with the open source software toolkits Sphinx and Kaldi. Unfortunately, German data resources needed to train such acoustic are rarely open source and easily accessible. We thus decided to record our own German speech data corpus, which we have now released under an open source license (CC-BY). Pretrained models and scripts to generate those are also available (see download links below) and are released under the same permissive CC-BY license. Update: we continued our efforts to build open source German acoustic models at Universität Hamburg and added speech data from the Spoken Wikipedia Corpus (SWC) project to our training recipes. Note that we discontinued pretrained Sphinx models and only offer pretrained Kaldi models from now on, since Kaldi has become the defacto standard toolkit for (open source) automatic speech recognition.
The recording of our own speech data corpus was supported by the BMBF project dialog+:
- Project homepage: dialogplus.eu
- Github project page with new and current Kaldi models: kaldi-tuda-de project
|Overall duration per microphone:||
about 36 hours (31 hrs train / 2.5 hrs dev / 2.5 hrs test)
|Count of microphones:||3 (Microsoft Kinect, Yamaha, Samson)|
|Count of wave-files per microphone:||about 14500|
|Overall count of participations:||180 (130 male / 50 female)|
What is the difference to the freely available German Voxforge corpus?
- We have recorded all our speech data under controlled conditions: same room, same microphone distances, ...
- We recorded with three microphones in parallel. An additional signal was recorded with enabled beamforming and noise reduction (Microsoft Kinect).
- The data is curated, to reduce speaking errors and artefacts.