Resources

Software

LT develops and maintains a range of software tools and frameworks for language processing. We have developed NLP tools such as a text segmenter and splitter, a named entity recognizer, a corpus processor and visualizer, a multi-word detector and many more. Further, we develop software such as annotation tools and visualizations for entity networks. Each tool includes a detailed documentation and a user guide. We strive to develop open source software products and most of them have a lenient license that allows for academic and industrial use without restrictions or charges. All of our software is hosted in public github repositories.

Data

We also collect language resources for different NLP research projects. Datasets range from web-scale pre-processed corpora, distributional thesauri, named entity annotation, semantic and lexical substitution, multi-word and complex word annotations to recordings and acoustic models for speech recognition in German. Our datasets are distributed under CC-BY 4.0 license, i.e. free to use for all, whenever possible.

Demos

For a better reachability and visibility of different research projects conducted in the lab, we provide public demos for some of the tools, especially web-based ones. Demos also include a detailed user guide for operating them; they are intended to showcase our software and technology.