Research

Expertise of our group

The Language Technology group is focusing on Natural Language Processing (NLP) research. Specifically, we are interested in statistical methods that make use of large unannotated text corpora, which nowadays is known under the Big Data Paradigm.

We collaborate with a range of partners in academia and industry. Please also look at our software projects.

Structure Discovery

A focus of our group is on unsupervised and knowledge free methods (e.g., clustering of lexical graphs) or topic models. These methods, which neither presuppose training data nor assume the existence of knowledge resources, identify regularities in large text collections and mark them back into the data, following the structure discovery paradigm. This markup, which is entirely data-driven and therefore independent of domain and language, is then used as features for learning applications in supervised machine learning settings: the utility of structure discovery processes is assessed in an application-based manner.

Statistical Semantics and exAI

We examines statistical methods that reflect natural-language semantics. Specifically we compute semantic similarities and semantic relations between lexical items through the analysis of large texts, and make these available within texts in a contextualized fashion. These relations are used in applications such as semantic indexing, paraphrasing and identification of lexical chains. In this realm, we are not only relying on representation learning with deep neural networks, but also on sparse, interpretable models towards explainable AI.

Crowdsourcing and Interactivity

For obtaining the markup necessary to train supervised language technology components, the group advocates the use of crowdsourcing techniques. Here, unskilled workers are paid small sums to perform small annotation tasks. The advantage of this is the virtually unlimited number of annotators, which makes the creation of training data quick and scalable. Quality is ensured by redundancy and by using qualification tests or test items. A major challenge lies in the formulation of complex annotation tasks needed as simple subtasks suitable for the crowd and in the utilization of crowd signals for improving NLP components and applications through usage.

Interdisciplinary Research

Text is a research material for many disciplines. In our group, we frequently work with researchers from fields as diverse as philosophy, linguistics, education, sociology, psychology and law studies for a semi-automatic access of materials, for informing experimental setups with statistics from language data and for enabling advanced applications. While interdisciplinary research often requires long initial conversations to clarify terminology and research goals, we appreciate the core NLP research questions appearing in these collaborations.

Projekts at HCDS