Web Interfaces for Language Processing Systems SS2017

The following systems are successfully completed as part of the master project web interfaces for language processing systems in the summer semester 2017.

I. AnonML

Machine Learning to anonymize court decisions.

Team

Mirco Franzek and Matthias Schildwächter

Purpose

Court decisions in Germany can only be published in anonymized form.
Anonymization is done by hand.
Only a very small fraction of all decisions is published.
Courts are not capable of anonymizing a larger share.
There is little research about how the law is applied by lower-level courts.

Idea

Create software that is able to anonymize court decisions automatically.
Enable legal research and practitioners to better predict court decisions.
Improve transparency and legitimacy – truly make the world a better place.
Create the foundation for a better database and better search tools.

Requirements

Recognize text fragments that have to be anonymized: names, addresses, locations, license plates, companies, identifiable descriptions etc.
Possibly extract anonymisation rules from anonymized decisions.
Replace or delete these fragments with placeholder.
Calculate confidence scores to require manual control.
No legal knowledge required.

Data

A few hundred decisions from the European Court of Justice

Result

The software suggests possible anonymizations which have to be accepted or declined.
The user can add missing anonymizations.
The application can be retrained with the information received from manual corrected decisions to improve the suggestions and speed up the process.
In the end the anonymized document can be exported.

Documentation

The documentation containing the installation guide and general descriptions is available here.

Source Code

The source code is available here

Demo

II. new/sleak Extension

This project extends the graph and document processing functionalities of the new/s/leak project

Team

Alvin Fazrie and Thorben Wiese

Purpose

Enable adding new entities and keywords in to the system.
Enable adding of new entity types.
Provide keyword graphs alongside the entity graph
Enhance entity and keyword blacklisting
Improve analyzability of connections between entities, keywords and tags.

Documentation

The documentation is available here.

Source Code

The source code is available here.

Demo

The demo is available here, which is based on the Enron Email Dataset

III. News-crawler

Team

Sönke Behrendt

Project Description

This project is crawling, extracting, indexing and processing the content of daily published news articles. The extracted content is indexed in ElasticSearch for further processing. This project also provides tooling to extract and preprocess the content for the NoD project.

Documentation and Source Code

And the link for source code and documentation: https://github.com/thesoenke/news-crawler

Demo

The demo is available here

News

LT|3 June 2025

ACL 2025 accepts 12 papers from LT members

We are blown away by the fact that ACL 2025 has accepted 12 papers from LT members:

Tadesse Destaw Belay, Ahmed Haj Ahmed, Alvin C Grissom II, Iqra Ameer, Grigori Sidorov, Olga Kolesnikova, Seid Muhie  Yimam, CULEMO: Cultural Lenses on Emotion - Benchmarking LLMs for Cross-Cultural Emotion Understanding. ACL 2025 Main. Gregor Geigle*, Florian Schneider*, Carolin Holtermann, Chris Biemann, Radu...

LT|6 March 2025

NAACL 2025 accepts 2 papers from LT members

We are delighted to announce that NAACL 2025 has accepted one paper from LT members.

Fischer, T., Biemann, C. (2025): Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite. Proceedings of NAACL 2025 Demo Track, Albuquerque, NM, USA Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Said Ahmad, Saminu Mohammad Aliyu,...

LT|2 December 2024

COLING 2025 accepts 3 papers from LT members

We are delighted to announce that COLING 2025 has accepted a total of three papers from members of our research group.

Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Grigori Sidorov, Dietrich Klakow, Philip Slusallek, Olga Kolesnikova and Seid Muhie Yimam: Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding. Accepted at COLING 2025. Daryna...

LT|23 September 2024

EMNLP 2024 accepts 4 papers from LT members

We are delighted to announce that EMNLP 2024 has accepted a total of four papers from members of our research group.

Hans Ole Hatzel, Chris Biemann: Story Embeddings — Narrative-Focused Representations of Fictional Stories. Accepted at EMNLP 2024 Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina: Low-Resource Machine Translation through the...

LT|16 May 2024

ACL 2024 accepts 3 papers from LT members

The ACL 2024 accepts 3 papers from LT members:

Viktor Moskvoretskii, Ekaterina Neminova, Alina Lobanova, Alexander Panchenko, Irina Nikishina: TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks. Accepted at ACL 2024 Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew...

LT|22 February 2024

LREC-COLING 2024 accepts 7 papers from LT members

The LREC 2024 Conference has accepted 7 papers co-authored by LT members

Ahmad Shallouf, Hanna Herasimchyk, Mikhail Salnikov, Rudy Garrido Veliz, Natia Mestvirishvili, Alexander Panchenko, Chris Biemann and Irina Nikishina: End-to-End Open Domain Comparative Question Answering
System Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew...

Veranstaltung|26 October 2023

Participation in ERC Synergy Grant CultCryo

Whether in logistics, science or air conditioning in the home - the possibility of artificial cooling has a fundamental influence on the world we live in. Yet this "artificial cryosphere" and its consequences, for example for climate change, have hardly been researched to date. The ERC project "CultCryo" aims to change that. It is investigating how the infrastructure of artificial cooling on the...

Publication|23 October 2023

Paper Accepted in Language Resources and Evaluation (LREV)

The following paper has been accepted and available online in Language Resources & Evaluation:

Anwar, S., Shelmanov, A., Arefyev, N., Panchenko, A., Biemann, C. (2023). Text augmentation for semantic frame induction and parsing. Language Resources & Evaluation. https://doi.org/10.1007/s10579-023-09679-8. (link)

Abstract: Semantic frames are formal structures describing situations, actions or...

LT|21 September 2023

GSCL master's thesis award 2023 goes to Florian Schneider

Every two years the German Society for Language Technology and Computational Linguistics (GSCL) awards the best bachelor and master thesis. At the German Conference on Natural Language Processing, KONVENS 2023 in Ingolstadt, two master's thesis finalists were invited to present their thesis.

This year's award for the best master's thesis goes to Florian Schneider for his thesis 'Self-supervised...

LT|17 July 2023

Two papers accepted at ECAI 2023

The '26th European Conference on Artificial Intelligence' (ECAI 2023) accepted the following papers:

"Using Self-Supervised Dual Constraint Contrastive Learning for Cross-modal Retrieval" - Xintong Wang, Xiaoyu Li, Liang Ding, Sanyuan Zhao, and Chris Biemann

Abstract: In this work, we present a self-supervised dual constraint contrastive method for efficiently fine-tuning the vision-language...

Archive of current news