tag:www.inf.uni-hamburg.de,2005:/en/inst/ab/lt/home/newsNews2024-03-08T16:45:59ZNAGR-fakmin-37081365-production2024-02-21T23:00:00ZLREC-COLING 2024 accepts 7 papers from LT members<p>The LREC 2024 Conference has accepted 7 papers co-authored by LT members</p>
Ahmad Shallouf, Hanna Herasimchyk, Mikhail Salnikov, Rudy Garrido Veliz, Natia Mestvirishvili, Alexander Panchenko, Chris Biemann and Irina Nikishina: End-to-End Open Domain Comparative Question Answering<br>System
Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow and Seid Muhie Yimam: EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation
Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann and Animesh Mukherjee: On Zero-Shot Counterspeech Generation by LLMs
Tim Fischer, Florian Schneider, Fynn Petersen-Frey, Anja Silvia Mollah Haque, Isabel Eiser, Gertraud Koch and Chris Biemann: Extending the Discourse Analysis Tool Suite with Whiteboards for Visual Qualitative Analysis
Fynn Petersen-Frey and Chris Biemann: Dataset of Quotation Attribution in German News Articles<br>Authors
Hans Ole Hatzel and Chris Biemann: Tell me again! A Large-Scale Dataset of Multiple Summaries for the Same Story
Viktor Moskvoretskii, Alexander Panchenko and Irina Nikishina: Are Large Language Models Good at Lexical Semantics? A Case of Taxonomy Learning
NAGR-fakmin-36027503-production2023-10-25T22:00:00ZParticipation in ERC Synergy Grant CultCryo<p>Whether in logistics, science or air conditioning in the home - the possibility of artificial cooling has a fundamental influence on the world we live in. Yet this "artificial cryosphere" and its consequences, for example for climate change, have hardly been researched to date. The ERC project "CultCryo" aims to change that. It is investigating how the infrastructure of artificial cooling on the planet is linked to cultural practices - exemplified by the areas of food, space cooling, biomedicine and computer science. Among other things, the project will involve a historical reconstruction as well as an ethical discussion of the practices and norms associated with the cryosphere.</p>
<p>The project, which will receive about 9.9 million euros, is coordinated by Dr. Alexander Friedrich of the Leibniz Center for Literary and Cultural Research in Berlin, which has enlisted the project with TU Darmstadt as the lead institution. UHH is involved as a project partner. Prof. Dr. Chris Biemann, Professor of Language Technology at the UHH, and his team will provide the technical and computer science expertise for the digital conceptual history of the "artificial cryosphere" through the „Sense Clustering Over Time“ (SCoT) program. In addition to the TU Darmstadt and the UHH, the universities of Paderborn and Duisburg-Essen, the Australian National University Canberra as well as the University of Halle and the Institute for Social Ecological Research Frankfurt are also involved.</p>
<p><span class="--l --r sentence_highlight"><span class="--l --r hover:bg-[#B4DAE8]">Press</span> <span class="--l --r hover:bg-[#B4DAE8]">release</span></span> hier </p>NAGR-fakmin-35977287-production2023-10-22T22:00:00ZPaper Accepted in Language Resources and Evaluation (LREV)<p>The following paper has been accepted and available online in Language Resources & Evaluation:</p>
Anwar, S., Shelmanov, A., Arefyev, N., Panchenko, A., Biemann, C. (2023). Text augmentation for semantic frame induction and parsing. Language Resources & Evaluation. https://doi.org/10.1007/s10579-023-09679-8. (link)
<p>Abstract: Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy, Kidnapping, or Exchange. Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping, two key roles are Perpetrator and the Victim, and this frame can be evoked with lexical units abduct, kidnap, or snatcher. While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. To tackle this problem, firstly, we propose a method that takes as input a few frame-annotated sentences and generates alternative lexical realizations of lexical units and semantic roles matching the original frame definition. Secondly, we show that the obtained synthetically generated semantic frame annotated examples help to improve the quality of frame-semantic parsing. To evaluate our proposed approach, we decompose our work into two parts. In the first part of text augmentation for LUs and roles, we experiment with various types of models such as distributional thesauri, non-contextualized word embeddings (word2vec, fastText, GloVe), and Transformer-based contextualized models, such as BERT or XLNet. We perform the intrinsic evaluation of these induced lexical substitutes using FrameNet gold annotations. Models based on Transformers show overall superior performance, however, they do not always outperform simpler models (based on static embeddings) unless information about the target word is suitably injected. However, we observe that non-contextualized models also show comparable performance on the task of LU expansion. We also show that combining substitutes of individual models can significantly improve the quality of final substitutes. Because intrinsic evaluation scores are highly dependent on the gold dataset and the frame preservation, and cannot be ensured by an automatic evaluation mechanism because of the incompleteness of gold datasets, we also carried out experiments with manual evaluation on sample datasets to further analyze the usefulness of our approach. The results show that the manual evaluation framework significantly outperforms automatic evaluation for lexical substitution. For extrinsic evaluation, the second part of this work assesses the utility of these lexical substitutes for the improvement of frame-semantic parsing. We took a small set of frame-annotated sentences and augmented them by replacing corresponding target words with their closest substitutes, obtained from best-performing models. Our extensive experiments on the original and augmented set of annotations with two semantic parsers show that our method is effective for improving the downstream parsing task by training set augmentation, as well as for quickly building FrameNet-like resources for new languages or subject domains.</p>NAGR-fakmin-35661890-production2023-09-21T12:00:00ZGSCL master's thesis award 2023 goes to Florian Schneider<p></p>
<p>Every two years the German Society for Language Technology and Computational Linguistics (GSCL) awards the best bachelor and master thesis. At the German Conference on Natural Language Processing, KONVENS 2023 in Ingolstadt, two master's thesis finalists were invited to present their thesis.</p>
<p>This year's award for the best master's thesis goes to Florian Schneider for his thesis 'Self-supervised Multi-Modal Text-Image Retrieval Methods to Improve Human Reading' supervised by Özge Alaçam, Xintong Wang, and Chris Biemann.</p>
<p></p>NAGR-fakmin-35053817-production2023-07-16T22:00:00ZTwo papers accepted at ECAI 2023<p>The '26th European Conference on Artificial Intelligence' (ECAI 2023) accepted the following papers:</p>
"Using Self-Supervised Dual Constraint Contrastive Learning for Cross-modal Retrieval" - Xintong Wang, Xiaoyu Li, Liang Ding, Sanyuan Zhao, and Chris Biemann
<p>Abstract: In this work, we present a self-supervised dual constraint contrastive method for efficiently fine-tuning the vision-language pre-trained (VLP) models that have achieved great success on various cross-modal tasks, since full fine-tune these pre-trained models is computationally expensive and tend to result in catastrophic forgetting restricted by the size and quality of labeled datasets. Our approach freezes the pre-trained VLP models as the fundamental, generalized, and transferable multimodal representation and incorporates lightweight parameters to learn domain and task-specific features without labeled data. We demonstrated that our self-supervised dual contrastive model performs better than previous fine-tuning methods on MS COCO and Flickr 30K datasets on the cross-modal retrieval task, with an even more pronounced improvement in zero-shot performance. Furthermore, experiments on the MOTIF dataset prove that our self-supervised approach remains effective when trained on a small, out-of-domain dataset without overfitting. As a plug-and-play method, our proposed method is agnostic to the underlying models and can be easily integrated with different VLP models, allowing for the potential incorporation of future advancements in VLP models.</p>
"Dimensions of Similarity: Towards Interpretable Dimension-Based Text Similarity" - Hans Ole Hatzel, Fynn Petersen-Frey, Tim Fischer and Chris Biemann.
<p>Abstract: This paper paves the way for interpretable and configurable semantic similarity search, by training state-of-the-art models for identifying textual similarity guided by a set of aspects or dimensions. The similarity models are analyzed as to which interpretable dimensions of similarity they place the most emphasis on. We conceptually introduce configurable similarity search for finding documents similar in specific aspects but dissimilar in others. To evaluate the interpretability of these dimensions, we experiment with downstream retrieval tasks using weighted combinations of these dimensions. Configurable similarity search is an invaluable tool for exploring datasets and will certainly be helpful in many applied natural language processing research applications.</p>
<p>The papers will soon be available in our "Publications" section.</p>NAGR-fakmin-34476002-production2023-05-08T22:00:00ZTwo papers accepted at ACL 2023<p>The 'The 61st Annual Meeting of the Association for Computational Linguistics' (ACL 2023) accepted the following demo paper and short findings paper respectively:</p>
"The D-WISE Tool Suite: Multi-Modal Machine-Learning-Powered Tools Supporting and Enhancing Digital Discourse Analysis" - Florian Schneider, Tim Fischer, Fynn Petersen-Frey, Isabel Eiser, Gertraud Koch and Chris Biemann
<p>Abstract: This work introduces the D-WISE Tool Suite (DWTS), a novel working environment for digital qualitative discourse analysis in the Digital Humanities (DH). The DWTS addresses limitations of current DH tools induced by the ever-increasing amount of heterogeneous, unstructured, and multi-modal data in which the discourses of contemporary societies are encoded. To provide meaningful insights from such data, our system leverages and combines state-of-the-art machine learning technologies from Natural Language Processing and Computer Vision. Further, the DWTS is conceived and developed by an interdisciplinary team of cultural anthropologists and computer scientists to ensure the tool's usability for modern DH research. Central features of the DWTS are: a) import of multi-modal data like text, image, audio, and video b) preprocessing pipelines for automatic annotations c) lexical and semantic search of documents d) manual span, bounding box, time-span, and frame annotations e) documentation of the research process.</p>
" The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing" - Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann
<p>Abstract: In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is usually distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary from the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.</p>
<p>The papers will soon be available in our "Publications" section.</p>NAGR-fakmin-33651018-production2023-02-14T23:00:00ZCodeAnno accepted for EACL System Demonstrations<p>The CodeAnno demo paper was accepted for the The 17th Conference of the European Chapter<br>of the Association for Computational Linguistics (System Demonstrations Track):</p>
Schneider, F., Yimam S.M., Petersen-Frey , F., Biemann, C., von Nordheim, G., Kleinen-von Königslöw, K., (2023): CodeAnno: Extending WebAnno with Hierarchical Document Level Annotation and Automation. The 17th Conference of the European Chapter<br>of the Association for Computational Linguistics (EACL 2023), System Demonstrations Track, Dubrovnik, Croatia (pdf)
<p>Abstract: WebAnno is one of the most popular annotation tools that supports generic annotation types and distributive annotation with multiple user roles. However, WebAnno focuses on annotating span-level mentions and relations among them, making document-level annotation complicated. When it comes to the annotation and analysis of social science materials, it usually involves the creation of codes to categorize a given document. The codes, which are known as codebooks, are typically hierarchical, which enables to code the document either with a general category or more fine-grained subcategories. CodeAnno is forked from WebAnno and designed to solve the coding problems faced by many social science researchers with the following main functionalities. 1) Creation of hierarchical codebooks, with functionality to move and sort categories in the hierarchy 2) an interactive UI for codebook annotation 3) import and export of annotations in CSV format, hence being compatible with existing annotations conducted using spreadsheet applications 4) integration of an external automation component to facilitate coding using machine learning 5) project templating that allows duplicating a project structure without copying the actual documents. We present different use-cases to demonstrate the capability of CodeAnno.</p>NAGR-fakmin-31922547-production2022-08-31T22:00:00ZLT Group will co-organize the AfriSenti-SemEval shared task (Task 12)
<p>LT Group, in collaboration with Masakhane, HausaNLP (Nigeria), ICT4D Research group (Ethiopia), and other researchers working on low-resource NLP, will organize the first AfriSenti-SemEval shared task (Task 12).</p>
<p>The AfriSenti-SemEval Shared Task 12 is based on a collection of Twitter datasets in 13 African languages for sentiment classification. It consists of three sub-tasks. Participants can select one or more tasks depending on their preference.</p>
Task Overview
Task A: Monolingual Sentiment Classification
<p>Given training data in a target language, determine the polarity of a tweet in the target language (positive, negative, or neutral). If a tweet For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen.</p>
Task B: Multilingual Sentiment Classification
<p><br>Given a combined training data from 10 African languages, determine the polarity of a tweet in the target language (positive, negative, or neutral)</p>
Task C: Zero-Shot Sentiment Classification
<p>Given unlabeled tweets in two African languages (Tigrinya and Kinyarwanda), leverage any or all of the available training datasets in Subtasks 1 and 2 to determine the sentiment of a tweet in the two target languages is positive, negative, or neutral.</p>NAGR-fakmin-31871114-production2022-07-07T22:00:00ZNew Book on Text Mining<p></p>
<p>The second edition of the German standard textbook on Text Mining. "Wissensrohstoff Text", finally has entered the bookstores.</p>
<p>The book, authored by Chris Biemann, Gerhard Heyer and Uwe Quasthoff, provides a comprehensive understanding of the fundamentals and applications of text mining, illustrated with many examples and sample applications. It is targeted at students of computer science, business informatics, media informatics, computational linguistics or comparable disciplines; computer scientists with a professional interest in language technology and text mining; researchers in application areas of text mining from the humanities and social sciences, especially digital humanities and linguistics.</p>
<p>The glossary of this book provides working definitions for a wide range of terms in Text Mining, NLP and related fields. It can be accessed freely.</p>NAGR-fakmin-31313345-production2022-07-06T22:00:00ZStudent group receives UHH excellence funding<img width="293" height="165" style="float:left" src="https://assets.rrz.uni-hamburg.de/instance_assets/fakmin/31313406/multimodalgroup-23b81b8d5138b9828a7c3297356907994ce4970c.png" /><p>Ali Ebrahimi Pourasad, Daniel Djahangir, Robert Geislinger and Deniz Gül were selected to receive the prestigious and competitive funding from the University's program for student research groups, which is implemented as part of the excellence strategy at the University of Hamburg for supporting promising student research activities with up to 10.000 Eur.</p>
<p>Ali, Daniel, Robert and Deniz receive the full funding of 10.000 Eur for their project idea "Multimodal Learning - An App to Improve Human Reading with Active Eye-Tracking"</p>
<p>The aim of this project is to develop an application to actively support non-native speakers in learning a new language. The application will automatically recognize difficult words in a text and enrich the text with matching images, so that the identified difficult words are depicted. This is done with the help of machine learning and by tracking the user's eye movements.</p>
<p>The LT group supports this initiative and actively guides this group of highly motivated students.</p><p>Photo: LT</p>NAGR-fakmin-30993531-production2022-06-19T22:00:00ZBest Student Paper Award at DESRIST 2022<p>As a part of the INSTANT project, the LT Group collaborated with the WISTS Group on the topic of utilising AI in the area of online customer service, and as a result the following paper won the Best Student Paper award at DESRIST 2022 :</p>
"Let’s Team Up with AI! Toward a Hybrid Intelligence System for Online Customer Service" - Mathis Poser, Christina Wiethof, Debayan Banerjee, Varun Shankar, Richard Paucar, Eva Bittner
<p>The paper can be found here.</p>NAGR-fakmin-30936630-production2022-06-14T09:00:00ZLT goes beyond research, glorious success in Cricket<p>Cricket which is a very popular sport in England, India, Australia, etc., is becoming quite popular in Europe as well. Abhik Jana, a member of LT is also a regular member of one of the cricket clubs in Hamburg, namely THCC Rot-Gelb. On 12th June (2022), THCC Rot-Gelb finishes NDCV T20 Regionalliga 2022 as the winner and Abhik Jana as a core member of this winning team finishes the league as the top scorer. Congratulations to Abhik Jana!!</p>
<p></p>
<p>Photo Courtesy: THCC Rot-Gleb Members.</p>NAGR-fakmin-30335671-production2022-04-25T22:01:00ZA Paper Accepted at Semantic Web Journal<p>The following survey paper is accepted for the special issue 'Deep Learning and Knowledge Graphs' of Semantic Web Journal:</p>
Sevgili, Ö., Shelmanov, A., Arkhipov, M., Panchenko, A., Biemann, C. (2022): Neural entity linking: A survey of models based on deep learning, Semantic Web Journal, vol. 13, no. 3, pp. 527-570, IOS Press (2022), doi:10.3233/SW-222986. (link)
<p>Abstract: This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the “deep learning revolution” in natural language processing. Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks. This work distills a generic architecture of a neural EL system and discusses its components, such as candidate generation, mention-context encoding, and entity ranking, summarizing prominent methods for each of them. The vast variety of modifications of this general architecture are grouped by several common themes: joint entity mention detection and disambiguation, models for global linking, domain-independent techniques including zero-shot and distant supervision methods, and cross-lingual approaches. Since many neural models take advantage of entity and mention/context embeddings to represent their meaning, this work also overviews prominent entity embedding techniques. Finally, the survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models based on the Transformer architecture.</p>NAGR-fakmin-30143255-production2022-04-05T22:01:00ZFour papers accepted at LREC 2022<p>The ' 13th Edition of its Language Resources and Evaluation Conference - LREC 2022' has accepted the following papers:</p>
Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke and Chris Biemann (2022): Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions
Debjoy Saha, Shravan Nayak and Timo Baumann (2022): Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Fynn Petersen-Frey, Marcus Soll, Louis Kobras, Melf Johannsen, Peter Kling and Chris Biemann (2022): Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments
Xintong Wang, Florian Schneider, Özge Alaçam, Prateek Chaudhury, and Chris Biemann (2022): MOTIF: Contextualized Images for Complex Words to Improve Human Reading
NAGR-fakmin-30072349-production2022-03-31T22:01:00ZTwo Papers Accepted at SIGIR 2022<p>The '45th International ACM SIGIR Conference on Research and Development in Information Retrieval' accepted the following demo and short papers respectively:</p>
"Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus" - Florian Schneider , Chris Biemann
<p>Abstract: In this work, we present the Golden Retriever, a system leveraging state-of-the-art visio-linguistic models for real-time text-image retrieval. The unique feature of our system is that it can focus on words contained in the textual query, i.e., locate and highlight them within retrieved images. An efficient two-stage process implements real-time capability and the ability to focus. Therefore, we first drastically reduce the number of images processed by a VLM. Then, in the second stage, we rank the images and highlight the focussed word using the outputs of a VLM. Further, we introduce a new and efficient algorithm based on the idea of TF-IDF to retrieve images for short textual queries. One of multiple use cases where we employ the Golden Retriever is a language learner scenario, where visual cues for ``difficult'' words within sentences are provided to improve a user's reading comprehension. However, since the backend is completely decoupled from the frontend, the system can be integrated into any other application where images must be retrieved fast. We demonstrate the Golden Retriever with screenshots of a minimalistic user interface.</p>
"Modern baselines for SPARQL Semantic Parsing" - Debayan Banerjee , Pranav Ajit Nair, Jivat Neet Kaur, Ricardo Usbeck, Chris Biemann<br>
<p>Abstract: In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. We experiment with BART, T5 and PGN (Pointer Generator Networks). We show that T5 requires special input okenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.</p>
<p>The papers will soon be available in our "Publications" section.</p>NAGR-fakmin-29582777-production2022-02-23T23:01:00ZA Paper Accepted at ACL 2022<p>The ‘60th Annual Meeting of the Association for Computational Linguistics (2022)’ accepted the following paper:</p>
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English, Ilias Chalkidis (University of Copenhagen), Abhik Jana (Universität Hamburg), Dirk Hartung (Bucerius Law School - Center for Legal Technology and Data Science), Michael James Bommarito (Michigan State College of Law), Ion Androutsopoulos (Athens University of Economics and Business), Daniel Martin Katz (Illinois Tech - Chicago Kent College of Law), Nikolaos Aletras (University of Sheffield)
<p>Abstract: Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across various tasks in the legal domain. To answer this currently open question, we introduce the Legal General Language Understanding Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way. We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.</p>
<p>The paper will be soon available in our "Publications" section.</p>NAGR-fakmin-28526910-production2021-11-30T23:01:00ZHouse of Computing and Data Science founded<p>The House of Computing and Data Science (HCDS) was officially founded on December 1, 2021. Under the direction of Chris Biemann, the HCDS enables and shapes the digital transformation of science and the humanities at Universität Hamburg and other scientific institutions in the local region.</p>NAGR-fakmin-29835166-production2021-10-31T23:01:00ZLT member wins the GSCL doctoral thesis award in memory of Wolfgang HoeppnerLT member wins the 'GSCL doctoral thesis award in memory of Wolfgang Hoeppner'
<p>Seid Muhie Yimam from the LT group wins the 2018-2020 GSCL Award for best Doctoral Thesis in memory of Wolfgang Hoeppner. The prize is split with Thomas Proisl from FAU Erlangen-Nürnberg.</p>
<p>The prize was awarded in a virtual ceremony on 29.10.2021.</p>NAGR-fakmin-27921876-production2021-10-18T22:00:00ZAngelie Kraft's Master Thesis entitled „Triggering Models: Measuring and Mitigating Bias in German Language Generation" supervised by LT group won EXPO 2021<p>Angelie Kraft has won the EXPO 2021 with a poster presentation of her recently completed Master's thesis "Triggering Models: Measuring and Mitigating Bias in German Language Generation". Congratulations!!</p>
<p></p>
<p>While large language models can generate plausible and human-like texts, unfortunately, they also reproduce harmful stereotypes and biases.<br>The thesis explored the issue of gender bias in a German version of GPT-2 and GPT-3, which is natively fluent in German. Different facets of gender bias were measured with an automated classifier-based approach and additional social scientifically grounded metrics. The classifier was trained and evaluated on a new crowd-sourced dataset. Experiments with a debiasing technique yielded some promising indications.</p>
<p>The thesis repository provides all data, the trained classifier, and scripts for examining and alleviating gender bias: https://github.com/krangelie/bias-in-german-nlg/.</p>
<p>LT group is proud to have won EXPO three years in a row!! (see 2020 and 2019 reports).</p>