News

NAACL 2025 accepts 2 papers from LT members

2025-03-05T23:00:00Z

We are delighted to announce that NAACL 2025 has accepted one paper from LT members.

Fischer, T., Biemann, C. (2025): Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite. Proceedings of NAACL 2025 Demo Track, Albuquerque, NM, USA Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Said Ahmad, Saminu Mohammad Aliyu, Nelson Odhiambo Onyango, Lilian D. A. Wanzare, Samuel Rutunda, Lukman Jibril Aliyu, Esubalew Alemneh, Oumaima Hourrane, Hagos Tesfahun Gebremichael, Elyas Abdi Ismail, Meriem Beloucif, Ebrahim Chekol Jibril, Andiswa Bukula, Rooweither Mabuya, Salomey Osei, Abigail Oppong, Tadesse Destaw Belay, Tadesse Kebede Guge, Tesfa Tegegne Asfaw, Chiamaka Ijeoma Chukwuneke, Paul Röttger, Seid Muhie Yimam, Nedjma Ousidhoum (2025): AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages. Proceedings of NAACL 2025, Albuquerque, NM, USA

COLING 2025 accepts 3 papers from LT members

2024-12-01T23:00:00Z

We are delighted to announce that COLING 2025 has accepted a total of three papers from members of our research group.

Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Grigori Sidorov, Dietrich Klakow, Philip Slusallek, Olga Kolesnikova and Seid Muhie Yimam: Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding. Accepted at COLING 2025. Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Alekhseevich Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee and Alexander Panchenko: Multilingual and Explainable Text Detoxification with Parallel Corpora. Accepted at COLING 2025. CompUGE-Bench: Comparative Understanding and Generation Evaluation Benchmark for Comparative Question Answering: Ahmad Shallouf, Irina Nikishina, and Chris Biemann. Accepted at COLING 2025 (DEMO).

Stay tuned for the pre-prints in the upcoming days and weeks.

EMNLP 2024 accepts 4 papers from LT members

2024-09-22T22:00:00Z

We are delighted to announce that EMNLP 2024 has accepted a total of four papers from members of our research group.

Hans Ole Hatzel, Chris Biemann: Story Embeddings — Narrative-Focused Representations of Fictional Stories. Accepted at EMNLP 2024 Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina: Low-Resource Machine Translation through the Lens of Personalized Federated Learning. Accepted at EMNLP 2024 (Findings) Florian Schneider, Sunayana Sitaram: M5 — A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks. Accepted at EMNLP 2024 (Findings) Musashi Hinck, Carolin Holtermann, Matthew Lyle Olson, Florian Schneider, Sungduk Yu, Anahita Bhiwandiwalla, Anne Lauscher, Shaoyen Tseng, and Vasudev Lal: Why do LLaVA Vision-Language Models Reply to Images in English? Accepted at EMNLP 2024 (Findings)

Stay tuned for the pre-prints in the upcoming days and weeks.

ACL 2024 accepts 3 papers from LT members

2024-05-15T22:00:00Z

The ACL 2024 accepts 3 papers from LT members:

Viktor Moskvoretskii, Ekaterina Neminova, Alina Lobanova, Alexander Panchenko, Irina Nikishina: TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Semantic Tasks. Accepted at ACL 2024 Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine de Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Indra Winata, Seid Muhie Yimam, Saif M. Mohammad: SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages. Accepted at ACL 2024 (Findings) Xintong Wang, Jingheng Pan, Liang Ding, Chris Biemann: Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding. Accepted at ACL 2024 (Findings)

LREC-COLING 2024 accepts 7 papers from LT members

2024-02-21T23:00:00Z

The LREC 2024 Conference has accepted 7 papers co-authored by LT members

Ahmad Shallouf, Hanna Herasimchyk, Mikhail Salnikov, Rudy Garrido Veliz, Natia Mestvirishvili, Alexander Panchenko, Chris Biemann and Irina Nikishina: End-to-End Open Domain Comparative Question Answering
System Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow and Seid Muhie Yimam: EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann and Animesh Mukherjee: On Zero-Shot Counterspeech Generation by LLMs Tim Fischer, Florian Schneider, Fynn Petersen-Frey, Anja Silvia Mollah Haque, Isabel Eiser, Gertraud Koch and Chris Biemann: Extending the Discourse Analysis Tool Suite with Whiteboards for Visual Qualitative Analysis Fynn Petersen-Frey and Chris Biemann: Dataset of Quotation Attribution in German News Articles
Authors Hans Ole Hatzel and Chris Biemann: Tell me again! A Large-Scale Dataset of Multiple Summaries for the Same Story Viktor Moskvoretskii, Alexander Panchenko and Irina Nikishina: Are Large Language Models Good at Lexical Semantics? A Case of Taxonomy Learning

Participation in ERC Synergy Grant CultCryo

2023-10-25T22:00:00Z

Whether in logistics, science or air conditioning in the home - the possibility of artificial cooling has a fundamental influence on the world we live in. Yet this "artificial cryosphere" and its consequences, for example for climate change, have hardly been researched to date. The ERC project "CultCryo" aims to change that. It is investigating how the infrastructure of artificial cooling on the planet is linked to cultural practices - exemplified by the areas of food, space cooling, biomedicine and computer science. Among other things, the project will involve a historical reconstruction as well as an ethical discussion of the practices and norms associated with the cryosphere.

The project, which will receive about 9.9 million euros, is coordinated by Dr. Alexander Friedrich of the Leibniz Center for Literary and Cultural Research in Berlin, which has enlisted the project with TU Darmstadt as the lead institution. UHH is involved as a project partner. Prof. Dr. Chris Biemann, Professor of Language Technology at the UHH, and his team will provide the technical and computer science expertise for the digital conceptual history of the "artificial cryosphere" through the „Sense Clustering Over Time“ (SCoT) program. In addition to the TU Darmstadt and the UHH, the universities of Paderborn and Duisburg-Essen, the Australian National University Canberra as well as the University of Halle and the Institute for Social Ecological Research Frankfurt are also involved.

Press release hier

Paper Accepted in Language Resources and Evaluation (LREV)

2023-10-22T22:00:00Z

The following paper has been accepted and available online in Language Resources & Evaluation:

Anwar, S., Shelmanov, A., Arefyev, N., Panchenko, A., Biemann, C. (2023). Text augmentation for semantic frame induction and parsing. Language Resources & Evaluation. https://doi.org/10.1007/s10579-023-09679-8. (link)

Abstract: Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy, Kidnapping, or Exchange. Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping, two key roles are Perpetrator and the Victim, and this frame can be evoked with lexical units abduct, kidnap, or snatcher. While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. To tackle this problem, firstly, we propose a method that takes as input a few frame-annotated sentences and generates alternative lexical realizations of lexical units and semantic roles matching the original frame definition. Secondly, we show that the obtained synthetically generated semantic frame annotated examples help to improve the quality of frame-semantic parsing. To evaluate our proposed approach, we decompose our work into two parts. In the first part of text augmentation for LUs and roles, we experiment with various types of models such as distributional thesauri, non-contextualized word embeddings (word2vec, fastText, GloVe), and Transformer-based contextualized models, such as BERT or XLNet. We perform the intrinsic evaluation of these induced lexical substitutes using FrameNet gold annotations. Models based on Transformers show overall superior performance, however, they do not always outperform simpler models (based on static embeddings) unless information about the target word is suitably injected. However, we observe that non-contextualized models also show comparable performance on the task of LU expansion. We also show that combining substitutes of individual models can significantly improve the quality of final substitutes. Because intrinsic evaluation scores are highly dependent on the gold dataset and the frame preservation, and cannot be ensured by an automatic evaluation mechanism because of the incompleteness of gold datasets, we also carried out experiments with manual evaluation on sample datasets to further analyze the usefulness of our approach. The results show that the manual evaluation framework significantly outperforms automatic evaluation for lexical substitution. For extrinsic evaluation, the second part of this work assesses the utility of these lexical substitutes for the improvement of frame-semantic parsing. We took a small set of frame-annotated sentences and augmented them by replacing corresponding target words with their closest substitutes, obtained from best-performing models. Our extensive experiments on the original and augmented set of annotations with two semantic parsers show that our method is effective for improving the downstream parsing task by training set augmentation, as well as for quickly building FrameNet-like resources for new languages or subject domains.

GSCL master's thesis award 2023 goes to Florian Schneider

2023-09-21T12:00:00Z

Every two years the German Society for Language Technology and Computational Linguistics (GSCL) awards the best bachelor and master thesis. At the German Conference on Natural Language Processing, KONVENS 2023 in Ingolstadt, two master's thesis finalists were invited to present their thesis.

This year's award for the best master's thesis goes to Florian Schneider for his thesis 'Self-supervised Multi-Modal Text-Image Retrieval Methods to Improve Human Reading' supervised by Özge Alaçam, Xintong Wang, and Chris Biemann.

Two papers accepted at ECAI 2023

2023-07-16T22:00:00Z

The '26th European Conference on Artificial Intelligence' (ECAI 2023) accepted the following papers:

"Using Self-Supervised Dual Constraint Contrastive Learning for Cross-modal Retrieval" - Xintong Wang, Xiaoyu Li, Liang Ding, Sanyuan Zhao, and Chris Biemann

Abstract: In this work, we present a self-supervised dual constraint contrastive method for efficiently fine-tuning the vision-language pre-trained (VLP) models that have achieved great success on various cross-modal tasks, since full fine-tune these pre-trained models is computationally expensive and tend to result in catastrophic forgetting restricted by the size and quality of labeled datasets. Our approach freezes the pre-trained VLP models as the fundamental, generalized, and transferable multimodal representation and incorporates lightweight parameters to learn domain and task-specific features without labeled data. We demonstrated that our self-supervised dual contrastive model performs better than previous fine-tuning methods on MS COCO and Flickr 30K datasets on the cross-modal retrieval task, with an even more pronounced improvement in zero-shot performance. Furthermore, experiments on the MOTIF dataset prove that our self-supervised approach remains effective when trained on a small, out-of-domain dataset without overfitting. As a plug-and-play method, our proposed method is agnostic to the underlying models and can be easily integrated with different VLP models, allowing for the potential incorporation of future advancements in VLP models.

"Dimensions of Similarity: Towards Interpretable Dimension-Based Text Similarity" - Hans Ole Hatzel, Fynn Petersen-Frey, Tim Fischer and Chris Biemann.

Abstract: This paper paves the way for interpretable and configurable semantic similarity search, by training state-of-the-art models for identifying textual similarity guided by a set of aspects or dimensions. The similarity models are analyzed as to which interpretable dimensions of similarity they place the most emphasis on. We conceptually introduce configurable similarity search for finding documents similar in specific aspects but dissimilar in others. To evaluate the interpretability of these dimensions, we experiment with downstream retrieval tasks using weighted combinations of these dimensions. Configurable similarity search is an invaluable tool for exploring datasets and will certainly be helpful in many applied natural language processing research applications.

The papers will soon be available in our "Publications" section.

Two papers accepted at ACL 2023

2023-05-08T22:00:00Z

The 'The 61st Annual Meeting of the Association for Computational Linguistics' (ACL 2023) accepted the following demo paper and short findings paper respectively:

"The D-WISE Tool Suite: Multi-Modal Machine-Learning-Powered Tools Supporting and Enhancing Digital Discourse Analysis" - Florian Schneider, Tim Fischer, Fynn Petersen-Frey, Isabel Eiser, Gertraud Koch and Chris Biemann

Abstract: This work introduces the D-WISE Tool Suite (DWTS), a novel working environment for digital qualitative discourse analysis in the Digital Humanities (DH). The DWTS addresses limitations of current DH tools induced by the ever-increasing amount of heterogeneous, unstructured, and multi-modal data in which the discourses of contemporary societies are encoded. To provide meaningful insights from such data, our system leverages and combines state-of-the-art machine learning technologies from Natural Language Processing and Computer Vision. Further, the DWTS is conceived and developed by an interdisciplinary team of cultural anthropologists and computer scientists to ensure the tool's usability for modern DH research. Central features of the DWTS are: a) import of multi-modal data like text, image, audio, and video b) preprocessing pipelines for automatic annotations c) lexical and semantic search of documents d) manual span, bounding box, time-span, and frame annotations e) documentation of the research process.

" The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing" - Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is usually distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary from the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.

The papers will soon be available in our "Publications" section.

CodeAnno accepted for EACL System Demonstrations

2023-02-14T23:00:00Z

The CodeAnno demo paper was accepted for the The 17th Conference of the European Chapter
of the Association for Computational Linguistics (System Demonstrations Track):

Schneider, F., Yimam S.M., Petersen-Frey , F., Biemann, C., von Nordheim, G., Kleinen-von Königslöw, K., (2023): CodeAnno: Extending WebAnno with Hierarchical Document Level Annotation and Automation. The 17th Conference of the European Chapter
of the Association for Computational Linguistics (EACL 2023), System Demonstrations Track, Dubrovnik, Croatia (pdf)

Abstract: WebAnno is one of the most popular annotation tools that supports generic annotation types and distributive annotation with multiple user roles. However, WebAnno focuses on annotating span-level mentions and relations among them, making document-level annotation complicated. When it comes to the annotation and analysis of social science materials, it usually involves the creation of codes to categorize a given document. The codes, which are known as codebooks, are typically hierarchical, which enables to code the document either with a general category or more fine-grained subcategories. CodeAnno is forked from WebAnno and designed to solve the coding problems faced by many social science researchers with the following main functionalities. 1) Creation of hierarchical codebooks, with functionality to move and sort categories in the hierarchy 2) an interactive UI for codebook annotation 3) import and export of annotations in CSV format, hence being compatible with existing annotations conducted using spreadsheet applications 4) integration of an external automation component to facilitate coding using machine learning 5) project templating that allows duplicating a project structure without copying the actual documents. We present different use-cases to demonstrate the capability of CodeAnno.

LT Group will co-organize the AfriSenti-SemEval shared task (Task 12)

2022-08-31T22:00:00Z

LT Group, in collaboration with Masakhane, HausaNLP (Nigeria), ICT4D Research group (Ethiopia), and other researchers working on low-resource NLP, will organize the first AfriSenti-SemEval shared task (Task 12).

The AfriSenti-SemEval Shared Task 12 is based on a collection of Twitter datasets in 13 African languages for sentiment classification. It consists of three sub-tasks. Participants can select one or more tasks depending on their preference.

Task Overview Task A: Monolingual Sentiment Classification

Given training data in a target language, determine the polarity of a tweet in the target language (positive, negative, or neutral). If a tweet For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen.

Task B: Multilingual Sentiment Classification

Given a combined training data from 10 African languages, determine the polarity of a tweet in the target language (positive, negative, or neutral)

Task C: Zero-Shot Sentiment Classification

Given unlabeled tweets in two African languages (Tigrinya and Kinyarwanda), leverage any or all of the available training datasets in Subtasks 1 and 2 to determine the sentiment of a tweet in the two target languages is positive, negative, or neutral.

New Book on Text Mining

2022-07-07T22:00:00Z

The second edition of the German standard textbook on Text Mining. "Wissensrohstoff Text", finally has entered the bookstores.

The book, authored by Chris Biemann, Gerhard Heyer and Uwe Quasthoff, provides a comprehensive understanding of the fundamentals and applications of text mining, illustrated with many examples and sample applications. It is targeted at students of computer science, business informatics, media informatics, computational linguistics or comparable disciplines; computer scientists with a professional interest in language technology and text mining; researchers in application areas of text mining from the humanities and social sciences, especially digital humanities and linguistics.

The glossary of this book provides working definitions for a wide range of terms in Text Mining, NLP and related fields. It can be accessed freely.

Student group receives UHH excellence funding

2022-07-06T22:00:00Z

Ali Ebrahimi Pourasad, Daniel Djahangir, Robert Geislinger and Deniz Gül were selected to receive the prestigious and competitive funding from the University's program for student research groups, which is implemented as part of the excellence strategy at the University of Hamburg for supporting promising student research activities with up to 10.000 Eur.

Ali, Daniel, Robert and Deniz receive the full funding of 10.000 Eur for their project idea "Multimodal Learning - An App to Improve Human Reading with Active Eye-Tracking"

The aim of this project is to develop an application to actively support non-native speakers in learning a new language. The application will automatically recognize difficult words in a text and enrich the text with matching images, so that the identified difficult words are depicted. This is done with the help of machine learning and by tracking the user's eye movements.

The LT group supports this initiative and actively guides this group of highly motivated students.

Photo: LT

Best Student Paper Award at DESRIST 2022

2022-06-19T22:00:00Z

As a part of the INSTANT project, the LT Group collaborated with the WISTS Group on the topic of utilising AI in the area of online customer service, and as a result the following paper won the Best Student Paper award at DESRIST 2022 :

"Let’s Team Up with AI! Toward a Hybrid Intelligence System for Online Customer Service" - Mathis Poser, Christina Wiethof, Debayan Banerjee, Varun Shankar, Richard Paucar, Eva Bittner

The paper can be found here.

LT goes beyond research, glorious success in Cricket

2022-06-14T09:00:00Z

Cricket which is a very popular sport in England, India, Australia, etc., is becoming quite popular in Europe as well. Abhik Jana, a member of LT is also a regular member of one of the cricket clubs in Hamburg, namely THCC Rot-Gelb. On 12th June (2022), THCC Rot-Gelb finishes NDCV T20 Regionalliga 2022 as the winner and Abhik Jana as a core member of this winning team finishes the league as the top scorer. Congratulations to Abhik Jana!!

Photo Courtesy: THCC Rot-Gleb Members.

A Paper Accepted at Semantic Web Journal

2022-04-25T22:01:00Z

The following survey paper is accepted for the special issue 'Deep Learning and Knowledge Graphs' of Semantic Web Journal:

Sevgili, Ö., Shelmanov, A., Arkhipov, M., Panchenko, A., Biemann, C. (2022): Neural entity linking: A survey of models based on deep learning, Semantic Web Journal, vol. 13, no. 3, pp. 527-570, IOS Press (2022), doi:10.3233/SW-222986. (link)

Abstract: This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015 as a result of the “deep learning revolution” in natural language processing. Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks. This work distills a generic architecture of a neural EL system and discusses its components, such as candidate generation, mention-context encoding, and entity ranking, summarizing prominent methods for each of them. The vast variety of modifications of this general architecture are grouped by several common themes: joint entity mention detection and disambiguation, models for global linking, domain-independent techniques including zero-shot and distant supervision methods, and cross-lingual approaches. Since many neural models take advantage of entity and mention/context embeddings to represent their meaning, this work also overviews prominent entity embedding techniques. Finally, the survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models based on the Transformer architecture.

Four papers accepted at LREC 2022

2022-04-05T22:01:00Z

The ' 13th Edition of its Language Resources and Evaluation Conference - LREC 2022' has accepted the following papers:

Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke and Chris Biemann (2022): Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions Debjoy Saha, Shravan Nayak and Timo Baumann (2022): Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts Fynn Petersen-Frey, Marcus Soll, Louis Kobras, Melf Johannsen, Peter Kling and Chris Biemann (2022): Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments Xintong Wang, Florian Schneider, Özge Alaçam, Prateek Chaudhury, and Chris Biemann (2022): MOTIF: Contextualized Images for Complex Words to Improve Human Reading

Two Papers Accepted at SIGIR 2022

2022-03-31T22:01:00Z

The '45th International ACM SIGIR Conference on Research and Development in Information Retrieval' accepted the following demo and short papers respectively:

"Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus" - Florian Schneider , Chris Biemann

Abstract: In this work, we present the Golden Retriever, a system leveraging state-of-the-art visio-linguistic models for real-time text-image retrieval. The unique feature of our system is that it can focus on words contained in the textual query, i.e., locate and highlight them within retrieved images. An efficient two-stage process implements real-time capability and the ability to focus. Therefore, we first drastically reduce the number of images processed by a VLM. Then, in the second stage, we rank the images and highlight the focussed word using the outputs of a VLM. Further, we introduce a new and efficient algorithm based on the idea of TF-IDF to retrieve images for short textual queries. One of multiple use cases where we employ the Golden Retriever is a language learner scenario, where visual cues for ``difficult'' words within sentences are provided to improve a user's reading comprehension. However, since the backend is completely decoupled from the frontend, the system can be integrated into any other application where images must be retrieved fast. We demonstrate the Golden Retriever with screenshots of a minimalistic user interface.

"Modern baselines for SPARQL Semantic Parsing" - Debayan Banerjee , Pranav Ajit Nair, Jivat Neet Kaur, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. We experiment with BART, T5 and PGN (Pointer Generator Networks). We show that T5 requires special input okenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.

The papers will soon be available in our "Publications" section.