1 April 2022, by Florian Schneider
The '45th International ACM SIGIR Conference on Research and Development in Information Retrieval' accepted the following demo and short papers respectively:
- "Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus" - Florian Schneider , Chris Biemann
Abstract: In this work, we present the Golden Retriever, a system leveraging state-of-the-art visio-linguistic models for real-time text-image retrieval. The unique feature of our system is that it can focus on words contained in the textual query, i.e., locate and highlight them within retrieved images. An efficient two-stage process implements real-time capability and the ability to focus. Therefore, we first drastically reduce the number of images processed by a VLM. Then, in the second stage, we rank the images and highlight the focussed word using the outputs of a VLM. Further, we introduce a new and efficient algorithm based on the idea of TF-IDF to retrieve images for short textual queries. One of multiple use cases where we employ the Golden Retriever is a language learner scenario, where visual cues for ``difficult'' words within sentences are provided to improve a user's reading comprehension. However, since the backend is completely decoupled from the frontend, the system can be integrated into any other application where images must be retrieved fast. We demonstrate the Golden Retriever with screenshots of a minimalistic user interface.
- "Modern baselines for SPARQL Semantic Parsing" - Debayan Banerjee , Pranav Ajit Nair, Jivat Neet Kaur, Ricardo Usbeck, Chris Biemann
Abstract: In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. We experiment with BART, T5 and PGN (Pointer Generator Networks). We show that T5 requires special input okenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.
The papers will soon be available in our "Publications" section.