Knowledge Technology at ESANN'24
22 October 2024
Our group members Ozan Özdemir and Connor Gäde attended the ESANN'24 conference in Bruges, Belgium. (October 9-11 2024). They presented their paper "Embodying Language Models In Robot Action". Here is more information about the paper:
Authors: Connor Gaede , Ozan Özdemir , Cornelius Weber , Stefan Wermter
Abstract: Large language models (LLMs) have achieved significant recent success in deep learning. The remaining challenges in robotics and human-robot interaction (HRI) still need to be tackled but off-the-shelf pre-trained LLMs with advanced language and reasoning capabilities can provide solutions to problems in the field. In this work, we realise an open-ended HRI scenario involving a humanoid robot communicating with a human while performing robotic object manipulation tasks at a table. To this end, we combine pre-trained general models of speech recognition, vision-language, text-to-speech and open-world object detection with robot-specific models of visuospatial coordinate transfer and inverse kinematics, as well as a task-specific motion model. Our experiments reveal robust performance by the language model in accurately selecting the task mode and by the whole model in correctly executing actions during open-ended dialogue. Our innovative architecture enables a seamless integration of open-ended dialogue, scene description, open-world object detection and action execution. It is promising as a modular solution for diverse robotic platforms and HRI scenarios.
You can reach the full paper here.