Intelligent Virtual Agents

HCI Master Project WS 25/26: Intelligent Virtual Agents

Goals and Motivation

In this project, the goal was to work with intelligent virtual agents - AI-controlled entities mimicking human behavior and communication style. We wanted to find out how to increase the usability, social presence or user experience of IVAs in virtual reality. For this, four project groups with 3-4 participants worked on different topics, all related to IVAs. Over the course of the semester, the students received input on how to conduct user studies in HCI, how to identify suitable literature to strenghten their research, and how to implement a functional prototype. Finally, the students conducted a user study with their prototype, collected quantitative and qualitative data and documented their work in the form of a research paper. A short version of their research, as well as images and videos, will be presented on this website. The four groups researched the following topics:

Supervision

Deictic Interactions with an IVA

Evaluating Eye Gaze, Finger Pointing and Verbal Description in VR

Showcase video of the study

Goal

Intelligent virtual agents (IVAs) integrated into smart glasses and mixed reality headsets offer innovative and intuitive ways to interact with virtual environments. This study investigates how users most effectively reference objects in virtual spaces—using gaze, pointing, or speech. A total of 39 participants completed tasks in virtual reality to evaluate the performance and usability of these interaction methods.

One can see a man wearing a HMD performing deixis in a virtual room. He is pointing at a photo on the wall showing a sunset. The frame of the photo is highlighted red as if it was selected by the man. — Pointing as a method of selection in virtual reality

Procedure

The study was conducted in a controlled virtual reality (VR) environment developed using the Unity engine. Participants wore a Meta Quest Pro head-mounted display (HMD) equipped with eye-tracking capabilities and were instructed to reference objects using one of three methods: gaze, pointing, or speech. To familiarize themselves with each method, participants first practiced in a "playground" scene. They then completed a specific task in a virtual room, such as identifying the oldest image among a set of images. To ensure reliable results and mitigate learning effects, the order of tasks and interaction methods was randomized. After completing each of the three conditions, participants removed the HMD and filled out a questionnaire. Task performance was measured based on completion time and the number of prompts required. Additionally, usability feedback was collected using tools such as the User Experience Questionnaire (UEQ) and NASA Task Load Index (NASA-TLX).

Results

The study encountered several unforeseen challenges, including eye tracker malfunctions, inaccurate speech-to-text transcriptions, and ChatGPT loading issues. Despite these obstacles, significant differences and trends emerged among the interaction methods, highlighting opportunities for further research. Notably, most participants reported frequent use of ChatGPT, underscoring the relevance of this study within the context of growing conversational AI. Interaction using gaze proved to be the fastest method and produced the fewest unclear prompts, while verbal communication was the least precise. Although gaze-based interaction had the lowest accuracy rate (66.7%), this difference was not statistically significant. Usability scores indicated that all methods performed well overall; however, efficiency was lower for both verbal communication and pointing, while gaze scored highest in dependability and clarity. Workload levels were generally moderate, but verbal communication required the most effort. Pointing reduced mental workload but imposed a higher physical workload. Participants expressed distinct preferences: gaze-based interaction was considered the most enjoyable, pointing was seen as the most natural, and many participants favored multimodal interaction to combine the strengths of different methods.

One can see a bar diagram showing the average number of unclear references per condition. Verbal has 1.3, pointing 0.8 and gaze 0.5. Significant differences are indicated between verbal and pointing and verbal and gaze by asterisk brackets.

i) Average no. of unclear references per condition (* = significant difference), ii) Average time needed to finish a task per condition (* = significant difference), iii) Participants' preferences

Outlook

While gaze-based interaction shows promise, technical issues may have influenced the results and should be addressed in future research. Enhancing feedback mechanisms could improve usability across all input methods, reducing mental strain and increasing control. Multimodal interaction is a key avenue for future work, as user preferences vary and a flexible system could improve accessibility. This study highlights the strengths and limitations of each method, paving the way for more natural and effective IVA interactions in VR.

Project Team

Lina Kaschub
Ugur Turhan
Bado Völckers

Virtual Agents vs. Static Picture Instructions in VR-Based Therapeutic Exercises

A Comparative Study

This video shows the agent and pictorial instructions, as well as the audio cues during execution.

Motivation

Virtual Reality (VR) combined with Virtual Agents (VAs) offers new possibilities for home-based rehabilitation, particularly for patients with vestibular disorders such as Benign Paroxysmal Positional Vertigo (BPPV). Traditional instructional methods, such as static picture-text guides, may lack engagement and clarity, making correct execution of therapeutic exercises challenging. This study explores whether VA-guided instructions in VR can enhance exercise execution accuracy, improve User Experience, and increase engagement compared to conventional picture instructions.

Virtual Agent performing rehabilitation exercise in VR