New Paper Published in Neural Computing and Applications
7 January 2025
Photo: UHH Knowledge Technology
Our group has a new paper published in the journal Neural Computing and Applications.
Title: FabuLight-ASD: unveiling speech activity via body language
Authors: Hugo Carneiro, Stefan Wermter
We introduce FabuLight-ASD, an advanced active speaker detection (ASD) model that integrates facial, audio, and body pose information to enhance detection accuracy and robustness. Building upon the Light-ASD framework, FabuLight-ASD incorporates human pose data via skeleton graphs, minimizing computational overhead. Experiments on the Wilder Active Speaker Detection (WASD) dataset show that FabuLight-ASD achieves an overall mean average precision (mAP) of 94.3%, outperforming Light-ASD's mAP of 93.7% across challenging scenarios. Performance breakdown experiments reveal that body pose data is particularly beneficial in scenarios involving speech impairment, face occlusion, or human voice background noise. These improvements come with only a modest increase in parameter count and multiply-accumulate operations, affirming FabuLight-ASD’s efficiency as a lightweight yet powerful model.
The paper can be found here. Additionally, the code of this paper is available here.