A Bit of Clarification: Probabilistic User-in-the-Loop Disambiguation of Multimodal Input in AR

Joseph Rupertus, Parastoo Abtahi

Senior Thesis
Princeton UniversityApril 22, 2026

Abstract

Multimodal inputs in Augmented Reality (AR) are inherently imprecise, noisy, and underspecified, meaning the same input can map to multiple valid interpretations. This is not a limitation of recognition or inference systems, but rather it is fundamentally an interaction problem. We introduce a proof-of-concept AR system that addresses this through user-in-the-loop disambiguation, using a Dynamic Bayesian Network to fuse gaze, voice, and gesture inputs into probability distributions over target and action spaces. The system enters disambiguation mode when no candidate is sufficiently confident, presenting audio and visual feedback to guide users toward a single clarifying input. We evaluate the system through a two-part user study: a multimodal elicitation study and a disambiguation mode comparison. Our fusion system correctly predicted intent in 53.1% of free-form elicitation trials, with the correct intent appearing in the candidate set in 83.3% of trials. Participants successfully disambiguated after a single input in 82.8% of trials.