Listeners tend to gaze at objects to which they resolve referring expressions. We show that this remains true even when these objects are presented in a virtual 3D environment in which listeners can move freely. We further show that an automated speech generation system that uses eyetracking information to monitor listener’s understanding of referring expressions outperforms comparable systems that do not draw on listener gaze.