Top-down effects of apparent humanness on vocal alignment toward human and device interlocutors

AbstractHumans are now regularly speaking to voice-activated artificially intelligent (voice-AI) assistants. Yet, our understanding of the cognitive mechanisms at play during speech interactions with a voice-AI, relative to a real human, interlocutor is an understudied area of research. The present study tests whether top-down guise of “apparent humanness” affects vocal alignment patterns to human and text-to-speech (TTS) voices. In a between-subjects design, participants heard either 4 naturally-produced or 4 TTS voices. Apparent humanness guise varied within-subject. Speaker guise was manipulated via a top-down label with images, either of two pictures of voice-AI systems (Amazon Echos) or two human talkers. Vocal alignment in vowel duration revealed top-down effects of apparent humanness guise: participants showed greater alignment to TTS voices when presented with a device guise (“authentic guise”), but lower alignment in the two inauthentic guises. Results suggest a dynamic interplay of bottom-up and top-down factors in human and voice-AI interaction.

Return to previous page