A Dynamic Neural Field Model of Speech Cue Compensation

Abstract

Categorical speech content can often be perceived directly from continuous auditory cues in the speech stream, but human-level performance on speech recognition tasks requires compensation for contextual variables like speaker identity. Regression modeling by McMurray and Jongman (2011) has suggested that for many fricative phonemes, a compensation scheme can substantially increase categorization accuracy beyond even the information from 24 un-compensated raw speech cues. Here, we simulate the same dataset instead using a neurally rather than abstractly implemented model: a hybrid dynamic neural field model and connectionist network. Our model achieved slightly lower accuracy than McMurray and Jongman’s but similar accuracy patterns across most fricatives. Results also compared similarly to more recent models that were also less neurally instantiated but somewhat closer fitting to humans in accuracy. An even less abstracted model is an immediate future goal, as is expanding the present model to additional sensory modalities and constancy/compensation effects.


Back to Table of Contents