Learning Simple Statistics for Language Comprehension and Production: The CAPPUCCINO Model

Abstract

Whether the input available to children is sufficient to explain their ability to use language has been the subject of much theoretical debate in cognitive science. Here, we present a simple, developmentally motivated computational model that learns to comprehend and produce language when exposed to child-directed speech. The model uses backward transitional probabilities to create an inventory of 'chunks' consisting of one or more words. Language comprehension is approximated in terms of shallow parsing of adult speech and production as the reconstruction of the child's actual utterances. The model functions in a fully incremental, on-line fashion, has broad cross-linguistic coverage, and is able to fit child data from Saffran's (2002) statistical learning study. Moreover, word-based distributional information is found to be more useful than statistics over word classes. Together, these results suggest that much of children's early linguistic behavior can be accounted for in a usage-based manner using distributional statistics.


Back to Table of Contents