Optimal Language Learning: The Importance of Starting Representative


Child-directed speech has a distinctive structure and may have facilitatory effects on children’s language learning. We con- sider these facilitatory effects from the perspective of Marr’s levels of analysis: could they arise at the computational level or must they be located at at the algorithmic or implementation levels? To determine if the effects could be due to computa- tional level benefits, we examine the question of what samples from a language should best facilitate learning by identifying the optimal linguistic input for an ideal Bayesian learner. Our analysis leads to a mathematical definition of the “represen- tativeness” of linguistic data, which can be computed for any probabilistic model of language learning. We use this measure to re-examine the debate over whether language learning can be improved by “starting small” (i.e. learning from data that have limited complexity). We compare the representativeness of corpora with differing levels of complexity, showing that while optimal corpora for a complex language are also com- plex, it is possible to construct relatively good corpora with limited complexity. We discuss the implications of these re- sults for the level of analysis at which a benefit of starting small must be located.

Back to Table of Contents