All computational models of word learning solve the problem of referential ambiguity by integrating information across naming events. This solution is supported by a wealth of empirical evidence from both adults and young children. However, these studies have recently been challenged by new data suggesting that human word learning mechanisms do not scale up to the ambiguity of real naming events. We replicate these experiments, collecting natural naming events both from a tripod-mounted camera and from a head-mounted camera that produced a “child’s-eye” view. Although individual naming events were equally ambiguous from both views, significant learning across events occurred only from the child’s own view. Thus, statistical word learning scales, but only from the right perspective.