Word learning involves mapping observable words onto unobservable speaker intentions, and intention-reading co-develops with language. To explore this interaction we present an agent-based model in which an individual simultaneously learns a lexicon and learns about the speaker's perspective, given a shared context and the speaker's utterances, through Bayesian inference. Simulations with this model show that (i) lexicon-learning and perspective-learning are interdependent: learning one is impossible without some knowledge of the other, (ii) lexicon- and perspective-learning can bootstrap each other, resulting in successful inference of both even when the learner starts with no knowledge of the lexicon and unhelpful assumptions about the minds of others, and (iii) receiving initial input from a `helpful' speaker (who adopts the learner's perspective on the world) paves the way for later learning from speakers with perspectives which diverge from the learner's. This approach represents a first exploration of the co-development dynamics of language and mindreading.