Human communication has a remarkable capacity to describe events that occurred elsewhere and at other times. In particular, when describing complex narratives, speakers must communicate temporal structure using a mixture of words (e.g., “after”), gestures (e.g., pointing rightward for a later event), and discourse structure (e.g., mentioning earlier events first). How do listeners integrate these sources of temporal information to make sense of complex narratives? In two experiments, we systematically manipulated gesture, speech, and order-of-mention to investigate their respective impacts on comprehension of temporal structure. Gesture had a significant effect on interpretations of temporal order. This influence of gesture, however, was weaker than the influence of both speech and order-of-mention. Indeed, in some cases, order-of-mention trumped explicit descriptions in speech; for instance, if ‘earlier’ events were mentioned second, they were sometimes thought to have occurred second. Listeners integrate multiple sources of information to interpret what happened when.