This study examined whether listeners keep spatial story representation created by speaker’s cohesive gestures beyond the concurrent sentence. Participants were presented with three-sentence discourse with two protagonists, in the first and second sentences, the gestures consistently assigned the two protagonists in either right or left of the gesture space. The third sentence (without gestures) referred to one of the protagonists, and the participants responded with one of the two keys to indicate the relevant protagonist. The response keys were either spatially congruent or incongruent with the gesturally established locations for the two participants. Experiments 1 and 2 showed that the performance in the congruent condition was better than that in the incongruent condition. Thus, listeners make a spatial story representation based on the gestures, and the spatial representation persists beyond the concurrent sentence, and the information is still activated in a subsequent sentence without a gesture.