Screen media, such as television and videos, are a common part of young children’s lives. Yet infants and toddlers have been shown to learn less effectively from screens than from interactions with another person. Using a quasi-experimental design we explored how social factors of screen media co-viewing impact children’s learning outcomes. We observed parents co-viewing a novel word training video with their children, then tested children for immediate and delayed word learning. We then investigated the links between parental speech during co-viewing and children’s subsequent word learning. Parental speech that encouraged children to produce the novel words predicted better retention of word learning, whereas speech that focused more on the video itself rather than the content was negatively associated with learning.