Parsing actions entails that relations between objects are discovered. A pervasively neural account of this process requires that fundamental problems are solved: the neural pointer problem, the binding problem, and the problem of generating discrete processing steps from time-continuous neural processes. We present a prototypical solution to these problems in a neural dynamic model that comprises dynamic neural fields holding representations close to sensorimotor surfaces as well as dynamic nodes holding discrete, language-like representations. Making the connection between these two types of representations enables the model to parse actions as well as ground movement phrases - all based on real visual input. We demonstrate how the dynamic neural processes autonomously generate the processing steps required to parse or ground object-oriented action.