A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding


Humans demonstrate remarkable abilities to predict physical events in complex scenes. Two classes of models for physical scene understanding have recently been proposed: ``Intuitive Physics Engines'', or IPEs, which posit that people make predictions by running approximate probabilistic simulations in causal mental models similar to physics engines, and memory-based models like convolutional networks, which make judgments based on analogies to stored experiences of previously encountered scenes and outcomes. Here we report four experiments that rigorously compare simulation-based and CNN-based models, where both approaches are concretely instantiated in algorithms that can run on raw image inputs and produce as outputs physical judgments. Both approaches can achieve super-human accuracy levels and can quantitatively predict human judgments to a similar degree, but only the simulation-based models generalize to novel situations in ways that people do, and are qualitatively consistent with systematic perceptual illusions and judgment asymmetries that people show.

Back to Table of Contents