Evaluating Models of Human Behavior in an Adversarial Multi-Armed Bandit Problem

AbstractWe consider the problem of predicting how humans learn interactively in an adversarial Multi-Armed Bandit (MAB) setting. We are motivated by the use of cyber deception in cybersecurity and the need to design effective decoys to lure attackers. We ran a behavioral study in which humans act as cyber attackers, and try to learn the defense strategy for repeatedly assigning nodes in the network to be decoys. We tested humans against three defenses: a stationary strategy, a static game-theoretic solution, and an adaptive MAB strategy. Our results show that humans have the most difficulty learning against the adaptive defense. We also evaluated five different models for predicting the tested human behavior. We compare the predictive quality of these models using our experimental data, showing that a modified version of Thompson Sampling and a cognitive model based on Instance-Based Learning Theory are the best at replicating human learning from our data.

Return to previous page