Human Decisions on Targeted and Non-Targeted Adversarial Sample

AbstractIn a world that relies increasingly on large amounts of data and on powerful Machine Learning (ML) models, the veracity of decisions made by these systems is essential. Adversarial samples are inputs that have been perturbed to mislead the interpretation of the ML and are a dangerous vulnerability. Our research takes a first step into what can be an important innovation in cognitive science: we analyzed human’s judgments and decisions when confronted with targeted (inputs constructed to make a ML model purposely misclassify an input as something else) and non-targeted (a noisy perturbed input that tries to trick the ML model) adversarial samples. Our findings suggest that although ML models that produce non-targeted adversarial samples can be more efficient than targeted samples they result in more incorrect human classifications than those of targeted samples. In other words, non-targeted samples interfered more with human perception and categorization decisions than targeted samples.

Return to previous page