Visual Attention is Attracted by Text Features Even in Scenes without Text


Previous studies have found that viewers’ attention is disproportionately attracted by texts, and one possible reason is that viewers have developed a “text detector” in their visual system to bias their attention toward text features. To verify this hypothesis, we add a text detector module to a visual attention model and test if the inclusion increases the model’s ability to predict eye fixation positions, particularly in scenes without any text. A model including text detector, saliency, and center bias is found to predict viewers’ eye fixations better than the same model without text detector, even in text-absent images. Furthermore, adding the text detector – which was designed for English texts – improves the prediction of both English- and Chinese-speaking viewers’ attention but with a stronger effect for English-speaking viewers. These results support the conclusion that, due to the viewers’ everyday reading training, their attention in natural scenes is biased toward text features.

Back to Table of Contents