Previous studies have found that viewers’ attention is disproportionately attracted by texts, and one possible reason is that viewers have developed a “text detector” in their visual system to bias their attention toward text features. To verify this hypothesis, we add a text detector module to a visual attention model and test if the inclusion increases the model’s ability to predict eye fixation positions, particularly in scenes without any text. A model including text detector, saliency, and center bias is found to predict viewers’ eye fixations better than the same model without text detector, even in text-absent images. Furthermore, adding the text detector – which was designed for English texts – improves the prediction of both English- and Chinese-speaking viewers’ attention but with a stronger effect for English-speaking viewers. These results support the conclusion that, due to the viewers’ everyday reading training, their attention in natural scenes is biased toward text features.