Grammar-Based and Lexicon-Based Techniques to Extract Personality Traits from Text


Language provides an important source of information to predict human personality. However, most studies that have predicted personality traits using computational linguistic methods have focused on lexicon-based information. We investigate to what extent the performance of lexicon-based and grammar-based methods compare when predicting personality traits. We analyzed a corpus of student essays and their personality traits using two lexicon-based approaches, one top-down (Linguistic Inquiry and Word Count (LIWC)), one bottom-up (topic models) and one grammar-driven approach (Biber model), as well as combinations of these models. Results showed that the performance of the models and their combinations demonstrated similar performance, showing that lexicon-based top-down models and bottom-up models do not differ, and neither do lexicon-based models and grammar-based models. Moreover, combination of models did not improve performance. These findings suggest that predicting personality traits from text remains difficult, but that the performance from lexicon-based and grammar-based models are on par.

Back to Table of Contents