Generalized Representation of Syntactic Structures

Abstract

Language analysis provides important insights into psychological properties of individuals and groups. While the majority of language analysis work has focused on semantics, psychological information is encoded not just in semantics, but also in syntax. We propose Conversation Level Syntax Similarity Metric-Group Representations(CASSIM-GR). This tool builds generalized representations of syntactic structures of documents, thus allowing researchers to distinguish between people and groups based on syntactic differences. CASSIM-GR applies spectral clustering to syntactic similarity matrices and calculates the center of each cluster. This resulting cluster centroid then represents the syntactical structure of the group of documents. To examine the effectiveness of CASSIM-GR, we conduct three experiments across three corpora. In each experiment, we calculate the clustering accuracy and compare our proposed technique to bag-of-words approach. Our results provide evidence for the effectiveness of CASSIM-GR and demonstrate that combining syntactic similarity and tf-idf semantic information improves the total accuracy of group classification.


Back to Table of Contents