Frame Augmented Language Model


N-gram based statistical language model is widely used in NLP applications such as Automated Speech Recognition and Machine Translation due to its ease of use and effectiveness. Given the very simple assumption of this model, the effectiveness of this model is somewhat surprising, but there clearly exist deficiencies such as the inability to account for long-distance dependencies and the lack of considerations for the overall meaning. There have been various approaches to enhance the n-gram language model by incorporating syntactic and semantic elements. In this paper, we explore the use of frames of the Berkeley FrameNet as a way of augmenting the purely statistically driven language model with semantic information; while the conventional n-gram model supplies the overall probability score of the surface form of a sentence, the candidate frames evoked by this sentence provide the means to calculate the conceptual relatedness among the words within the sentence. The two measures are combined via linear interpolation to give an overall score. In addition to boosting the overall performance, we believe that our approach brings the language model a bit closer to the reality of human language processing.

Back to Table of Contents