“The Godfather” vs.“Chaos”: Comparing Linguistic Analysis based on On-line Knowledge Sources and Bags-of-N-Grams for Movie Review Valence Estimation
B Schuller, J Schenk, G Rigoll… - 2009 10th International …, 2009 - ieeexplore.ieee.org
B Schuller, J Schenk, G Rigoll, T Knaup
2009 10th International Conference on Document Analysis and …, 2009•ieeexplore.ieee.orgIn the fields of sentiment and emotion recognition, bag of words modeling has lately become
popular for the estimation of valence in text. A typical application is the evaluation of reviews
of eg movies, music, or games. In this respect we suggest the use of back-off N-Grams as
basis for a vector space construction in order to combine advantages of word-order
modeling and easy integration into potential acoustic feature vectors intended for spoken
document retrieval. For a fine granular estimate we consider data-driven regression next to …
popular for the estimation of valence in text. A typical application is the evaluation of reviews
of eg movies, music, or games. In this respect we suggest the use of back-off N-Grams as
basis for a vector space construction in order to combine advantages of word-order
modeling and easy integration into potential acoustic feature vectors intended for spoken
document retrieval. For a fine granular estimate we consider data-driven regression next to …
In the fields of sentiment and emotion recognition, bag of words modeling has lately become popular for the estimation of valence in text. A typical application is the evaluation of reviews of e.g. movies, music, or games. In this respect we suggest the use of back-off N-Grams as basis for a vector space construction in order to combine advantages of word-order modeling and easy integration into potential acoustic feature vectors intended for spoken document retrieval. For a fine granular estimate we consider data-driven regression next to classification based on support vector machines. Alternatively the on-line knowledge sources ConceptNet, general inquirer, and WordNet not only serve to reduce out-of-vocabulary events, but also as basis for a purely linguistic analysis. As special benefit, this approach does not demand labeled training data. A large set of 100 k movie reviews of 20 years stemming from Metacritic is utilized throughout extensive parameter discussion and comparative evaluation effectively demonstrating efficiency of the proposed methods.
ieeexplore.ieee.org