Abstract
We propose a novel Probabilistic Rating infErence Framework, known as Pref, for mining user preferences from reviews and then mapping such preferences onto numerical rating scales. Pref applies existing linguistic processing techniques to extract opinion words and product features from reviews. It then estimates the sentimental orientations (SO) and strength of the opinion words using our proposed relative-frequency-based method. This method allows semantically similar words to have different SO, thereby addresses a major limitation of existing methods. Pref takes the intuitive relationships between class labels, which are scalar ratings, into consideration when assigning ratings to reviews. Empirical results validated the effectiveness of Pref against several related algorithms, and suggest that Pref can produce reasonably good results using a small training corpus. We also describe a useful application of Pref as a rating inference framework. Rating inference transforms user preferences described as natural language texts into numerical rating scales. This allows Collaborative Filtering (CF) algorithms, which operate mostly on databases of scalar ratings, to utilize textual reviews as an additional source of user preferences. We integrated Pref with a classical CF algorithm, and empirically demonstrated the advantages of using rating inference to augment ratings for CF.
Similar content being viewed by others
References
Adomavicius, G., Kwon, Y.: New recommendation techniques for multicriteria rating systems. IEEE Intell. Syst. 22(3), 48–55 (2007)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Breese, J. S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52 (1998)
Bruce, R., Wiebe, J.: Recognizing subjectivity: a case study of manual tagging. Nat. Lang. Eng., 5(2), 187–205 (1999)
Chesley, P., Vincent, B., Xu, L., Srihari, R.: Using verbs and adjectives to automatically classify blog sentiment. In: Proc. of the Spring Symposia on Computational Approaches to Analyzing Weblogs (2006)
Das, S., Chen, M.: Yahoo! for Amazon: extracting market sentiment from stock message boards. In: Asia Pacific Finance Association Annual Conference (2001)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: 12th International World Wide Web Conference, pp. 519–528 (2003)
Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through gloss classification. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 617–624 (2005)
Esuli, A., Sebastiani, F.: SentiWordNnet: a publicly available lexical resource for opinion mining. In: 5th International Conference on Language Resources and Evaluation (LREC) (2006)
Gamon, M., Aue, A., Corston-Oliver, S., Ringger, E.K.: Pulse: mining customer opinions from free text. In: 6th International Symposium on Intelligent Data Analysis, pp. 121–132 (2005)
Goldberg, A.B., Zhu, X.: Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In: Proc. of the HLT-NAACL Workshop on TextGraphs: Graph-based Algorithms for Natural Language Processing, pp. 45–52 (2006)
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: 8th Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)
Herlocker, J., Konstan, J., Riedl, J.: An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Inf. Retr. 5, 287–310 (2002)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 168–177 (2004)
Hu, M., Liu, B.: Mining opinion features in customer reviews. In: 19th National Conference on Artificial Intelligence, pp. 755–760 (2004)
Jindal, N., Liu, B.: Identifying comparative sentences in text documents. In: 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 244–251 (2006)
Joachims, T.: Making large-scale support vector machine learning practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 41–56. MIT Press (1999)
Kaji, N., Kitsuregawa, M.: Building lexicon for sentiment analysis from massive collection of HTML documents. In: 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1075–1083 (2007)
Kamps, J., Marx, M., Mokken, R.J., de Rijke, M.: Using WordNet to measure semantic orientations of adjectives. In: 4th International Conference on Language Resources and Evaluation (LREC), pp. 1115–1118 (2004)
Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Conference on Computational Linguistics, pp. 1367–1373 (2004)
Leung, C.W.K., Chan, S.C.F., Chung, F.L.: Integrating collaborative filtering and sentiment analysis: a rating inference approach. In: ECAI 2006 Workshop on Recommender Systems, pp. 62–66 (2006)
Leung, C.W.K., Chan, S.C.F., Chung, F.L.: Evaluation of a Rating Inference Approach to Utilizing Textual Reviews for Collaborative Recommendation. Cooperative Internet Computing, World Scientific Publisher (2008)
Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on the web. In: 14th International WWW Conference, pp. 342–351 (2005)
Liu, H.: MontyLingua: an end-to-end natural language processor with common sense. http://web.media.mit.edu/∼hugo/montylingua/ (2004). Accessed 9 February 2011
Liu, J., Yao, J., Wu, G.: Sentiment classification using information extraction technique. In: Advances in Intelligent Data Analysis VI, pp. 216–227 (2005)
Manouselis, N., Costopoulou, C.: Analysis and classification of multi-criteria recommender systems. World Wide Web: Internet and Web Information Systems (WWWJ) 10(4), 415–441 (2007)
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: an online lexical database. Int. J. Lexicogr. (Special Issue), 3(4), 235–312 (1990)
Mishne, G., Glance, N.: Predicting movie sales from blogger sentiment. In: Spring Symposia on Computational Approaches to Analyzing Weblogs (2006)
Okanohara, D., Tsujii, J.: Assigning polarity scores to reviews using machine learning techniques. In: Second International Joint Conference on Natural Language Processing, pp. 314–325 (2005)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: 42nd Annual Meeting of the Association for Computation Linguistics, pp. 271–278 (2004)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: 43rd Annual Meeting of the Association for Computation Linguistics, pp. 115–124 (2005)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Popescu, A., Etzioni, O.: Extracting product features and opinions from reviews. In: Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 339–346 (2005)
Resnick, P., Iacovou, N., Suchak, M., Bergstorm, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of Netnews. In: ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)
Ricci, F.: Travel recommender systems. IEEE Intell. Syst. 17(6), 55–57 (2002)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
Riloff, E., Wiebe, J.: Learning extraction patterns for subjective expressions. In: Conference on Empirical Methods in Natural Language Processing, pp. 105–112 (2003)
Smola, A.J., Scholkopf, B.: A tutorial on support vector regression. Technical Report NC2-TR-1998-030, NeuroCOLT2, Royal Holloway College, University of London (1998)
Snyder, B., Barzilay, R.: Multiple aspect ranking using the good grief algorithm. In: Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 300–307 (2007)
Taboada, M., Anthony, C., Voll, K.: Methods for creating semantic orientation dictionaries. In: 5th International Conference on Language Resources and Evaluation (LREC), pp. 427–432 (2006)
Thomas, M., Pang, B., Lee, L.: Get out the vote: determining support or opposition from Congressional floor-debate transcripts. In: Conference on Empirical Methods in Natural Language Processing, pp. 327–335 (2006)
Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424 (2002)
Turney, P.D., Littman, M.L.: Measuring praise and criticism: inference of semantic orientation from association. ACM Trans. Inf. Sys. 21(4), 315–346 (2003)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer (1995)
Wiebe, J., Bruce, R., Bell, M., Martin, M., Wilson, T.: A corpus study of evaluative and speculative language. In: 2nd ACL SIGdial Workshop on Discourse and Dialogue (2001)
Wiebe, J., Bruce, R., O’Hara, T.: Development and use of a gold-standard data set for subjectivity classifications. In: 37th Annual Meeting of the Association for Computational Linguistics, pp. 246–253 (1999)
Wilson, T., Wiebe, J., Hwa, R.: Just how mad are you? Finding strong and weak opinion clauses. In: 19th National Conference on Artificial Intelligence, pp. 761–769 (2004)
Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intell. Syst. 17(5), 58–63 (2002)
Yi, J., Nasukawa, T., Bunescu, R., Niblack, W.: Sentiment Analyzer: extracting sentiments about a given topic using natural language processing techniques. In: 3rd IEEE International Conference on Data Mining (ICDM), pp. 427–434 (2003)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is an extension of the preliminary work reported in the papers “Integrating collaborative filtering and sentiment analysis: A rating inference approach” [21] and “Evaluation of a Rating Inference Approach to Utilizing Textual Reviews for Collaborative Recommendation” [22].
This work was done when Cane Wing-ki Leung was with the Department of Computing, The Hong Kong Polytechnic University.
Rights and permissions
About this article
Cite this article
Leung, C.Wk., Chan, S.Cf., Chung, Fl. et al. A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14, 187–215 (2011). https://doi.org/10.1007/s11280-011-0117-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0117-5