Abstract
This paper is concerned with the problem of mining commodity information from threaded Chinese customer reviews. Chinese online commodity forums, which are developing rapidly, provide a good environment for customers to share reviews. However, due to noises and navigational limitations, it is hard to have a clear view of a commodity from thousands of related reviews. Further more, due to different characters between Chinese and English, Researching approaches may vary a lot. This paper aims to automatically mine out key information from commodity reviews. An effective algorithm, i.e. Chinese Commodity Review Miner (CCRM) is proposed. The algorithm can be divided into two parts. First, we propose an efficient rule based algorithm for commodity feature extraction as well as a probabilistic model for feature ranking. Second, we propose a top-to-down algorithm to reorganize the extracted features into hierarchical structure. A prototype system based on CCRM is also implemented. Using CCRM, users can easily acquire the outline of a commodity, and navigate freely in it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bai, X., Padman, R., Airoldi, E.: On Learning Parsimonious Models for Extracting Consumer Opinions. In: Proc. of HICSS-05, p. 75b (2005)
Baron, F., Hirst, G.: Collocations as Cues to Semantic Orientation. In: Proc. of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications, AAAI Press, Menlo Park (2004)
Bourigault, D.: Lexter: A terminology extraction software for knowledge acquisition from texts. In: Proc. of KAW-95 (1995)
Clemencon, S., Lugosi, G., Vayatis-Manuscript, N.: Ranking and scoring using empirical risk minimization. In: Proc. of the 18th Annual Conference on Learning Theory (2005)
Cohen, W.W., Schapire, R.E., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research 10, 243–270 (1999)
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: The Balancing Act: Combining Symbolic and Statistical Approaches to Language, MIT Press, Cambridge (1996)
Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proc. of WWW-03, pp. 519–528 (2003)
Gamon, M., et al.: Pulse: Mining Customer Opinions from Free Text. In: Proc. of IDA-05, pp. 121-132 (2005)
Haveliwala, T.H.: Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search. IEEE Transactions on Knowledge and Data Engineering (2003)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. of KDD-04 (2004)
Jacquemin, C., Bourigault, D.: Term extraction and automatic indexing. In: Mitkov, R. (ed.) Handbook of Computational Linguistics, Oxford University Press, Oxford (2001)
Justeson, J., Katz, S.: Technical Terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 9–27 (1995)
Lei, M., et al.: Improved relevance ranking in WebGather. Journal of Computer Science and Technology, 410–417 (September 2001)
Liu, B., Hu, M., Cheng, J.: Opinion Observer: Analyzing and Comparing Opinions on the Web. In: Proc. of WWW-05 (2005)
Morinaga, S., et al.: Mining Product Reputations on the Web. In: Proc. of KDD-02 (2002)
Zeng, H., et al.: Learning to cluster web search results. In: Proc. of ACM SIGIR-04, pp. 210–217 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Duan, H., Bao, S., Yu, Y. (2007). CCRM: An Effective Algorithm for Mining Commodity Information from Threaded Chinese Customer Reviews. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)