Abstract
The architecture of Blue Martini Software's e-commerce suite has supported data collection, data transformation, and data mining since its inception. With clickstreams being collected at the application-server layer, high-level events being logged, and data automatically transformed into a data warehouse using meta-data, common problems plaguing data mining using weblogs (e.g., sessionization and conflating multi-sourced data) were obviated, thus allowing us to concentrate on actual data mining goals. The paper briefly reviews the architecture and discusses many lessons learned over the last four years and the challenges that still need to be addressed. The lessons and challenges are presented across two dimensions: business-level vs. technical, and throughout the data mining lifecycle stages of data collection, data warehouse construction, business intelligence, and deployment. The lessons and challenges are also widely applicable to data mining domains outside retail e-commerce.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
ANSI/X3/SPARC. (1975). Study group on data base management systems. Interim Report, ANSI.
Almuallim, H., Akiba, Y., & Kaneda, S. (1995). On handling tree-structured attributes. In Proceedings of the Twelfth International Conference on Machine Learning (ICML'95) (pp. 12–20). Morgan Kauffmann.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94) (pp. 487–499). Morgan Kauffmann.
Agrawal, R., & Shafer, J. (1996). Parallel mining of association rules. IEEE Transactions of Knowledge and Data Engineering, 8, 962–969. IEEE. http://www.almaden.ibm.com/cs/people/ragrawal/papers/parassoc96.ps.
Ansari, S., Kohavi, R., Mason, L., & Zheng, Z. (2001). Integrating E-commerce and data mining: Architecture and challenges. In Proceedings of the IEEE International Conference on Data Mining (ICDM'2001). IEEE. http://www.lsmason.com/papers/ICDM01-eCommerceMining.pdf.
Aronis, J., & Provost, F. (1997). Increasing the efficiency of data mining algorithms with breadth-first marker propagation. In Proceedings of Knowledge Discovery and Data Mining (KDD'97) (pp. 119–122). AAAI Press.
Becker, B., Kohavi, R., & Sommerfield, D. (2001). Visualizing the simple Bayesian classifier. Information Visualization in Data Mining and Knowledge Discovery, 18, 237–249. Morgan Kaufmann. http://robotics.stanford.edu/ users/ronnyk/ronnyk-bib.html.
Berry, M., & Linoff, G. (1997). Data mining techniques: For marketing, sales, and customer support. JohnWiley and Sons.
Berry, M., & Linoff, G. (2000). Mastering data mining: The art and science of customer relationship management. John Wiley and Sons.
Blue Martini Software. (2003a). Blue Martini business intelligence at work: Charting the terrains of MECWebsite data. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html.
Blue Martini Software. (2003b). Blue Martini business intelligence delivers unparalleled insight into user behavior at the Debenhams Web site. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html.
Catledge, L., & Pitkow, J. (1995). Characterizing browsing strategies in theWorld-WideWeb. Computer Networks and ISDN Systems, 27:6, 1065–1073. Elsevier Science. <http://citeseer.ist.psu.edu/catledge95characterizing>. html.
Chan, P., & Stolfo, S. (1997). On the accuracy of meta-learning for scalable data mining. Journal of Intelligent Information Systems, 8:1, 5–28. Kluwer Academic Publishers. http://www1.cs.columbia.edu/~pkc/ papers/jiis97.ps.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Sherer, C., & Wirth, R. (2000). Cross industry standard process for data mining (CRISP-DM) 1.0. http://www.crisp-dm.org/.
Cheswick,W., & Bellovin, S. (1994). Firewalls and internet security: Repelling the wily hacker. Addison-Wesley Publishing Company.
Cohen,W. (1996). Learning trees and rules with set-valued features. In Proceedings of the AAAI/IAAI Conference, 1, 709–716. AAAI Press.
Collins, J., & Porras, J. (1994). Built to last, successful habits of visionary companies. Harper Collins Publishers.
Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining world wide web browsing patterns.
Knowledge and Information Systems, 1:1. Springer-Verlag. http://maya.cs.depaul.edu/~mobasher/papers/ webminer-kais.ps.
David Shepard Associates. (1998). The newdirect marketing:Howto implement a profit-driven database marketing strategy, 3rd edition. McGraw-Hill.
Domingos, P. (2002). When and how to subsample: Report on the KDD-2001 panel. SIGKDD Explorations, 3:2, 74–76. ACM. http://www.acm.org/sigs/sigkdd/explorations/issue3-2/contents.htm#Domingos.
Elder, J., & Abbott, D. (1998). A comparison of leading data mining tools. Tutorial at the Knowledge Discovery and Data Mining Conference (KDD'98). ACM. http://www.datamininglab.com/pubs/ kdd98 elder abbott nopics bw.pdf.
English, L. (1999). Improving data warehouse and business information quality: Methods for reducing costs and increasing profits. John Wiley & Sons.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (Eds.). (1996). Advances in knowledge discovery and data mining. MIT Press.
Freitas, A. (1998). Tutorial on scalable, high-performance data mining with parallel processing. In Proceedings of the Principles and Practice of Knowledge Discovery in Databases (PKDD'98). Springer.
Freitas, A., & Lavington, S. (1998). Mining very large databases with parallel processing. Kluwer Academic Publishers.
Heaton, J. (2002). Programming spiders, bots, and aggregators in Java. Sybex Book.
Hughes, A. (2000). Strategic database marketing, 2nd edition. McGraw-Hill.
Kimball, R. (1996). The data warehouse toolkit: Practical techniques for building dimensional data warehouses John Wiley & Sons.
Kimball, R., & Merz, R. (2000). The data webhouse toolkit: Building the Web-enabled data warehouse. John Wiley & Sons.
Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W. (1998). The data warehouse lifecycle toolkit: Expert methods for designing, developing, and deploying data warehouses. John Wiley & Sons.
Kohavi, R. (1998). Crossing the Chasm: From academic machine learning to commercial data mining. Invited talk at the Fifteenth International Conference on Machine Learning (ICML'98), Madison,WA.Morgan Kauffmann. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html.
Kohavi, R. (2001). Mining e-commerce data: The good, the bad, and the ugly. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001) (pp. 8–13). ACM Press. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html.
Kohavi, R., Brodley, C., Frasca, B., Mason, L., & Zheng, Z. (2000). KDD-Cup 2000 organizers' report: Peeling the onion. SIGKDD Explorations, 2:2, 86–98. ACM Press. http://robotics.stanford.edu/users/ronnyk/ronnykbib.html.
Kohavi, R., & Provost, F. (2001). Applications of data mining to electronic commerce. Data Mining and Knowledge Discovery, 5:1/2. Kluwer Academic. http://robotics.Stanford.EDU/users/ronnyk/ecommerce-dm.
Kohavi, R., Rothleder, N., & Simoudis, E. (2002). Emerging trends in business analytics. Communications of the ACM, 45:8, 45–48. ACM Press. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html.
Langley, P. (2002). Lessons for the computational discovery of scientific knowledge. Proceedings of the First International Workshop on Data Mining Lessons Learned (DMLL'2002). http://www.hpl.hp.com/personal/Tom Fawcett/DMLL-2002/Langley.pdf.
Lee, J., Podlaseck, M., Schonberg, E., & Hoch, R. (2001). Visualization and analysis of clickstream data of online stores for understandingWeb merchandising. Data Mining and Knowledge Discovery, 5:1/2. Kluwer Academic.
Linoff, G., & Berry, M. (2002). Mining the Web: Transforming customer data. John Wiley and Sons.
Madsen, M. R. (2002). Integrating Web-based clickstream data into the data warehouse. DM Review, August, 2002. http://www.dmreview.com/editorial/dmreview/print action.cfm?EdID=5565.
Maniatty,W., & Zaki, M. (2000). A requirements analysis for parallel (KDD) systems. In Proceedings of the Data Mining Workshop at the International Parallel and Distributed Processing Symposium (IPDPS'2000). IEEE Computer Society.
Mason, L., Zheng, Z., Kohavi, R., & Frasca, B. (2001). Blue Martini eMetrics study. <http://developer>. bluemartini.com.
McJones, P. (1995). The 1995 SQL reunion: People, projects, and politics an informal but first-hand account of the birth of SQL, the history of System R, and the origins of a number of other relational systems inside and outside IBM. http://www.mcjones.org/System R/SQL Reunion 95/sqlr95-System.html.
Pfahringer, B. (2002). Data mining challenge problems: Any lessons learned? In Proceedings of the First International Workshop on Data Mining Lessons Learned (DMLL'2002). http://www.hpl.hp.com/personal/ Tom Fawcett/DMLL-2002/Proceedings.html.
Piatetsky-Shapiro, G., Brachman, R., Khabaza, T., Kloesgen, W., & Simoudis, E. (1996). An overview of issues in developing industrial data mining and knowledge discovery applications. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96) (pp. 89–95). AAAI Press.
Provost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3:2, 131–169. Kluwer Academic.
Pyle, D. (1999). Data preparation for data mining. Morgan Kauffmann.
Quinlan, R. (1986). Induction of decision trees. Machine Learning, 1, 81–106. Kluwer Academic.
Quinlan, R. (1989). Unknown attribute values in induction. In Proceedings of the Sixth International Machine Learning Workshop (ICML'89) (pp. 164–168). Morgan Kauffmann.
Rosset, S., Murad, U., Neumann, E., Idan, Y., & Pinkas, G. (1999). Discovery of fraud rules for telecommunications: Challenges and solutions. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99) (pp. 409–413). ACM Press. http://www-stat.stanford.edu/ %7Esaharon/papers/fraud.pdf.
RuleQuest Research. (2003). C5.0: An informal tutorial. http://www.rulequest.com/see5-unix.html.
Simpson, E. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society, Ser. B, 13, 238–241.
Spiliopoulou, M., Mobasher, B., Berendt, B., & Nakagawa, M. (2003). A framework for the evaluation of session reconstruction heuristics in Web usage. INFORMS Journal of Computing, Special Issue on Mining Web-Based Data for E-Business Applications, 15:2. http://maya.cs.depaul.edu/~mobasher/papers/SMBN03.pdf.
Tan, P., & Kumar, V. (2002). Discovery of Web Robot sessions based on their Navigational patterns. Data Mining and Knowledge Discovery, 6:1, 9–35. Kluwer Academic. http://www-users.cs.umn.edu//~ptan/ Papers/DMKD.ps.gz.
Underhill, P. (2000). Why we buy: The science of shopping. Touchstone Books.
Webb, G. I. (2000). Efficient search for association rules. In Proceedings of the Discovery and Data Mining Conference (KDD 2000) (pp. 99–107). ACM Press. http://portal.acm.org/citation.cfm?id=347112&coll= portal&dl=portal&CFID=8086514&CFTOKEN=81282849.
Zhang, H. (2000). Mining and visualization of association rules over relational DBMSs. PhD thesis, Department of Computer and Information Science and Engineering, The University of Florida. http://citeseer.ist.psu.edu/cache/ papers/cs/20450/http:zSzzSzetd.fcla.eduzSzetdzSzufzSz2000zSzana 7033zSzEtd.pdf/zhang00mining.pdf.
Zhang, J., Silvescu, A., & Honavar, V. (2002). Ontology-driven induction of decision trees at multiple levels of abstraction. In Proceedings of Symposium on Abstraction, Reformulation, and Approximation. Lecture Notes in Artificial Intelligence (Vol. 2371), Springer-Verlag.
Zheng, Z., Kohavi, R., & Mason, L. (2001). Real world performance of association rule algorithms. In Proceedings of the Knowledge Discovery and Data Mining Conference (KDD 2001) (pp. 401–406). ACM Press. http://www.lsmason.com/papers/KDD01-RealAssocPerformance.pdf.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kohavi, R., Mason, L., Parekh, R. et al. Lessons and Challenges from Mining Retail E-Commerce Data. Machine Learning 57, 83–113 (2004). https://doi.org/10.1023/B:MACH.0000035473.11134.83
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000035473.11134.83