Abstract
This paper presents a knowledge extraction system for providing sales intelligence based on information downloaded from the WWW. The information is first located and downloaded from relevant companies’ websites and then machine learning is used to find these web pages that contain useful information where useful is defined as containing news about orders for specific products. Several machine learning algorithms were tested from which k-nearest neighbour, support vector machines, multi-layer perceptron and C4.5 decision tree produced best results in one or both experiments however k-nearest neighbour and support vector machines proved to be most robust which is a highly desired characteristic in the particular application. K-nearest neighbour slightly outperformed the support vector machines in both experiments which contradicts the results reported previously in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Billsus, D., Pazzani, M.: A Personal News Agent that Talks, Learns and Explains. In: Proceedings of the Third International Conference on Autonomous Agents (Agents 1999), Seattle, Washington (1999)
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks 31(11-16), 1623–1640 (1999)
Cooley, R.: Classification of News Stories Using Support Vector Machines. In: IJCAI 1999 Workshop on Text Mining (1999)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: Proceedings of the 26th International Conference on Very Large Databases (VLDB), pp. 527–534 (2000)
Eikvil, L.: Information Extraction from World Wide Web - A Survey. Technical Report 945 (1999)
Frank, E., Bouckaert, R.R.: Naive Bayes for Text Classification with Unbalanced Classes. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS, vol. 4213, pp. 503–510. Springer, Heidelberg (2006)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)
Kumaran, G., Allan, J.: Text Classification and Named Entities for New Event Detection. In: Proceedings of SIGIR 2004, pp. 297–304 (2004)
le Cessie, S., van Houwelingen, J.C.: Ridge Estimators in Logistic Regression. Applied Statistics 41(1), 191–201 (1992)
Li, Y., Bontcheva, K., Cunningham, H.: SVM Based Learning System For Information Extraction. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS (LNAI), vol. 3635, pp. 319–339. Springer, Heidelberg (2005)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Masand, B., Lino, G., Waltz, D.: Classifying News Stories Using Memory Based Reasoning. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)
Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization (1998)
Menczer, F.: ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery. In: Fisher, D. (ed.) Proceedings of the 14th International Conference on Machine Learning (ICML 1997). Morgan Kaufmann, San Francisco (1997)
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 616–623 (2003)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Selamat, A., Omatu, S.: Web Page Feature Selection and Classification Using Neural Networks. Information Sciences 158, 69–88 (2004)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Wermter, S.: Hung, Ch.: Selforganizing Classification on the Reuters News Corpus. In: Proceedings of the 19th international conference on Computational linguistics, Taipei, Taiwan, pp. 1–7 (2002)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)
Yang, Y., Chute, C.G.: A Linear Least Squares Fit Mapping Method for Information Retrieval from Natural Language Texts. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING 1992), pp. 447–453 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Popova, V., John, R., Stockton, D. (2009). Sales Intelligence Using Web Mining. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2009. Lecture Notes in Computer Science(), vol 5633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03067-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-03067-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03066-6
Online ISBN: 978-3-642-03067-3
eBook Packages: Computer ScienceComputer Science (R0)