Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5633))

Included in the following conference series:

  • 1723 Accesses

Abstract

This paper presents a knowledge extraction system for providing sales intelligence based on information downloaded from the WWW. The information is first located and downloaded from relevant companies’ websites and then machine learning is used to find these web pages that contain useful information where useful is defined as containing news about orders for specific products. Several machine learning algorithms were tested from which k-nearest neighbour, support vector machines, multi-layer perceptron and C4.5 decision tree produced best results in one or both experiments however k-nearest neighbour and support vector machines proved to be most robust which is a highly desired characteristic in the particular application. K-nearest neighbour slightly outperformed the support vector machines in both experiments which contradicts the results reported previously in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Billsus, D., Pazzani, M.: A Personal News Agent that Talks, Learns and Explains. In: Proceedings of the Third International Conference on Autonomous Agents (Agents 1999), Seattle, Washington (1999)

    Google Scholar 

  2. Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks 31(11-16), 1623–1640 (1999)

    Article  Google Scholar 

  3. Cooley, R.: Classification of News Stories Using Support Vector Machines. In: IJCAI 1999 Workshop on Text Mining (1999)

    Google Scholar 

  4. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)

    Google Scholar 

  5. Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: Proceedings of the 26th International Conference on Very Large Databases (VLDB), pp. 527–534 (2000)

    Google Scholar 

  6. Eikvil, L.: Information Extraction from World Wide Web - A Survey. Technical Report 945 (1999)

    Google Scholar 

  7. Frank, E., Bouckaert, R.R.: Naive Bayes for Text Classification with Unbalanced Classes. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS, vol. 4213, pp. 503–510. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)

    Google Scholar 

  9. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)

    Article  Google Scholar 

  10. Kumaran, G., Allan, J.: Text Classification and Named Entities for New Event Detection. In: Proceedings of SIGIR 2004, pp. 297–304 (2004)

    Google Scholar 

  11. le Cessie, S., van Houwelingen, J.C.: Ridge Estimators in Logistic Regression. Applied Statistics 41(1), 191–201 (1992)

    Article  MATH  Google Scholar 

  12. Li, Y., Bontcheva, K., Cunningham, H.: SVM Based Learning System For Information Extraction. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS (LNAI), vol. 3635, pp. 319–339. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  14. Masand, B., Lino, G., Waltz, D.: Classifying News Stories Using Memory Based Reasoning. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)

    Google Scholar 

  15. Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  16. Menczer, F.: ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery. In: Fisher, D. (ed.) Proceedings of the 14th International Conference on Machine Learning (ICML 1997). Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  17. Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  18. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  19. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 616–623 (2003)

    Google Scholar 

  20. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  21. Selamat, A., Omatu, S.: Web Page Feature Selection and Classification Using Neural Networks. Information Sciences 158, 69–88 (2004)

    Article  MathSciNet  Google Scholar 

  22. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  23. Wermter, S.: Hung, Ch.: Selforganizing Classification on the Reuters News Corpus. In: Proceedings of the 19th international conference on Computational linguistics, Taipei, Taiwan, pp. 1–7 (2002)

    Google Scholar 

  24. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  25. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)

    Article  Google Scholar 

  26. Yang, Y., Chute, C.G.: A Linear Least Squares Fit Mapping Method for Information Retrieval from Natural Language Texts. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING 1992), pp. 447–453 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Popova, V., John, R., Stockton, D. (2009). Sales Intelligence Using Web Mining. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2009. Lecture Notes in Computer Science(), vol 5633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03067-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03067-3_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03066-6

  • Online ISBN: 978-3-642-03067-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics