Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1401890.1402015acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Learning from multi-topic web documents for contextual advertisement

Published: 24 August 2008 Publication History

Abstract

Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges. Often advertisers wish to either target (or avoid) some specific content on web pages which may appear only in a small part of the page. Learning for these targeting tasks is difficult since most training pages are multi-topic and need expensive human labeling at the sub-document level for accurate training. In this paper we investigate ways to learn for sub-document classification when only page level labels are available - these labels only indicate if the relevant content exists in the given page or not. We propose the application of multiple-instance learning to this task to improve the effectiveness of traditional methods. We apply sub-document classification to two different problems in contextual advertising. One is "sensitive content detection" where the advertiser wants to avoid content relating to war, violence, pornography, etc. even if they occur only in a small part of a page. The second problem involves opinion mining from review sites - the advertiser wants to detect and avoid negative opinion about their product when positive, negative and neutral sentiments co-exist on a page. In both these scenarios we present experimental results to show that our proposed system is able to get good block level labeling for free and improve the performance of traditional learning methods.

References

[1]
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems 15, 2002.
[2]
K. Ataman, W. N. Street, and Y. Zhang. Learning to rank by maximizing AUC with linear programming. In International Joint Conference on Neural Network, 2006.
[3]
S. Corston-Oliver, E. Ringger, M. Gamon, and R. Campbell. Integration of email and task lists. In Proceedings of Fourth Conference on Email and Anti-Spam, 2004.
[4]
T. G. Dietterich, R. H. Lathrop, and T. Lozano-Perez. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1-2):31--71, 1997.
[5]
J. A. Hanley and B. J. McNeil. The meaning and the use of the area under a receiver operating characteristics (ROC) curve. Radiology, 143:29--36, 1982.
[6]
X. Jin, Y. Li, and J. T. Teresa Mah. Sensitive webpage classification for content advertising. In 1st International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD'07), 2007.
[7]
J. D. Keeler, D. E. Rumelhart, and W.-K. Leow. Integrated segmentation and recognition of hand-printed numerals. In NIPS-3: Proceedings of the 1990 conference on Advances in neural information processing systems 3, pages 557--563, San Francisco, CA, USA, 1990. Morgan Kaufmann Publishers Inc.
[8]
O. Maron and T. Lozano-Pérez. A framework for multiple-instance learning. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, Advances in Neural Information Processing Systems, volume 10. The MIT Press, 1998.
[9]
O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In ICML '98: Proceedings of the Fifteenth International Conference on Machine Learning, pages 341--349, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.
[10]
L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. Advances in Neural Information Processing Systems, 12:512--518, 2000.
[11]
M. Minsky and S. Papert. Perceptrons: an introduction computational geomery. MIT Press, 1969.
[12]
B. Pang and L. Lee. Cornell movie review data repository, http://www.cs.cornell.edu/people/pabo/movie-reviewdata.
[13]
B. Pang and L. Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the ACL, pages 271--278, 2004.
[14]
B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002.
[15]
J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines, Technical Report 98-14, Microsoft Research, 1998.
[16]
R. J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, San Manteo, CA, 1993.
[17]
D. Shen, Z. Chen, Q. Yang, H.-J. Zeng, B. Zhang, Y. Lu, and W.-Y. Ma. Web-page classification through summarization. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 242--249, New York, NY, USA, 2004. ACM.
[18]
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
[19]
P. Viola, J. C. Platt, and C. Zhang. Multiple instance boosting for objective detection. Advances in Neural Information Processing Systems, 18:1417--1426, 2006.
[20]
J. Wang and J.-D. Zucker. Solving the multiple-instance problem: A lazy learning approach. In ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, pages 1119--1126, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[21]
L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 296--305, New York, NY, USA, 2003. ACM.
[22]
Q. Zhang, S. Goldman, W. Yu, and J. Fritts. Content-based image retrieval using multiple-instance learning. In Proceedings of 19th International Conference on Machine Learning, pages 682--689, 2002.
[23]
Q. Zhang and S. A. Goldman. EM-DD: An improved multiple-instance learning technique. In In Proceedings of Neural Information Processing Systems 14, 2001.

Cited By

View all
  • (2024)AI-Driven Contextual Advertising: Toward Relevant Messaging Without Personal DataJournal of Current Issues & Research in Advertising10.1080/10641734.2024.233493945:3(301-319)Online publication date: 29-Apr-2024
  • (2024)Double similarities weighted multi-instance learning kernel and its applicationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121900238:PBOnline publication date: 27-Feb-2024
  • (2024)Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageAnalysis of Images, Social Networks and Texts10.1007/978-3-031-54534-4_2(21-35)Online publication date: 12-Mar-2024
  • Show More Cited By

Index Terms

  1. Learning from multi-topic web documents for contextual advertisement

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2008
    1116 pages
    ISBN:9781605581934
    DOI:10.1145/1401890
    • General Chair:
    • Ying Li,
    • Program Chairs:
    • Bing Liu,
    • Sunita Sarawagi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. contextual advertising
    2. opinion mining
    3. sensitive content detection
    4. sub-document classification

    Qualifiers

    • Research-article

    Conference

    KDD08

    Acceptance Rates

    KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AI-Driven Contextual Advertising: Toward Relevant Messaging Without Personal DataJournal of Current Issues & Research in Advertising10.1080/10641734.2024.233493945:3(301-319)Online publication date: 29-Apr-2024
    • (2024)Double similarities weighted multi-instance learning kernel and its applicationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121900238:PBOnline publication date: 27-Feb-2024
    • (2024)Benchmarking Multilabel Topic Classification in the Kyrgyz LanguageAnalysis of Images, Social Networks and Texts10.1007/978-3-031-54534-4_2(21-35)Online publication date: 12-Mar-2024
    • (2023)A Comprehensive Review on Multiple Instance LearningElectronics10.3390/electronics1220432312:20(4323)Online publication date: 18-Oct-2023
    • (2023)Natural-Annotation-Based Malay Multiword Expressions Extraction and ClusteringComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23793-5_13(143-152)Online publication date: 26-Feb-2023
    • (2022)Multiple Instance Learning for Emotion Recognition Using Physiological SignalsIEEE Transactions on Affective Computing10.1109/TAFFC.2019.295411813:1(389-407)Online publication date: 1-Jan-2022
    • (2022)Machine‐Learning and Deep‐Learning Techniques in Social SciencesMachine Learning Algorithms for Signal and Image Processing10.1002/9781119861850.ch23(409-428)Online publication date: 18-Nov-2022
    • (2021)Structure-sensitive graph-based multiple-instance semi-supervised learningSādhanā10.1007/s12046-021-01659-446:3Online publication date: 3-Aug-2021
    • (2020)Identifying machine learning techniques for classification of target advertisingICT Express10.1016/j.icte.2020.04.0126:3(175-180)Online publication date: Sep-2020
    • (2019)Fusing Visual and Textual Information to Determine Content Safety2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)10.1109/ICMLA.2019.00324(2026-2031)Online publication date: Dec-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media