Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1150402.1150482acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A mixture model for contextual text mining

Published: 20 August 2006 Publication History

Abstract

Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document are usually related to the context of the document, analyzing topical themes within context can potentially reveal many interesting theme patterns. In this paper, we generalize some of these models proposed in the previous work and we propose a new general probabilistic model for contextual text mining that can cover several existing models as special cases. Specifically, we extend the probabilistic latent semantic analysis (PLSA) model by introducing context variables to model the context of a document. The proposed mixture model, called contextual probabilistic latent semantic analysis (CPLSA) model, can be applied to many interesting mining tasks, such as temporal text mining, spatiotemporal text mining, author-topic analysis, and cross-collection comparative analysis. Empirical experiments show that the proposed mixture model can discover themes and their contextual variations effectively.

References

[1]
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[3]
S. Boykin and A. Merlino. Machine learning of event segmentation for news on demand. Commun. ACM, 43(2):35--41, 2000.
[4]
C. C. Chen, M. C. Chen, and M.-S. Chen. Liped: Hmm-based life profiles for adaptive event detection. In Proceeding of KDD '05, pages 556--561, 2005.
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.
[6]
T. L. Griffiths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.
[7]
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of UAI'99.
[8]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of ACM SIGIR'99.
[9]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of KDD '02, pages 91--101.
[10]
A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.
[11]
Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of SIGIR'05, pages 106--113, 2005.
[12]
J. Ma and S. Perkins. Online novelty detection on temporal sequences. In Proceedings of KDD'03, pages 613--618, 2003.
[13]
Q. Mei, C. Liu, H. Su, and C. Zhai. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW '06, pages 533--542, 2006.
[14]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceeding of KDD'05, pages 198--207, 2005.
[15]
R. Nallapati, A. Feng, F. Peng, and J. Allan. Event threading within news topics. In Proceedings of CIKM'04, pages 446--453, 2004.
[16]
J. Perkio, W. Buntine, and S. Perttu. Exploring independent trends in a topic-based search engine. In Proceedings of WI '04, pages 664--668, 2004.
[17]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of KDD'04, pages 306--315, 2004.
[18]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of KDD'04, pages 743--748, 2004.

Cited By

View all
  • (2024)Supervised probabilistic latent semantic analysis with applications to controversy analysis of legislative billsIntelligent Data Analysis10.3233/IDA-22720228:1(161-183)Online publication date: 3-Feb-2024
  • (2023)Context-Aware Customer Needs Identification by Linguistic Pattern Mining Based on Online Product ReviewsIEEE Access10.1109/ACCESS.2023.329545211(71859-71872)Online publication date: 2023
  • (2022)Hybrid Representation to Locate Vulnerable Lines of CodeInternational Journal of Software Innovation10.4018/IJSI.29202010:1(1-19)Online publication date: 4-Mar-2022
  • Show More Cited By

Index Terms

  1. A mixture model for contextual text mining

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. EM algorithm
    2. clustering
    3. context
    4. contextual text mining
    5. mixture model
    6. theme pattern

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Supervised probabilistic latent semantic analysis with applications to controversy analysis of legislative billsIntelligent Data Analysis10.3233/IDA-22720228:1(161-183)Online publication date: 3-Feb-2024
    • (2023)Context-Aware Customer Needs Identification by Linguistic Pattern Mining Based on Online Product ReviewsIEEE Access10.1109/ACCESS.2023.329545211(71859-71872)Online publication date: 2023
    • (2022)Hybrid Representation to Locate Vulnerable Lines of CodeInternational Journal of Software Innovation10.4018/IJSI.29202010:1(1-19)Online publication date: 4-Mar-2022
    • (2022)Defining Success in a Language MOOC From Learner PerspectivesInternational Journal of Computer-Assisted Language Learning and Teaching10.4018/IJCALLT.29110812:1(1-16)Online publication date: 18-Feb-2022
    • (2022)An Approach to Ensure Secure Inter-Cloud Data and Application Migration Using End-to-End Encryption and Content VerificationInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29314813:1(1-21)Online publication date: 8-Apr-2022
    • (2022)Topic Modeling Techniques for Text Mining Over a Large-Scale Scientific and Biomedical Text CorpusInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29313713:1(1-18)Online publication date: 29-Apr-2022
    • (2022)Cloud Intrusion Detection Model Based on Deep Belief Network and Grasshopper OptimizationInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29312313:1(1-24)Online publication date: 8-Apr-2022
    • (2022)Generation of Adversarial Mechanisms in Deep Neural NetworksInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29311113:1(1-18)Online publication date: 25-Mar-2022
    • (2022)Hybrid Approach Using Deep Autoencoder and Machine Learning Techniques for Cyber-Attack DetectionInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.29309813:1(1-21)Online publication date: 6-May-2022
    • (2022)Analytical Model of Customer Purchasing Behavior Considering Event Characteristics on Flower Delivery BusinessTotal Quality Science10.17929/tqs.7.1257:3(125-136)Online publication date: 11-May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media