Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1935826.1935870acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Mining named entities with temporally correlated bursts from multilingual web news streams

Published: 09 February 2011 Publication History

Abstract

In this work, we study a new text mining problem of discovering named entities with temporally correlated bursts of mention counts in multiple multilingual Web news streams. Mining named entities with temporally correlated bursts of mention counts in multilingual text streams has many interesting and important applications, such as identification of the latent events, attracting the attention of on-line media in different countries, and valuable linguistic knowledge in the form of transliterations. While mining "bursty" terms in a single text stream has been studied before, the problem of detecting terms with temporally correlated bursts in multilingual Web streams raises two new challenges: (i) correlated terms in multiple streams may have bursts that are of different orders of magnitude in their intensity and (ii) bursts of correlated terms may be separated by time gaps. We propose a two-stage method for mining items with temporally correlated bursts from multiple data streams, which addresses both challenges. In the first stage of the method, the temporal behavior of different entities is normalized by modeling them with the Markov-Modulated Poisson Process. In the second stage, a dynamic programming algorithm is used to discover correlated bursts of different items, that can be potentially separated by time gaps. We evaluated our method with the task of discovering transliterations of named entities from multilingual Web news streams. Experimental results indicate that our method can not only effectively discover named entities with correlated bursts in multilingual Web news streams, but also outperforms two state-of-the-art baseline methods for unsupervised discovery of transliterations in static text collections.

Supplementary Material

JPG File (wsdm2011_kotov_mne_01.jpg)
MP4 File (wsdm2011_kotov_mne_01.mp4)

References

[1]
Y. Al-Onaizan and K. Knight. Machine transliteration of names in arabic text. In Proceedings of the ACL'02 Workshop on Computational Approaches to Semitic Languages, pages 1--13, 2002.
[2]
D. Blei and J. Lafferty. Correlated topic models. Advances in Neural Information Processing Systems (NIPS), 18:147--154, 2005.
[3]
S. Chien and N. Immorlica. Semantic similarity between search engine queries using temporal correlation. In Proceedings of the 14th International Conference on World Wide Web, pages 2--11, 2005.
[4]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1):1--38, 1977.
[5]
Y. Ephraim and N. Merhav. Hidden markov processes. IEEE Transactions on Information Theory, 48(6), 2002.
[6]
W. Fischer and K. Meier-Hellstern. The markov-modulated poisson process cookbook. Performance Evaluation, 18(2):149--171, 1993.
[7]
R. G. Gallager. Discrete Stochastic Processes. Springer, 1995.
[8]
T. Idé and K. Inoue. Knowledge discovery from heterogeneous dynamic systems using change-point correlations. In Proceedings of 2005 SIAM International Conference on Data Mining (SDM'05), 2005.
[9]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02), pages 91--101, 2002.
[10]
K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4):599--612, 1998.
[11]
A. Krause, J. Leskovec, and C. Guestrin. Data association for topic intensity tracking. In Proceedings of the 23rd International Conference on Machine Learning (ICDM'06), pages 497--504, 2006.
[12]
J.-S. Kuo, H. Li, and Y.-K. Yang. A phonetic similarity model for automatic extraction of transliteration pairs. ACM Transactions on Asian Language Information Processing, 6(2), 2007.
[13]
I. L. MacDonald and W. Zucchini. Hidden Markov and Other Models for Discrete-valued Time Series. Chapman and Hall, 1997.
[14]
N. Parikh and N. Sundaresan. Scalable and near real-time burst detection from ecommerce queries. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), pages 972--980, 2008.
[15]
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286, 1989.
[16]
M. Sayal. Detecting time correlations in time-series data streams. Technical Report HPL-2004-103, HP Laboratories Palo Alto, 2004.
[17]
Y. Shinyama and S. Sekine. Named entity discovery using comparable news articles. In Proceedings of the 20th International Conference on Computational Linguistics (COLING'04), 2004.
[18]
R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00), pages 49--56, 2000.
[19]
T. Tao and C. Zhai. Mining comparable bilingual text corpora for cross-language information integration. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'05), pages 691--696, 2005.
[20]
X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07), pages 784--793, 2007.
[21]
X. Wang, K. Zhang, X. Jin, and D. Shen. Mining common topics from multiple asynchronous text streams. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09), pages 192--201, 2009.
[22]
T. Zhang, D. Yue, Y. Gu, and G. Yu. Boolean representation based data-adaptive correlation analysis over time series streams. In Proceedings of the 16th International Conference on Information and Knowledge Management (CIKM'07), pages 203--212, 2007.
[23]
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03), pages 336--345, 2003.

Cited By

View all
  • (2016)Data mining for building knowledge bases: techniques, architectures and applicationsThe Knowledge Engineering Review10.1017/S026988891600004731:02(97-123)Online publication date: 31-Mar-2016
  • (2016)Early detection method for emerging topics based on dynamic bayesian networks in micro-blogging networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.03.05057:C(285-295)Online publication date: 15-Sep-2016
  • (2016)Urban Sensing: Potential and Limitations of Social Network Analysis and Data Visualization as Research Methods in Urban StudiesInnovative Methods in Media and Communication Research10.1007/978-3-319-40700-5_13(253-272)Online publication date: 28-Dec-2016
  • Show More Cited By

Index Terms

  1. Mining named entities with temporally correlated bursts from multilingual web news streams

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
      February 2011
      870 pages
      ISBN:9781450304931
      DOI:10.1145/1935826
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 February 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. correlated burst detection
      2. dynamic programming
      3. probabilistic modeling
      4. text streams

      Qualifiers

      • Research-article

      Conference

      Acceptance Rates

      WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2016)Data mining for building knowledge bases: techniques, architectures and applicationsThe Knowledge Engineering Review10.1017/S026988891600004731:02(97-123)Online publication date: 31-Mar-2016
      • (2016)Early detection method for emerging topics based on dynamic bayesian networks in micro-blogging networksExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.03.05057:C(285-295)Online publication date: 15-Sep-2016
      • (2016)Urban Sensing: Potential and Limitations of Social Network Analysis and Data Visualization as Research Methods in Urban StudiesInnovative Methods in Media and Communication Research10.1007/978-3-319-40700-5_13(253-272)Online publication date: 28-Dec-2016
      • (2015)Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of DataProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767756(253-262)Online publication date: 9-Aug-2015
      • (2015)Mining Correlations on Massive Bursty Time Series CollectionsDatabase Systems for Advanced Applications10.1007/978-3-319-18120-2_4(55-71)Online publication date: 9-Apr-2015
      • (2014)Open challenges for data stream mining researchACM SIGKDD Explorations Newsletter10.1145/2674026.267402816:1(1-10)Online publication date: 25-Sep-2014
      • (2013)Chelsea won, and you bought a t-shirtProceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/2492517.2500302(829-836)Online publication date: 25-Aug-2013
      • (2013)Emerging topic detection for organizations from microblogsProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484057(43-52)Online publication date: 28-Jul-2013
      • (2013)Bursty subgraphs in social networksProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433423(213-222)Online publication date: 4-Feb-2013
      • (2012)Identifying event-related bursts via social media activitiesProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391116(1466-1477)Online publication date: 12-Jul-2012
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media