Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1367497.1367560acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Automatic online news issue construction in web environment

Published: 21 April 2008 Publication History

Abstract

In many cases, rather than a keyword search, people intend to see what is going on through the Internet. Then the integrated comprehensive information on news topics is necessary, which we called news issues, including the background, history, current progress, different opinions and discussions, etc. Traditionally, news issues are manually generated by website editors. It is quite a time-consuming hard work, and hence real-time update is difficult to perform. In this paper, a three-step automatic online algorithm for news issue construction is proposed. The first step is a topic detection process, in which newly appearing stories are clustered into new topic candidates. The second step is a topic tracking process, where those candidates are compared with previous topics, either merged into old ones or generating a new one. In the final step, news issues are constructed by the combination of related topics and updated by the insertion of new topics. An automatic online news issue construction process under practical Web circumstances is simulated to perform news issue construction experiments. F-measure of the best results is either above (topic detection) or close to (topic detection and tracking) 90%. Four news issue construction results are successfully generated in different time granularities: one meets the needs like "what's new", and the other three will answer questions like "what's hot" or "what's going on". Through the proposed algorithm, news issues can be effectively and automatically constructed with real-time update, and lots of human efforts will be released from tedious manual work.

References

[1]
http://www.nist.gov/speech/tests/tdt/index.htm
[2]
Q. He, K. Chang, and E.-P. Lim. A model for Anticipatory Event Detection. In ER, pages 168--181, 2006.
[3]
Y. Yang, T. Pierce, and J. Carbonell. A Study of Retrospective and On-line Event Detection. In Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, Australia. ACM Press. 1998, 28--36.
[4]
N. Stokes and J. Carthy. Combining Semantic and Syntactic Document Classifiers to Improve First Story Detection. In Proceedings of the 24th Annual International ACM SIGIR Conference, New Orleans. ACM Press. 2001, 424--425.
[5]
T. Brants, F. Chen, and A. Farahat. A System for New Event Detection. In Proceedings of the 26th Annual International ACM SIGIR Conference, New York, NY, USA. ACM Press. 2003, 330--337.
[6]
G. Kumaran and J. Allan. Text Classification and Named Entities for New Event Detection. In Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK, ACM Press. 2004, 297--304.
[7]
K. Zhang, J. Li, and G. Wu. New Event Detection Based on Indexing-tree and Named Entity. In Proceedings of the 30th Annual International ACM SIGIR Conference, Amsterdam, the Netherlands. ACM Press. 2007, 215--222.
[8]
M. Spitters and W. Kraaij. TNO at TDT2001: Language Model-Based Topic Detection. Topic Detection and Tracking Workshop Report, 2001.
[9]
Y. Zhao and G. Karypis. Criterion Functions for Document Clustering. Technical Report, 2005.
[10]
C. J. van Rijsbergen, Information Retrieval, Buttersworth, London, second edition, 1989.
[11]
M. Steinbach, G. Karypis and V Kumar. A Comparison of Document Clustering Techniques. KDD Workshop on Text Mining, 2000.
[12]
H. Fang, T. Tao, C. Zhai. A Formal Study of Information Retrieval Heuristics. In Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK, ACM Press. 2004, 49--56.
[13]
J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, Australia. ACM Press. 1998, 37--45.
[14]
Q. He, K. Chang, and E.-P. Lim. Analyzing Feature Trajectories for Event Detection. In Proceedings of the 30th Annual International ACM SIGIR Conference, Amsterdam, the Netherlands. ACM Press. 2007, 207--214.
[15]
Overview of the TDT 2004 Evaluation and Results, http://www.nist.gov/speech/tests/tdt/tdt2004/papers/NIST-TDT2004.ppt
[16]
D. Trieschnigg and W. Kraaij. TNO Hierarchical topic detection report at TDT 2004. Topic Detection and Tracking Workshop Report, 2004.
[17]
M.-Q. Yu, W.-H. Luo, Z.-T. Zhou and S. Bai. ICT's Approaches to HTD and Tracking at TDT2004. Topic Detection and Tracking Workshop Report, 2004.
[18]
M. Connell, A. Feng, G. Kumaran, and et al. UMass at TDT 2004. Topic Detection and Tracking Workshop Report, 2004.
[19]
G. P. C. Fung, J. X. Yu, H. Liu and P. S. Yu. Time-Dependent Event Hierarchy Construction. In Proceedings of KDD-2007, pages 300--309, California, USA, 2007.
[20]
The 2001 TDT task definition and evaluation plan, http://www.nist.gov/speech/tests/tdt/tdt2001/evalplan.htm.
[21]
A. Savona, A. Gulli and L. Foschini. US patent: Systems and methods for selecting and organizing information using temporal clustering, pending. (Application number: 20070260586).

Cited By

View all
  • (2023)Heterogeneous Graph Transformer for Meta-structure Learning with Application in Text ClassificationACM Transactions on the Web10.1145/358050817:3(1-27)Online publication date: 22-May-2023
  • (2023)Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking BenchmarkProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570418(832-840)Online publication date: 27-Feb-2023
  • (2023)A Novel Hybrid Machine Learning Model for Analyzing E-Learning Users’ SatisfactionInternational Journal of Human–Computer Interaction10.1080/10447318.2023.220998640:16(4193-4214)Online publication date: 28-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '08: Proceedings of the 17th international conference on World Wide Web
April 2008
1326 pages
ISBN:9781605580852
DOI:10.1145/1367497
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. news issue
  3. topic detection and tracking

Qualifiers

  • Research-article

Conference

WWW '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Heterogeneous Graph Transformer for Meta-structure Learning with Application in Text ClassificationACM Transactions on the Web10.1145/358050817:3(1-27)Online publication date: 22-May-2023
  • (2023)Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking BenchmarkProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570418(832-840)Online publication date: 27-Feb-2023
  • (2023)A Novel Hybrid Machine Learning Model for Analyzing E-Learning Users’ SatisfactionInternational Journal of Human–Computer Interaction10.1080/10447318.2023.220998640:16(4193-4214)Online publication date: 28-May-2023
  • (2023)Overview of NLPCC 2023 Shared Task 6: Chinese Few-Shot and Zero-Shot Entity LinkingNatural Language Processing and Chinese Computing10.1007/978-3-031-44699-3_23(257-265)Online publication date: 12-Oct-2023
  • (2022)Research on Long Text Classification Model Based on Multi-Feature Weighted FusionApplied Sciences10.3390/app1213655612:13(6556)Online publication date: 28-Jun-2022
  • (2022)A Survey on Text Classification: From Traditional to Deep LearningACM Transactions on Intelligent Systems and Technology10.1145/349516213:2(1-41)Online publication date: 8-Apr-2022
  • (2022)Generative Text Convolutional Neural Network for Hierarchical Document Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.3192319(1-17)Online publication date: 2022
  • (2022)A Chinese L2 Learners' Dynamic Vocabulary Growth Network Model Based on Graph Deep Learning2022 4th International Conference on Computer Science and Technologies in Education (CSTE)10.1109/CSTE55932.2022.00035(156-163)Online publication date: May-2022
  • (2021)geoGAT: Graph Model Based on Attention Mechanism for Geographic Text ClassificationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/343423920:5(1-18)Online publication date: 22-Sep-2021
  • (2021)Modeling multi-prototype Chinese word representation learning for word similarityComplex & Intelligent Systems10.1007/s40747-021-00482-y7:6(2977-2990)Online publication date: 4-Aug-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media