Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1613715.1613763dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Studying the history of ideas using topic models

Published: 25 October 2008 Publication History

Abstract

How can the development of ideas in a scientific field be studied over time? We apply unsupervised topic modeling to the ACL Anthology to analyze historical trends in the field of Computational Linguistics from 1978 to 2006. We induce topic clusters using Latent Dirichlet Allocation, and examine the strength of each topic over time. Our methods find trends in the field including the rise of probabilistic methods starting in 1988, a steady increase in applications, and a sharp decline of research in semantics and understanding between 1978 and 2001, possibly rising again after 2001. We also introduce a model of the diversity of ideas, topic entropy, using it to show that COLING is a more diverse conference than ACL, but that both conferences as well as EMNLP are becoming broader over time. Finally, we apply Jensen-Shannon divergence of topic distributions to show that all three conferences are converging in the topics they cover.

References

[1]
Steven Bird. 2008. Association of Computational Linguists Anthology. http://www.aclweb.org/anthology-index/.
[2]
David Blei and John D. Lafferty. 2006. Dynamic topic models. ICML.
[3]
David Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022.
[4]
D. Blei, T. Gri, M. Jordan, and J. Tenenbaum. 2004. Hierarchical topic models and the nested Chinese restaurant process.
[5]
Kenneth Church. 2005. Reviewing the reviewers. Comput. Linguist., 31(4):575--578.
[6]
Laura Dietz, Steffen Bickel, and Tobias Scheffer. 2007. Unsupervised prediction of citation influences. In ICML, pages 233--240. ACM.
[7]
Eugene Garfield. 1955. Citation indexes to science: A new dimension in documentation through association of ideas. Science, 122:108--111.
[8]
Tom L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. PNAS, 101 Suppl 1:5228--5235, April.
[9]
Yookyung Jo, Carl Lagoze, and C. Lee Giles. 2007. Detecting research topics via the correlation between graphs and texts. In KDD, pages 370--379, New York, NY, USA. ACM.
[10]
Mark T. Joseph and Dragomir R. Radev. 2007. Citation analysis, centrality, and the ACL anthology. Technical Report CSE-TR-535-07, University of Michigan. Department of Electrical Engineering and Computer Science.
[11]
Thomas S. Kuhn. 1962. The Structure of Scientific Revolutions. University Of Chicago Press.
[12]
Wei Li and Andrew McCallum. 2006. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML, pages 577--584, New York, NY, USA. ACM.
[13]
Gideon S. Mann, David Mimno, and Andrew McCallum. 2006. Bibliometric impact measures leveraging topic analysis. In JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pages 65--74, New York, NY, USA. ACM.
[14]
Xuerui Wang and Andrew McCallum. 2006. Topics over time: a non-Markov continuous-time model of topical trends. In KDD, pages 424--433, New York, NY, USA. ACM.

Cited By

View all
  • (2024)Automatic Construction of Sememe Knowledge Bases From Machine Readable DictionariesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.334792732(1023-1035)Online publication date: 1-Jan-2024
  • (2019)A data-driven reflection on 36 years of security and privacy researchProceedings of the 12th USENIX Conference on Cyber Security Experimentation and Test10.5555/3359012.3359018(6-6)Online publication date: 12-Aug-2019
  • (2019)Scalable Cross-lingual Document Similarity through Language-specific Concept HierarchiesProceedings of the 10th International Conference on Knowledge Capture10.1145/3360901.3364444(147-153)Online publication date: 23-Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
October 2008
1129 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 25 October 2008

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)101
  • Downloads (Last 6 weeks)16
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Automatic Construction of Sememe Knowledge Bases From Machine Readable DictionariesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.334792732(1023-1035)Online publication date: 1-Jan-2024
  • (2019)A data-driven reflection on 36 years of security and privacy researchProceedings of the 12th USENIX Conference on Cyber Security Experimentation and Test10.5555/3359012.3359018(6-6)Online publication date: 12-Aug-2019
  • (2019)Scalable Cross-lingual Document Similarity through Language-specific Concept HierarchiesProceedings of the 10th International Conference on Knowledge Capture10.1145/3360901.3364444(147-153)Online publication date: 23-Sep-2019
  • (2019)Holes in the OutlineProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298953(289-293)Online publication date: 8-Mar-2019
  • (2018)Examining the impact of keyword ambiguity on search advertising performanceMIS Quarterly10.25300/MISQ/2018/1404242:3(805-830)Online publication date: 1-Sep-2018
  • (2018)A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical TurkProceedings of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3173574.3174023(1-14)Online publication date: 21-Apr-2018
  • (2017)A general overview and bibliometric analysis of seven ACM hypertext and web conferencesInternational Journal of Web Engineering and Technology10.1504/IJWET.2017.08837612:3(190-233)Online publication date: 1-Jan-2017
  • (2017)Efficient Clustering from Distributions over TopicsProceedings of the 9th Knowledge Capture Conference10.1145/3148011.3148019(1-8)Online publication date: 4-Dec-2017
  • (2017)A Discipline-Enriched Dataset for Tracking the Computational Turn of European UniversitiesProceedings of the 6th International Workshop on Mining Scientific Publications10.1145/3127526.3127532(29-33)Online publication date: 15-Dec-2017
  • (2017)Multi-level mining and visualization of scientific text collectionsProceedings of the 6th International Workshop on Mining Scientific Publications10.1145/3127526.3127529(9-16)Online publication date: 15-Dec-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media