Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2232817.2232831acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

To better stand on the shoulder of giants

Published: 10 June 2012 Publication History

Abstract

Usually scientists breed research ideas inspired by previous publications, but they are unlikely to follow all publications in the unbounded literature collection. The volume of literature keeps on expanding extremely fast, whilst not all papers contribute equal impact to the academic society. Being aware of potentially influential literature would put one in an advanced position in choosing important research references. Hence, estimation of potential influence is of great significance. We study a challenging problem of identifying potentially influential literature. We examine a set of hypotheses on what are the fundamental characteristics for highly cited papers and find some interesting patterns. Based on these observations, we learn to identify potentially influential literature via Future Influence Prediction (FIP), which aims to estimate the future influence of literature. The system takes a series of features of a particular publication as input and produces as output the estimated citation counts of that article after a given time period. We consider several regression models to formulate the learning process and evaluate their performance based on the coefficient of determination (R2). Experimental results on a real-large data set show a mean average predictive performance of 83.6% measured in R^2. We apply the learned model to the application of bibliography recommendation and obtain prominent performance improvement in terms of Mean Average Precision (MAP).

References

[1]
S. Bethard and D. Jurafsky. Who should I cite: learning literature search models from citation behavior. In Proceedings of CIKM, CIKM'10, pages 609--618. ACM, 2010.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, March 2003.
[3]
L. Breiman. Classification and regression trees. Chapman & Hall/CRC, 1984.
[4]
T. Brody, S. Harnad, and L. Carr. Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology, 57(8):1060--1072, 2006.
[5]
C. Castillo, D. Donato, and A. Gionis. Estimating number of citations using author reputation. In String processing and information retrieval, pages 107--117. Springer, 2007.
[6]
L. Dietz, S. Bickel, and T. Scheffer. Unsupervised prediction of citation influences. In Proceedings of ICML'07, pages 233--240, 2007.
[7]
J. Dimitrov, S. Kaveri, and J. Bayry. Metrics: journal's impact factor skewed by a single paper. Nature, 466(7303):179--179, 2010.
[8]
A. Fersht. The most influential journals: Impact Factor and Eigenfactor. Proceedings of the National Academy of Sciences, 106(17):6883, 2009.
[9]
L. D. Fu and C. Aliferis. Models for predicting and explaining citation count of biomedical articles. In AMIA Annual Symposium, pages 222--226, 2008.
[10]
E. Garfield. Impact factors, and why they won't go away. Nature, 411(6837):522--522, 2001.
[11]
J. Gehrke, P. Ginsparg, and J. Kleinberg. Overview of the 2003 kdd cup. SIGKDD Explor. Newsl., 5:149--151, December 2003.
[12]
J. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46):16569, 2005.
[13]
J. Hirsch. Does the h index have predictive power? Proceedings of the National Academy of Sciences, 104(49):19193, 2007.
[14]
A. Ibáñez, P. Larrañaga, and C. Bielza. Predicting citation count of bioinformatics papers within four years of publication. Bioinformatics, 25(24):3303--3309, 2009.
[15]
L. Liu, J. Tang, J. Han, M. Jiang, and S. Yang. Mining topic-level influence in heterogeneous networks. In Proceedings of CIKM'10, CIKM '10, pages 199--208, 2010.
[16]
C. Lokker, K. McKibbon, R. McKinlay, N. Wilczynski, and R. Haynes. Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study. BMJ, 336(7645):655--657, 2008.
[17]
M. Lovaglia. Predicting citations to journal articles: The ideal number of references. The American Sociologist, 22(1):49--64, 1991.
[18]
R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. In Proceeding of SIGKDD, KDD '08, pages 542--550, New York, NY, USA, 2008. ACM.
[19]
R. Picard and R. Cook. Cross-validation of regression models. Journal of the American Statistical Association, 79:575--583, 1984.
[20]
C. Rasmussen. Gaussian processes in machine learning. Advanced Lectures on Machine Learning, pages 63--71, 2004.
[21]
X. Shi, J. Leskovec, and D. A. McFarland. Citing for high impact. In Proceedings of the 10th annual joint conference on Digital libraries, JCDL '10, pages 49--58, 2010.
[22]
A. Siddharthan and S. Teufel. Whose idea was this, and why does it matter? attributing scientific work to citations. In HLT-NAACL, pages 316--323, 2007.
[23]
K. Simons. The misused impact factor. Science, 322:165, 2008.
[24]
R. Steel and J. Torrie. Principles and procedures of statistics, volume 633. McGraw-Hill New York, 1980.
[25]
Y. Sun and C. Giles. Popularity weighted ranking for academic digital libraries. Advances in Information Retrieval, pages 605--612.
[26]
Y. Sun, T. Wu, Z. Yin, H. Cheng, J. Han, X. Yin, and P. Zhao. Bibnetminer: mining bibliographic information networks. In Proceedings of SIGMOD, SIGMOD '08, pages 1341--1344, 2008.
[27]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In Proceedings of SIGKDD, KDD '09, pages 807--816, New York, NY, USA, 2009. ACM.
[28]
R. Yan, J. Tang, X. Liu, D. Shan, and X. Li. Citation count prediction: Learning to estimate future citations for literature. In Proceeding of CIKM, CIKM '11, 2011.
[29]
D. Zhou, X. Ji, H. Zha, and C. L. Giles. Topic evolution and social interactions: how authors effect research. In Proceedings of CIKM '06, pages 248--257, 2006.
[30]
M. Zitt. Citing-side normalization of journal impact: A robust variant of the Audience Factor. Journal of Informetrics, 4(3):392--406, 2010.

Cited By

View all
  • (2024)Citation Forecasting with Multi-Context Attention-Aided Dependency ModelingACM Transactions on Knowledge Discovery from Data10.1145/364914018:6(1-23)Online publication date: 12-Apr-2024
  • (2024)A deep learning-based method for predicting the emerging degree of research topics using emerging indexScientometrics10.1007/s11192-024-05068-2129:7(4021-4042)Online publication date: 14-Jun-2024
  • (2023)Measuring Consolidation and Disruption Indexes in Global Knowledge and Information Creation PublicationsScientific and Technical Information Processing10.3103/S014768822304007X50:4(314-327)Online publication date: 1-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '12: Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
June 2012
458 pages
ISBN:9781450311540
DOI:10.1145/2232817
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. citation pattern analysis
  2. digital libraries
  3. influence prediction

Qualifiers

  • Research-article

Conference

JCDL '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Upcoming Conference

JCDL '24
The 2024 ACM/IEEE Joint Conference on Digital Libraries
December 16 - 20, 2024
Hong Kong , China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)5
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Citation Forecasting with Multi-Context Attention-Aided Dependency ModelingACM Transactions on Knowledge Discovery from Data10.1145/364914018:6(1-23)Online publication date: 12-Apr-2024
  • (2024)A deep learning-based method for predicting the emerging degree of research topics using emerging indexScientometrics10.1007/s11192-024-05068-2129:7(4021-4042)Online publication date: 14-Jun-2024
  • (2023)Measuring Consolidation and Disruption Indexes in Global Knowledge and Information Creation PublicationsScientific and Technical Information Processing10.3103/S014768822304007X50:4(314-327)Online publication date: 1-Dec-2023
  • (2023)Early Identification of Potential Disruptive Technologies Using Machine Learning and Text Mining2023 Portland International Conference on Management of Engineering and Technology (PICMET)10.23919/PICMET59654.2023.10216869(1-15)Online publication date: Jul-2023
  • (2023)Community-based Dynamic Graph Learning for Popularity PredictionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599281(930-940)Online publication date: 6-Aug-2023
  • (2023)The association between prior knowledge and the disruption of an articleScientometrics10.1007/s11192-023-04751-0128:8(4731-4751)Online publication date: 8-Jun-2023
  • (2022)The significance and impact of winning an academic awardProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries10.1145/3529372.3530913(1-11)Online publication date: 20-Jun-2022
  • (2022)Modeling Dynamic Heterogeneous Graph and Node Importance for Future Citation PredictionProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557398(572-581)Online publication date: 17-Oct-2022
  • (2022)Citation Count Prediction Using Different Time Series Analysis Models2022 IEEE Bombay Section Signature Conference (IBSSC)10.1109/IBSSC56953.2022.10037553(1-5)Online publication date: 8-Dec-2022
  • (2022)When Research Topic Trend Prediction Meets Fact-Based AnnotationsData Science and Engineering10.1007/s41019-022-00197-17:4(316-327)Online publication date: 12-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media