research-article

Learning event count models with application to affiliation ranking

Authors:

Ebrahim Bagheri,

Jeong-Yoon Lee,

Song ChenAuthors Info & Claims

CASCON '17: Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering

Pages 132 - 139

Published: 06 November 2017 Publication History

Abstract

Event count prediction is a class of problems in time series analysis, which has been extensively studied over the years. Its applications range from the prediction of the number of publications in the scientific community to ATM cash withdrawal transaction prediction in the banking industry. However, in applied data science problems, using event count prediction models for real-world data often faces difficulties because the data violates not only the Poisson distribution assumption, i.e., the rate at which events occur should be constant, but the data is also relatively sparse, i.e., only a few event count values are greater than zero. Traditional techniques do not work well under these two conditions. To overcome these limitations, some researchers have proposed the generic autoregressive (AR) models for event count prediction, which work with non-constant event occurrence rates. As AR models solely use historical event count for forecasting, they might not be as flexible for incorporating domain knowledge. Moreover, and similarly, AR models may not work very well with the relatively short length-time series. In order to overcome these challenges, we propose a machine learning approach to address the event count prediction problem. We benchmark our proposed solution on the KDD Cup 2016 dataset by formalizing affiliation ranking as an event count time series prediction problem. We map the time series onto a highly dimensional state space and systematically apply the state-of-the-art machine learning algorithms to predict event counts. We then compare our proposed approach against solutions in the KDD Cup 2016 competition and show that our work outperforms the best models in this with an NDCG@20 score of 0.7573.

References

[1]

Patrick T. Brandt and John T. Williams. 2001. A Linear Poisson Autoregressive Model: The Poisson AR(p) Model. Political Analysis 9, 2 (2001), 164?184.

[2]

Patrick T. Brandt, John T. Williams, Benjamin O. Fordham, and Brain Pollins. 2000. Dynamic Modeling for Persistent Event-Count Time Series. American Journal of Political Science 44, 4 (2000), 823--843. http://www.jstor.org/stable/2669284

[3]

Patrick T. Brandt, John T. Williams, Benjamin O. Fordham, and Brain Pollins. 2000. Dynamic Modeling for Persistent Event-Count Time Series. American Journal of Political Science 44, 4 (2000), 823--843. http://www.jstor.org/stable/2669284

[4]

Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.

Digital Library

[5]

A. C. Cameron and P. K. Trivedi. 2013. Regression Analysis of Count Data. Cambridge University Press, New York, NY, USA.

[6]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.

Digital Library

[7]

Suhendry Effendy and Roland H. C. Yap. 2016. Investigations on Rating Computer Sciences Conferences: An Experiment with the Microsoft Academic Graph Dataset. In WWW.

Digital Library

[8]

J. Doyne Farmer and John J. Sidorowichl. 1989. Exploiting Chaos to Predict the Future and Reduce Noise. World Scientific Publishing Company. 277--330 pages.

[9]

Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.

[10]

Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). ACM, New York, NY, USA, 41--48.

Digital Library

[11]

Robert C. Jung, Martin Kukuk, and Roman Liesenfeld. 2006. Time Series of Count Data: Modeling, Estimation and Diagnostics. Comput. Stat. Data Anal. 51, 4 (Dec. 2006), 2350--2364.

Digital Library

[12]

Gary King. 1988. Statistical Models for Political Science Event Counts: Bias in Conventional Procedures and Evidence for The Exponential Poisson Regression Model. American Journal of Political Science 32 (August 1988), 838--863.

[13]

GaryKing. 1989. Event Count Models for International Relations: Generalizations and Applications. International Studies Quarterly 33 (June 1989), 123--147.

[14]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.

Digital Library

[15]

F. Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.

Digital Library

[16]

Ryan M Rifkin and Ross A Lippert. 2007. Notes on regularized least squares. (2007).

[17]

Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web. ACM, 243--246.

Digital Library

[18]

Alex J. Smola and Bernhard Schlkopf. 2004. A tutorial on support vector regression. (2004).

[19]

Kai Ming Ting and Ian H Witten. 1997. Stacked Generalization: when does it work?. In in Procs. International Joint Conference on Artificial Intelligence.

Digital Library

[20]

Alex D. Wade, Kuansan Wang, Yizhou Sun, and Antonio Gulli. 2016. WSDM Cup 2016: Entity Ranking Challenge. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM '16). ACM, New York, NY, USA, 593--594.

Digital Library

Recommendations

Social event identification and ranking on flickr

Effective event modeling allows accurate event identification and monitoring to enable timely response to emergencies occurring in various applications. Although event identification (or detection) has been extensively studied in the last decade, the ...
How does author affiliation affect preprint citation count?: analyzing citation bias at the institution and country level
JCDL '22: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Citing is an important aspect of scientific discourse and important for quantifying the scientific impact quantification of researchers. Previous works observed that citations are made not only based on the pure scholarly contributions but also based on ...
Hybrid models for future event prediction
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

We present a hybrid method to turn off-the-shelf information retrieval (IR) systems into future event predictors. Given a query, a time series model is trained on the publication dates of the retrieved documents to capture trends and periodicity of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

CASCON '17: Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering

November 2017

380 pages

Conference Chair:
Marcellus Mindel
IBM Canada Ltd.
,
Program Chairs:
Kelly Lyons
University of Toronto
,
Joe Wigglesworth
IBM Canada Ltd.

Publisher

IBM Corp.

United States

Publication History

Published: 06 November 2017

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
43
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents