Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3172795.3172811dlproceedingsArticle/Chapter ViewAbstractPublication PagescasconConference Proceedingsconference-collections
research-article

Learning event count models with application to affiliation ranking

Published: 06 November 2017 Publication History

Abstract

Event count prediction is a class of problems in time series analysis, which has been extensively studied over the years. Its applications range from the prediction of the number of publications in the scientific community to ATM cash withdrawal transaction prediction in the banking industry. However, in applied data science problems, using event count prediction models for real-world data often faces difficulties because the data violates not only the Poisson distribution assumption, i.e., the rate at which events occur should be constant, but the data is also relatively sparse, i.e., only a few event count values are greater than zero. Traditional techniques do not work well under these two conditions. To overcome these limitations, some researchers have proposed the generic autoregressive (AR) models for event count prediction, which work with non-constant event occurrence rates. As AR models solely use historical event count for forecasting, they might not be as flexible for incorporating domain knowledge. Moreover, and similarly, AR models may not work very well with the relatively short length-time series. In order to overcome these challenges, we propose a machine learning approach to address the event count prediction problem. We benchmark our proposed solution on the KDD Cup 2016 dataset by formalizing affiliation ranking as an event count time series prediction problem. We map the time series onto a highly dimensional state space and systematically apply the state-of-the-art machine learning algorithms to predict event counts. We then compare our proposed approach against solutions in the KDD Cup 2016 competition and show that our work outperforms the best models in this with an NDCG@20 score of 0.7573.

References

[1]
Patrick T. Brandt and John T. Williams. 2001. A Linear Poisson Autoregressive Model: The Poisson AR(p) Model. Political Analysis 9, 2 (2001), 164?184.
[2]
Patrick T. Brandt, John T. Williams, Benjamin O. Fordham, and Brain Pollins. 2000. Dynamic Modeling for Persistent Event-Count Time Series. American Journal of Political Science 44, 4 (2000), 823--843. http://www.jstor.org/stable/2669284
[3]
Patrick T. Brandt, John T. Williams, Benjamin O. Fordham, and Brain Pollins. 2000. Dynamic Modeling for Persistent Event-Count Time Series. American Journal of Political Science 44, 4 (2000), 823--843. http://www.jstor.org/stable/2669284
[4]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[5]
A. C. Cameron and P. K. Trivedi. 2013. Regression Analysis of Count Data. Cambridge University Press, New York, NY, USA.
[6]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
[7]
Suhendry Effendy and Roland H. C. Yap. 2016. Investigations on Rating Computer Sciences Conferences: An Experiment with the Microsoft Academic Graph Dataset. In WWW.
[8]
J. Doyne Farmer and John J. Sidorowichl. 1989. Exploiting Chaos to Predict the Future and Reduce Noise. World Scientific Publishing Company. 277--330 pages.
[9]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[10]
Kalervo Järvelin and Jaana Kekäläinen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00). ACM, New York, NY, USA, 41--48.
[11]
Robert C. Jung, Martin Kukuk, and Roman Liesenfeld. 2006. Time Series of Count Data: Modeling, Estimation and Diagnostics. Comput. Stat. Data Anal. 51, 4 (Dec. 2006), 2350--2364.
[12]
Gary King. 1988. Statistical Models for Political Science Event Counts: Bias in Conventional Procedures and Evidence for The Exponential Poisson Regression Model. American Journal of Political Science 32 (August 1988), 838--863.
[13]
GaryKing. 1989. Event Count Models for International Relations: Generalizations and Applications. International Studies Quarterly 33 (June 1989), 123--147.
[14]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
[15]
F. Pedregosa et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[16]
Ryan M Rifkin and Ross A Lippert. 2007. Notes on regularized least squares. (2007).
[17]
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-june Paul Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th International Conference on World Wide Web. ACM, 243--246.
[18]
Alex J. Smola and Bernhard Schlkopf. 2004. A tutorial on support vector regression. (2004).
[19]
Kai Ming Ting and Ian H Witten. 1997. Stacked Generalization: when does it work?. In in Procs. International Joint Conference on Artificial Intelligence.
[20]
Alex D. Wade, Kuansan Wang, Yizhou Sun, and Antonio Gulli. 2016. WSDM Cup 2016: Entity Ranking Challenge. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM '16). ACM, New York, NY, USA, 593--594.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CASCON '17: Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering
November 2017
380 pages

Publisher

IBM Corp.

United States

Publication History

Published: 06 November 2017

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 43
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media