Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583780.3614810acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

CLOCK: Online Temporal Hierarchical Framework for Multi-scale Multi-granularity Forecasting of User Impression

Published: 21 October 2023 Publication History

Abstract

User impression forecasting underpins various commercial activities, from long-term strategic decisions to short-term automated operations. As a representative that involves both kinds, the highly profitable Guaranteed Delivery (GD) advertising focuses mainly on promoting brand effect by allowing advertisers to order target impressions weeksin advance and get allocatedonline at the scheduled time. Such a business mode naturally incurs three issues making existing solutions inferior: 1) Timescale-granularity dilemma of coherently supporting the sales of day-level impressions of the distant future and the corresponding fine-grained allocation in real-time. 2) High dimensionality due to the Cartesian product of user attribute combinations. 3) Stability-plasticity dilemma of instant adaptation to emerging patterns of temporal dependency withoutcatastrophic forgetting of repeated ones facing the non-stationary traffic.
To overcome the obstacles, we propose an online temporal hierarchical framework that functions analogously to a CLOCK and hence its name. Long-timescale, coarse-grained temporal data (e.g., the daily impression of one quarter) and short-timescale but fine-grained ones are handled separately by dedicated models, just like the hour/minute/second hands. Each tier in the hierarchy is triggered for forecasting and updating by need at different frequencies, thus saving the maintenance overhead. Furthermore, we devise a reconciliation mechanism to coordinate tiers by aggregating the separately learned local variance and global trends tier by tier. CLOCK solves the dimensionality dilemma by subsuming the autoencoder design to achieve an end-to-end, nonlinear factorization of streaming data into a low-dimension latent space, where a neural predictor produces predictions for the decoder to project them back to the high dimension. Lastly, we regulate the CLOCK's continual refinement by combining the complementary Experience Replay (ER) and Knowledge Distillation (KD) techniques to consolidate and recall previously learned temporal patterns. We conduct extensive evaluations on three public datasets and the real-life user impression log from the Tencent advertising system, and the results demonstrate CLOCK's efficacy.

References

[1]
Oren Anava, Elad Hazan, Shie Mannor, and Ohad Shamir. 2013. Online learning for time series prediction. In Conference on learning theory. PMLR, 172--184.
[2]
Elahe Arani, Fahad Sarfraz, and Bahram Zonooz. 2022. Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System. In International Conference on Learning Representations.
[3]
George Athanasopoulos, Rob J Hyndman, Nikolaos Kourentzes, and Fotios Petropoulos. 2017. Forecasting with temporal hierarchies. European Journal of Operational Research 262, 1 (2017), 60--74.
[4]
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).
[5]
George EP Box and Gwilym M Jenkins. 1968. Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics) 17, 2 (1968), 91--109.
[6]
Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. 2020. Dark experience for general continual learning: a strong, simple baseline. Advances in Neural Information Processing Systems 33 (2020), 15920-- 15930.
[7]
Defu Cao, Yujing Wang, Juanyong Duan, Ce Zhang, Xia Zhu, Congrui Huang, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, et al. 2020. Spectral temporal graph neural network for multivariate time-series forecasting. Advances in neural information processing systems 33 (2020), 17766--17778.
[8]
Robert B Cleveland, William S Cleveland, Jean E McRae, and Irma Terpenning. 1990. STL: A seasonal-trend decomposition. J. Off. Stat 6, 1 (1990), 3--73.
[9]
Marco Cuturi. 2011. Fast global alignment kernels. In Proceedings of the 28th International Conference on Machine Learning. 929--936.
[10]
Prathamesh Deshpande and Sunita Sarawagi. 2019. Streaming adaptation of deep forecasting models using adaptive recurrent units. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1560--1568.
[11]
Stephen Grossberg. 2013. Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural networks 37 (2013), 1--47.
[12]
Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2, 7 (2015).
[13]
Rob J Hyndman, Roman A Ahmed, George Athanasopoulos, and Han Lin Shang. 2011. Optimal combination forecasts for hierarchical time series. Computational statistics & data analysis 55, 9 (2011), 2579--2589.
[14]
Kaggle. 2017. Wikipedia web traffic. https://www.kaggle.com/c/web-traffictime- series-forecasting/data.
[15]
Lutz Kilian and Helmut Lütkepohl. 2017. Structural vector autoregressive analysis. Cambridge University Press.
[16]
Nikolaos Kourentzes, Fotios Petropoulos, and Juan R Trapero. 2014. Improving forecasting by estimating time series structural components across multiple frequencies. International Journal of Forecasting 30, 2 (2014), 291--302.
[17]
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. 2018. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval. 95--104.
[18]
Chenghao Liu, Steven CH Hoi, Peilin Zhao, and Jianling Sun. 2016. Online arima algorithms for time series prediction. In Thirtieth AAAI conference on artificial intelligence.
[19]
Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. 2021. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations.
[20]
Xiaoyang Ma, Lan Zhang, Lan Xu, Zhicheng Liu, Ge Chen, Zhili Xiao, Yang Wang, and Zhengtao Wu. 2019. Large-scale user visits understanding and forecasting with deep spatial-temporal tensor factorization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2403--2411.
[21]
Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109--165.
[22]
Nam Nguyen and Brian Quanz. 2021. Temporal latent auto-encoder: A method for probabilistic multivariate time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9117--9125.
[23]
Quang Pham, Chenghao Liu, Doyen Sahoo, and Steven CH Hoi. 2022. Learning Fast and Slow for Online Time Series Forecasting. arXiv preprint arXiv:2202.11672 (2022).
[24]
Syama Sundar Rangapuram, Lucien D Werner, Konstantinos Benidis, Pedro Mercado, Jan Gasthaus, and Tim Januschowski. 2021. End-to-end learning of coherent probabilistic forecasts for hierarchical time series. In International Conference on Machine Learning. PMLR, 8832--8843.
[25]
Anthony Robins. 1995. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science 7, 2 (1995), 123--146.
[26]
David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. 2020. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36, 3 (2020), 1181--1191.
[27]
Rajat Sen, Hsiang-Fu Yu, and Inderjit S Dhillon. 2019. Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting. Advances in Neural Information Processing Systems 32 (2019).
[28]
Shun-Yao Shih, Fan-Keng Sun, and Hung-yi Lee. 2019. Temporal pattern attention for multivariate time series forecasting. Machine Learning 108, 8 (2019), 1421--1441.
[29]
Henrik Spliid. 1983. A fast estimation method for the vector autoregressive moving average model with exogenous variables. J. Amer. Statist. Assoc. 78, 384 (1983), 843--849.
[30]
Souhaib Ben Taieb, James W Taylor, and Rob J Hyndman. 2017. Coherent probabilistic forecasts for hierarchical time series. In International Conference on Machine Learning. PMLR, 3348--3357.
[31]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30 (2017).
[32]
Sean J Taylor and Benjamin Letham. 2018. Forecasting at scale. The American Statistician 72, 1 (2018), 37--45.
[33]
Artur Trindade. 2014. Electricity Load Diagrams 2011--2014 Data Set. https: //archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[35]
Jeffrey S Vitter. 1985. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37--57.
[36]
Shiyu Wang, Fan Zhou, Yinbo Sun, Lintao Ma, James Zhang, and Yangfei Zheng. 2022. End-to-End Modeling of Hierarchical Time Series Using Autoregressive Transformer and Conditional Normalizing Flow-based Reconciliation. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 1087-- 1094.
[37]
XiaoYu Wang, Bin Tan, Yonghui Guo, Tao Yang, Dongbo Huang, Lan Xu, Nikolaos M Freris, Hao Zhou, and Xiang-Yang Li. 2022. CONFLUX: A Request-level Fusion Framework for Impression Allocation via Cascade Distillation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4070--4078.
[38]
Shanika L Wickramasuriya, George Athanasopoulos, and Rob J Hyndman. 2019. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Amer. Statist. Assoc. 114, 526 (2019), 804--819.
[39]
Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems 34 (2021), 22419--22430.
[40]
Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 753--763.
[41]
Junchen Ye, Zihan Liu, Bowen Du, Leilei Sun, Weimiao Li, Yanjie Fu, and Hui Xiong. 2022. Learning the evolutionary and multi-scale graph structure for multivariate time series forecasting. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2296--2306.
[42]
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3634--3640.
[43]
Hsiang-Fu Yu, Nikhil Rao, and Inderjit S Dhillon. 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. Advances in Neural Information Processing Systems 29 (2016).
[44]
Hong Zhang, Lan Zhang, Lan Xu, Xiaoyang Ma, Zhengtao Wu, Cong Tang, Wei Xu, and Yiguo Yang. 2020. A request-level guaranteed delivery advertising planning: Forecasting and allocation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2980--2988.
[45]
Y. Zhang and Junchi Yan. 2023. Crossformer: Transformer Utilizing Cross- Dimension Dependency for Multivariate Time Series Forecasting. In International Conference on Learning Representations.
[46]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11106--11115.
[47]
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. arXiv preprint arXiv:2201.12740 (2022).

Cited By

View all
  • (2024)Know in AdVance: Linear-Complexity Forecasting of Ad Campaign Performance with Evolving User InterestProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671528(5926-5937)Online publication date: 25-Aug-2024
  • (2024)Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation via Adversarial RewardingProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635756(750-759)Online publication date: 4-Mar-2024

Index Terms

  1. CLOCK: Online Temporal Hierarchical Framework for Multi-scale Multi-granularity Forecasting of User Impression

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
      October 2023
      5508 pages
      ISBN:9798400701245
      DOI:10.1145/3583780
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. display advertising
      2. hierarchical forecasting
      3. multivariate time series
      4. online learning

      Qualifiers

      • Research-article

      Funding Sources

      • China National Natural Science Foundation
      • National Key R&D Program of China

      Conference

      CIKM '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)121
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 24 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Know in AdVance: Linear-Complexity Forecasting of Ad Campaign Performance with Evolving User InterestProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671528(5926-5937)Online publication date: 25-Aug-2024
      • (2024)Follow the LIBRA: Guiding Fair Policy for Unified Impression Allocation via Adversarial RewardingProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635756(750-759)Online publication date: 4-Mar-2024

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media