Abstract
Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. It is known, however, that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four Reddit datasets, namely the Wallstreetbets, AskScience, The Donald, and Politics sub-reddits. First, we empirically demonstrate that LPLM can display average performance drops of about 79% in the best cases, when predicting the popularity of future posts. We then introduce a methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks Our models display performance drops of only about 33% in the best cases when predicting the popularity of future posts, while using only about 7% of the total number of parameters of LPLM and providing interpretable representations that offer insight into real-world events, like the GameStop short squeeze of 2021. Source code to reproduce our experiments is available online.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, O., Nenkova, A.: Temporal effects on pre-trained models for language processing tasks. Trans. Assoc. Comput. Linguist. 10, 904–921 (2022)
Amba, S., Chen, H.T., Zhang, M., Bendersky, M., Najork, M., Ben, M.: Dynamic language models for continuously evolving content; dynamic language models for continuously evolving content, vol. 11 (2021). https://doi.org/10.1145/3447548.3467162
Amba Hombaiah, S., Chen, T., Zhang, M., Bendersky, M., Najork, M.: Dynamic language models for continuously evolving content. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 2514–2524 (2021)
Bishop, C.M.: Pattern recognition and machine learning. Springer (2006)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120 (2006)
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Chawla, S., Singh, N., Drori, I.: Quantifying and alleviating distribution shifts in foundation models on review classification. In: NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (2021). https://openreview.net/forum?id=OG78-TuPcvL
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches (2014). arXiv preprint arXiv:1409.1259
Cvejoski, K., Sánchez, R.J., Bauckhage, C., Ojeda, C.: Dynamic review-based recommenders. In: Data Science – Analytics and Applications, pp. 66–71. Springer, Wiesbaden (2022). https://doi.org/10.1007/978-3-658-36295-9_10
Cvejoski, K., Sánchez, R.J., Georgiev, B., Bauckhage, C., Ojeda, C.: Recurrent point review models. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020). https://doi.org/10.1109/IJCNN48605.2020.9206768
Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J., Potts, C.: No country for old members: user lifecycle and linguistic change in online communities. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 307–318. Association for Computing Machinery, New York, NY, USA (2013)
Delasalles, E., Lamprier, S., Denoyer, L.: Dynamic neural language models. In: Gedeon, T., Wong, K.W., Lee, M. (eds.) ICONIP 2019. LNCS, vol. 11955, pp. 282–294. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36718-3_24
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
Dhingra, B., Cole, J.R., Eisenschlos, J.M., Gillick, D., Eisenstein, J., Cohen, W.W.: Time-aware language models as temporal knowledge bases. Trans. Assoc. Comput. Linguist. 10, 257–273 (2022)
Guo, H., Zhu, H., Guo, Z., Zhang, X., Wu, X., Su, Z.: Domain adaptation with latent semantic association for named entity recognition. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 281–289. Association for Computational Linguistics, Boulder, Colorado (2009). https://aclanthology.org/N09-1032
Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., Song, D.: Pretrained transformers improve out-of-distribution robustness. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2744–2751. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.244, https://aclanthology.org/2020.acl-main.244
Hofmann, V., Pierrehumbert, J., Schütze, H.: Dynamic contextualized word embeddings. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6970–6984. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.542, https://aclanthology.org/2021.acl-long.542
Hu, Y., Zhai, K., Eidelman, V., Boyd-Graber, J.: Polylingual tree-based topic models for translation domain adaptation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1166–1176. Association for Computational Linguistics, Baltimore, Maryland (2014). https://doi.org/10.3115/v1/P14-1110, https://aclanthology.org/P14-1110
Koh, P.W., et al.: WILDS: a benchmark of in-the-wild distribution shifts. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 5637–5664. PMLR (2021). https://proceedings.mlr.press/v139/koh21a.html
Krishnan, R.G., Shalit, U., Sontag, D.: Deep Kalman filters (2015)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: A lite BERT for self-supervised learning of language representations (2019). arXiv preprint arXiv:1909.11942
Lazaridou, A., et al.: Mind the Gap: assessing temporal generalization in neural language models. Adv. Neural. Inf. Process. Syst. 34, 29348–29363 (2021)
Liska, A., et al.: StreamingQA: a benchmark for adaptation to new knowledge over time in question answering models. In: International Conference on Machine Learning, pp. 13604–13622. PMLR (2022)
Liu, Y., et al.: RoBERTa: A robustly optimized BERT pretraining approach (2019). arXiv preprint arXiv:1907.11692
Loureiro, D., Barbieri, F., Neves, L., Anke, L.E., Camacho-Collados, J.: TimeLMS: Diachronic language models from twitter, pp. 251–260 (2022). https://doi.org/10.48550/arxiv.2202.03829, https://arxiv.org/abs/2202.03829v2
Luu, K., Khashabi, D., Gururangan, S., Mandyam, K., Smith, N.A.: Time waits for no one! analysis and challenges of temporal misalignment (2021). arXiv preprint arXiv:2111.07408
Ma, X., Xu, P., Wang, Z., Nallapati, R., Xiang, B.: Domain adaptation with BERT-based domain classification and data selection. In: Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pp. 76–83. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-6109, https://aclanthology.org/D19-6109
Oren, Y., Sagawa, S., Hashimoto, T.B., Liang, P.: Distributionally robust language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4227–4237. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1432, https://aclanthology.org/D19-1432
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1202, https://aclanthology.org/N18-1202
Qiu, X., Sun, T., Xu, Y., Shao, Y., Dai, N., Huang, X.: Pre-trained models for natural language processing: a survey. SCIENCE CHINA Technol. Sci. 63(10), 1872–1897 (2020)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Rosin, G.D., Guy, I., Radinsky, K.: Time masking for temporal language models. In: WSDM 2022 - Proceedings of the 15th ACM International Conference on Web Search and Data Mining, pp. 833–841 (10 2021). https://doi.org/10.48550/arxiv.2110.06366, https://arxiv.org/abs/2110.06366v4
Rosin, G.D., Radinsky, K.: Temporal attention for language models. In: Findings of the Association for Computational Linguistics: NAACL 2022. pp. 1498–1508. Association for Computational Linguistics, Seattle, United States (2022).https://doi.org/10.18653/v1/2022.findings-naacl.112, https://aclanthology.org/2022.findings-naacl.112
Röttger, P., Pierrehumbert, J.: Temporal adaptation of BERT and performance on downstream document classification: Insights from social media. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2400–2412. Association for Computational Linguistics, Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.findings-emnlp.206, https://aclanthology.org/2021.findings-emnlp.206
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, X., YANG, Y.: Neural topic model with attention for supervised learning. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 108, pp. 1147–1156. PMLR (2020). https://proceedings.mlr.press/v108/wang20c.html
Wu, C.Y., Ahmed, A., Beutel, A., Smola, A.J.: Joint training of ratings and reviews with recurrent recommender networks (2016)
Yogatama, D., Wang, C., Routledge, B.R., Smith, N.A., Xing, E.P.: Dynamic Lang. Models Streaming Text. Trans. Assoc. Comput. Linguist. 2, 181–192 (2014). https://doi.org/10.1162/tacl_a_00175
Zhou, W., Liu, F., Chen, M.: Contrastive out-of-distribution detection for pretrained transformers. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 1100–1111. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic (2021). https://doi.org/10.18653/v1/2021.emnlp-main.84, https://aclanthology.org/2021.emnlp-main.84
Acknowledgments
This research has been funded by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning and Artificial Intelligence, LAMARR22B. César Ojeda is supported by Deutsche Forschungsgemeinschaft (DFG) - Project-ID 318763901 - SFB1294.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cvejoski, K., Sánchez, R.J., Ojeda, C. (2024). The Future is Different: Predicting Reddits Popularity with Variational Dynamic Language Models. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14941. Springer, Cham. https://doi.org/10.1007/978-3-031-70341-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-70341-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70340-9
Online ISBN: 978-3-031-70341-6
eBook Packages: Computer ScienceComputer Science (R0)