M-Mix: Patternwise Missing Mix for filling the missing values in traffic flow data

Xiaoyu Guo¹,
Weiwei Xing¹,
Xiang Wei¹,
Weibin Liu²,
Jian Zhang¹ &
…
Wei Lu¹

330 Accesses
Explore all metrics

Abstract

Real-world traffic flow data often contain missing values, which can limit its usability. Although existing deep learning-based imputation methods have shown promising results by reconstructing observed values, they often overlook certain missing patterns in the dataset and perform worse on filling real missing values. This paper addresses this issue and proposes a novel masking method called Patternwise Missing Mix (M-Mix) for masked modeling-based traffic flow data imputation. M-Mix generates masks by mixing existing missing values in the target datasets to preserve the missing pattern information, thereby enhancing the performance of imputing real missing values. Additionally, a dual-objective loss function is proposed for model optimization, which predicts masked values for higher robustness and reconstructs observed values to maintain semantic correctness. Through extensive experiments on real-world datasets, M-Mix consistently demonstrates superior performance compared to other masking methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective variational auto-encoder-based model for traffic flow imputation

Article 22 November 2023

TINet: Multi-dimensional Traffic Data Imputation via Transformer Network

Learning Traffic as Videos: A Spatio-Temporal VAE Approach for Traffic Data Imputation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analyzed during the current study are available in the mmix (public github repository), https://github.com/guoxiaoyuatbjtu/mmix.

References

Zheng C, Fan X, Wang C, Qi J (2020) Gman: a graph multi-attention network for traffic prediction. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 1234–1241
Xing J, Kong X, Xing W, Wei X, Zhang, J, Lu W (2022) Stgs: construct spatial and temporal graphs for citywide crowd flow prediction. Appl Intell 1–10
Lin Z, Feng J, Lu Z, Li Y, Jin D (2019) Deepstn+: context-aware spatial-temporal neural network for crowd flow prediction in metropolis. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 1020–1027
Park J, Müller J, Arora B, Faybishenko B, Pastorello G, Varadharajan C, Sahu R, Agarwal D (2022) Long-term missing value imputation for time series data using deep neural networks. Neural Comput Appl 1–21
Yoon J, Zame WR, van der Schaar M (2017) Multi-directional recurrent neural networks: a novel method for estimating missing data. In: Time series workshop in international conference on machine learning
Cao W, Wang D, Li J, Zhou H, Li L, Li Y (2018) Brits: bidirectional recurrent imputation for time series. Adv Neural Inf Process Syst 31:6775–6785
Google Scholar
Wardana INK, Gardner JW, Fahmy SA (2022) Estimation of missing air pollutant data using a spatiotemporal convolutional autoencoder. Neural Comput Appl 34(18):16129–16154
Article Google Scholar
Luo Y, Zhang Y, Cai X, Yuan X (2019) E2gan: end-to-end generative adversarial network for multivariate time series imputation. In: AAAI Press, pp 3094–3100
Zhang W, Zhang P, Yu Y, Li X, Biancardo SA, Zhang J (2021) Missing data repairs for traffic flow with self-attention generative adversarial imputation net. IEEE Trans Intell Transp Syst 23(7):7919–7930
Article Google Scholar
Choi T-M, Kang J-S, Kim J-H (2020) Rdis: random drop imputation with self-training for incomplete time series data. arXiv preprint arXiv:2010.10075
Tashiro Y, Song J, Song Y, Ermon S (2021) Csdi: conditional score-based diffusion models for probabilistic time series imputation. Adv Neural Inf Process Syst 34
Wu H, Hu T, Liu Y, Zhou H, Wang J, Long M (2022) Timesnet: temporal 2D-variation modeling for general time series analysis. arXiv preprint arXiv:2210.02186
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):1–12
Article Google Scholar
Liu S, Li X, Cong G, Chen Y, Jiang Y (2023) Multivariate time-series imputation with disentangled temporal representations. In: The Eleventh international conference on learning representations
Yu Y, Li VOK, Lam JCK (2022) Missing air pollution data recovery based on long-short term context encoder. IEEE Trans Big Data 8(3):711–722. https://doi.org/10.1109/TBDATA.2020.2979443
Article Google Scholar
Ma J, Shou Z, Zareian A, Mansour H, Vetro A, Chang S-F (2019) Cdsa: cross-dimensional self-attention for multivariate, geo-tagged time series imputation. arXiv preprint arXiv:1905.09904
Tang X, Yao H, Sun Y, Aggarwal C, Mitra P, Wang S (2020) Joint modeling of local and global temporal dynamics for multivariate time series forecasting with missing values. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 5956–5963
Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the association for computational linguistics: human language technologies, volume 1 (Long and Short Papers), pp 4171–4186
Yu X, Tang L, Rao Y, Huang T, Zhou J, Lu, J (2022) Point-bert: pre-training 3D point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19313–19322
Wang R, Chen D, Wu Z, Chen Y, Dai X, Liu M, Jiang Y-G, Zhou L, Yuan L (2022) Bevt: bert pretraining of video transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14733–14743
Tian K, Jiang Y, Diao Q, Lin C, Wang L, Yuan Z (2023) Designing bert for convolutional networks: sparse and hierarchical masked modeling. arXiv preprint arXiv:2301.03580
Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77
Article Google Scholar
Sun Y, Wang S, Li Y, Feng S, Chen X, Zhang H, Tian X, Zhu D, Tian H, Wu H (2019) Ernie: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223
Xie Z, Zhang Z, Cao Y, Lin Y, Bao J, Yao Z, Dai Q, Hu H (2022) Simmim: a simple framework for masked image modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9653–9663
Shi Y, Siddharth N, Torr P, Kosiorek AR (2022) Adversarial masking for self-supervised learning. In: International conference on machine learning, PMLR, pp 20026–20040
Liang Y, Zhao Z, Sun L (2021) Dynamic spatiotemporal graph convolutional neural networks for traffic data imputation with complex missing patterns. arXiv preprint arXiv:2109.08357
Marisca I, Cini A, Alippi C (2022) Learning to reconstruct missing data from spatiotemporal graphs with sparse observations. arXiv preprint arXiv:2205.13479
Liu M, Huang H, Feng H, Sun L, Du B, Fu Y (2023) Pristi: a conditional diffusion framework for spatiotemporal imputation. arXiv preprint arXiv:2302.09746
Li Y, Yu R, Shahabi C, Liu Y (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In: International conference on learning representations (ICLR ’18)
Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57(7):86–94
Article Google Scholar
Yi X, Zheng Y, Zhang J, Li T (2016) St-mvl: filling missing values in geo-sensory time series data. In: Proceedings of the 25th international joint conference on artificial intelligence
Zheng Y, Yi X, Li M, Li R, Shan Z, Chang E, Li T (2015) Forecasting fine-grained air quality based on big data. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2267–2276
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) mixup: beyond empirical risk minimization. In: International conference on learning representations
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6023–6032
Wei X, Wei X, Kong X, Lu S, Xing W, Lu W (2021) Fmixcutmatch for semi-supervised deep learning. Neural Netw 133:166–176
Article Google Scholar
Chen J-N, Sun S, He J, Torr PH, Yuille A, Bai S (2022) Transmix: attend to mix for vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12135–12144
Moon J, Jeong Y, Chae D-K, Choi J, Shim H, Lee J (2023) Comix: collaborative filtering with mixup for implicit datasets. Inf Sci 628:254–268
Article Google Scholar
Kong X, Zhang J, Wei X, Xing W, Lu W (2022) Adaptive spatial-temporal graph attention networks for traffic flow forecasting. Appl Intell 52(4):4300–4316
Article Google Scholar
Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional lstm recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:1801.02143
Cui Z, Henrickson K, Ke R, Wang Y (2019) Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. IEEE Trans Intell Transp Syst 21(11):4883–4894
Article Google Scholar
Guo S, Lin Y, Feng N, Song C, Wan H (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 922–929
Siami-Namini S, Tavakoli N, Namin AS (2019) The performance of lstm and bilstm in forecasting time series. In: 2019 IEEE international conference on big data (Big Data), IEEE, pp 3285–3292
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 156–165
Rubanova Y, Chen RT, Duvenaud DK (2019) Latent ordinary differential equations for irregularly-sampled time series. Adv Neural Inf Process Syst 32
Chen RT, Rubanova Y, Bettencourt J, Duvenaud, DK (2018) Neural ordinary differential equations. Adv Neural Inf Process Syst 31

Download references

Acknowledgements

The authors would like to thank the Beijing Natural Science Foundation (L231005, 4212025) and the National Natural Science Foundation of China (61876018, 61906014) for their support in this research.

Author information

Authors and Affiliations

School of Software Engineering, Beijing Jiaotong University, Shangyuancun, Beijing, 100044, China
Xiaoyu Guo, Weiwei Xing, Xiang Wei, Jian Zhang & Wei Lu
School of Computer and Information Technology, Beijing Jiaotong University, Shangyuancun, Beijing, 100044, China
Weibin Liu

Authors

Xiaoyu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Xing
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Weibin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiwei Xing.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, X., Xing, W., Wei, X. et al. M-Mix: Patternwise Missing Mix for filling the missing values in traffic flow data. Neural Comput & Applic 36, 10183–10200 (2024). https://doi.org/10.1007/s00521-024-09579-0

Download citation

Received: 08 June 2023
Accepted: 05 February 2024
Published: 08 March 2024
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00521-024-09579-0

M-Mix: Patternwise Missing Mix for filling the missing values in traffic flow data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An effective variational auto-encoder-based model for traffic flow imputation

TINet: Multi-dimensional Traffic Data Imputation via Transformer Network

Learning Traffic as Videos: A Spatio-Temporal VAE Approach for Traffic Data Imputation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

M-Mix: Patternwise Missing Mix for filling the missing values in traffic flow data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An effective variational auto-encoder-based model for traffic flow imputation

TINet: Multi-dimensional Traffic Data Imputation via Transformer Network

Learning Traffic as Videos: A Spatio-Temporal VAE Approach for Traffic Data Imputation

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation