Abstract
Time series missing data is a pervasive problem in many fields, especially in intelligent transportation system, which hinders the application of timing analysis methods and the fine adjustment of control strategies. The prevalent imputation approaches reconstruct missing data with a high accuracy by exploiting a precise distribution model. But the multistate characteristic of time series data and the uncertainty of imputation process increase the difficulty of modeling temporal data distribution and reduce the imputation performance. In this paper, a novel time series generative adversarial imputation network (TGAIN) model is proposed to deal with time series data missing problem. The model combines the advantages of GAN's data distribution modeling and multiple imputation's uncertainty handling. Specifically, the TGAIN network is designed and adversarial trained to learn the multistate distribution of missing time series data. Through the conditional vector constraint and adversarial imputation process, the latent distribution for each missing position under different states can be effectively estimated based on implicit relationships with partial observation information. Then the corresponding multiple imputation strategy is proposed to deal with the uncertainty of imputation process and it can determine the best fill value from the learned distribution. Furthermore, sufficient experiments have been conducted in two real traffic flow datasets. The comparative results show the proposed TGAIN not only has better ability on time series data distribution modeling and imputation uncertainty handling, but also performs more robustly and stability even with the missing rate increases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
All datasets and code supporting the findings of this study are available from the corresponding author upon reasonable request.
References
Li Z, Cao Q, Zhao Y et al (2018) Signal cooperative control with traffic supply and demand on a single intersection. IEEE Access 6:54407–54416. https://doi.org/10.1109/ACCESS.2018.2870172
Qu Z, Li H, Li Z et al (2020) Short-term traffic flow forecasting method with M-B-LSTM hybrid network. IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2020.3009725.Accessed29July
Kalair K, Connaughton C (2021) Anomaly detection and classification in traffic flow data from fluctuations in the flow-density relationship. Transp Res Pt C-Emerg Technol 127:103178. https://doi.org/10.1016/j.trc.2021.103178
Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing values in databases. IEEE Trans Syst Man Cybern Syst 37(5):692–709. https://doi.org/10.1109/TSMCA.2007.902631
Guo Z, Wang Y, Ye H (2019) A data imputation method for multivariate time series based on generative adversarial network. Neurocomputing 360:185–197. https://doi.org/10.1016/j.neucom.2019.06.007
García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282. https://doi.org/10.1007/s00521-009-0295-6
García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR et al (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72(7–9):1483–1493. https://doi.org/10.1016/j.neucom.2008.11.026
Zhang S (2012) Nearest neighbor selection for iteratively KNN imputation. J Syst Softw 85(11):2541–2552. https://doi.org/10.1016/j.jss.2012.05.073
Kim H, Golub GH, Park H (2005) Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187–198. https://doi.org/10.1093/bioinformatics/bth499
Yu Z, Li T, Horng SJ et al (2017) An iterative locally auto-weighted least squares method for microarray missing value estimation. IEEE Trans Nanobiosci 16(1):21–33. https://doi.org/10.1109/TNB.2016.2636243
Buza K, Nanopoulosb A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowledge-Based Syst 86:250–260. https://doi.org/10.1016/j.knosys.2015.06.010
Wang G, Lu J, Choi KS et al (2020) A transfer-based additive LS-SVM classifier for handling missing data. IEEE T Cybern 50(2):739–752. https://doi.org/10.1109/TCYB.2018.2872800
Razzaghi T, Roderick O, Safro I et al (2016) Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5):e0155119. https://doi.org/10.1371/journal.pone.0155119
Qu L, Li L, Zhang Y et al (2009) PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Trans Intell Transp Syst 10(3):512–522. https://doi.org/10.1109/TITS.2009.2026312
Folch-Fortuny A, Arteaga F, Ferrer A (2015) PCA model building with missing data: new proposals and a comparative study. Chemometrics Intell Lab Syst 146:77–88. https://doi.org/10.1016/j.chemolab.2015.05.006
Yuan X, Han L, Qian S et al (2019) Singular value decomposition based recommendation using imputed data. Knowledge-Based Syst 163:485–494. https://doi.org/10.1016/j.knosys.2018.09.011
Chen X, He Z, Wang J (2018) Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transp Res Pt C-Emerg Technol 86(2018):59–77. https://doi.org/10.1016/j.trc.2017.10.023
Asif MT, Mitrovic N, Garg L et al (2013) Low-dimensional models for missing data imputation in road networks. In: EEE international conference on acoustics, speech and signal processing. IEEE, pp. 3527–3531
Chen X, Wei Z, Li Z et al (2017) Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation. Knowl-Based Syst 132:249–262. https://doi.org/10.1016/j.knosys.2017.06.010
Chen X, Cai Y, Ye Q et al (2018) Graph regularized local self-representation for missing value imputation with applications to on-road traffic sensor data. Neurocomputing 303:47–59. https://doi.org/10.1016/j.neucom.2018.04.029
Chen X, Cai Y, Liu Q et al (2018) Nonconvex l(p)-Norm regularized sparese self-representation for traffic sensor data recovery. IEEE Access 6:24279–24290. https://doi.org/10.1109/ACCESS.2018.2832043
Harel O, Zhou XH (2007) Multiple imputation: review of theory, implementation and software. Stat Med 26(16):3057–3077. https://doi.org/10.1002/sim.2787
Murray JS (2018) Multiple imputation: a review of practical and theoretical findings. Stat Sci 33(2):142–159. https://doi.org/10.1214/18-STS644
Gondara L, Wang L (2018) Mida: multiple imputation using denoising autoencoders. Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, pp 260–272
Enders CK, Mistler SA, Keller BT (2016) Multilevel multiple imputation: a review and evaluation of joint modeling and chained equations imputation. Psychol Methods 21(2):222–240. https://doi.org/10.1037/met0000063
Goodfellow I, Pouget-Abadie J, Mirza M, et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680
Arjovsky M, Chintala S, Bottou L, (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, pp. 214–223
Xu S, Zhu Q, Wang J (2020) Generative image completion with image-to-image translation. Neural Comput Appl 32(11):7333–7345. https://doi.org/10.1007/s00521-019-04253-2
Yang Y, Wang L, Xie D et al (2021) Multi-sentence auxiliary adversarial networks for fine-grained text-to-image synthesis. IEEE Trans Image Process 30:2798–2809. https://doi.org/10.1109/TIP.2021.3055062
Yoon J, Jordon J, Schaar M (2018) GAIN: missing data imputation using generative adversarial nets. In: International conference on machine learning, pp. 5675–5684
Luo Y, Cai X, Zhang Y, et al (2018) Multivariate time series imputation with generative adversarial networks. in: 32nd conference on neural information processing systems (NIPS), 2018, vol.31
Shang C, Palmer A, Sun J et al. (2017) VIGAN: missing view imputation with generative adversarial networks. In: 2017 IEEE International conference on big data (Big Data), pp. 766–775
Lee D, Kim J, Moon W J et al. (2019) CollaGAN: collaborative GAN for missing image data imputation. In: IEEE/CVF conference on computer vision and pattern recognition, pp: 2487–2496
Schafer JL, Olsen MK (1998) Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res 33(4):545–571. https://doi.org/10.1207/s15327906mbr3304_5
Ni D, Leonard JD (2005) Markov chain monte carlo multiple imputation using bayesian networks for incomplete intelligent transportation systems data, Transp. Res. Record. In: 84th annual meeting of the transportation-research-board. 1935(1):57–67
Nielsen SF (2003) Proper and improper multiple imputation. Int Stat Rev 71(3):593–607
Li D, Li L, Li X et al (2020) Smoothed LSTM-AE: a spatio-temporal deep model for multiple time-series missing imputation. Neurocomputing 411:351–363. https://doi.org/10.1016/j.neucom.2020.05.033
Zhu J, Raghunathan TE (2015) Convergence properties of a sequential regression multiple imputation algorithm. J Am Stat Assoc 110(511):1112–1124. https://doi.org/10.1080/01621459.2014.948117
Yu L, Zhou R, Chen R et al (2022) Missing data preprocessing in credit classification: one-hot encoding or imputation? Emerg Mark Financ Trade 58(2):472–482
Li M, Zhang T, Chen Y et al. (2014) Efficient mini-batch training for stochastic optimization. In: 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp: 661–670
Kong QJ, Zhao Q, Wei C et al (2013) Efficient traffic state estimation for large-scale urban road networks. IEEE Trans Intell Transp Syst 14(1):398–407. https://doi.org/10.1109/TITS.2012.2218237
Li SCX, Jiang B, Marlin B (2019) MisGAN: Learning from incomplete data with generative adversarial networks. In: International conference on learning representations
Fan J, Chow TWS (2017) Matrix completion by least-square, low-rank, and sparse self-representations. Pattern Recognit 71:290–305. https://doi.org/10.1016/j.patcog.2017.05.013
Gao S, Zhou M, Wang Y et al (2019) Dendritic neuron model with effective learning algorithms for classification, approximation and prediction. IEEE Trans. Neural Netw. Learn. Syst 30(2):601–614. https://doi.org/10.1109/TNNLS.2018.2846646
Wang J, Kumbasar T (2019) Parameter optimization of interval Type-2 fuzzy neural networks based on PSO and BBBC methods. IEEE/CAA J Autom Sinica 6(1):247–257
Acknowledgements
This research is supported by the National Natural Science Foundation of China (Key Program) (52131202) and the Natural Science Foundation of Jilin Province (20190201107JC). The authors would like to thank the Digital Roadway Interactive Visualization and Evaluation Network (DRIVENet) for providing the traffic volume data used to validate this methodology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Cao, Q., Bai, Q. et al. Multistate time series imputation using generative adversarial network with applications to traffic data. Neural Comput & Applic 35, 6545–6567 (2023). https://doi.org/10.1007/s00521-022-07961-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07961-4