Abstract
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest resulting in an extreme imbalance in the data. There have been many methods introduced in the literature for overcoming this issue; simple data manipulation through undersampling and oversampling, utilizing cost-sensitive learning algorithms, or by generating synthetic data points following the distribution of the existing data. While synthetic data generation has recently received a great deal of attention, there are real challenges involved in doing so for high-dimensional data such as multivariate time series. In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling in order to balance a large dataset of multivariate time series. We utilize a flare forecasting benchmark dataset, named SWAN-SF, and design two verification methods to both quantitatively and qualitatively evaluate the similarity between the generated minority and the ground-truth samples. We further assess the quality of the generated samples by training a classical, supervised machine learning algorithm on synthetic data, and testing the trained model on the unseen, real data. The results show that the classifier trained on the data augmented with the synthetic multivariate time series achieves a significant improvement compared with the case where no augmentation is used. The popular flare forecasting evaluation metrics, TSS and HSS, report 20-fold and 5-fold improvements, respectively, indicating the remarkable statistical similarities, and the usefulness of CGAN-based data generation for complicated tasks such as flare forecasting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Withbroe, G.L.: Living With a Star. American Geophysical Union, pp. 45–51 (2013). [Online] https://doi.org/10.1029/GM125p0045
N.R. Council: Severe Space Weather Events-Understanding Societal and Economic Impacts: A Workshop Report. Washington, D.C., The National Academies Press, 2008. [Online] https://doi.org/10.17226/12507
Boteler, D.H.: Geomagnetic hazards to conducting networks. Nat. Hazards 28(2), 537–561 (2003). [Online] https://doi.org/10.1023/A:1022902713136
Benz, A.O.: Flare observations. Living Rev. Sol. Phys. 5(1) (2008). [Online] https://doi.org/10.12942/lrsp-2008-1
Martens, P.C., Angryk, R.A.: Data handling and assimilation for solar event prediction. In: Proceedings of the International Astronomical Union, vol. 13, no. S335, pp. 344–347 (2017). [Online] https://doi.org/10.1017/S1743921318000510
Hostetter, M., et al.: Understanding the impact of statistical time series features for flare prediction analysis. In: 2019 IEEE International Conference on Big Data (Big Data), 9–12 December 2019, Los Angeles, CA, USA, pp. 4960–4966. IEEE (2019). [Online] https://doi.org/10.1109/BigData47090.2019.9006116
Ahmadzadeh, A., et al.: How to train your flare prediction model: revisiting robust sampling of rare events. arXiv e-prints arXiv:2103.07542, March 2021
Ahmadzadeh, A., et al.: Challenges with extreme class-imbalance and temporal coherence: a study on solar flare data. In: 2019 IEEE International Conference on Big Data (Big Data) 2019, pp. 1423–1431 (2019). [Online] https://doi.org/10.1109/BigData47090.2019.9006505
Ahmadzadeh, A., et al.: Rare-event time series prediction: a case study of solar flare forecasting. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1814–1820 (2019). [Online] https://doi.org/10.1109/ICMLA.2019.00293
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS 2014. Cambridge, MA, USA, pp. 2672–2680. MIT Press (2014). [Online] https://doi.org/10.5555/2969033.2969125
Radford, A., et al.: Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, vol. abs/1511.06434 (2016)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. JMLR.org 2017, pp. 214–223 (2017). [Online] https://dl.acm.org/doi/10.5555/3305381.3305404
Chen, X., et al.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, ser. NIPS 2016. Red Hook, NY, USA: Curran Associates Inc., pp. 2180–2188 (2016). [Online] https://doi.org/10.5555/3157096.3157340
Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014). [Online] arXiv:1411.1784
Esteban, C., Hyland, S.L., Rätsch, G.: Real-valued (medical) time series generation with recurrent conditional GANs (2017). arXiv:1706.02633
Mogren, O.: C-RNN-GAN: a continuous recurrent neural network with adversarial training. In: Constructive Machine Learning Workshop (CML) at NIPS 2016 (2016)
Yoon, J., et al.: Time-series generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 5508–5518 (2019)
Middelkamp, A.: Online. Praktische Huisartsgeneeskunde, vol. 3(4), 3 (2017). https://doi.org/10.1007/s41045-017-0040-y
Hoeksema, J.T., et al.: The Helioseismic and magnetic imager (HMI) vector magnetic field pipeline: overview and performance. Sol. Phys. 289(9), 3483–3530 (2014). [Online] https://doi.org/10.1007/s11207-014-0516-8
Bobra, M.G., et al.: The Helioseismic and magnetic imager (HMI) vector magnetic field pipeline: sharps-space-weather HMI active region patches. Solar Phys. 289(9), 3549–3578 (2014). [Online] https://doi.org/10.1007/s11207-014-0529-3
Angryk, R.A., et al.: Multivariate time series dataset for space weather data analytics. Sci. Data 7(1) (2020). [Online] https://doi.org/10.1038/s41597-020-0548-x
Chan, C., et al.: Everybody dance now. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, October 2019. [Online] https://doi.org/10.1109/iccv.2019.00603
Park, T., et al.: GauGAN: semantic image synthesis with spatially adaptive normalization. In: ACM SIGGRAPH 2019 Real-Time Live!, ser. SIGGRAPH 2019. New York, NY, USA. Association for Computing Machinery (2019). [Online] https://doi.org/10.1145/3306305.3332370
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org
Yale, A., et al.: Assessing privacy and quality of synthetic health data. In: Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse, ser. AIDR 2019. New York, NY, USA. Association for Computing Machinery (2019). [Online] https://doi.org/10.1145/3359115.3359124
Hanssen, A., Kuipers, W.: On the Relationship Between the Frequency of Rain and Various Meteorological Parameters: (with Reference to the Problem Ob Objective Forecasting), ser. Koninkl. Nederlands Meterologisch Institut. Mededelingen en Verhandelingen. Staatsdrukkerij- en Uitgeverijbedrijf (1965). [Online] https://books.google.com/books?id=nTZ8OgAACAAJ
Balch, C.C.: Updated verification of the space weather prediction center’s solar energetic particle prediction model. Space Weather Int. J. Res. Appl. 6(1) (2008). [Online] https://doi.org/10.1029/2007SW000337
Acknowledgment
This project has been supported in part by funding from the Division of Advanced Cyberinfrastructure within the Directorate for Computer and Information Science and Engineering, the Division of Atmospheric & Geospace Sciences within the Directorate for Geosciences, under NSF awards #193155 and # 1936361.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Y., Kempton, D.J., Ahmadzadeh, A., Angryk, R.A. (2021). Towards Synthetic Multivariate Time Series Generation for Flare Forecasting. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2021. Lecture Notes in Computer Science(), vol 12854. Springer, Cham. https://doi.org/10.1007/978-3-030-87986-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-87986-0_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87985-3
Online ISBN: 978-3-030-87986-0
eBook Packages: Computer ScienceComputer Science (R0)