A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs

Elaheh Jafarigol¹ &
Theodore B. Trafalis²

323 Accesses
1 Citation
Explore all metrics

Abstract

The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such as temperature, humidity, wind speed, and pressure, contributing to meteorological conditions. Furthermore, it’s imperative that classification algorithms proficiently navigate challenges such as data imbalances, where certain weather events (e.g., storms or extreme temperatures) might be underrepresented. This empirical study explores data augmentation methods to address imbalanced classes in tabular weather data in centralized and federated settings. Employing data augmentation techniques such as the Synthetic Minority Over-sampling Technique or Generative Adversarial Networks can improve the model’s accuracy in classifying rare but critical weather events. Moreover, with advancements in federated learning, machine learning models can be trained across decentralized databases, ensuring privacy and data integrity while mitigating the need for centralized data storage and processing. Thus, the classification of weather data stands as a critical bridge, linking raw meteorological data to actionable insights, enhancing our capacity to anticipate and prepare for diverse weather conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced Flood Forecasting: Revolutionizing Prediction with Federated Learning

Adversarial Training of Logistic Regression Classifiers for Weather Prediction Against Poison and Evasion Attacks

A Survey on Predictive Modelling for Diverse Climate Condition and Heavy Rainfall

Data availability

The dataset analyzed during the current study is publicly available in the http://www.bom.gov.au/climate/data-services/.

Code availability

The code is available in the author’s GitHub repository https://github.com/ElahehJafarigol.

References

Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp 308–318
Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1(2013):332–340
Google Scholar
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning. PML, pp 214–223
Aydin MA (2021) Using generative adversarial networks for handling class imbalance problem. In: 2021 29th Signal processing and communications applications conference (SIU). IEEE, pp 1–4
Bao F, Deng Y, Kong Y, Ren Z, Suo J, Dai Q (2019) Learning deep landmarks for imbalanced classification. IEEE Trans Neural Netw Learn Syst
Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3(10)
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Google Scholar
Cenggoro TW (2018) Deep learning for imbalance data classification using class expert generative adversarial network. Proc Comput Sci 135:60–67
Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 875–886
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Google Scholar
Cho H-Y, Kim Y-H (2020) A genetic algorithm to optimize SMOTE and GAN ratios in class imbalanced datasets. In: Proceedings of the 2020 genetic and evolutionary computation conference companion, pp 33–34
Choi H-S, Jung D, Kim S, Yoon S (2022) Imbalanced data classification via cooperative interaction between classifier and generator. IEEE Trans Neural Netw Learn Syst 33(8):3343–3356
Google Scholar
Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: GrC, pp 732–737
Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans Neural Netw Learn Syst
Divovic P, Obradovic P, Misic M (2021) Networks balancing imbalanced datasets using generative adversarial neural. In: 2021 29th telecommunications forum (TELFOR). IEEE
Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471
Google Scholar
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407
Engelmann J, Lessmann S (2021) Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Syst Appl 174:114582
Google Scholar
Farooq MS, Tehseen R, Qureshi JN, Omer U, Yaqoob R, Tanweer HA, Atal Z (2023) FFM: Flood forecasting model using federated learning. IEEE Access 11:24472–24483
Google Scholar
Fatourechi M, Ward RK, Mason SG, Huggins J, Schlögl A, Birch GE (2008) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782
Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced big data classification: outcomes and challenges. Complex Intell Syst 3(2):105–120
Google Scholar
Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Google Scholar
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data streams. In: Learning from imbalanced data sets. Springer, pp 279–303
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38
Google Scholar
Ganganwar V (2012) An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng 2(4):42–47
Google Scholar
Goel G, Maguire L, Li Y, McLoone S (2013) Evaluation of sampling methods for learning from imbalanced data. In: International conference on intelligent computing. Springer, pp 392–401
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Google Scholar
Gosain A, Sardana S (2017) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85
Gulrajani I, Ahmed F, Arjovsky M, Vincent D, Courville AC (2017) Improved training of Wasserstein GANs. Adv Neural Inf Process Syst 30
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Google Scholar
Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123
Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
Hoens TR, Chawla NV (2013) Imbalanced datasets: from sampling to classifiers. In: Imbalanced learning: foundations, algorithms, and applications, pp 43–59
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1
Google Scholar
Huang Y, Jin Y, Li Y, Lin Z (2020) Towards imbalanced image classification: a generative adversarial network ensemble learning method. IEEE Access 8:88399–88409
Google Scholar
Huang Y, Fields KG, Ma Y (2022) A tutorial on generative adversarial networks with application to classification of imbalanced data. Stat Anal Data Min ASA Data Sci J 15(5):543–552
Google Scholar
Huang G, Jafari AH (2021) Enhanced balancing GAN: minority-class image generation. Neural Comput Appl 1–10
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Jafarigol E, Trafalis T (2020) Imbalanced learning with parametric linear programming support vector machine for weather data application. SN Comput Sci 1(6):1–11
Google Scholar
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):27
Google Scholar
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Google Scholar
Le D-D, Tran A-K, Dao M-S, Nguyen-Ly K-C, Le H-S, Nguyen-Thi X-D, Pham T-Q, Nguyen V-L, Nguyen-Thi B-Y (2022) Insights into multi-model federated learning: an advanced approach for air quality index forecasting. Algorithms 15(11):434
Google Scholar
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563
Google Scholar
Li T, Huang Z, Li P, Liu Z, Jia C (2018) Outsourced privacy-preserving classification service over encrypted data. J Netw Comput Appl 106:100–110
Google Scholar
Li Z, Jin Y, Li Y, Lin Z, Wang S (2018) Imbalanced adversarial learning for weather image generation and classification. In: 2018 14th IEEE International conference on signal processing (ICSP). IEEE, pp 1093–1097
Ling CX, Huang J, Zhang H et al (2003) AUC: a statistically consistent and more discriminating measure than accuracy. In Ijcai 3:519–524
Google Scholar
Liu A, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. In: DMIN, pp 66–72
Luo J, Huang J, Li H (2021) A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis. J Intell Manuf 32:407–425
Google Scholar
Luque A, Carrasco A, Martín A, de las Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn 91:216–231
Google Scholar
Malhotra R, Jain J (2020) Handling imbalanced data using ensemble learning in software defect prediction. In: 2020 10th International conference on cloud computing, data science and engineering (confluence). IEEE, pp 300–304
Mani I, Zhang I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, vol 126
Marzban C, Stumpf GJ (1996) A neural network for tornado prediction based on doppler radar-derived attributes. J Appl Meteorol Climatol 35(5):617–626
Google Scholar
Marzban C, Stumpf GJ (1998) A neural network for damaging wind prediction. Weather Forecast 13(1):151–163
Google Scholar
McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR, pp 1273–1282
Mescheder L, Geiger A, Nowozin S (2018) Which training methods for GANs do actually converge? In: International conference on machine learning. PMLR, pp 3481–3490
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1695–1704
Niu X, Ye Q, Zhang Y, Ye D (2018) A privacy-preserving identification mechanism for mobile sensing systems. IEEE Access 6:15457–15467
Google Scholar
Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224
Google Scholar
Sarada C, SathyaDevi M (2019) Imbalanced big data classification using feature selection under-sampling. CVR J Sci Technol 17(1):78–82
Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Google Scholar
Shamsolmoali P, Zareapoor M, Shen L, Sadka AH, Yang J (2021) Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing 459:481–493
Google Scholar
Sharma A, Singh PK, Chandra R (2022) SMOTified-GAN for class imbalanced pattern classification problems. IEEE Access 10:30655–30665
Google Scholar
Sharma S, Bellinger C, Krawczyk B, Zaiane O, Japkowicz N (2018) Synthetic oversampling with the majority class: a new perspective on handling extreme imbalance. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 447–456
Sonak A, Patankar RA (2015) A survey on methods to handle imbalance dataset. Int J Comput Sci Mobile Comput 4(11):338–343
Google Scholar
Sonak A, Patankar R, Pise N (2016) A new approach for handling imbalanced dataset using ANN and genetic algorithm. In: 2016 International conference on communication and signal processing (ICCSP). IEEE, pp 1987–1990
Trafalis TB, Ince H, Richman MB (2003) Tornado detection with support vector machines. In: International conference on computational science. Springer, pp 289–298
Trafalis TB, Adrianto I, Richman MB, Lakshmivarahan S (2014) Machine-learning classifiers for imbalanced tornado data. CMS 11(4):403–418
Google Scholar
Wang Q, Zhou Y, Zhang W, Tang Z, Chen X (2020) Adaptive sampling using self-paced learning for imbalanced cancer data pre-diagnosis. Expert Syst Appl 152:113334
Google Scholar
Wen H, Du Y, Lim EG, Wen H, Yan K, Li X, Jiang L (2022) A solar forecasting framework based on federated learning and distributed computing. Build Environ 225:109556
Google Scholar
Wu Z, Lin W, Ji Y (2018) An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE Access 6:8394–8402
Google Scholar
Xiao C, Wang S (2021) An experimental study of class imbalance in federated learning. In: 2021 IEEE symposium series on computational intelligence (SSCI). IEEE
Xie Y, Zhang T (2018) Imbalanced learning for fault diagnosis problem of rotating machinery based on generative adversarial networks. In: 2018 37th Chinese control conference (CCC). IEEE, pp 6017–6022
Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K (2019) Modeling tabular data using conditional gan. Adv Neural Inf Process Syst 32
Yang X-L, Lo D, Xia X, Huang Q, Sun J-L (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198
Google Scholar
Yang M, Wang X, Zhu H, Wang H, Qian H (2021) Federated learning with class imbalance reduction. In: 2021 29th European signal processing conference (EUSIPCO). IEEE
Zhai J, Qi J, Zhang S (2022) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern 13:735–750
Google Scholar
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
Zhou Z, Zhang B, Lv Y, Shi T, Chang F (2019) Data augment in imbalanced learning based on generative adversarial networks. Springer International Publishing, pp 21–30

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Data Science and Analytics Institute, University of Oklahoma, 202 W. Boyd St., Room 409, Norman, OK, 73019, USA
Elaheh Jafarigol
Industrial and Systems Engineering, University of Oklahoma, 202 W. Boyd St., Room 104, Norman, OK, 73019, USA
Theodore B. Trafalis

Authors

Elaheh Jafarigol
View author publications
You can also search for this author in PubMed Google Scholar
Theodore B. Trafalis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elaheh Jafarigol.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jafarigol, E., Trafalis, T.B. A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs. Comput Manag Sci 21, 22 (2024). https://doi.org/10.1007/s10287-024-00504-3

Download citation

Received: 18 October 2023
Accepted: 19 January 2024
Published: 19 February 2024
DOI: https://doi.org/10.1007/s10287-024-00504-3

A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhanced Flood Forecasting: Revolutionizing Prediction with Federated Learning

Adversarial Training of Logistic Regression Classifiers for Weather Prediction Against Poison and Evasion Attacks

A Survey on Predictive Modelling for Diverse Climate Condition and Heavy Rainfall

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enhanced Flood Forecasting: Revolutionizing Prediction with Federated Learning

Adversarial Training of Logistic Regression Classifiers for Weather Prediction Against Poison and Evasion Attacks

A Survey on Predictive Modelling for Diverse Climate Condition and Heavy Rainfall

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation