Abstract
The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such as temperature, humidity, wind speed, and pressure, contributing to meteorological conditions. Furthermore, it’s imperative that classification algorithms proficiently navigate challenges such as data imbalances, where certain weather events (e.g., storms or extreme temperatures) might be underrepresented. This empirical study explores data augmentation methods to address imbalanced classes in tabular weather data in centralized and federated settings. Employing data augmentation techniques such as the Synthetic Minority Over-sampling Technique or Generative Adversarial Networks can improve the model’s accuracy in classifying rare but critical weather events. Moreover, with advancements in federated learning, machine learning models can be trained across decentralized databases, ensuring privacy and data integrity while mitigating the need for centralized data storage and processing. Thus, the classification of weather data stands as a critical bridge, linking raw meteorological data to actionable insights, enhancing our capacity to anticipate and prepare for diverse weather conditions.
Similar content being viewed by others
Data availability
The dataset analyzed during the current study is publicly available in the http://www.bom.gov.au/climate/data-services/.
Code availability
The code is available in the author’s GitHub repository https://github.com/ElahehJafarigol.
References
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp 308–318
Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1(2013):332–340
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning. PML, pp 214–223
Aydin MA (2021) Using generative adversarial networks for handling class imbalance problem. In: 2021 29th Signal processing and communications applications conference (SIU). IEEE, pp 1–4
Bao F, Deng Y, Kong Y, Ren Z, Suo J, Dai Q (2019) Learning deep landmarks for imbalanced classification. IEEE Trans Neural Netw Learn Syst
Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3(10)
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Cenggoro TW (2018) Deep learning for imbalance data classification using class expert generative adversarial network. Proc Comput Sci 135:60–67
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119
Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 875–886
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Cho H-Y, Kim Y-H (2020) A genetic algorithm to optimize SMOTE and GAN ratios in class imbalanced datasets. In: Proceedings of the 2020 genetic and evolutionary computation conference companion, pp 33–34
Choi H-S, Jung D, Kim S, Yoon S (2022) Imbalanced data classification via cooperative interaction between classifier and generator. IEEE Trans Neural Netw Learn Syst 33(8):3343–3356
Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: GrC, pp 732–737
Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans Neural Netw Learn Syst
Divovic P, Obradovic P, Misic M (2021) Networks balancing imbalanced datasets using generative adversarial neural. In: 2021 29th telecommunications forum (TELFOR). IEEE
Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407
Engelmann J, Lessmann S (2021) Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Syst Appl 174:114582
Farooq MS, Tehseen R, Qureshi JN, Omer U, Yaqoob R, Tanweer HA, Atal Z (2023) FFM: Flood forecasting model using federated learning. IEEE Access 11:24472–24483
Fatourechi M, Ward RK, Mason SG, Huggins J, Schlögl A, Birch GE (2008) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782
Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced big data classification: outcomes and challenges. Complex Intell Syst 3(2):105–120
Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data streams. In: Learning from imbalanced data sets. Springer, pp 279–303
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38
Ganganwar V (2012) An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng 2(4):42–47
Goel G, Maguire L, Li Y, McLoone S (2013) Evaluation of sampling methods for learning from imbalanced data. In: International conference on intelligent computing. Springer, pp 392–401
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Gosain A, Sardana S (2017) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85
Gulrajani I, Ahmed F, Arjovsky M, Vincent D, Courville AC (2017) Improved training of Wasserstein GANs. Adv Neural Inf Process Syst 30
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887
Hoens TR, Chawla NV (2013) Imbalanced datasets: from sampling to classifiers. In: Imbalanced learning: foundations, algorithms, and applications, pp 43–59
Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1
Huang Y, Jin Y, Li Y, Lin Z (2020) Towards imbalanced image classification: a generative adversarial network ensemble learning method. IEEE Access 8:88399–88409
Huang Y, Fields KG, Ma Y (2022) A tutorial on generative adversarial networks with application to classification of imbalanced data. Stat Anal Data Min ASA Data Sci J 15(5):543–552
Huang G, Jafari AH (2021) Enhanced balancing GAN: minority-class image generation. Neural Comput Appl 1–10
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134
Jafarigol E, Trafalis T (2020) Imbalanced learning with parametric linear programming support vector machine for weather data application. SN Comput Sci 1(6):1–11
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):27
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Le D-D, Tran A-K, Dao M-S, Nguyen-Ly K-C, Le H-S, Nguyen-Thi X-D, Pham T-Q, Nguyen V-L, Nguyen-Thi B-Y (2022) Insights into multi-model federated learning: an advanced approach for air quality index forecasting. Algorithms 15(11):434
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563
Li T, Huang Z, Li P, Liu Z, Jia C (2018) Outsourced privacy-preserving classification service over encrypted data. J Netw Comput Appl 106:100–110
Li Z, Jin Y, Li Y, Lin Z, Wang S (2018) Imbalanced adversarial learning for weather image generation and classification. In: 2018 14th IEEE International conference on signal processing (ICSP). IEEE, pp 1093–1097
Ling CX, Huang J, Zhang H et al (2003) AUC: a statistically consistent and more discriminating measure than accuracy. In Ijcai 3:519–524
Liu A, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. In: DMIN, pp 66–72
Luo J, Huang J, Li H (2021) A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis. J Intell Manuf 32:407–425
Luque A, Carrasco A, Martín A, de las Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn 91:216–231
Malhotra R, Jain J (2020) Handling imbalanced data using ensemble learning in software defect prediction. In: 2020 10th International conference on cloud computing, data science and engineering (confluence). IEEE, pp 300–304
Mani I, Zhang I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, vol 126
Marzban C, Stumpf GJ (1996) A neural network for tornado prediction based on doppler radar-derived attributes. J Appl Meteorol Climatol 35(5):617–626
Marzban C, Stumpf GJ (1998) A neural network for damaging wind prediction. Weather Forecast 13(1):151–163
McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR, pp 1273–1282
Mescheder L, Geiger A, Nowozin S (2018) Which training methods for GANs do actually converge? In: International conference on machine learning. PMLR, pp 3481–3490
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1695–1704
Niu X, Ye Q, Zhang Y, Ye D (2018) A privacy-preserving identification mechanism for mobile sensing systems. IEEE Access 6:15457–15467
Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224
Sarada C, SathyaDevi M (2019) Imbalanced big data classification using feature selection under-sampling. CVR J Sci Technol 17(1):78–82
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Shamsolmoali P, Zareapoor M, Shen L, Sadka AH, Yang J (2021) Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing 459:481–493
Sharma A, Singh PK, Chandra R (2022) SMOTified-GAN for class imbalanced pattern classification problems. IEEE Access 10:30655–30665
Sharma S, Bellinger C, Krawczyk B, Zaiane O, Japkowicz N (2018) Synthetic oversampling with the majority class: a new perspective on handling extreme imbalance. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 447–456
Sonak A, Patankar RA (2015) A survey on methods to handle imbalance dataset. Int J Comput Sci Mobile Comput 4(11):338–343
Sonak A, Patankar R, Pise N (2016) A new approach for handling imbalanced dataset using ANN and genetic algorithm. In: 2016 International conference on communication and signal processing (ICCSP). IEEE, pp 1987–1990
Trafalis TB, Ince H, Richman MB (2003) Tornado detection with support vector machines. In: International conference on computational science. Springer, pp 289–298
Trafalis TB, Adrianto I, Richman MB, Lakshmivarahan S (2014) Machine-learning classifiers for imbalanced tornado data. CMS 11(4):403–418
Wang Q, Zhou Y, Zhang W, Tang Z, Chen X (2020) Adaptive sampling using self-paced learning for imbalanced cancer data pre-diagnosis. Expert Syst Appl 152:113334
Wen H, Du Y, Lim EG, Wen H, Yan K, Li X, Jiang L (2022) A solar forecasting framework based on federated learning and distributed computing. Build Environ 225:109556
Wu Z, Lin W, Ji Y (2018) An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE Access 6:8394–8402
Xiao C, Wang S (2021) An experimental study of class imbalance in federated learning. In: 2021 IEEE symposium series on computational intelligence (SSCI). IEEE
Xie Y, Zhang T (2018) Imbalanced learning for fault diagnosis problem of rotating machinery based on generative adversarial networks. In: 2018 37th Chinese control conference (CCC). IEEE, pp 6017–6022
Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K (2019) Modeling tabular data using conditional gan. Adv Neural Inf Process Syst 32
Yang X-L, Lo D, Xia X, Huang Q, Sun J-L (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198
Yang M, Wang X, Zhu H, Wang H, Qian H (2021) Federated learning with class imbalance reduction. In: 2021 29th European signal processing conference (EUSIPCO). IEEE
Zhai J, Qi J, Zhang S (2022) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern 13:735–750
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
Zhou Z, Zhang B, Lv Y, Shi T, Chang F (2019) Data augment in imbalanced learning based on generative adversarial networks. Springer International Publishing, pp 21–30
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jafarigol, E., Trafalis, T.B. A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs. Comput Manag Sci 21, 22 (2024). https://doi.org/10.1007/s10287-024-00504-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10287-024-00504-3