Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

The classification of weather data involves categorizing meteorological phenomena into classes, thereby facilitating nuanced analyses and precise predictions for various sectors such as agriculture, aviation, and disaster management. This involves utilizing machine learning models to analyze large, multidimensional weather datasets for patterns and trends. These datasets may include variables such as temperature, humidity, wind speed, and pressure, contributing to meteorological conditions. Furthermore, it’s imperative that classification algorithms proficiently navigate challenges such as data imbalances, where certain weather events (e.g., storms or extreme temperatures) might be underrepresented. This empirical study explores data augmentation methods to address imbalanced classes in tabular weather data in centralized and federated settings. Employing data augmentation techniques such as the Synthetic Minority Over-sampling Technique or Generative Adversarial Networks can improve the model’s accuracy in classifying rare but critical weather events. Moreover, with advancements in federated learning, machine learning models can be trained across decentralized databases, ensuring privacy and data integrity while mitigating the need for centralized data storage and processing. Thus, the classification of weather data stands as a critical bridge, linking raw meteorological data to actionable insights, enhancing our capacity to anticipate and prepare for diverse weather conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The dataset analyzed during the current study is publicly available in the http://www.bom.gov.au/climate/data-services/.

Code availability

The code is available in the author’s GitHub repository https://github.com/ElahehJafarigol.

References

  • Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp 308–318

  • Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J Netw Innov Comput 1(2013):332–340

    Google Scholar 

  • Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning. PML, pp 214–223

  • Aydin MA (2021) Using generative adversarial networks for handling class imbalance problem. In: 2021 29th Signal processing and communications applications conference (SIU). IEEE, pp 1–4

  • Bao F, Deng Y, Kong Y, Ren Z, Suo J, Dai Q (2019) Learning deep landmarks for imbalanced classification. IEEE Trans Neural Netw Learn Syst

  • Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3(10)

  • Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Google Scholar 

  • Cenggoro TW (2018) Deep learning for imbalance data classification using class expert generative adversarial network. Proc Comput Sci 135:60–67

    Google Scholar 

  • Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, pp 107–119

  • Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook. Springer, pp 875–886

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  • Cho H-Y, Kim Y-H (2020) A genetic algorithm to optimize SMOTE and GAN ratios in class imbalanced datasets. In: Proceedings of the 2020 genetic and evolutionary computation conference companion, pp 33–34

  • Choi H-S, Jung D, Kim S, Yoon S (2022) Imbalanced data classification via cooperative interaction between classifier and generator. IEEE Trans Neural Netw Learn Syst 33(8):3343–3356

    Google Scholar 

  • Cieslak DA, Chawla NV, Striegel A (2006) Combating imbalance in network intrusion datasets. In: GrC, pp 732–737

  • Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans Neural Netw Learn Syst

  • Divovic P, Obradovic P, Misic M (2021) Networks balancing imbalanced datasets using generative adversarial neural. In: 2021 29th telecommunications forum (TELFOR). IEEE

  • Douzas G, Bacao F (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl 91:464–471

    Google Scholar 

  • Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends® Theor Comput Sci 9(3–4):211–407

  • Engelmann J, Lessmann S (2021) Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Syst Appl 174:114582

    Google Scholar 

  • Farooq MS, Tehseen R, Qureshi JN, Omer U, Yaqoob R, Tanweer HA, Atal Z (2023) FFM: Flood forecasting model using federated learning. IEEE Access 11:24472–24483

    Google Scholar 

  • Fatourechi M, Ward RK, Mason SG, Huggins J, Schlögl A, Birch GE (2008) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782

  • Fernández A, del Río S, Chawla NV, Herrera F (2017) An insight into imbalanced big data classification: outcomes and challenges. Complex Intell Syst 3(2):105–120

    Google Scholar 

  • Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905

    Google Scholar 

  • Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data streams. In: Learning from imbalanced data sets. Springer, pp 279–303

  • Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38

    Google Scholar 

  • Ganganwar V (2012) An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng 2(4):42–47

    Google Scholar 

  • Goel G, Maguire L, Li Y, McLoone S (2013) Evaluation of sampling methods for learning from imbalanced data. In: International conference on intelligent computing. Springer, pp 392–401

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Google Scholar 

  • Gosain A, Sardana S (2017) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85

  • Gulrajani I, Ahmed F, Arjovsky M, Vincent D, Courville AC (2017) Improved training of Wasserstein GANs. Adv Neural Inf Process Syst 30

  • Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Google Scholar 

  • Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123

    Google Scholar 

  • Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, pp 878–887

  • Hoens TR, Chawla NV (2013) Imbalanced datasets: from sampling to classifiers. In: Imbalanced learning: foundations, algorithms, and applications, pp 43–59

  • Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5(2):1

    Google Scholar 

  • Huang Y, Jin Y, Li Y, Lin Z (2020) Towards imbalanced image classification: a generative adversarial network ensemble learning method. IEEE Access 8:88399–88409

    Google Scholar 

  • Huang Y, Fields KG, Ma Y (2022) A tutorial on generative adversarial networks with application to classification of imbalanced data. Stat Anal Data Min ASA Data Sci J 15(5):543–552

    Google Scholar 

  • Huang G, Jafari AH (2021) Enhanced balancing GAN: minority-class image generation. Neural Comput Appl 1–10

  • Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  • Jafarigol E, Trafalis T (2020) Imbalanced learning with parametric linear programming support vector machine for weather data application. SN Comput Sci 1(6):1–11

    Google Scholar 

  • Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):27

    Google Scholar 

  • Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232

    Google Scholar 

  • Le D-D, Tran A-K, Dao M-S, Nguyen-Ly K-C, Le H-S, Nguyen-Thi X-D, Pham T-Q, Nguyen V-L, Nguyen-Thi B-Y (2022) Insights into multi-model federated learning: an advanced approach for air quality index forecasting. Algorithms 15(11):434

    Google Scholar 

  • Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

  • Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(1):559–563

    Google Scholar 

  • Li T, Huang Z, Li P, Liu Z, Jia C (2018) Outsourced privacy-preserving classification service over encrypted data. J Netw Comput Appl 106:100–110

    Google Scholar 

  • Li Z, Jin Y, Li Y, Lin Z, Wang S (2018) Imbalanced adversarial learning for weather image generation and classification. In: 2018 14th IEEE International conference on signal processing (ICSP). IEEE, pp 1093–1097

  • Ling CX, Huang J, Zhang H et al (2003) AUC: a statistically consistent and more discriminating measure than accuracy. In Ijcai 3:519–524

    Google Scholar 

  • Liu A, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. In: DMIN, pp 66–72

  • Luo J, Huang J, Li H (2021) A case study of conditional deep convolutional generative adversarial networks in machine fault diagnosis. J Intell Manuf 32:407–425

    Google Scholar 

  • Luque A, Carrasco A, Martín A, de las Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn 91:216–231

    Google Scholar 

  • Malhotra R, Jain J (2020) Handling imbalanced data using ensemble learning in software defect prediction. In: 2020 10th International conference on cloud computing, data science and engineering (confluence). IEEE, pp 300–304

  • Mani I, Zhang I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, vol 126

  • Marzban C, Stumpf GJ (1996) A neural network for tornado prediction based on doppler radar-derived attributes. J Appl Meteorol Climatol 35(5):617–626

    Google Scholar 

  • Marzban C, Stumpf GJ (1998) A neural network for damaging wind prediction. Weather Forecast 13(1):151–163

    Google Scholar 

  • McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR, pp 1273–1282

  • Mescheder L, Geiger A, Nowozin S (2018) Which training methods for GANs do actually converge? In: International conference on machine learning. PMLR, pp 3481–3490

  • Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784

  • Mullick SS, Datta S, Das S (2019) Generative adversarial minority oversampling. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1695–1704

  • Niu X, Ye Q, Zhang Y, Ye D (2018) A privacy-preserving identification mechanism for mobile sensing systems. IEEE Access 6:15457–15467

    Google Scholar 

  • Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224

    Google Scholar 

  • Sarada C, SathyaDevi M (2019) Imbalanced big data classification using feature selection under-sampling. CVR J Sci Technol 17(1):78–82

    Google Scholar 

  • Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Google Scholar 

  • Shamsolmoali P, Zareapoor M, Shen L, Sadka AH, Yang J (2021) Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing 459:481–493

    Google Scholar 

  • Sharma A, Singh PK, Chandra R (2022) SMOTified-GAN for class imbalanced pattern classification problems. IEEE Access 10:30655–30665

    Google Scholar 

  • Sharma S, Bellinger C, Krawczyk B, Zaiane O, Japkowicz N (2018) Synthetic oversampling with the majority class: a new perspective on handling extreme imbalance. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 447–456

  • Sonak A, Patankar RA (2015) A survey on methods to handle imbalance dataset. Int J Comput Sci Mobile Comput 4(11):338–343

    Google Scholar 

  • Sonak A, Patankar R, Pise N (2016) A new approach for handling imbalanced dataset using ANN and genetic algorithm. In: 2016 International conference on communication and signal processing (ICCSP). IEEE, pp 1987–1990

  • Trafalis TB, Ince H, Richman MB (2003) Tornado detection with support vector machines. In: International conference on computational science. Springer, pp 289–298

  • Trafalis TB, Adrianto I, Richman MB, Lakshmivarahan S (2014) Machine-learning classifiers for imbalanced tornado data. CMS 11(4):403–418

    Google Scholar 

  • Wang Q, Zhou Y, Zhang W, Tang Z, Chen X (2020) Adaptive sampling using self-paced learning for imbalanced cancer data pre-diagnosis. Expert Syst Appl 152:113334

    Google Scholar 

  • Wen H, Du Y, Lim EG, Wen H, Yan K, Li X, Jiang L (2022) A solar forecasting framework based on federated learning and distributed computing. Build Environ 225:109556

    Google Scholar 

  • Wu Z, Lin W, Ji Y (2018) An integrated ensemble learning model for imbalanced fault diagnostics and prognostics. IEEE Access 6:8394–8402

    Google Scholar 

  • Xiao C, Wang S (2021) An experimental study of class imbalance in federated learning. In: 2021 IEEE symposium series on computational intelligence (SSCI). IEEE

  • Xie Y, Zhang T (2018) Imbalanced learning for fault diagnosis problem of rotating machinery based on generative adversarial networks. In: 2018 37th Chinese control conference (CCC). IEEE, pp 6017–6022

  • Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K (2019) Modeling tabular data using conditional gan. Adv Neural Inf Process Syst 32

  • Yang X-L, Lo D, Xia X, Huang Q, Sun J-L (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198

    Google Scholar 

  • Yang M, Wang X, Zhu H, Wang H, Qian H (2021) Federated learning with class imbalance reduction. In: 2021 29th European signal processing conference (EUSIPCO). IEEE

  • Zhai J, Qi J, Zhang S (2022) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern 13:735–750

    Google Scholar 

  • Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915

  • Zhou Z, Zhang B, Lv Y, Shi T, Chang F (2019) Data augment in imbalanced learning based on generative adversarial networks. Springer International Publishing, pp 21–30

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elaheh Jafarigol.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jafarigol, E., Trafalis, T.B. A distributed approach to meteorological predictions: addressing data imbalance in precipitation prediction models through federated learning and GANs. Comput Manag Sci 21, 22 (2024). https://doi.org/10.1007/s10287-024-00504-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10287-024-00504-3

Keywords

Navigation