Predicting PM2.5, PM10, SO2, NO2, NO and CO Air Pollutant Values with Linear Regression in R Language
<p>Flowchart presenting the proposed method in an air pollution prediction system, based on the developed R program.</p> "> Figure 2
<p>Air pollutants and meteorological measured values loaded into the R GUI.</p> "> Figure 3
<p>Created the first part of the R program with creating data vectors, relations with linear regression, and drawing diagrams in the R GUI.</p> "> Figure 4
<p>Created the second part of the R program, which deals with predicting air pollutants values, fitting the linear model, and printing the results.</p> "> Figure 5
<p>Example of R program execution for linear regression upon CO and NO pairs or vectors and the resulting statistics regarding mathematical model accuracy.</p> "> Figure 6
<p>Example of results from executing the predict function for CO data vectors based on correlation with NO (within the execution of the created R program).</p> "> Figure 7
<p>Diagram of comparatively presented real measurement data of CO and predicted values (range) of CO, based on linear correlation with NO values.</p> "> Figure 8
<p>Correlation heat map with all obtained air pollution parameters from the sample.</p> "> Figure 9
<p>Linear regression diagrams created within R GUI with R program (<b>a</b>) PM10-PM2.5; (<b>b</b>) CO-NO; (<b>c</b>) SO<sub>2</sub>-NO<sub>2</sub>; (<b>d</b>) NO-NO<sub>X</sub>; (<b>e</b>) CO-NO<sub>2</sub>; (<b>f</b>) NO<sub>2</sub>-NO<sub>X</sub>.</p> "> Figure 9 Cont.
<p>Linear regression diagrams created within R GUI with R program (<b>a</b>) PM10-PM2.5; (<b>b</b>) CO-NO; (<b>c</b>) SO<sub>2</sub>-NO<sub>2</sub>; (<b>d</b>) NO-NO<sub>X</sub>; (<b>e</b>) CO-NO<sub>2</sub>; (<b>f</b>) NO<sub>2</sub>-NO<sub>X</sub>.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Brauer, M.; Hoek, G.; Smit, H.A.; De Jongste, J.C.; Gerritsen, J.; Postma, D.S.; Kerkhof, M.; Brunekreef, B. Air pollution and development of asthma, allergy and infections in a birth cohort. Eur. Respir. J. 2007, 5, 879–888. [Google Scholar] [CrossRef]
- Tusnio, N.; Fichna, J.; Nowakowski, P.; Tofilo, P. Air Pollution Associates with Cancer Incidences in Poland. Appl. Sci. 2020, 10, 7489. [Google Scholar] [CrossRef]
- Balogun, H.A.; Rantala, A.K.; Antikainen, H.; Siddika, N.; Amegah, A.K.; Ryti, N.R.I.; Kukkonen, J.; Sofiev, M.; Jaakkola, M.S.; Jaakkola, J.J.K. Effects of Air Pollution on the Risk of Low Birth Weight in a Cold Climate. Appl. Sci. 2020, 10, 6399. [Google Scholar] [CrossRef]
- McConnell, R.; Berhane, K.; Yao, L.; Jerrett, M.; Lurmann, F.; Gilliland, F.; Kunzli, N.; Gauderman, J.; Avol, E.; Thomas, D.; et al. Traffic, susceptibility, and childhood asthma. Environ. Health Persp. 2006, 114, 766–772. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Morgenstern, V.; Zutaver, A.; Cyrys, J.; Brockow, I.; Koletzko, S.; Kramer, U.; Behrendt, H.; Herbarth, O.; von Berg, A.; Bauer, P.C.; et al. Atopic diseases, allergic sensitization, and exposure to traffic-related air pollution in children. Am. J. Respir. Crit. Care Med. 2008, 177, 1331–1337. [Google Scholar] [CrossRef] [PubMed]
- Olvera-García, M.A.; Carbajal-Hernández, J.J.; Sánchez-Fernández, L.P.; Hernández-Bautista, I. Air quality assessment using a weighted Fuzzy Inference System. Ecol. Inform. 2016, 33, 57–74. [Google Scholar] [CrossRef]
- Morley, D.W.; Gulliver, J. A land use regression variable generation, modelling and prediction tool for air pollution exposure assessment. Environ. Modell. Softw. 2018, 105, 17–23. [Google Scholar] [CrossRef]
- Betancourt, C.; Hagemeier, B.; Schroder, S.; Schultz, M.G. Context aware benchmarking and tuning of a TByte-scale air quality database and web service. Earth Sci. Inform. 2021, 14, 1597–1607. [Google Scholar] [CrossRef]
- Rajat, R.R.; Vaibhav, D.; Ridam, G.; Rahul, P.; Pratik, G.; Mukul, S.; Ritik, J.; Preetee, K. Prediction of Air Quality Index Using Supervised Machine Learning. Int. J. Res. Appl. Sci. Eng. Tech. 2022, 10, 1371–1382. [Google Scholar]
- Xing, H.; Zhu, L.; Chen, B.; Niu, J.; Li, X.; Feng, Y.; Fang, W. Spatial and temporal changes analysis of air quality before and after the COVID-19 in Shandong Province, China. Earth Sci. Inform. 2022, 15, 863–876. [Google Scholar] [CrossRef]
- Carmichael, G.R.; Sandu, A.; Chai, T.; Daescu, D.N.; Constantinescu, E.M.; Tang, Y. Predicting air quality: Improvements through advanced methods to integrate models and measurements. J. Comput. Phys. 2008, 227, 3540–3571. [Google Scholar] [CrossRef]
- Ilijazi, V.; Jacimovski, S.; Milic, N.; Popovic, B. Software-Supported Visualization of Mathematical Spatial-Time Distribution Models of Air-Pollutant Emissions. J. Sci. Ind. Res. 2021, 80, 915–923. Available online: http://op.niscair.res.in/index.php/JSIR/article/view/46963/465479886 (accessed on 30 August 2022).
- Kadivala, A.; Kumar, A. Applications of Python to evaluate environmental data science problems. Environ. Prog. Sustain. 2017, 16, 1580–1586. [Google Scholar] [CrossRef]
- Dutang, C.; Goulet, V.; Pigeon, M. Actuar: An R package for actuarial science. J. Stat. Softw. 2008, 25, 1–37. [Google Scholar]
- Ihaka, R.; Gentleman, R. R: A Language for Data Analysis and Graphics. J. Comput. Graph. Stat. 2012, 5, 299–314. [Google Scholar]
- R Foundation for Statistical Computing. R Core Team. R: A Language and Environment for Statistical Computing. Available online: https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf (accessed on 7 September 2022).
- Csárdi, G.; Salmon, M. rhub: Connect to ‘R-hub’. Available online: https://r-hub.github.io/rhub/authors.html (accessed on 7 September 2022).
- Frichot, E.; Francois, O. LEA: An R package for landscape and ecological association studies. Methods Ecol. Evol. 2015, 6, 925–929. [Google Scholar] [CrossRef]
- Guenzi, D.; Fratianni, S.; Boraso, R.; Cremonini, R. CondMerg: An open source implementation in R language of conditional merging for weather radars and rain gauges observations. Earth Sci. Inform. 2017, 10, 127–135. [Google Scholar] [CrossRef]
- Kembel, S.W.; Cowan, P.D.; Helmus, M.R.; Cornwell, W.K.; Morlon, H.; Ackerly, D.D. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010, 26, 1463–1464. [Google Scholar] [CrossRef] [Green Version]
- Stanke, H.; Finley, A.O.; Weed, A.S.; Walters, B.F.; Domke, G.M. rFIA: An R package for estimation of forest attributes with the US Forest Inventory and Analysis database. Environ. Modell. Softw. 2020, 127, 104664. [Google Scholar] [CrossRef]
- Lemenkova, P.; Debeir, O. R Libraries for Remote Sensing Data Classification by K-Means Clustering and NDVI Computation in Congo River Basin, DRC. Appl. Sci. 2022, 12, 12554. [Google Scholar] [CrossRef]
- Seo, J.Y.; Lee, H.M. A study on statistical map of air pollution in Korea using R. In Proceedings of the 4th International Conference on Computer Applications and Information Processing Technology CAIPT2017, Kuta Bali, Indonesia, 8–10 August 2017. [Google Scholar]
- Setiawan, I. Time series air quality forecasting with R Language and R Studio. J. Phys. Conf. Ser. 2020, 1450, 012064. [Google Scholar] [CrossRef]
- Carslaw, D.C.; Ropkins, K. openair—An R package for air quality data analysis. Environ. Modell. Softw. 2012, 27–28, 52–61. [Google Scholar] [CrossRef]
- Syafei, A.D.; Fujiwara, A.; Zhang, J. Prediction model of Air Pollutant Levels Using Linear Model with Component Analysis. Int. J. Environ. Sci. Dev. 2015, 6, 519–525. [Google Scholar] [CrossRef] [Green Version]
- Sethi, J.K.; Mittal, M. An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci. Inform. 2021, 14, 1777–1786. [Google Scholar] [CrossRef]
- Zheng, Y.; Xiuwen, Y.; Ming, L.; Ruiyan, L.; Zhangping, S.; Eric, C.; Tiannui, L. Forecasting Fine-Grained Air Quality Based on Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 2267–2276. [Google Scholar]
- Siwek, K.; Osowski, S. Data Mining Methods for Prediction of Air Pollution. Int. J. Appl. Math. Comput. Sci. 2016, 26, 467–478. [Google Scholar] [CrossRef] [Green Version]
- Zhang, J.; Ding, W. Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong. Int. J. Environ. Res. Pub. He. 2017, 14, 114. [Google Scholar] [CrossRef] [PubMed]
- Ibarra-Berastegi, G.; Elias, A.; Barona, A.; Saenz, J.; Ezcurra, A.; Diaz de Argandona, J. From diagnosis to prognosis for forecasting air pollution using neural networks: Air pollution monitoring in Bilbao. Environ. Modell. Softw. 2008, 23, 622–637. [Google Scholar] [CrossRef]
- Zhao, R.; Gu, X.; Xne, B.; Zhang, J.; Ren, W. Short period PM2.5 prediction based on multivariate linear regression model. PLoS ONE 2018, 13, e0201011. [Google Scholar] [CrossRef] [PubMed]
- Choi, S.-M.; Choi, H. Statistical Modeling for PM10, PM2.5 and PM1 at Gangneung Affected by Local Meteorological Variables and PM10 and PM2.5 at Beijing for Non- and Dust Periods. Appl. Sci. 2021, 11, 11958. [Google Scholar] [CrossRef]
- Young, M.T.; Bechle, M.J.; Sampson, P.D.; Szpiro, A.A.; Marshall, J.D.; Sheppard, L.; Kaufman, J.D. Satellite-Based NO2 and Model Validation in a National Prediction Model Based on Universal Kriging and Land-Use Regression. Environ. Sci. Technol. 2016, 50, 3686–3694. [Google Scholar] [CrossRef] [Green Version]
- Mani, G.; Viswanadhapalli, J.K.; Stonier, A.A. Prediction and forecasting of air quality index in Chennai using regression and ARIMA time series models. J. Eng. Res. 2022, 10, 179–194. [Google Scholar] [CrossRef]
- Alsoltany, S.N.; Alnaqash, I.A. Estimating Fuzzy Linear Regression Model for Air Pollution Predictions in Baghdad City. J. Al-Nahrain Univ. 2015, 18, 157–166. [Google Scholar] [CrossRef]
- Roy, S.S.; Paraschiv, N.; Popa, M.; Lile, R.; Naktode, I. Prediction of air-pollutant concentrations using hybrid model of regression and genetic algorithm. J. Intell. Fuzzy Syst. 2020, 38, 5909–5919. [Google Scholar] [CrossRef]
- Sousa, S.I.V.; Martins, F.G.; Alvim-Ferraz, M.C.M.; Pereira, M.C. Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. Environ. Modell. Softw. 2007, 22, 97–103. [Google Scholar] [CrossRef]
- Basagaña, X.; Aguilera, I.; Rivera, M.; Agis, D.; Foraster, M.; Marrugat, J.; Elosua, R.; Künzli, N. Measurement Error in Epidemiologic Studies of Air Pollution Based on Land-Use Regression Models. Am. J. Epidemiol. 2013, 178, 1342–1346. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Selvi, S.; Chandrasekaran, M. Performance evaluation of mathematical predictive modeling for air quality forecasting. Cluster. Comput. 2019, 22, 12481–12493. [Google Scholar] [CrossRef]
- Iskandaryan, D.; Ramos, F.; Trilles, S. Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review. Appl. Sci. 2020, 10, 2401. [Google Scholar] [CrossRef] [Green Version]
- Briggs, D.J.; Collins, S.; Elliot, P.; Fischer, P.; Kingham, S.; Lebret, E.; Pryl, K.; Van Reeuwijk, H.; Smallbone, K.; Van der Veen, A. Mapping urban air pollution using GIS: A regression-based approach. Int. J. Geogr. Inf. Sci. 1997, 11, 699–718. [Google Scholar] [CrossRef] [Green Version]
- Hochadel, M.; Heinrich, J.; Gehring, U.; Morgenstern, V.; Wichmann, H.E.; Kuhlbusch, T.; Link, E.; Kramer, U. Predicting long-term average concentrations of traffic-related air pollutants using GIS-based information. Atmos. Environ. 2006, 40, 542–553. [Google Scholar] [CrossRef]
- Zhou, X.; Tong, W.; Li, L. Deep learning spatiotemporal air pollution data in China using data fusion. Earth Sci. Inform. 2020, 13, 859–868. [Google Scholar] [CrossRef]
- Morandat, F.; Hill, B.; Osvald, L.; Vitek, J. Evaluating the Design of the R language. In ECOOP 2012—Object-Oriented Programming; Lecture Notes in Computer Science; Noble, J., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7313, pp. 104–131. [Google Scholar]
- Environmental Protection Agency, Ministry of Environmental Protection, Republic of Serbia. National Network of Automatic Stations for Air Quality Monitoring—Raw Data Obtained from Measuring Stations. Available online: http://www.amskv.sepa.gov.rs/stanicepodaci.php (accessed on 1 January 2021).
- Environmental Protection Agency, Ministry of Environmental Protection, Republic of Serbia. National Network of Automatic Stations for Air Quality Monitoring—Data View. Available online: http://www.amskv.sepa.gov.rs/pregledpodatakazbirni.php?lng=en (accessed on 1 January 2021).
- Environmental Protection Agency, Ministry of Environmental Protection, Republic of Serbia. National Network of Automatic Stations for Air Quality Monitoring—Criteria for Pollution Classification. Available online: http://www.amskv.sepa.gov.rs/kriterijumi.php?lng=en (accessed on 31 August 2022).
- Jacob-Lopes, E.; Queiroz Zepka, L.; Costa Deprá, M. Methods of evaluation of the environmental impact on the life cycle. In Sustainability Metrics and Indicators of Environmental Impact, Industrial and Agricultural Life Cycle Assessment; Elsevier: Amsterdam, The Netherlands, 2021; pp. 29–70. [Google Scholar]
Y = 3.413 + 0.698x | r = 0.7895 | ||||
---|---|---|---|---|---|
PM2.5-PM10 (x-Y) | Std. error | t-value | p-value | Residual Std. error | F-statistic |
0.0405 | 17.214 | <2 × 10−16 *** | 7.945 | 296.3 |
Y = 143.66x − 94.94 | r = 0.8515 | ||||
---|---|---|---|---|---|
CO-NO (x-Y) | Std. error | t-value | p-value | Residual Std. error | F-statistic |
6.751 | 21.28 | <2 × 10−16 *** | 19.3 | 452.9 |
Y = 0.229x − 0.328 | r = 0.4468 | ||||
---|---|---|---|---|---|
SO2-NO2 (x-Y) | Std. error | t-value | p-value | Residual Std. error | F-statistic |
0.02873 | 7.988 | 9.18 × 10−12 *** | 4.084 | 63.82 |
Y = 8.428 + x39.489 | r = 0.6305 | ||||
---|---|---|---|---|---|
CO-NO2 (x-Y) | Std. error | t-value | p-value | Residual Std. error | F-statistic |
3.401 | 11.609 | <2 × 10−16 *** | 9.722 | 134.8 |
Y = 36.108 + 1.755x | r = 0.9809 | ||||
---|---|---|---|---|---|
NO-NOX (x-Y) | Std. error | t-value | p-value | Residual Std. error | F-statistic |
0.02757 | 63.64 | <2 × 10−16 *** | 12.27 | 4050 |
Y = 4.822x − 104.375 | r = 0.7558 | ||||
---|---|---|---|---|---|
NOX-NO2 (x-Y) | Std. error | t-value | p-value | Residual Std. error | F-statistic |
0.3084 | 15.637 | <2 × 10−16 *** | 43.83 | 244.5 |
Air Quality Index | Excellent | Good | Acceptable | Polluted | Very Polluted |
---|---|---|---|---|---|
PM2.5 concentration intervals number of predicted values (relation with PM10) (µg/m3) | 0–15 | 15.01–30 | 30.01–55 | 55.01–110 | >110 |
3 | 36 | 33 | 9 | 0 | |
NO2 concentration intervals number of predicted values (relation with SO2) (µg/m3) | 0–50 | 50.01–100 | 100.01–150 | 150.01–400 | >400 |
46 | 35 | 0 | 0 | 0 | |
CO concentration intervals number of predicted values (relation with NO) (mg/m3) | 0–5 | 5.01–10 | 10.01–25 | 25.01–5 | >50 |
81 | 0 | 0 | 0 | 0 | |
SO2 concentration intervals number of predicted values (relation with NO2) (µg/m3) | 0–50 | 50.01–90 | 90.01–180 | 350.01–500 | >500 |
81 | 0 | 0 | 0 | 0 | |
PM10 concentration intervals Number of predicted values (relation with PM2.5) (µg/m3) | 0–25 | 25.01–50 | 50.01–90 | 90.01–180 | >180 |
16 | 37 | 27 | 1 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kazi, Z.; Filip, S.; Kazi, L. Predicting PM2.5, PM10, SO2, NO2, NO and CO Air Pollutant Values with Linear Regression in R Language. Appl. Sci. 2023, 13, 3617. https://doi.org/10.3390/app13063617
Kazi Z, Filip S, Kazi L. Predicting PM2.5, PM10, SO2, NO2, NO and CO Air Pollutant Values with Linear Regression in R Language. Applied Sciences. 2023; 13(6):3617. https://doi.org/10.3390/app13063617
Chicago/Turabian StyleKazi, Zoltan, Snezana Filip, and Ljubica Kazi. 2023. "Predicting PM2.5, PM10, SO2, NO2, NO and CO Air Pollutant Values with Linear Regression in R Language" Applied Sciences 13, no. 6: 3617. https://doi.org/10.3390/app13063617
APA StyleKazi, Z., Filip, S., & Kazi, L. (2023). Predicting PM2.5, PM10, SO2, NO2, NO and CO Air Pollutant Values with Linear Regression in R Language. Applied Sciences, 13(6), 3617. https://doi.org/10.3390/app13063617