Identifying and Classifying Urban Data Sources for Machine Learning-Based Sustainable Urban Planning and Decision Support Systems Development
<p>Flowchart (or workflow) of the urban data collection process.</p> "> Figure 2
<p>Urban data features and corresponding structures.</p> "> Figure 3
<p>Number of documents published by (<b>a</b>) year, (<b>b</b>) source journal.</p> "> Figure 4
<p>(<b>a</b>) Top 10 affiliated institutions and (<b>b</b>) interconnections between the most frequent institutions’ countries.</p> "> Figure 5
<p>Relations between the most frequent countries.</p> "> Figure 6
<p>Distribution of urban data sources.</p> "> Figure 7
<p>Learning type depending on the data sources involved.</p> "> Figure 8
<p>Learning problems used depending on the data sources involved.</p> "> Figure 9
<p>Use of ML depending on the data sources involved.</p> "> Figure 10
<p>ML methods used depending on the data sources involved.</p> "> Figure 11
<p>DL methods used depending on the data sources involved.</p> "> Figure 12
<p>Urban-planning issues by urban data source.</p> ">
Abstract
:1. Introduction
2. Urban Data
2.1. Data Structures
- Linear or non-linear indicates whether the data items are organized chronologically, as in a table, or non-graphically, as in a graph. The data can also be periodic or seasonal.
- Homogeneous or heterogeneous, indicating whether all data elements in a specific repository are of the same type for homogeneous data or of different types for heterogeneous, respectively. Heterogeneous data can also come from multiple sources and be aggregated or merged to better target a given indicator.
- Static or dynamic, describing how data structures are compiled. Static structures have fixed sizes, structures and memory locations at compile time. In a dynamic data structure, the size, structures, and memory locations can shrink or grow depending on the use of the data structure.
2.2. Spatialized Data Structuring and the Emergence of GIS
3. Research Method
4. Analysis of Results
4.1. Bibliometric Analysis Summary
4.2. Sources of Urban Data Analysis
4.3. Data Sources: Remote Sensing and Surveys
4.3.1. Remote Sensing
4.3.2. Survey/Statistical Urban Data
4.4. ML Methods Depending on the Urban Data Source
4.4.1. Data Sources and Methods Used: The Case of ML
4.4.2. Data Sources and Methods Used: The Case of DL
4.5. Urban Planning Issues According to the Urban Data Source
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gómez, J.A.; Patiño, J.E.; Duque, J.C.; Passos, S. Spatiotemporal Modeling of Urban Growth Using Machine Learning. Remote Sens. 2020, 12, 109. [Google Scholar] [CrossRef] [Green Version]
- Kafy, A.A.; Naim, M.N.H.; Subramanyam, G.; Faisal, A.A.; Ahmed, N.U.; Al Rakib, A.; Kona, M.A.; Sattar, G.S. Cellular Automata approach in dynamic modeling of land cover changes using RapidEye images in Dhaka, Bangladesh. Environ. Chall. 2021, 4, 100084. [Google Scholar] [CrossRef]
- Okwuashi, O.; Ndehedehe, C.E. Integrating machine learning with Markov chain and cellular automata models for modelling urban land use change. Remote Sens. Appl. Soc. Environ. 2020, 21, 100461. [Google Scholar] [CrossRef]
- Ibrahim, M.R.; Titheridge, H.; Cheng, T.; Haworth, J. predictSLUMS: A new model for identifying and predicting informal settlements and slums in cities from street intersections using machine learning. Comput. Environ. Urban Syst. 2019, 76, 31–56. [Google Scholar] [CrossRef] [Green Version]
- Lu, S.; Zhang, Q.; Chen, G.; Seng, D. A combined method for short-term traffic flow prediction based on recurrent neural network. Alex. Eng. J. 2021, 60, 87–94. [Google Scholar] [CrossRef]
- Liu, Z.; Liu, Y.; Meng, Q.; Cheng, Q. A tailored machine learning approach for urban transport network flow estimation. Transp. Res. Part C Emerg. Technol. 2019, 108, 130–150. [Google Scholar] [CrossRef]
- Kabano, P.; Lindley, S.; Harris, A. Evidence of urban heat island impacts on the vegetation growing season length in a tropical city. Landsc. Urban Plan. 2021, 206, 103989. [Google Scholar] [CrossRef]
- Rida, A.Z.M.I.; Koumetio, C.S.T.; Diop, E.B.; Chenal, J. Exploring the relationship between urban form and land surface temperature (LST) in a semi-arid region case study of Ben Guerir city-Morocco. Environ. Chall. 2021, 5, 100229. [Google Scholar] [CrossRef]
- Geiß, C.; Schrade, H.; Pelizari, P.A.; Taubenböck, H. Multistrategy ensemble regression for mapping of built-up density and height with Sentinel-2 data. ISPRS J. Photogramm. Remote Sens. 2020, 170, 57–71. [Google Scholar] [CrossRef]
- Choung, Y.J.; Kim, J.M. Study of the Relationship between Urban Expansion and PM10 Concentration Using Multi-Temporal Spatial Datasets and the Machine Learning Technique: Case Study for Daegu, South Korea. Appl. Sci. 2019, 9, 1098. [Google Scholar] [CrossRef]
- Orlowski, C.; Sarzyński, A.; Karatzas, K.; Katsifarakis, N. Decision processes based on IoT data for sustainable smart cities. In Transactions on Computational Collective Intelligence XXXI; Springer: Berlin/Heidelberg, Germany, 2018; pp. 136–146. [Google Scholar]
- Chang, S.; Saha, N.; Castro-Lacouture, D.; Yang, P.P.J. Generative design and performance modeling for relationships between urban built forms, sky opening, solar radiation and energy. Energy Procedia 2019, 158, 3994–4002. [Google Scholar] [CrossRef]
- Long, Y.; Mao, Q.z.; Shen, Z.j. Urban form, transportation energy consumption, and environment impact integrated simulation: A multi-agent model. In Spatial Planning and Sustainable Development; Springer: Dordrecht, The Netherlands, 2013; pp. 227–247. [Google Scholar] [CrossRef]
- Liu, L.; Silva, E.A.; Wu, C.; Wang, H. A machine learning-based method for the large-scale evaluation of the qualities of the urban environment. Comput. Environ. Urban Syst. 2017, 65, 113–125. [Google Scholar] [CrossRef]
- Kontokosta, C.E.; Hong, B.; Johnson, N.E.; Starobin, D. Using machine learning and small area estimation to predict building-level municipal solid waste generation in cities. Comput. Environ. Urban Syst. 2018, 70, 151–162. [Google Scholar] [CrossRef]
- Koumetio Tekouabou, S.C.; Diop, E.B.; Azmi, R.; Jaligot, R.; Chenal, J. Reviewing the application of machine learning methods to model urban form indicators in planning decision support systems: Potential, issues and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 5943–5967. [Google Scholar] [CrossRef]
- Culwick, C.; Washbourne, C.L.; Anderson, P.M.; Cartwright, A.; Patel, Z.; Smit, W. CityLab reflections and evolutions: Nurturing knowledge and learning for urban sustainability through co-production experimentation. Curr. Opin. Environ. Sustain. 2019, 36, 9–16. [Google Scholar] [CrossRef]
- Madamori, O.; Max-Onakpoya, E.; Erhardt, G.D.; Baker, C.E. Enabling Opportunistic Low-cost Smart Cities By Using Tactical Edge Node Placement. In Proceedings of the 2021 16th Annual Conference on Wireless On-demand Network Systems and Services Conference (WONS), Klosters, Switzerland, 9–11 March 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Niu, H.; Silva, E.A. Crowdsourced data mining for urban activity: Review of data sources, applications, and methods. J. Urban Plan. Dev. 2020, 146, 04020007. [Google Scholar] [CrossRef] [Green Version]
- Leguay, J.; Lindgren, A.; Scott, J.; Friedman, T.; Crowcroft, J. Opportunistic content distribution in an urban setting. In Proceedings of the 2006 SIGCOMM Workshop on Challenged Networks, Pisa, Italy, 11–15 September 2006; pp. 205–212. [Google Scholar] [CrossRef] [Green Version]
- Lane, N.D.; Eisenman, S.B.; Musolesi, M.; Miluzzo, E.; Campbell, A.T. Urban sensing systems: Opportunistic or participatory? In Proceedings of the 9th Workshop on Mobile Computing Systems and Applications, Napa Valley, CA, USA, 25–16 February 2008; pp. 11–16. [Google Scholar] [CrossRef]
- Llaguno, M. Opportunistic Mobile Urban Sensing Technologies. In Proceedings of the American Meteorological Society, Boston, MA, USA, 13 January 2020; Available online: http://hdl.handle.net/2078.1/243054 (accessed on 20 January 2022).
- Xu, B.; Chen, J.; Yu, P. Vectorization of classified remote sensing raster data to establish topological relations among polygons. Earth Sci. Inform. 2017, 10, 99–113. [Google Scholar] [CrossRef]
- Sagl, G.; Blaschke, T. 14 Integrated Urban Sensing in the Twenty-First Century. Global Urban Monitoring and Assessment through Earth Observation; Taylor & Francis: Abingdon, UK, 2014; p. 269. [Google Scholar]
- Mainka, A.; Hartmann, S.; Meschede, C.; Stock, W.G. Mobile application services based upon open urban government data. In iConference 2015 Proceedings; iSchools: Grandville, MI, USA, 2015; Available online: http://hdl.handle.net/2142/73635 (accessed on 20 January 2022).
- Ozguven, E.E.; Horner, M.W.; Kocatepe, A.; Marcelin, J.M.; Abdelrazig, Y.; Sando, T.; Moses, R. Metadata-based needs assessment for emergency transportation operations with a focus on an aging population: A case study in Florida. Transp. Rev. 2016, 36, 383–412. [Google Scholar] [CrossRef]
- Jetzek, T.; Avital, M.; Bjørn-Andersen, N. Generating Value from Open Government Data. In Proceedings of the ICIS 2013, Milano, Italy, 15–18 December 2013; Available online: http://aisel.aisnet.org/cgi/viewcontent.cgi?article=1181&context=icis2013 (accessed on 20 January 2022).
- Nikiforova, A. Smarter Open Government Data for Society 5.0: Are your open data smart enough? Sensors 2021, 21, 5204. [Google Scholar] [CrossRef]
- Krasikov, P.; Eurich, M.; Legner, C. Unleashing the Potential of External Data: A DSR-based Approach to Data Sourcing. In Proceedings of the ECIS 2022 Research Papers—AISEL 2022; Timi Valla, Romania, 18–24 June 2022, Available online: https://aisel.aisnet.org/ecis2022_rp/64 (accessed on 24 January 2022).
- Liggett, R.; Friedman, S.; Jepson, W. Interactive Design/Decision Making in a Virtual Urban World: Visual Simulation and GIS. 1995. Available online: https://proceedings.esri.com/library/userconf/proc95/to350/p308.html (accessed on 27 January 2022).
- Porat, I.; Shach-Pinsly, D. Building morphometric analysis as a tool for urban renewal: Identifying post-Second World War mass public housing development potential. Environ. Plan. B Urban Anal. City Sci. 2021, 48, 248–264. [Google Scholar] [CrossRef]
- Wurm, M.; Droin, A.; Stark, T.; Geiß, C.; Sulzer, W.; Taubenböck, H. Deep learning-based generation of building stock data from remote sensing for urban heat demand modeling. ISPRS Int. J. Geo-Inf. 2021, 10, 23. [Google Scholar] [CrossRef]
- Demšar, J.; Curk, T.; Erjavec, A.; Gorup, Č.; Hočevar, T.; Milutinovič, M.; Možina, M.; Polajnar, M.; Toplak, M.; Starič, A.; et al. Orange: Data mining toolbox in Python. J. Mach. Learn. Res. 2013, 14, 2349–2353. Available online: http://jmlr.org/papers/v14/demsar13a.html (accessed on 30 January 2022).
- Schneider, A. Monitoring land cover change in urban and peri-urban areas using dense time stacks of Landsat satellite data and a data mining approach. Remote Sens. Environ. 2012, 124, 689–704. [Google Scholar] [CrossRef]
- Huang, B.; Zhao, B.; Song, Y. Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery. Remote Sens. Environ. 2018, 214, 73–86. [Google Scholar] [CrossRef]
- Vakalopoulou, M.; Karantzalos, K.; Komodakis, N.; Paragios, N. Building detection in very high resolution multispectral data with deep learning features. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 1873–1876. [Google Scholar] [CrossRef] [Green Version]
- Robinson, C.; Dilkina, B.; Hubbs, J.; Zhang, W.; Guhathakurta, S.; Brown, M.A.; Pendyala, R.M. Machine learning approaches for estimating commercial building energy consumption. Appl. Energy 2017, 208, 889–904. [Google Scholar] [CrossRef]
- Zhang, F.; Zhou, B.; Liu, L.; Liu, Y.; Fung, H.H.; Lin, H.; Ratti, C. Measuring human perceptions of a large-scale urban region using machine learning. Landsc. Urban Plan. 2018, 180, 148–160. [Google Scholar] [CrossRef]
- Hagenauer, J.; Helbich, M. Mining urban land-use patterns from volunteered geographic information by means of genetic algorithms and artificial neural networks. Int. J. Geogr. Inf. Sci. 2012, 26, 963–982. [Google Scholar] [CrossRef]
- Noulas, A.; Mascolo, C.; Frias-Martinez, E. Exploiting foursquare and cellular data to infer user activity in urban environments. In Proceedings of the 2013 IEEE 14th International Conference on Mobile Data Management, Milan, Italy, 3–6 June 2013; Volume 1, pp. 167–176. [Google Scholar] [CrossRef] [Green Version]
- Persello, C.; Stein, A. Deep fully convolutional networks for the detection of informal settlements in VHR images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2325–2329. [Google Scholar] [CrossRef]
- Zhan, X.; Zheng, Y.; Yi, X.; Ukkusuri, S.V. Citywide traffic volume estimation using trajectory data. IEEE Trans. Knowl. Data Eng. 2016, 29, 272–285. [Google Scholar] [CrossRef]
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
- Caminha, P.H.C.; Costa, L.H.M.K.; de Souza Couto, R. A Bus-based Opportunistic Sensing Network. In Proceedings of the Anais Estendidos do XXXIX Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos—SBC, Online, 16–20 August 2021; pp. 57–64. [Google Scholar] [CrossRef]
- Kamel Boulos, M.N.; Resch, B.; Crowley, D.N.; Breslin, J.G.; Sohn, G.; Burtner, R.; Pike, W.A.; Jezierski, E.; Chuang, K.Y.S. Crowdsourcing, citizen sensing and sensor web technologies for public and environmental health surveillance and crisis management: Trends, OGC standards and application examples. Int. J. Health Geogr. 2011, 10, 1–29. [Google Scholar] [CrossRef] [Green Version]
- Lu, Y. Using Google Street View to investigate the association between street greenery and physical activity. Landsc. Urban Plan. 2019, 191, 103435. [Google Scholar] [CrossRef]
- Ma, J.; Cheng, J.C.; Jiang, F.; Chen, W.; Zhang, J. Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques. Land Use Policy 2020, 94, 104537. [Google Scholar] [CrossRef]
Data Struct | Questions | Description | Advantages | Disadvantages |
---|---|---|---|---|
Vector data | WHEN & WHERE | Made up of a grid of pixels. Instead, vector graphics are comprised of vertices and paths. The three basic symbol types for vector data are points, lines, and polygons (areas) | +Compact data structure +Efficient for encoding topology +True representation of shape | -Complex structure -Overlay operations difficult -Might imply a false sense of accuracy |
Raster data | WHEN and WHERE | The simplest form consists of a matrix of cells (or pixels) organized in rows and columns (grid) in which each cell contains a value representing information | +Suitable for complex analysis +Efficient for overlays +Common for imagery where matrices are easy to analyze | -Large datasets which requires a lot of resources for processing and storage -Topology hard to represent -Maps are less “realistic” due to spatial resolution -Difficult to adequately represent linear features depending on the cell resolution. |
Attribute data | WHEN and WHERE and WHAT | Alphanumeric variables describing a given urban entity that may not have a spatial component (longitude and latitude). Technically, they are considered non-spatial tables that can be browsed and modified using the attribute table view in urban data analysis tools such as QGIS. | +Simple structure +Suitable for Simple analysis +Efficient for overlays +Easy to analyze linear features +Require low resources and computing expertise +Low data preprocessing | -Inefficient for complex analysis -Subject to appraiser interpretation -Hard to represent the topology |
N° | Ref | Year | Data Source | Source | Cites | ACPY | Publisher |
---|---|---|---|---|---|---|---|
1 | [34] | 2012 | Sensing (satellite) | Remote Sensing of Environment | 288 | 32 | Elsevier |
2 | [35] | 2018 | Sensing (satellite) | Remote Sensing of Environment | 202 | 67.33 | Elsevier |
3 | [36] | 2015 | Sensing (satellite) | International Geoscience and Remote Sensing Symposium (IGARSS) | 199 | 33.170 | IEEE |
4 | [37] | 2017 | Survey | Applied Energy | 155 | 38.75 | Elsevier |
5 | [38] | 2018 | Hybrid | Landscape and Urban Planning | 103 | 34.33 | Elsevier |
6 | [39] | 2012 | Sensing (OpenStreetMap) | International Journal of Geographical Information Science | 92 | 10.22 | Tay & Fr |
7 | [40] | 2013 | Survey (Telecom & Geotagué des phones) | Proceedings—IEEE International Conference on Mobile Data Management | 91 | 11.36 | IEEE |
8 | [14] | 2017 | Sensing (Baidu Map) | Computers, Environment and Urban Systems | 88 | 22 | Elsevier |
9 | [41] | 2017 | Sensing (satellite) | IEEE Geoscience and Remote Sensing Letters | 78 | 19.5 | IEEE |
10 | [42] | 2017 | Sensing (GPS trajectory dataset) | IEEE Transactions on Knowledge and Data Engineering | 71 | 17.75 | IEEE |
DS | Device Type | Description | Advantages & Disadvantages | Data Structure | |||||
---|---|---|---|---|---|---|---|---|---|
- | Complexity | Spatial coverage | Temporal infos | Open Accessibility | Cost | Vectors | Attribute | ||
Sensors | Satellite/radars | Data generated by remote sensing technologies using sensors carried by satellites | Low | Global | YES | Yes/No | Low | ✓ | |
Drône/plane | Data from sensors built into air vehicles (planes, drones, helicopters, etc.) | Low | partial | YES | No | High | ✓ | ✓ | |
Bikes/Cars/ Motos | Data from sensors on or integrated into ground transporters (bicycles, motorbikes, buses, taxis, trains, etc.) | Low | Partial | YES | Yes | Low | ✓ | ||
Ubiquitous mobile devices | Data from sensors embedded in any mobile device or any other connected device of daily use (phones, watches, smart homes/infrastructures, etc) | Low | Partial | YES | Yes | Low | ✓ | ||
Fixed devices | Data from sensors either embedded in a dedicated fixed device or in any other device opportunistically (e.g., camera, street lights, …) | High | Partial | YES | No | High | ✓ | ✓ | |
Survey and institutional Statistics | Social Networks/media | Emerging data from the social networks (including web surveys) such as Facebook, Twitter, etc. | High | Global | YES | Yes | Low | ✓ | |
Crowd-sourcing | Data from a large group of people in a study area, who submit (voluntarily) their data via the internet, social media, or smartphone applications | High | Partial | No | No | Low | ✓ | ✓ | |
Interviews | Data from interviews on urban issues that can be conducted offline or online via social media | High | Partial | NO | No | High | ✓ | ||
Institutional statistics | Data from governmental and non-governmental institutions’ statistics | Low | Partial | NO | No | High | ✓ | ✓ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tékouabou, S.C.K.; Chenal, J.; Azmi, R.; Toulni, H.; Diop, E.B.; Nikiforova, A. Identifying and Classifying Urban Data Sources for Machine Learning-Based Sustainable Urban Planning and Decision Support Systems Development. Data 2022, 7, 170. https://doi.org/10.3390/data7120170
Tékouabou SCK, Chenal J, Azmi R, Toulni H, Diop EB, Nikiforova A. Identifying and Classifying Urban Data Sources for Machine Learning-Based Sustainable Urban Planning and Decision Support Systems Development. Data. 2022; 7(12):170. https://doi.org/10.3390/data7120170
Chicago/Turabian StyleTékouabou, Stéphane C. K., Jérôme Chenal, Rida Azmi, Hamza Toulni, El Bachir Diop, and Anastasija Nikiforova. 2022. "Identifying and Classifying Urban Data Sources for Machine Learning-Based Sustainable Urban Planning and Decision Support Systems Development" Data 7, no. 12: 170. https://doi.org/10.3390/data7120170
APA StyleTékouabou, S. C. K., Chenal, J., Azmi, R., Toulni, H., Diop, E. B., & Nikiforova, A. (2022). Identifying and Classifying Urban Data Sources for Machine Learning-Based Sustainable Urban Planning and Decision Support Systems Development. Data, 7(12), 170. https://doi.org/10.3390/data7120170