Knowledge extraction from crowdsourced data for the enrichment of road networks

Gregor Jossé ORCID: orcid.org/0000-0002-4044-0268¹,
Klaus Arthur Schmid¹,
Andreas Züfle²,
Georgios Skoumas³,
Matthias Schubert¹,
Matthias Renz²,
Dieter Pfoser² &
…
Mario A. Nascimento⁴

776 Accesses
9 Citations
Explore all metrics

Abstract

In current navigation systems quantitative metrics such as distance, time and energy are used to determine optimal paths. Yet, a “best path”, as judged by users, might take qualitative features into account, for instance the scenery or the touristic attractiveness of a path. Machines are unable to quantify such “soft” properties. Crowdsourced data provides us with a means to record user choices and opinions. In this work, we survey heterogeneous sources of spatial, spatio-temporal and textual crowdsourced data as a proxy for qualitative information of users in movement. We (i) explore the process of extracting qualitative information from uncertain crowdsourced data sets employing different techniques, (ii) investigate the enrichment of road networks with the extracted information by adjusting its properties and by building a meta-network, and (iii) show how to use the enriched networks for routing purposes. An extensive experimental evaluation of our proposed methods on real-world data sets shows that qualitative properties as captured by crowdsourced data can indeed be used to improve the quality of routing suggestions while not sacrificing their quantitative aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Knowledge-Enriched Route Computation

Landmark-Based Route Recommendation with Crowd Intelligence

Article Open access 18 July 2016

On Reliability of Routes Computed Based on Crowdsourced Points of Interest

Notes

References

Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69:211–221
Article Google Scholar
Skoumas G, Schmid KA, Jossé G, Züfle A, Nascimento MA, Renz M, Pfoser D (2014) Towards knowledge-enriched path computation. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, Dallas/Fort Worth, TX, USA, November 4-7, 2014, pp 485–488
Skoumas G, Schmid KA, Jossé G, Schubert M, Nascimento MA, Züfle A, Renz M, Pfoser D (2015) Knowledge-enriched route computation. In: Advances in spatial and temporal databases - 14th international symposium, SSTD 2015, Hong Kong, China, August 26-28, 2015. Proceedings, pp 157–176
Jossé G, Franzke M, Skoumas G, Züfle A, Nascimento MA, Renz M (2015) A framework for computation of popular paths from crowdsourced data. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, pp 1428–1431
Jossé G, Schmid KA, Züfle A, Skoumas G, Schubert M, Pfoser D (2015) Tourismo: A user-preference tourist trip search engine. In: Advances in spatial and temporal databases - 14th international symposium, SSTD 2015, Hong Kong, China, August 26-28, 2015. Proceedings, pp 514–519
Lv M, Chen L, Chen G (2012) Discovering personally semantic places from gps trajectories. In: ACM CIKM12, pp 1552–1556
Yan Z, Chakraborty D, Parent C, Spaccapietra S, Aberer K (2011) Semitri: A framework for semantic annotation of heterogeneous trajectories. In: Proceedings of the 14th international conference on extending database technology, pp 259–270
Palma AT, Bogorny V, Kuijpers B, Alvares LO (2008) A clustering-based approach for discovering interesting places in trajectories. In: Proceedings of the ACM symposium on applied computing , pp 863–868
Alvares LO, Bogorny V, Kuijpers B, de Macedo JAF, Moelans B, Vaisman A (2007) A model for enriching trajectories with semantic geographical information. In: Proceedings of the 15th annual ACM international symposium on advances in geographic information systems. ACM, pp 22:1–22:8
Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, Yan Z (2013) Semantic trajectories modeling and analysis. ACM Comput Surv ’13 45:42:1–42:32
Google Scholar
Spaccapietra S, Parent C (2011) Adding meaning to your steps. In: Proceedings of the 30th international conference on conceptual modeling, pp 13–31
Yan Z, Chakraborty D, Parent C, Spaccapietra S, Aberer K (2013) Semantic trajectories: Mobility data computation and annotation. ACM Trans Intell Syst Technol 4:49:1–49:38
Article Google Scholar
Yan Z, Spremic L, Chakraborty D, Parent C, Spaccapietra S, Aberer K (2010) Automatic construction and multi-level visualization of semantic trajectories. In: Proceedings of the 18th international conference on advances in geographic information systems, pp 524–525
Feldman D, Sugaya A, Sung C, Rus D (2013) idiary: From gps signals to a text-searchable diary. In: Proceedings of the 11th ACM conference on embedded networked sensor systems. ACM , pp 6:1–6:12
Duckham M, Kulik L (2003) Simplest paths: Automated route selection for navigation. In: Proceedings of the international conference on spatial information theory COSIT 2003. Springer, pp 169–185
Duckham M, Winter S, Robinson M (2010) Including landmarks in routing instructions. Journal on Location Based Services 4(4):28–52
Article Google Scholar
Richter KF, Duckham M (2008) Simplest instructions: Finding easy-to-describe routes for navigation. In: Proceedings of 5th international conference geographic information science 2008, pp 274–289
Westphal M, Renz J (2011) Evaluating and minimizing ambiguities in qualitative route instructions. In: Proceedings of the 19th ACM international conference on advances in geographic information systems, pp 171–180
Sacharidis D, Bouros P (2013) Routing directions: Keeping it fast and simple. In: Proceedings of the 21st ACM international conference on advances in geographic information systems, pp 164–173
Quercia D, Schifanella R, Aiello LM (2014) The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. CoRR (abs/1407.1031
Sun Y, Fan H, Bakillah M, Zipf A (2015) Road-based travel recommendation using geo-tagged images. Comput Environ Urban Syst 53:110–122
Article Google Scholar
Runge N, Samsonov P, Degraen D, Schöning J (2016) No more autobahn!: Scenic route generation using googles street view. In: Proceedings of the 21st international conference on intelligent user interfaces. ACM, pp 147–151
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li F, Cheng D, Hadjieleftheriou M, Kollios G, Teng SH (2005) On trip planning queries in spatial databases. In: Proceedings of the 9th international conference on advances in spatial and temporal databases. SSTD’05. Springer-Verlag, Berlin, Heidelberg, pp 273–290
Kanza Y, Safra E, Sagiv Y, Doytsher Y (2008) Heuristic algorithms for route-search queries over geographical data. In: ACM SIGSPATIAL GIS, p 11
Chen H, Ku WS, Sun MT, Zimmermann R (2011) The partial sequenced route query with traveling rules in road networks. Geoinformatica 15:541–569
Article Google Scholar
Gavalas KC, Mastakas K, Pantziou G (2014) A survey on algorithmic approaches for solving tourist trip design problems. J Heuristics 20:291–328
Article Google Scholar
Garcia A, Arbelaitz O, Linaza MT, Vansteenwegen P, Souffriau W (2010) Personalized tourist route generation. Springer
Chen C, Zhang D, Guo B, Ma X, Pan G, Wu Z (2015) Tripplanner: personalized trip planning leveraging heterogeneous crowdsourced digital footprints. IEEE Trans Intell Transp Syst 16:1259–1273
Article Google Scholar
Hsieh HP, Li CT, Lin SD (2012) Exploiting large-scale check-in data to recommend time-sensitive routes. In: Proceedings of the ACM SIGKDD international workshop on urban computing. ACM
Brilhante I, Macedo JA, Nardini FM, Perego R, Renso C (2013) Where shall we go today?: Planning touristic tours with tripbuilder. In: Proceedings of the 22nd ACM international conference on information & knowledge management. CIKM ’13. ACM, New york, NY, USA, pp 757–762
Hsieh H, Li C (2014) Mining and planning time-aware routes from check-in data. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, China, November 3-7, 2014, pp 481–490
De Choudhury M, Feldman M, Amer-Yahia S, Golbandi N, Lempel R, Yu C (2010) Automatic construction of travel itineraries using social breadcrumbs. In: Proceedings of the 21st ACM conference on hypertext and hypermedia. HT ’10. ACM, New york, NY, USA, pp 35–44
Vansteenwegen P, Souffriau W, Oudheusden DV (2011) The orienteering problem: a survey. Eur J Oper Res 209:1–10
Article Google Scholar
Souffriau W, Vansteenwegen P, Berghe GV, Oudheusden DV (2011) The planning of cycle trips in the province of east flanders. Omega 39:209–213
Article Google Scholar
Verbeeck C, Vansteenwegen P, Aghezzaf EH (2014) An extension of the arc orienteering problem and its application to cycle trip planning. Transport Res E-Log 68 :64–78
Article Google Scholar
Lu Y, Shahabi C (2015) An arc orienteering algorithm to find the most scenic path on a large-scale road network. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. GIS ’15. ACM, New York, NY, USA, pp 46:1–46:10
Hauff C (2013) A study on the accuracy of flickr’s geotag data. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 1037–1040
Zielstra D, Hochmair HH (2013) Positional accuracy analysis of flickr and panoramio images for selected world regions. J Spat Sci 58:251–273
Article Google Scholar
Zandbergen PA (2008) Positional accuracy of spatial data: Non-normal distributions and a critique of the national standard for spatial data accuracy. Trans GIS 12:103–130
Article Google Scholar
Skoumas G, Pfoser D, Kyrillidis A, Sellis T (2016) Location estimation using crowdsourced spatial relations. ACM Trans Spatial Algorithms Syst 2 :5:1–5:23
Article Google Scholar
Loper E, Bird S (2002) Nltk: The natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, pp 63–70
Skoumas G, Pfoser D, Kyrillidis A (2013) On quantifying qualitative geospatial data: A probabilistic approach. In: Proceedings of the second ACM international workshop on crowdsourced and volunteered geographic information, pp 71–78
Schlieder C, Matyas C (2009) Photographing a city: an analysis of place concepts based on spatial choices. Spat Cogn Comput 9:212–228
Google Scholar
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:269–271
Article Google Scholar
Schubert M, Renz M, Kriegel HP (2010) Route skyline queries: a multi-preference path planning approach. In: Proceedings ICDE, pp 261–272
Shekelyan M, Jossé G., Schubert M (2015) Paretoprep: Efficient lower bounds for path skylines and fast path computation SSTD15
Shekelyan M, Jossé G, Schubert M, Kriegel H (2014) Linear path skyline computation in bicriteria networks. In: Database systems for advanced applications - 19th international conference, DASFAA 2014, bali, indonesia, april 21-24 2014. Proceedings, Part I, pp 173–187
Shekelyan M, Jossé G, Schubert M (2015) Linear path skylines in multicriteria networks. In: ICDE15, pp 459–470
Neis P, Zielstra D (2014) Recent developments and future trends in volunteered geographic information research: The case of openstreetmap. Future Internet 6:76–106
Article Google Scholar
Neis P, Zielstra D, Zipf A (2013) Comparison of volunteered geographic information data contributions and community development for selected world regions. Future Internet 5:282–300
Article Google Scholar
Graf F, Kriegel HP, Renz M, Schubert M (2011) Mario: multi-attribute routing in open street map. In: Advances in spatial and temporal databases. vol 6849 of lecture notes in computer science , pp 486–490
Mousselly-Sergieh H, Watzinger D, Huber B, Döller M, Egyed-Zsigmond E, Kosch H (2014) World-wide scale geotagged image dataset for automatic image annotation and reverse geotagging. In: Proceedings of the 5th ACM multimedia systems conference, pp 47–52
Yang D, Zhang D, Chen L, Qu B (2015) Nationtelescope: monitoring and visualizing large-scale collective behavior in lbsns. J Netw Comput Appl 55:170–180
Article Google Scholar
Yang D, Zhang D, Qu B (2015) Participatory cultural mapping based on collective behavior in location based social networks. ACM Transactions on Intelligent Systems and Technology. in press
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-verlag New York, inc., secaucus, NJ, USA
Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39:1–38
Google Scholar

Download references

Author information

Authors and Affiliations

Ludwig-Maximilians-Universität München, 80538, München, Germany
Gregor Jossé, Klaus Arthur Schmid & Matthias Schubert
George Mason University, Fairfax, 22030, USA
Andreas Züfle, Matthias Renz & Dieter Pfoser
National Technical University of Athens, Zografou, 15773, Greece
Georgios Skoumas
University of Alberta, Edmonton, T6G2R3, Canada
Mario A. Nascimento

Authors

Gregor Jossé
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Arthur Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Züfle
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Skoumas
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Schubert
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Renz
View author publications
You can also search for this author in PubMed Google Scholar
Dieter Pfoser
View author publications
You can also search for this author in PubMed Google Scholar
Mario A. Nascimento
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Schubert.

Additional information

Mario A. Nascimento’s work was partially supported by NSERC, Canada.

Appendix: Mining Spatial Closeness from Textual Data

As presented in [3], we rely on a probabilistic model to counter the ambiguity inherent in textual data. This is done in three steps. First, we create feature vectors for each occurrence of a particular spatial closeness relation. Second, employing the feature vectors we train a Gaussian Mixture Model (GMM) for each kind of closeness relation. Third, we use the derived GMMs to infer a closeness score which can be used to enrich the underlying road network.

For each occurrence of a closeness relation (e.g., “next to”) mined from the text (e.g., the triplet (P _i,“next to”,P _j)), a spatial feature vector v _{i
j} = (r, ϕ) is created. v _{i
j} describes the Euclidean distance between P _i and P _j and the orientation as the counterclockwise rotation of the x-axis, centered at P _j, to P _i. Thus, for any triplet extracted from the corpus (sufficiently often) we obtain a two-dimensional feature vector. We group these vectors by the relation they represent, obtaining a set of vectors $V_{REL^{k}}$ for each of the closeness relations R E L ^k. For each relation, we propose to train a probabilistic model. Following the argumentation in [43] where similar relations were examined, we propose employing GMMs, a well-proven and extensively studied method for many supervised and unsupervised learning problems [56].

A GMM is a weighted sum of a fixed number M of Gaussian component densities

$$\mathbb{P}_{\theta}(v) = \sum\limits_{i = 1}^{M} w_{i} g(v;\mu_{i},{\Sigma}_{i}) $$

where v is an l-dimensional vector, w _i are the mixture weights (summing to 1) and

$$g(v;\mu_{i}, {\Sigma}_{i}) = \frac{1}{\left( (2\pi)^{l} \det({\Sigma}_{i})\right)^{1/2}} \exp \left( -\frac{1}{2}(v-\mu_{i})^{T}{\Sigma}_{i}(v-\mu_{i})\right) $$

is an l-variate Gaussian density function with mean vector $\mu _{i} \in \mathbb {R}^{l}$ and covariance matrix ${\Sigma }_{i} \in \mathbb {R}^{l\times l}$. The model is fully characterized by the weights, mean vectors and covariance matrices, collectively represented in 𝜃 = {w _i, μ _i,Σ_i},i = 1,…,M. In our case, l = 2, the dimensionality of the spatial feature vectors v. For the parameter estimation of each Gaussian component, Expectation Maximization (EM) [57] is the state-of-the-art technique. It updates the parameters of the components iteratively w.r.t. a given (feature) vector set until a convergence threshold is reached.

Employing EM, we obtain a set of probabilistic models $\mathbb {P}(\cdot \mid \theta ^{k})$, one for each closeness relation R E L ^k ∈{R E L ¹,…,R E L ^m} in the set of all closeness relations. For a pair of POIs P _i and P _j with spatial feature vector v _{i
j}, $\mathbb {P}_{\theta ^{k}}(v_{ij})$ is the probability that P _i and P _j stand in spatial relation R E L ^k to each other. Based on this information, we now derive a closeness score for pairs of POIs by Bayesian inference.

For two distinct POIs, let R E L _{i
j} denote the set of all closeness relations existing between P _i and P _j. Note that R E L ^k denotes an abstract relation, while R E L _{i
j} denotes the set of occurrences of relations between a pair of POIs. Furthermore, let v _{i
j} denote the spatial feature vector of P _i and P _j. In order to determine a numeric score describing the closeness of the POIs, we estimate the posterior probability of all closeness relations R E L ^k and accumulate them. The posterior probability of relation R E L ^k is given by

$$\mathbb{P}(REL^{k} \mid v_{ij}) = \frac{\mathbb{P}(v_{ij} \mid \theta^{k})\ \mathbb{P}(\theta^{k})}{{\sum}_{l=1}^{m} \mathbb{P}(v_{ij} \mid \theta^{l})\ \mathbb{P}(\theta^{l})} $$

where $\mathbb {P}(\theta ^{k}) = \mathbb {P}(REL^{k})$ denotes the prior probability of relation R E L ^k given by the trained model represented by 𝜃 ^k.

In a traditional classification problem the task would be to choose the closeness relation with the highest posterior and to assign the pair of POIs to this class. We, in contrast, consider each posterior probability $\mathbb {P}(REL^{k} \mid v_{ij})$ as a measure of confidence of the existence of R E L ^k between P _i and P _j. Stressing that all considered relations represent spatial closeness, we combine all posteriors into one measure which we call closeness score c s _{i
j} of the pair of POIs P _i and P _j:

$$cs_{ij} = \frac{1}{m} \sum\limits_{k=1}^{m} \frac{\mathbb{P}(REL^{k} \mid v_{ij})}{\max\{\mathbb{P}(REL^{k} \mid v_{ij}) \mid \forall \ i\neq j\}} $$

This closeness score can be used to enrich the underlying road network.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jossé, G., Schmid, K., Züfle, A. et al. Knowledge extraction from crowdsourced data for the enrichment of road networks. Geoinformatica 21, 763–795 (2017). https://doi.org/10.1007/s10707-017-0306-1

Download citation

Received: 28 April 2016
Revised: 03 July 2017
Accepted: 31 July 2017
Published: 04 September 2017
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10707-017-0306-1

Knowledge extraction from crowdsourced data for the enrichment of road networks

Abstract

Access this article

Subscribe and save

Buy Now