Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

A Data Mining Approach to Assess Privacy Risk in Human Mobility Data

Published: 11 December 2017 Publication History

Abstract

Human mobility data are an important proxy to understand human mobility dynamics, develop analytical services, and design mathematical models for simulation and what-if analysis. Unfortunately mobility data are very sensitive since they may enable the re-identification of individuals in a database. Existing frameworks for privacy risk assessment provide data providers with tools to control and mitigate privacy risks, but they suffer two main shortcomings: (i) they have a high computational complexity; (ii) the privacy risk must be recomputed every time new data records become available and for every selection of individuals, geographic areas, or time windows. In this article, we propose a fast and flexible approach to estimate privacy risk in human mobility data. The idea is to train classifiers to capture the relation between individual mobility patterns and the level of privacy risk of individuals. We show the effectiveness of our approach by an extensive experiment on real-world GPS data in two urban areas and investigate the relations between human mobility patterns and the privacy risk of individuals.

Supplementary Material

a31-pellungrini-supp.pdf (pellungrini.zip)
Supplemental movie, appendix, image and software files for, A Data Mining Approach to Assess Privacy Risk in Human Mobility Data

References

[1]
Osman Abul, Francesco Bonchi, and Mirco Nanni. 2008a. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proceedings of the 24th International Conference on Data Engineering (ICDE’08). 376--385.
[2]
Osman Abul, Francesco Bonchi, and Mirco Nanni. 2008b. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE’08. 376--385.
[3]
Jagdish Prasad Achara, Gergely Ács, and Claude Castelluccia. 2015. On the unicity of smartphone applications. In Proceedings of the 14th ACM Workshop on Privacy in the Electronic Society (WPES 2015), Denver, Colorado, USA, October 12, 2015. 27--36.
[4]
Christopher Alberts, Sandra Behrens, Richard Pethia, and William Wilson. 1999. Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) Framework, Version 1.0. Technical Report CMU/SEI-99-TR-017. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA. http://resources.sei.cmu.edu/library/asset-view.cfm?AssetID=13473.
[5]
Alessandro Armando, Michele Bezzi, Nadia Metoui, and Antonino Sabetta. 2015. Risk-based privacy-aware information disclosure. International Journal of Security Software Engineering 6, 2 (April 2015), 70--89.
[6]
Anirban Basu, Anna Monreale, Juan Camilo Corena, Fosca Giannotti, Dino Pedreschi, Shinsaku Kiyomoto, Yutaka Miyake, Tadashi Yanagihara, and Roberto Trasarti. 2014. A privacy risk model for trajectory data. In Trust Management VIII, Jianying Zhou, Nurit Gal-Oz, Jie Zhang, and Ehud Gudes (Eds.). IFIP Advances in Information and Communication Technology, Vol. 430. Springer Berlin, 125--140.
[7]
Armando Bazzani, Bruno Giorgini, Sandro Rambaldi, Riccardo Gallotti, and Luca Giovannini. 2010. Statistical laws in urban mobility from microscopic GPS data in the area of Florence. Journal of Statistical Mechanics: Theory and Experiment 2010, 5 (2010), P05001. http://stacks.iop.org/1742-5468/2010/i=05/a=P05001
[8]
Alket Cecaj, Marco Mamei, and Franco Zambonelli. 2016. Re-identification and information fusion between anonymized CDR and social network data. Journal of Ambient Intelligence and Humanized Computing 7, 1 (2016), 83--96.
[9]
Vittoria Colizza, Alain Barrat, Marc Barthelemy, Alain-Jacques Valleron, and Alessandro Vespignani. 2007. Modeling the worldwide spread of Pandemic influenza: Baseline case and containment interventions. PLOS Medicine 4, 1 (Jan. 2007), 1--16.
[10]
Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Thanh T. L. Tran. 2012. Differentially private summaries for sparse data. In ICDT’12. 299--311.
[11]
Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. 2013. Unique in the crowd: The privacy bounds of human mobility. Scientific Reports 3 (March 2013), 1376.
[12]
Yves-Alexandre de Montjoye, Samuel S. Wang, and Alex Pentland. 2012. On the trusted use of large-scale personal data. IEEE Data Engineering Bull. 35, 4 (2012), 5--8.
[13]
Mina Deng, Kim Wuyts, Riccardo Scandariato, Bart Preneel, and Wouter Joosen. 2011. A privacy threat analysis framework: Supporting the elicitation and fulfillment of privacy requirements. Requirements Engineering 16, 1 (March 2011), 3--32.
[14]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC’06. 265--284.
[15]
Nathan Eagle and Alex S. Pentland. 2009. Eigenbehaviors: Identifying structure in routine. Behavioral Ecology and Sociobiology 63, 7 (1 May 2009), 1057--1066.
[16]
Sébastien Gambs, Marc-Olivier Killijian, and Miguel Núñez del Prado Cortez. 2012. Next place prediction using mobility Markov chains. In Proceedings of the 1st Workshop on Measurement, Privacy, and Mobility (MPM’12). ACM, New York, Article 3, 6 pages.
[17]
Sebastien Gambs, Marc-Olivier Killijian, and Miguel Nuñez Del Prado Cortez. 2014. De-anonymization attack on geolocated data. Journal of Computer System Science 80 (2014), 1597--1614.
[18]
Fosca Giannotti, Anna Monreale, and Dino Pedreschi. 2013. Mobility data and privacy. In Mobility Data Modeling, Management, and Understanding, C. Renso, S. Spaccapietra, E. Zimanyi (Eds.). Springer, 174--193.
[19]
Fosca Giannotti, Mirco Nanni, Dino Pedreschi, Fabio Pinelli, Chiara Renso, Salvatore Rinzivillo, and Roberto Trasarti. 2011. Unveiling the complexity of human mobility by querying and mining massive trajectory data. The VLDB Journal 20, 5 (2011), 695.
[20]
Marta C. Gonzalez, Cesar A. Hidalgo, and Albert-Laszlo Barabasi. 2008. Understanding individual human mobility patterns. Nature 453, 7196 (June 2008), 779--782.
[21]
Trevor J. Hastie, Robert John Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
[22]
Shouling Ji, Weiqing Li, Mudhakar Srivatsa, Jing Selena He, and Raheem Beyah. 2014. Structure Based Data De-Anonymization of Social Networks and Mobility Traces. Springer International Publishing, Cham, 237--254.
[23]
Shan Jiang, Yingxiang Yang, Siddharth Gupta, Daniele Veneziano, Shounak Athavale, and Marta C. Gonzlez. 2016. The TimeGeo modeling framework for urban mobility without travel surveys. Proceedings of the National Academy of Sciences 113, 37 (2016), E5370--E5378. arXiv:http://www.pnas. org/content/113/37/E5370.full.pdf.
[24]
Xin Lu, Erik Wetter, Nita Bharti, Andrew J. Tatem, and Linus Bengtsson. 2013. Approaching the limit of predictability in human mobility. Scientific Reports 3, 1, 2923.
[25]
Stefano Marchetti, Caterina Giusti, Monica Pratesi, Nicola Salvati, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, Luca Pappalardo, and Lorenzo Gabrielli. 2015. Small area model-based estimators using big data sources. Journal of Official Statistics 31, 2 (2015), 263--281.
[26]
J. D. Meier and Microsoft Corporation. 2003. Improving Web Application Security: Threats and Countermeasures. Microsoft.
[27]
Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi. 2009. Walking in the crowd: Anonymizing trajectory data for pattern analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, New York, 1441--1444.
[28]
Anna Monreale, Gennady Andrienko, Natalia Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, and Stefan Wrobel. 2010a. Movement data anonymity through generalization. Transactions on Data Privacy 3, 2 (Aug. 2010), 91--121.
[29]
Anna Monreale, Gennady L. Andrienko, Natalia V. Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, and Stefan Wrobel. 2010b. Movement data anonymity through generalization. Transactions on Data Privacy 3, 2 (2010), 91--121.
[30]
Anna Monreale, Dino Pedreschi, Ruggero G. Pensa, and Fabio Pinelli. 2014a. Anonymity preserving sequential pattern mining. Artificial Intelligence and Law 22, 2 (2014), 141--173.
[31]
Anna Monreale, Salvatore Rinzivillo, Francesca Pratesi, Fosca Giannotti, and Dino Pedreschi. 2014b. Privacy-by-design in big data analytics and social mining. EPJ Data Science 3, 1 (2014), 10.
[32]
Anna Monreale, Wendy Hui Wang, Francesca Pratesi, Salvatore Rinzivillo, Dino Pedreschi, Gennady Andrienko, and Natalia Andrienko. 2013. Privacy-Preserving Distributed Movement Data Aggregation. Springer International Publishing, 225--245.
[33]
Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In Proceedings of the 30th IEEE Symposium on Security and Privacy (S&P’’’09). 173--187.
[34]
OWASP. 2016. Risk rating methodology. Retrieved from https://www.owasp.org/index.php/OWASP_Risk_Rating_Methodology.
[35]
Luca Pappalardo, Dino Pedreschi, Zbigniew Smoreda, and Fosca Giannotti. 2015. Using big data to study the link between human mobility and socio-economic development. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data’15). 871--878.
[36]
Luca Pappalardo, Salvatore Rinzivillo, Zehui Qu, Dino Pedreschi, and Fosca Giannotti. 2013. Understanding the patterns of car travel. The European Physical Journal Special Topics 215, 1 (2013), 61--73.
[37]
Luca Pappalardo, Salvatore Rinzivillo, and Filippo Simini. 2016. Human mobility modelling: Exploration and preferential return meet the gravity model. Procedia Computer Science 83 (2016), 934--939. The 7th International Conference on Ambient Systems, Networks and Technologies (ANT 2016) / The 6th International Conference on Sustainable Energy Information Technology (SEIT-2016) / Affiliated Workshops.
[38]
Luca Pappalardo and Filippo Simini. 2016. Modelling spatio-temporal routines in human mobility. CoRR abs/1607.05952 (2016). http://arxiv.org/abs/1607.05952
[39]
Luca Pappalardo, Filippo Simini, Salvatore Rinzivillo, Dino Pedreschi, Fosca Giannotti, and Albert-Laszlo Barabasi. 2015. Returners and explorers dichotomy in human mobility. Nature Communications 6 (Sept. 2015).
[40]
Luca Pappalardo, Maarten Vanhoof, Lorenzo Gabrielli, Zbigniew Smoreda, Dino Pedreschi, and Fosca Giannotti. 2016. An analytical framework to nowcast well-being using mobile phone data. International Journal of Data Science and Analytics 2, 1 (2016), 75--92.
[41]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[42]
Francesca Pratesi, Anna Monreale, Roberto Trasarti, Fosca Giannotti, Dino Pedreschi, and Tadashi Yanagihara. 2016. PRISQUIT: A System for Assessing Privacy Risk versus Quality in Data Sharing. Technical Report 2016-TR-043. ISTI - CNR, Pisa, Italy.
[43]
Arthi Ramachandran, Yunsung Kim, and Augustin Chaintreau. 2014. “I knew they clicked when I saw them with their friends”: Identifying your silent web visitors on social media. In Proceedings of the 2nd ACM Conference on Online Social Networks (COSN’14). 239--246.
[44]
Ira S. Rubinstein. 2013. Big data: The end of privacy or a new beginning? International Data Privacy Law (2013). arXiv:http://idpl.oxfordjournals.org/content/early/2013/01/24/idpl.ips03 6.full.pdf+html
[45]
Pierangela Samarati. 2001. Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 6 (2001), 1010--1027.
[46]
Pierangela Samarati and Latanya Sweeney. 1998a. Generalizing data to provide anonymity when disclosing information (abstract). In PODS. 188.
[47]
Pierangela Samarati and Latanya Sweeney. 1998b. Protecting privacy when disclosing information: K-anonymity and its enforcement through generalization and suppression. In Proceedings of the IEEE Symposium on Research in Security and Privacy. 384--393.
[48]
Filippo Simini, Marta C. Gonzalez, Amos Maritan, and Albert-Laszlo Barabasi. 2012. A universal model for mobility and migration patterns. Nature 484, 7392 (05 04 2012), 96--100.
[49]
Chaoming Song, Tal Koren, Pu Wang, and Albert-Laszlo Barabasi. 2010a. Modelling the scaling properties of human mobility. Nature and Physics 6, 10 (10 2010), 818--823.
[50]
Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-Lszl Barabsi. 2010b. Limits of predictability in human mobility. Science 327, 5968 (2010), 1018--1021. arXiv:http://www.sciencemag.org/cgi/reprint/327/5968/1018.pdf
[51]
Yi Song, Daniel Dahlmeier, and Stéphane Bressan. 2014. Not so unique in the crowd: A simple and effective algorithm for anonymizing location data. In Proceeding of the 1st International Workshop on Privacy-Preserving IR: When Information Retrieval Meets Privacy and Security co-located with 37th Annual International ACM SIGIR Conference (PIR@SIGIR’14). 19--24.
[52]
G. Stoneburner, A. Goguen, and A. Feringa. 2002. Risk Management Guide for Information Technology Systems: Recommendations of the National Institute of Standards and Technology. NIST special publication, Vol. 800. U.S. Department of Commerce, National Institute of Standards and Technology.
[53]
Latanya Sweeney. 2002. K-anonymity: A model for protecting privacy. International Journal of Uncertainty and Fuzziness in Knowledge-Based Systems 10, 5 (Oct. 2002), 557--570.
[54]
Frank Swiderski and Window Snyder. 2004. Threat Modeling. O’Reilly Media.
[55]
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining (1st Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[56]
Manolis Terrovitis and Nikos Mamoulis. 2008. Privacy preservation in the publication of trajectories. In MDM. 65--72.
[57]
Manolis Terrovitis, Nikos Mamoulis, and Panos Kalnis. 2008. Privacy-preserving anonymization of set-valued data. Proceedings of the VLDB Endowment 1, 1 (Aug. 2008), 115--125.
[58]
Michele Tizzoni, Paolo Bajardi, Adeline Decuyper, Guillaume Kon Kam King, Christian M. Schneider, Vincent Blondel, Zbigniew Smoreda, Marta C. Gonzlez, and Vittoria Colizza. 2014. On the use of human mobility proxies for modeling epidemics. PLOS Computational Biology 10, 7 (Jul. 2014), 1--15.
[59]
Slim Trabelsi, Vincent Salzgeber, Michele Bezzi, and Gilles Montagnon. 2009. Data disclosure risk evaluation. In CRiSIS’09. 35--72.
[60]
Jayakrishnan Unnikrishnan and Farid Movahedi Naini. 2013. De-anonymizing private data by matching statistics. In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton’13). 1616--1623.
[61]
Pu Wang, Timothy Hunter, Alexandre M. Bayen, Katja Schechtner, and Marta C. González. 2012. Understanding road usage patterns in urban areas. Scientific Reports 2 (Dec. 2012), 1001 EP.
[62]
Nathalie E. Williams, Timothy A. Thomas, Matthew Dunbar, Nathan Eagle, and Adrian Dobra. 2015. Measures of human mobility using mobile phone records enhanced with GIS data. PLoS ONE 10, 7 (07 2015), 1--16.
[63]
W. K. Wong, David W. Cheung, Edward Hung, Ben Kao, and Nikos Mamoulis. 2007. Security in outsourcing of association rule mining. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07). VLDB Endowment, 111--122.
[64]
World Economic Forum. 2013. Unlocking the Value of Personal Data: From Collection to Usage. Retrieved from http://www3.weforum.org/docs/WEF_IT_UnlockingValuePersonalData_CollectionUsage_Report_2013.pdf.
[65]
Yabo Xu, Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Jian Pei. 2008a. Publishing sensitive transactions for itemset utility. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM’08). 1109--1114.
[66]
Yabo Xu, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu. 2008b. Anonymizing transaction databases for publication. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 767--775.
[67]
Roman Yarovoy, Francesco Bonchi, Laks V. S. Lakshmanan, and Wendy Hui Wang. 2009. Anonymizing moving objects: How to hide a MOB in a crowd? In EDBT. 72--83.
[68]
Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: A large-scale measurement study. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking (MobiCom’11). ACM, New York, 145--156.
[69]
Yu Zheng. 2015. Trajectory data mining: An overview. ACM Transactions on Intelligent Systems Technology 6, 3 (2015), 29:1--29:41.
[70]
Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. 2014. Urban computing: Concepts, methodologies, and applications. ACM Transactions on Intelligent Systems Technology 5, 3 (Sept. 2014), Article 38, 55 pages.
[71]
Yu Zheng and Xiaofang Zhou (Eds.). 2011. Computing with Spatial Trajectories. Springer.

Cited By

View all
  • (2024)Privacy Preserving Human Mobility Generation Using Grid-Based Data and Graph AutoencodersISPRS International Journal of Geo-Information10.3390/ijgi1307024513:7(245)Online publication date: 9-Jul-2024
  • (2024)Personalized Privacy Preservation in Consumer Mobile TrajectoriesInformation Systems Research10.1287/isre.2023.122735:1(249-271)Online publication date: Mar-2024
  • (2024)TrajectGuard: A Comprehensive Privacy-Risk Framework for Multiple-Aspects TrajectoriesIEEE Access10.1109/ACCESS.2024.346208812(136354-136378)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 9, Issue 3
Regular Papers and Special Issue: Urban Intelligence
May 2018
370 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3167125
  • Editor:
  • Yu Zheng
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2017
Accepted: 01 May 2017
Revised: 01 May 2017
Received: 01 December 2016
Published in TIST Volume 9, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human mobility
  2. data mining
  3. privacy

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • European project SoBigData RI

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)245
  • Downloads (Last 6 weeks)26
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Privacy Preserving Human Mobility Generation Using Grid-Based Data and Graph AutoencodersISPRS International Journal of Geo-Information10.3390/ijgi1307024513:7(245)Online publication date: 9-Jul-2024
  • (2024)Personalized Privacy Preservation in Consumer Mobile TrajectoriesInformation Systems Research10.1287/isre.2023.122735:1(249-271)Online publication date: Mar-2024
  • (2024)TrajectGuard: A Comprehensive Privacy-Risk Framework for Multiple-Aspects TrajectoriesIEEE Access10.1109/ACCESS.2024.346208812(136354-136378)Online publication date: 2024
  • (2024)Design of a smart parking space allocation system for higher energy efficiencySmart Spaces10.1016/B978-0-443-13462-3.00016-9(371-390)Online publication date: 2024
  • (2024)Domain-Knowledge Enhanced GANs for High-Quality Trajectory GenerationAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5606-3_33(386-396)Online publication date: 5-Aug-2024
  • (2024)Leveraging Transformer Architecture for Effective Trajectory-User Linking (TUL) Attack and Its MitigationComputer Security – ESORICS 202410.1007/978-3-031-70903-6_14(271-290)Online publication date: 16-Sep-2024
  • (2023)Unique in the metro system: The likelihood to re-identify a metro user with limited trajectory pointsPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2023.129176628(129176)Online publication date: Oct-2023
  • (2023)EXPHLOT: EXplainable Privacy Assessment for Human LOcation TrajectoriesDiscovery Science10.1007/978-3-031-45275-8_22(325-340)Online publication date: 9-Oct-2023
  • (2022)Synthesis of Longitudinal Human Location Sequences: Balancing Utility and PrivacyACM Transactions on Knowledge Discovery from Data10.1145/352926016:6(1-27)Online publication date: 30-Jul-2022
  • (2022)Can I only share my eyes? A Web Crowdsourcing based Face Partition Approach Towards Privacy-Aware Face RecognitionProceedings of the ACM Web Conference 202210.1145/3485447.3512256(3611-3622)Online publication date: 25-Apr-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media