Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient User Guidance for Validating Participatory Sensing Data

Published: 17 July 2019 Publication History

Abstract

Participatory sensing has become a new data collection paradigm that leverages the wisdom of the crowd for big data applications without spending cost to buy dedicated sensors. It collects data from human sensors by using their own devices such as cell phone accelerometers, cameras, and GPS devices. This benefit comes with a drawback: human sensors are arbitrary and inherently uncertain due to the lack of quality guarantee. Moreover, participatory sensing data are time series that exhibit not only highly irregular dependencies on time but also high variance between sensors. To overcome these limitations, we formulate the problem of validating uncertain time series collected by participatory sensors. In this article, we approach the problem by an iterative validation process on top of a probabilistic time series model. First, we generate a series of probability distributions from raw data by tailoring a state-of-the-art dynamical model, namely <u>G</u>eneralised <u>A</u>uto <u>R</u>egressive <u>C</u>onditional <u>H</u>eteroskedasticity (GARCH), for our joint time series setting. Second, we design a feedback process that consists of an adaptive aggregation model to unify the joint probabilistic time series and an efficient user guidance model to validate aggregated data with minimal effort. Through extensive experimentation, we demonstrate the efficiency and effectiveness of our approach on both real data and synthetic data. Highlights from our experiences include the fast running time of a probabilistic model, the robustness of an aggregation model to outliers, and the significant effort saving of a guidance model.

References

[1]
Reza Akbarinia, Patrick Valduriez, and Guillaume Verger. 2012. Efficient evaluation of SUM queries over probabilistic data. IEEE Transactions on Knowledge and Data Engineering 25, 4 (2012), 764--775.
[2]
Tim Althoff, Eric Horvitz, Ryen W. White, and Jamie Zeitzer. 2017. Harnessing the web for population-scale physiological sensing: A case study of sleep and performance. In WWW. 113--122.
[3]
Daniele Apiletti, Elena Baralis, Giulia Bruno, and Tania Cerquitelli. 2009. Real-time analysis of physiological data to support medical applications. IEEE Transactions on Information Technology in Biomedicine 13, 3 (2009), 313--321.
[4]
Jiang Bian, Haoyi Xiong, Yanjie Fu, and Sajal K. Das. 2018. CSWA: Aggregation-free spatial-temporal community sensing. In AAAI.
[5]
Phan Thanh Cong, Nguyen Thanh Toan, Nguyen Quoc Viet Hung, and Bela Stantic. 2018. Minimizing efforts in reconciling participatory sensing data. In WIMS. 49.
[6]
Graham Cormode and Minos Garofalakis. 2007. Sketching probabilistic data streams. In SIGMOD. 281--292.
[7]
Drew Creal, Siem Jan Koopman, and André Lucas. 2013. Generalized autoregressive score models with applications. J. Appl. Econom. 28, 5 (2013), 777--795.
[8]
Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. JVLDB 16, 4 (2007), 523--544.
[9]
Jonathan Doherty, Kevin Curran, and Paul McKevitt. 2015. Pattern matching techniques for replacing missing sections of audio streamed across wireless networks. TIST 6, 2 (2015), 25.
[10]
EPFL-LSIR. 2018. Global Sensor Networks—A middleware for processing sensor data in the Internet. Retrieved from http://lsir.epfl.ch/research/current/gsn/.
[11]
Jing Gao, Qi Li, Bo Zhao, Wei Fan, and Jiawei Han. 2015. Truth discovery and crowdsourcing aggregation: A unified perspective. In VLDB. 2048--2049.
[12]
Ming Hua, Jian Pei, Wenjie Zhang, and Xuemin Lin. 2008. Ranking queries on uncertain data: A probabilistic threshold approach. In SIGMOD. 673--686.
[13]
Nguyen Quoc Viet Hung and Duong Tuan Anh. 2007. Combining SAX and piecewise linear approximation to improve similarity search on financial time series. In ISITC. 58--62.
[14]
Nguyen Quoc Viet Hung and Duong Tuan Anh. 2008. An improvement of PAA for dimensionality reduction in large time series databases. In PRICAI. 698--707.
[15]
Nguyen Quoc Viet Hung, Chi Thang Duong, Nguyen Thanh Tam, Matthias Weidlich, Karl Aberer, Hongzhi Yin, and Xiaofang Zhou. 2017. Argument discovery via crowdsourcing. VLDB J. 26, 4 (2017), 511--535.
[16]
Nguyen Quoc Viet Hung, Hoyoung Jeung, and Karl Aberer. 2013. An evaluation of model-based approaches to sensor data compression. TKDE 25, 11 (2013), 2434--2447.
[17]
Nguyen Quoc Viet Hung, Saket Sathe, Chi Thang Duong, and Karl Aberer. 2014. Towards enabling probabilistic databases for participatory sensing. In CollaborateCom. 114--123.
[18]
Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Lam Ngoc Tran, and Karl Aberer. 2013. An evaluation of aggregation techniques in crowdsourcing. In WISE. 1--15.
[19]
Nguyen Quoc Viet Hung, Duong Chi Thang, Nguyen Thanh Tam, Matthias Weidlich, Karl Aberer, Hongzhi Yin, and Xiaofang Zhou. 2017. Answer validation for generic crowdsourcing tasks with minimal efforts. VLDB J. 26, 6 (2017), 855--880.
[20]
Nguyen Quoc Viet Hung, Duong Chi Thang, Matthias Weidlich, and Karl Aberer. 2015. Minimizing efforts in validating crowd answers. In SIGMOD. 999--1014.
[21]
M. Ilbeigi, D. Castro-Lacouture, and A. Joukar. 2017. Generalized autoregressive conditional heteroscedasticity model to quantify and forecast uncertainty in the price of asphalt cement. JME 33, 5 (2017), 04017026.
[22]
Abhay Jha and Dan Suciu. 2012. Probabilistic databases with MarkoViews. In VLDB. 1160--1171.
[23]
Hyun Joon Jung and Matthew Lease. 2012. Improving quality of crowdsourced labels via probabilistic matrix factorization. In HCOMP. 101--106.
[24]
Bhargav Kanagal and Amol Deshpande. 2009. Indexing correlated probabilistic databases. In SIGMOD. 455--468.
[25]
George Karypis and Vipin Kumar. 1995. Metis-unstructured graph partitioning and sparse matrix ordering system, Version 2.0.
[26]
Xueying Li, Huanhuan Cao, Enhong Chen, and Jilei Tian. 2012. Learning to infer the status of heavy-duty sensors for energy-efficient context-sensing. TIST 3, 2 (2012), 35.
[27]
Yaguang Li, Han Su, Ugur Demiryurek, Bolong Zheng, Tieke He, and Cyrus Shahabi. 2017. PaRE: A system for personalized route guidance. In WWW. 637--646.
[28]
Mengxiong Liu, Zhengchao Liu, Chao Zhang, Keyang Zhang, Quan Yuan, Tim Hanratty, and Jiawei Han. 2017. Urbanity: A system for interactive exploration of urban dynamics from streaming human sensing data. In CIKM. 2503--2506.
[29]
Xin Miao, Kebin Liu, Yuan He, Dimitris Papadias, Qiang Ma, and Yunhao Liu. 2013. Agnostic diagnosis: Discovering silent failures in wireless sensor networks. TWC 12, 12 (2013), 6067--6075.
[30]
Oscar Moll, Aaron Zalewski, Sudeep Pillai, Sam Madden, Michael Stonebraker, and Vijay Gadepally. 2017. Exploring big volume sensor data with Vroom. In VLDB, Vol. 10. 1973--1976.
[31]
Min Mun, Sasank Reddy, Katie Shilton, Nathan Yau, Jeff Burke, Deborah Estrin, Mark Hansen, Eric Howard, Ruth West, and Péter Boda. 2009. PEIR, the personal environmental impact report, as a platform for participatory sensing systems research. In MobiSys. 55--68.
[32]
George L. Nemhauser and Laurence A. Wolsey. 1981. Maximizing submodular set functions: Formulations and analysis of algorithms. North-Holland Mathematics Studies (1981), 279--301.
[33]
Quoc Viet Hung Nguyen, Matthias Weidlich, Thanh Tam Nguyen, Zoltan Miklos, Karl Aberer, Avigdor Gal, and Bela Stantic. 2019. Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models. Information Systems.
[34]
Dan Olteanu, Jiewen Huang, and Christoph Koch. 2009. Sprout: Lazy vs. eager query plans for tuple-independent probabilistic databases. In ICDE. 640--651.
[35]
Aditya G. Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, and Jennifer Widom. 2012. Crowdscreen: Algorithms for filtering data with humans. In SIGMOD. 361--372.
[36]
Christopher Ré, Julie Letchner, Magdalena Balazinksa, and Dan Suciu. 2008. Event queries on correlated probabilistic streams. In SIGMOD. 715--728.
[37]
James Reason. 1990. Human Error. Cambridge University Press.
[38]
Sasank Reddy, Katie Shilton, Gleb Denisov, Christian Cenizal, Deborah Estrin, and Mani Srivastava. 2010. Biketastic: Sensing and mapping for better biking. In CHI. 1817--1820.
[39]
Neil Rubens, Mehdi Elahi, Masashi Sugiyama, and Dain Kaplan. 2015. Active learning in recommender systems. In Recommender Systems Handbook. 809--846.
[40]
Saket Sathe, Hoyoung Jeung, and Karl Aberer. 2011. Creating probabilistic databases from imprecise time-series data. In ICDE. 327--338.
[41]
Sandra Servia-Rodriguez, Kiran K. Rachuri, Cecilia Mascolo, Peter J. Rentfrow, Neal Lathia, and Gillian M. Sandstrom. 2017. Mobile sensing at the service of mental well-being: A large-scale longitudinal study. In WWW. 103--112.
[42]
Robert H. Shumway and David S. Stoffer. 2000. Time Series Analysis and Its Applications. Springer-Verlag.
[43]
Hongsuda Tangmunarunkit, Cheng-Kang Hsieh, Brent Longstaff, S. Nolen, John Jenkins, Cameron Ketcham, Joshua Selsky, Faisal Alquaddoomi, Dony George, Jinha Kang, et al. 2015. Ohmage: A general and extensible end-to-end participatory sensing platform. TIST 6, 3 (2015), 38.
[44]
Thanh Tran, Charles Sutton, Richard Cocci, Yanming Nie, Yanlei Diao, and Prashant Shenoy. 2009. Probabilistic inference over RFID streams in mobile environments. In ICDE. 1096--1107.
[45]
Daniela Tulone and Samuel Madden. 2006. PAQ: Time series forecasting for approximate query answering in sensor networks. In EWSN. 21--37.
[46]
Tim Van Erven and Peter Harremos. 2014. Rényi divergence and Kullback-Leibler divergence. TIT 60, 7 (2014), 3797--3820.
[47]
Dong Wang, Lance Kaplan, Hieu Le, and Tarek Abdelzaher. 2012. On truth discovery in social sensing: A maximum likelihood estimation approach. In IPSN. 233--244.
[48]
Leye Wang, Daqing Zhang, Dingqi Yang, Animesh Pathak, Chao Chen, Xiao Han, Haoyi Xiong, and Yasha Wang. 2017. Space-ta: Cost-effective task allocation exploiting intradata and interdata correlations in sparse crowdsensing. TIST 9, 2 (2017), 20.
[49]
Pengfei Wang, Guannan Liu, Yanjie Fu, Yuanchun Zhou, and Jianhui Li. 2018. Spotting trip purposes from taxi trajectories: A general probabilistic model. TIST 9, 3 (2018), 29.
[50]
Ryen W. White and Ryan Ma. 2017. Improving search engines via large-scale physiological sensing. In SIGIR. 881--884.
[51]
Haoyi Xiong, Daqing Zhang, Leye Wang, J. Paul Gibson, and Jie Zhu. 2015. EEMC: Enabling energy-efficient mobile crowdsensing with anonymous participants. TIST 6, 3 (2015), 39.
[52]
Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, and Ihab F. Ilyas. 2011. Guided data repair. In VLDB. 279--289.
[53]
Chi Yang and Jinjun Chen. 2017. A scalable data chunk similarity based compression approach for efficient big sensing data processing on cloud. TKDE 29, 6 (2017), 1144--1157.
[54]
Hongzhi Yin, Xiaofang Zhou, Bin Cui, Hao Wang, Kai Zheng, and Quoc Viet Hung Nguyen. 2016. Adapting to user interest drift for POI recommendation. TKDE 28, 10 (2016), 2566--2581.
[55]
Bo Zhang, Zheng Song, Chi Harold Liu, Jian Ma, and Wendong Wang. 2015. An event-driven QoI-aware participatory sensing framework with energy and budget constraints. TIST 6, 3 (2015), 42.
[56]
Fuzheng Zhang, Nicholas Jing Yuan, David Wilkie, Yu Zheng, and Xing Xie. 2015. Sensing the pulse of urban refueling behavior: A perspective from taxi mobility. TIST 6, 3 (2015), 37.
[57]
Wei Zhang, Rui Fan, Yonggang Wen, and Fang Liu. 2017. Energy-efficient mobile video streaming: A location-aware approach. TIST 9, 1 (2017), 6.
[58]
Bolong Zheng, Kai Zheng, Christian S. Jensen, Quoc Viet Hung Nguyen, Han Su, Guohui Li, and Xiaofang Zhou. 2018. Answering why-not group spatial keyword queries. TKDE (2018).
[59]
Pengfei Zhou, Yuanqing Zheng, and Mo Li. 2012. How long to wait?: Predicting bus arrival time with mobile phone based participatory sensing. In MobiSys. 379--392.

Cited By

View all
  • (2021)ODAR: A Lightweight Object Detection Framework for Autonomous Driving Robots2021 Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA52665.2021.9647256(01-08)Online publication date: Nov-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 10, Issue 4
Survey Papers and Regular Papers
July 2019
327 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3344873
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 July 2019
Accepted: 01 April 2019
Revised: 01 February 2019
Received: 01 August 2018
Published in TIST Volume 10, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Participatory sensing
  2. probabilistic database
  3. trust management

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)ODAR: A Lightweight Object Detection Framework for Autonomous Driving Robots2021 Digital Image Computing: Techniques and Applications (DICTA)10.1109/DICTA52665.2021.9647256(01-08)Online publication date: Nov-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media