Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Self-Identifying Data for Fair Use

Published: 02 March 2015 Publication History

Abstract

Public-use earth science datasets are a useful resource with the unfortunate feature that their provenance is easily disconnected from their content. “Fair-use policies” typically associated with these datasets require appropriate attribution of providers by users, but sound and complete attribution is difficult if provenance information is lost. To address this, we introduce a technique to directly associate provenance information with sensor datasets. Our technique is similar to traditional watermarking but is intended for application to unstructured time-series datasets. Our approach is potentially imperceptible given sufficient margins of error in datasets and is robust to a number of benign but likely transformations including truncation, rounding, bit-flipping, sampling, and reordering. We provide algorithms for both one-bit and blind mark checking and show how our system can be adapted to various data representation types. Our algorithms are probabilistic in nature and are characterized by both combinatorial and empirical analyses. Mark embedding can be applied at any point in the data life cycle, allowing adaptation of our scheme to social or scientific concerns.

References

[1]
R. Agrawal and J. Kiernan. 2002. Watermarking relational databases. In Proceedings of the 28th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 155--166.
[2]
H. K. Arndt, T. Bandholtz, O. Günther, M. Rüther, and T. Schütz. 2000. EML—the Environmental Markup Language. In Workshop Symposium on Integration in Environmental Information Systems. ISESS, 1--9.
[3]
K. S. Baker and L. Yarmey. 2009. Data stewardship: Environmental data curation and a web-of-repositories. International Journal of Digital Curation 4, 2, 12--27.
[4]
D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. 2005. An annotation management system for relational databases. VLDB Journal 14, 4, 373--396.
[5]
P. Buneman, A. Chapman, and J. Cheney. 2006. Provenance management in curated databases. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 539--550.
[6]
K. Chang, N. Yau, M. Hansen, and D. Estrin. 2006. Sensorbase.org—a centralized repository to slog sensor network data. In Proceedings of the International Conference on Distributed Computing in Sensor Network (DCOSS)/Euro-American Workshop on Middleware for Sensor Networks.
[7]
R. Chbeir and D. Gross-Amblard. 2006. Multimedia and metadata watermarking driven by application constraints. In 2006 12th International Multi-Media Modeling Conference Proceedings. IEEE, Washington, DC.
[8]
S. Chong, C. Skalka, and J. A. Vaughan. 2010. Self-identifying sensor data. In Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks. ACM, New York, NY, 82--93.
[9]
S. Chong, C. Skalka, and J. A. Vaughan. 2014. Self-identifying data for fair use. Tech. Rep. TR-01-14, Harvard University. Retrieved from ftp://ftp.deas.harvard.edu/techreports/tr-2014.html.
[10]
I. J. Cox and M. L. Miller. 2002. The first 50 years of electronic watermarking. EURASIP Journal on Advances in Signal Processing 2002, 2, 126--132.
[11]
Y. Cui and J. Widom. 2001. Lineage tracing for general data warehouse transformations. In Proceedings of 27th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA.
[12]
J. Fridrich and M. Goljan. 1999. Comparing robustness of watermarking techniques. In Security and Watermarking of Multimedia Contents. Vol. 3657. SPIE, Bellingham, WA.
[13]
A. Gehani and U. Lindqvist. 2007. VEIL: A system for certifying video provenance. In Proceedings of the 9th IEEE International Symposium on Multimedia. IEEE Computer Society, Washington, DC, 263--272.
[14]
B. Harjito, V. Potdar, and J. Singh. 2012. Watermarking technique for copyright protection of wireless sensor network data using LFSR and Kolmogorov complexity. In Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia. ACM, New York, NY, 208--217.
[15]
Hubbard Brook Ecosystem Study. 2014. Homepage. Retrieved from http://www.hubbardbrook.org/.
[16]
H. I. Jacobson. 1969. The maximum variance of restricted unimodal distributions. Annals of Mathematical Statistics 40, 5, 1746--1752.
[17]
I. Kamel and H. Juma. 2011. A lightweight data integrity scheme for sensor networks. Sensors 11, 4, 4118--4136.
[18]
J. Ledlie, C. Ng, D. A. Holland, K.-K. Muniswamy-Reddy, U. Braun, and M. Seltzer. 2005. Provenance-aware sensor data storage. In Proceedings of the 1st IEEE International Workshop on Networking Meets Databases. IEEE Computer Society, Los Alamitos, CA.
[19]
T. Lee, S. Bressan, and S. E. Madnick. 1998. Source attribution for querying against semi-structured documents. In Workshop on Web Information and Data Management. ACM, New York, NY, 33--39.
[20]
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. 2011. The Open Provenance Model core specification (v1.1). Future Generation Computer Systems 27, 6, 743--756.
[21]
U. Park and J. Heidemann. 2008. Provenance in sensornet republishing. In Proceedings of the 2nd International Provenance and Annotation Workshop. Springer-Verlag, Salt Lake City, Utah, 208--292.
[22]
F. Sadri, Ed. 1998. In Proceedings of the 1st Workshop on Web Information and Data Management. ACM, New York, NY.
[23]
Sagehen Creek Field Station Data Repository. 2014. Homepage. Retrieved from http://sagehen.ucnrs.org/resources.htm.
[24]
M. Shehab, E. Bertino, and A. Ghafoor. 2008. Watermarking relational databases using optimization-based techniques. IEEE Transactions on Knowledge and Data Engineering 20, 1, 116--129.
[25]
R. Sion, M. Atallah, and S. Prabhakar. 2006. Rights protection for discrete numeric streams. IEEE Transactions on Knowledge and Data Engineering 18, 5, 699--714.
[26]
W. Tan. 2003. Containment of relational queries with annotation propagation. In Proceedings of the International Workshop on Database and Programming Languages. Springer-Verlag, Berlin, 37--53.
[27]
UbiSec&Sens. 2013. Hmac-MD5 Implementation. Retrieved from http://www.ist-ubisecsens.org/downloads/hmac-md5/hmac-md5.php.
[28]
J. Widom. 2004. Trio: A system for integrated management of data, accuracy, and lineage. Technical Report 2004-40, Stanford InfoLab.

Cited By

View all
  • (2021)Overview of Information Hiding Algorithms for Ensuring Security in IoT Based Cyber-Physical SystemsSecurity and Privacy Preserving for IoT and 5G Networks10.1007/978-3-030-85428-7_5(81-115)Online publication date: 10-Oct-2021
  • (2020)The Use of the Blockchain Technology and Digital Watermarking to Provide Data Authenticity on a Mining EnterpriseSensors10.3390/s2012344320:12(3443)Online publication date: 18-Jun-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 5, Issue 3
Special Issue on Provenance, Data and Information Quality
February 2015
105 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/2698232
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2015
Accepted: 01 September 2014
Revised: 01 August 2014
Received: 01 January 2014
Published in JDIQ Volume 5, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Provenance
  2. self-identifying data

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Overview of Information Hiding Algorithms for Ensuring Security in IoT Based Cyber-Physical SystemsSecurity and Privacy Preserving for IoT and 5G Networks10.1007/978-3-030-85428-7_5(81-115)Online publication date: 10-Oct-2021
  • (2020)The Use of the Blockchain Technology and Digital Watermarking to Provide Data Authenticity on a Mining EnterpriseSensors10.3390/s2012344320:12(3443)Online publication date: 18-Jun-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media