Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3150994.3151002acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Towards preserving results confidentiality in cloud-based scientific workflows

Published: 12 November 2017 Publication History

Abstract

Cloud computing has established itself as a solid computational model that allows for scientists to deploy their simulation-based experiments on distributed virtual resources to execute a wide range of scientific experiments. These experiments can be modeled as scientific workflows. Many of these workflows are data-intensive and produce a large volume of data, which is also stored in the cloud using storage services by Scientific Workflow Management Systems (SWfMS). One main issue regarding cloud storage services is confidentiality of stored data, i.e. if unauthorized people access data files they can infer knowledge about the results or even about the workflow structure. Encryption is a possible solution, but it may not be be sufficient and a new level of security can be added to preserve data confidentiality: data dispersion. In order to reduce this risk, generated data files cannot be stored in the same bucket, or at least sensitive data files have to be distributed across many cloud storage. In this paper, we present IPConf, an approach to preserve workflow results confidentiality in cloud storage. IPConf generates a distribution plan for data files generated during a workflow execution. This plan disperses data files in several cloud storage to preserve confidentiality. This distribution plan is then sent to the SWfMS that effectively stores generated data into specific buckets during workflow execution. Experiments performed using real data from SciPhy workflow executions indicate the potential of the proposed approach.

References

[1]
Hussain Aljafer, Zaki Malik, Mohammed Alodib, and Abdelmounaam Rezgui. 2014. An Experimental Evaluation of Data Confidentiality Measures on the Cloud. In Proceedings of the 6th International Conference on Management of Emergent Digital EcoSystems (MEDES '14). ACM, New York, NY, USA, Article 20, 8 pages.
[2]
Cecilia R. Aragon and Karl J. Runge. 2009. Workflow Management for High Volume Supernova Search. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC '09). ACM, New York, NY, USA, 949--955.
[3]
Eliseu C. Branco, Jr., Jose Maria Monteiro, Roney Reis de C. e Silva, and Javam C. Machado. 2016. A New Approach to Preserving Data Confidentiality in the Cloud. In Proceedings of the 20th International Database Engineering & Applications Symposium (IDEAS '16). ACM, New York, NY, USA, 256--263.
[4]
Valentina Ciriani, Sabrina De Capitani Di Vimercati, Sara Foresti, Sushil Jajodia, Stefano Paraboschi, and Pierangela Samarati. 2009. Keep a Few: Outsourcing Data While Maintaining Confidentiality. In Proceedings of the 14th European Conference on Research in Computer Security (ESORICS'09). Springer-Verlag, Berlin, Heidelberg, 440--455. http://dl.acm.org/citation.cfm?id=1813084.1813120
[5]
Valentina Ciriani, Sabrina De Capitani Di Vimercati, Sara Foresti, Sushil Jajodia, Stefano Paraboschi, and Pierangela Samarati. 2010. Combining Fragmentation and Encryption to Protect Privacy in Data Storage. ACM Trans. Inf. Syst. Secur. 13, 3, Article 22 (July 2010), 33 pages.
[6]
Flavio Costa, Vitor Silva Sousa, Daniel de Oliveira, Kary A. C. S. Ocaña, Eduardo S. Ogasawara, Jonas Dias, and Marta Mattoso. 2013. Capturing and querying workflow runtime provenance with PROV: a practical approach. In Joint 2013 EDBT/ICDT Conferences, EDBT/ICDT '13, Genoa, Italy, March 22, 2013, Workshop Proceedings. 282--289.
[7]
Susan B. Davidson, Sanjeev Khanna, Tova Milo, Debmalya Panigrahi, and Sudeepa Roy. 2011. Provenance Views for Module Privacy. In Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS '11). ACM, New York, NY, USA, 175--186.
[8]
Susan B. Davidson, Sanjeev Khanna, Sudeepa Roy, Julia Stoyanovich, Val Tannen, and Yi Chen. 2011. On Provenance and Privacy. In Proceedings of the 14th International Conference on Database Theory (ICDT '11). ACM, New York, NY, USA, 3--10.
[9]
Daniel de Oliveira, Fernanda Araujo Baião, and Marta Mattoso. 2010. Towards a Taxonomy for Cloud Computing from an e-Science Perspective. Springer London, London, 47--62.
[10]
Daniel de Oliveira, Kary A. C. S. Ocaña, Fernanda Araujo Baião, and Marta Mattoso. 2012. A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds. J. Grid Comput. 10, 3 (2012), 521--552.
[11]
D. de Oliveira, E. Ogasawara, F. Baião, and M. Mattoso. 2010. SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows. In 3rd International Conference on Cloud Computing. 378--385.
[12]
Ewa Deelman, Gurmeet Singh, Miron Livny, Bruce Berriman, and John Good. 2008. The Cost of Doing Science on the Cloud: The Montage Example. In Proc. of the SC'08. 50:1--50:12.
[13]
Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, et al. 2015. Pegasus, a workflow management system for science automation. FGCS 46 (2015), 17--35.
[14]
Brandon M. S. Erickson, Raminderjeet Singh, August E. Evrard, Matthew R. Becker, Michael T. Busha, Andrey V. Kravtsov, Suresh Marru, Marlon Pierce, and Risa H. Wechsler. 2012. A High Throughput Workflow Environment for Cosmological Simulations. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond (XSEDE '12). ACM, New York, NY, USA, Article 34, 8 pages.
[15]
Juliana Freire, David Koop, Emanuele Santos, and Cláudio T. Silva. 2008. Provenance for Computational Tasks: A Survey. Computing in Science and Engg. 10, 3 (May 2008), 11--21.
[16]
Michel Gendreau, Gilbert Laporte, and Frédéric Semet. 2004. Heuristics and lower bounds for the bin packing problem with conflicts. Computers & Operations Research 31 (2004), 347--358.
[17]
Tony Hey, Stewart Tansley, Kristin M Tolle, et al. 2009. The fourth paradigm: data-intensive scientific discovery. Vol. 1. Microsoft research Redmond, WA.
[18]
Christina Hoffa, Gaurang Mehta, Timothy Freeman, Ewa Deelman, Kate Keahey, Bruce Berriman, and John Good. 2008. On the use of cloud computing for scientific workflows. In eScience. 640--645.
[19]
Keith R. Jackson, Lavanya Ramakrishnan, Karl J. Runge, and Rollin C. Thomas. 2010. Seeking Supernovae in the Clouds: A Performance Study. In HPDC 2010. ACM, New York, NY, USA, 421--429.
[20]
Klaus Jansen. 1999. An Approximation Scheme for Bin Packing with Conflicts. Journal of Combinatorial Optimization 3 (1999), 363--377.
[21]
Mihai Maruseac, Gabriel Ghinita, and Razvan Rughinis. 2014. Privacy-preserving Publication of Provenance Workflows. In Proceedings of the 4th ACM Conference on Data and Application Security and Privacy (CODASPY '14). ACM, New York, NY, USA, 159--162.
[22]
Marta Mattoso, Claudia Werner, Guilherme H. Travassos, Vanessa Braganholo, Eduardo Ogasawara, Daniel Oliveira, and et al. 2010. Towards supporting the life cycle of large scale scientific experiments. IJBPIM 5, 1 (2010), 79+.
[23]
Manel Medhioub, Mohamed Hamdi, and Tai-Hoon Kim. 2016. A New Authentication Scheme for Cloud-based Storage Applications. In Proceedings of the 9th International Conference on Security of Information and Networks (SIN '16). ACM, New York, NY, USA, 57--60.
[24]
Leonardo Murta, Vanessa Braganholo, Fernando Chirigati, David Koop, and Juliana Freire. 2014. noWorkflow: Capturing and Analyzing Provenance of Scripts. In Provenance and Annotation of Data and Processes - 5th International Provenance and Annotation Workshop, IPAW2014, Cologne, Germany, June 9--13, 2014. Revised Selected Papers. 71--83.
[25]
Kary Ocaña, Daniel de Oliveira, Eduardo S. Ogasawara, Alberto M. R. Dávila, Alexandre A. B. Lima, and Marta Mattoso. 2011. SciPhy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes. In BSB. Springer, 66--70.
[26]
Kary ACS Ocaña, Daniel de Oliveira, Eduardo Ogasawara, Alberto MR Dávila, Alexandre AB Lima, and Marta Mattoso. 2011. SciPhy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes. In 2011 BSB. Springer, 66--70.
[27]
Chuan Qin, Jingwei Li, and Patrick P. C. Lee. 2017. The Design and Implementation of a Rekeying-Aware Encrypted Deduplication Storage System. Trans. Storage 13, 1, Article 9 (Feb. 2017), 30 pages.
[28]
T. Rekatsinas, A. Deshpande, and A. Machanavajjhala. [n. d.]. SPARSI: partitioning sensitive data amongst multiple adversaries. Proceedings of the VLDB Endownment ([n. d.]).
[29]
Pierangela Samarati and Sabrina De Capitani di Vimercati. 2010. Data Protection in Outsourcing Scenarios: Issues and Directions. In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security (ASIACCS 10). ACM, New York, NY, USA, 1--14.
[30]
Lisa Ann Scott, Robert Zimmerman, Hsin-Yi Chang, Mary Heitzman, Joseph Krajcik, Kate Lynch McNeill, Chris Quintana, and Elliot Soloway. 2004. Chemation: A Handheld Chemistry Modeling and Animation Tool. In Proceedings of the 2004 Conference on Interaction Design and Children: Building a Community (IDC '04). ACM, New York, NY, USA, 145--146.
[31]
Vitor Silva Sousa, José Leite, José J. Camata, Daniel de Oliveira, Alvaro L. G. A. Coutinho, Patrick Valduriez, and Marta Mattoso. 2017. Raw data queries during data-intensive parallel workflow execution. Future Generation Comp. Syst. 75 (2017), 402--422.
[32]
Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison. 2007. The triana workflow environment: Architecture and applications. In Workflows for e-Science. Springer, 320--339.
[33]
Ian J Taylor, Ewa Deelman, Dennis B Gannon, and Matthew Shields. 2014. Workflows for e-Science: scientific workflows for grids. Springer Publishing Company, Incorporated.
[34]
Luis M. Vaquero, Luis Rodero-Merino, Juan Caceres, and Maik Lindner. 2008. A Break in the Clouds: Towards a Cloud Definition. SIGCOMM Rev. 39, 1 (Dec. 2008), 50--55.
[35]
Paul Watson. 2012. A multi-level security model for partitioning workflows over federated clouds. Journal of Cloud Computing: Advances, Systems and Applications 1, 1 (28 Jul 2012), 15.
[36]
Simon Woodman, Hugo Hiden, and Paul Watson. 2017. Applications of provenance in performance prediction and data storage optimisation. Future Generation Computer Systems 75 (2017), 299 -- 309.
[37]
Justin M. Wozniak, Timothy G. Armstrong, Michael Wilde, Daniel S. Katz, Ewing Lusk, and Ian T. Foster. 2013. Swift/T: Scalable Data Flow Programming for Many-task Applications. SIGPLAN Not. 48, 8 (Feb. 2013), 309--310.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WORKS '17: Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science
November 2017
87 pages
ISBN:9781450351294
DOI:10.1145/3150994
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data distribution
  2. data provenance
  3. results confidentiality
  4. scientific workflows

Qualifiers

  • Research-article

Funding Sources

Conference

SC '17
Sponsor:

Acceptance Rates

WORKS '17 Paper Acceptance Rate 8 of 25 submissions, 32%;
Overall Acceptance Rate 30 of 54 submissions, 56%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 80
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media