Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2896387.2900335acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccConference Proceedingsconference-collections
research-article

Warehousing and Protecting Big Data: State-Of-The-Art-Analysis, Methodologies, Future Challenges

Published: 22 March 2016 Publication History

Abstract

This paper proposes a comprehensive critical survey on the issues of warehousing and protecting big data, which are recognized as critical challenges of emerging big data research. Indeed, both are critical aspects to be considered in order to build truly, high-performance and highly-flexible big data management systems. We report on state-of-the-art approaches, methodologies and trends, and finally conclude by providing open problems and challenging research directions to be considered by future efforts.

References

[1]
Cuzzocrea, A., Song, I.-Y., and Davis, K.C. Analytics over Large-Scale Multidimensional Data: The Big Data Revolution! Proc. of ACM DOLAP, 2011.
[2]
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., and Pirahesh, H. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery 1(1), 1997.
[3]
Harinarayan, V., Rajaraman, A., and Ullman, J.D. Implementing Data Cubes Efficiently. Proc. of SIGMOD Conference, 1996.
[4]
Chen, C., Yan, X., Zhu, F., Han, J., and Yu, P.S. Graph OLAP: A Multi-Dimensional Framework for Graph Data Analysis. Knowledge and Information Systems 21(1), 2009.
[5]
Jensen, M.R., Møller, T.H., and Pedersen, T.B. Specifying OLAP Cubes on XML Data. Proc. of SSDBM, 2001.
[6]
Zhao, P., Li, X., Xin, D., and Han, J. Graph Cube: On Warehousing and OLAP Multidimensional Networks. Proc. of ACM SIGMOD, 2011.
[7]
Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J.X., and Zhang, Q. Efficient Computation of the Skyline Cube. Proc. of VLDB, 2005.
[8]
Dehne, F.K.H.A., Eavis, T., and Rau-Chaplin, A. The cgmCUBE Project: Optimizing Parallel Data Cube Generation for ROLAP. Distributed and Parallel Databases 19(1), 2006.
[9]
Sitaridi, E.A., and Ross, K.A. Ameliorating Memory Contention of OLAP Operators on GPU Processors. Proc. of ACM DaMoN, 2012.
[10]
Sarawagi, S., Agrawal, R., and Megiddo, N. Discovery-Driven Exploration of OLAP Data Cubes. Proc. of EDBT, 1998.
[11]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J. Rasin, A., and Silberschatz, A. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1), 2009.
[12]
Agrawal, D., Das, D., and El Abbadi, A. Big Data and Cloud Computing: Current State and Future Opportunities. Proc. of EDBT, 2011.
[13]
Cuzzocrea, A., and Bertino, E. Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach. Journal of Computer and System Sciences 77(6), 2011.
[14]
Cattell, R. Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4), 2010.
[15]
Cuzzocrea, A., and Saccà, D. Balancing Accuracy and Privacy of OLAP Aggregations on Data Cubes. Proc. of DOLAP, 2010
[16]
Bellatreche, L., Cuzzocrea, A., and Benkrid, S. Effectively and Efficiently Designing and Querying Parallel Relational Data Warehouses on Heterogeneous Database Clusters: The F&A Approach. Journal of Database Management 23(4), 2012.
[17]
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., and Welton, C. MAD Skills: New Analysis Practices for Big Data. PVLDB 2(2), 2009.
[18]
Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 2008.
[19]
Khouri, S., Bellatreche, L., and Berkani, N. MODETL: A Complete MODeling and ETL Method for Designing Data Warehouses from Semantic Databases. Proc. of COMAD, 2012.
[20]
Khouri, S., and Bellatreche, L. DWOBS: Data Warehouse Design from Ontology-Based Sources. Proc. of DASFAA, 2011
[21]
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., and Babu, S. Starfish: A Self-Tuning System for Big Data Analytics. Proc. of CIDIR, 2011.
[22]
Jiang, D., Ooi, B.C., Shi, L., and Wu, S. The Performance of MapReduce: An In-depth Study. PVLDB 3(1), 2010.
[23]
Thusoo, A. Sarma, J.S., Jain, N., Shao, Z., Chakka, P. Zhang, N., Antony, S., Liu, H., and Murthy, R. Hive -- A Petabyte Scale Data Warehouse Using Hadoop. Proc. of ICDE, 2010.
[24]
Bizer, C., Boncz, P.A., Brodie, M.L., and Erling, O. The Meaningful Use of Big Data: Four Perspectives - Four Challenges. SIGMOD Record 40(4), 2011
[25]
Chen, Y., Alspaugh, S., and Katz, R.H. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. PVLDB 5(12), 2012
[26]
Cuzzocrea, A., Saccà, D., and Serafino, P. Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes. International Journal of Data Warehousing and Mining 3(4), 2007
[27]
Cuzzocrea, A., Saccà, D., and Serafino, P. A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes. Proc. of DaWaK, 2006.
[28]
Cuzzocrea, A. Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams. Proc. of SSDBM, 2011.
[29]
Cuzzocrea, A., and Chakravarthy, S. Event-based Lossy Compression for Effective and Efficient OLAP over Data Streams. Data and Knowledge Engineering 69(7), 2010
[30]
Cuzzocrea, A. Providing Probabilistically-Bounded Approximate Answers to Non-Holistic Aggregate Range Queries in OLAP. Proc. of ACM DOLAP, 2005.
[31]
Mahmoud, H. A., Moon, H. J., Chi, Y., Hacigumus, H., Agrawal, D., and El Abbadi, A. Cloudoptimizer: Multitenancy for I/O-Bound OLAP Workloads. Proc. of EDBT, 2013.
[32]
Cao, Y., Chen, C., Guo, F., Jiang, D., Lin, Y., Ooi, B.C., Vo, H. T., Wu, S., and Xu, Q. Es2: A Cloud Data Storage System for Supporting both OLTP and OLAP. Proc. of ICDE, 2011
[33]
Wu, C., and Guo, Y. Enhanced User Data Privacy with Pay-By-Data Model. Proc. of BigData Conference, 2013.
[34]
Jensen, M. Challenges of Privacy Protection in Big Data Analytics. Proc. of BigData Congress, 2013.
[35]
Li, M., Zang, W., Bai, K., Yu, M. and Liu, P. MyCloud: Supporting User-Configured Privacy Protection in Cloud Computing. Proc. of ACSAC, 2013.
[36]
Betgé-Brezetz, S., Kamga, G.-B., Dupont, M.-P., and Guesmi, A. End-To-End Privacy Policy Enforcement in Cloud Infrastructure. Proc. of CLOUDNET, 2013.
[37]
Cuzzocrea, A: Analytics over Big Data: Exploring the Convergence of Data Warehousing, OLAP and Data-Intensive Cloud Infrastructures. Proc. of COMPSAC, 2013.
[38]
Cuzzocrea, A., Song, I.-Y., and Davis, K.C. Analytics over Large-Scale Multidimensional Data: The Big Data Revolution!. Proc. of DOLAP, 2011.
[39]
Weidner, M., Dees, J., and Sanders, P. Fast OLAP Query Execution in Main Memory on Large Data in a Cluster. Proc. of BigData Conference, 2013.
[40]
Cuzzocrea, A., Moussa, R., and Xu, G. OLAP*: Effectively and Efficiently Supporting Parallel OLAP over Big Data. Proc. of MEDI, 2013.
[41]
Cuzzocrea, A., Bellatreche, L., and Song, I.-Y. Data Warehousing and OLAP over Big Data: Current Challenges and Future Research Directions. Proc. of DOLAP, 2013.
[42]
Cuzzocrea, A. Analytics over Big Data: Exploring the Convergence of Data Warehousing, OLAP and Data-Intensive Cloud Infrastructures. Proc. of COMPSAC, 2013.
[43]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J. Rasin, A., and Silberschatz, A. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1), 2009.
[44]
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., and Babu, S. Starfish: A Self-Tuning System for Big Data Analytics. Proc. of CIDIR, 2011.
[45]
Machanavajjhala, A., and Reiter, J.P. Big Privacy: Protecting Confidentiality in Big Data. ACM Crossroads 19(1), 2012.
[46]
Hayashi, K. Social Issues of Big Data and Cloud: Privacy, Confidentiality, and Public Utility. Proc. of ARES, 2013.
[47]
Cuzzocrea, A., Russo, V., and Saccà, D. A Robust Sampling-Based Framework for Privacy Preserving OLAP. Proc. of DaWaK, 2008.
[48]
Cuzzocrea, A., Saccà, A. Balancing Accuracy and Privacy of OLAP Aggregations on Data Cubes. Proc. of DOLAP, 2010.
[49]
Agrawal, D., El Abbadi, A., and Wang, S. Secure and Privacy-Preserving Database Services in the Cloud. Proc. of ICDE, 2013.
[50]
Jang, M., Yoon, M., Chang, J.-W. A Privacy-Aware Query Authentication Index for Database Outsourcing. Proc. of BigComp, 2014.
[51]
Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining. Proc. of SIGMOD, 2000.
[52]
Agrawal, R., Srikant, R., and Thomas, D. Privacy Preserving OLAP. Proc. of SIGMOD, 2005.
[53]
Belsis, P., and Pantziou, G.E. A k-Anonymity Privacy-Preserving Approach in Wireless Medical Monitoring Environments. Personal and Ubiquitous Computing 18(1), 2014.
[54]
Ishibuchi, H., Yamane, M., and Nojima, Y. Learning from Multiple Data Sets with Different Missing Attributes and Privacy Policies: Parallel Distributed Fuzzy Genetics-based Machine Learning Approach. Proc. of BigData Conference, 2013.
[55]
Dean, J., and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 2008.
[56]
Dhatchayani, V.N., and Shankar Sriram, V.S. Trust Aware Identity Management for Cloud Computing. International Journal of Information and Communication Technology 6(3/4), 2014.
[57]
Rizvi, S., Ryoo, J., Liu, Y., Zazworsky, D., and Cappeta, A. A Centralized Trust Model Approach for Cloud Computing. Proc. of WOCC, 2014.
[58]
Eckhoff, D., and Sommer, C. Driving for Big Data? Privacy Concerns in Vehicular Networking. IEEE Security & Privacy 12(1), 2014.
[59]
Daries, J.P., Reich, J., Waldo, J., Young, E.M., Whittinghill, J., Ho, A., Seaton, D.T., and Chuang, I. Privacy, Anonymity, and Big Data in the Social Sciences. Communications of the ACM 57(9), 2014.
[60]
Kim, H.-I., Hong, S.-T., and Chang, J.-W. Hilbert-Curve based Cryptographic Transformation Scheme for Protecting Data Privacy On Outsourced Private Spatial Data. Proc. of BigComp, 2014.
[61]
Liu, L., Khodaei, A., Yin, W., and Han, Z. A Distribute Parallel Approach for Big Data Scale Optimal Power Flow with Security Constraints. Proc. of SmartGridComm, 2013.
[62]
Hipgrave, S. Smarter Fraud Investigations with Big Data Analytics. Proc. of NS, 2013.
[63]
Arasu, A., Eguro, K., Kaushik, R., and Ramamurthy, R. Querying Encrypted Data. Proc. of SIGMOD, 2014.
[64]
Boldyreva, A., Chenette, N., and O'Neill, A. Order-Preserving Encryption Revisited: Improved Security Analysis and Alternative Solutions. Proc. of CRYPTO, 2011.
[65]
Canim, M., and Kantarcioglu, M. Design and Analysis of Querying Encrypted Data in Relational Databases. Proc. of DAS, 2007.
[66]
Canim, M., Kantarcioglu, M., and Inan, A. Query Optimization in Encrypted Relational Databases by Vertical Schema Partitioning. Proc. of SDM, 2009.
[67]
Hacigümüs, H., Iyer, B.R., and Mehrotra, S. Efficient Execution of Aggregation Queries over Encrypted Relational Databases. Proc. of DASFAA, 2004.
[68]
Hacigümüs, H., Mehrotra, S., and Iyer, B.R. Providing Database as a Service. Proc. of ICDE, 2002.
[69]
Chandramouli, B., Goldstein, J., and Duan, S. Temporal Analytics on Big Data for Web Advertising. Proc. of ICDE, 2012.
[70]
Liu, C., Zhang, X., Liu, C., Yang, Y., Ranjan, R., Georgakopoulos, D., and Chen, J. An Iterative Hierarchical Key Exchange Scheme for Secure Scheduling of Big Data Applications in Cloud Computing. Proc. of TrustCom/ISPA/IUCC, 2013.
[71]
Schapranow, M.-P., and Plattner, H. HIG - An In-Memory Database Platform Enabling Real-Time Analyses of Genome Data. Proc. of BigData Conference, 2013.
[72]
Van Tassel, D. Advanced Cryptographic Techniques for Computers. Communications of the ACM 12(12), 1969.
[73]
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A.H. (editors). Big Data: The Next Frontier for Innovation, Competition, and Productivity, McKinsey Global Institute, 2011.
[74]
Cuzzocrea, A. Aggregation and Multidimensional Analysis of Big Data for Large-Scale Scientific Applications: Models, Issues, Analytics, and Beyond. Proc. of SSDBM, 2015.
[75]
Jin, S., Lin, W., Yin, H., Yang, S., Li, A., and Deng, B. Community Structure Mining in Big Data Social Media networks with MapReduce. Cluster Computing 18(3), 2015.
[76]
Cuzzocrea, A., Saccà, D., and Ullman, J.D. Big Data: A Research Agenda. Proc. of IDEAS, 2013.

Cited By

View all
  • (2021)Dynamic ICT Modeling for Handling Student Data Using Big Data Technology and Hybrid Cloud ComputingComputer Communication, Networking and IoT10.1007/978-981-16-0980-0_2(9-21)Online publication date: 19-Jun-2021
  • (2021)A Framework for Computerization of Punjab Technical Education System for Financial Assistance to Underrepresented StudentsAdvances in Software Engineering, Education, and e-Learning10.1007/978-3-030-70873-3_24(337-349)Online publication date: 9-Sep-2021
  • (2020)Healthcare Decision-Making Over a Geographic, Socioeconomic, and Image Data WarehouseADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium10.1007/978-3-030-55814-7_7(85-97)Online publication date: 18-Aug-2020
  • Show More Cited By
  1. Warehousing and Protecting Big Data: State-Of-The-Art-Analysis, Methodologies, Future Challenges

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICC '16: Proceedings of the International Conference on Internet of things and Cloud Computing
      March 2016
      535 pages
      ISBN:9781450340632
      DOI:10.1145/2896387
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 March 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Big Data
      2. Big Data Analytics
      3. Protecting Big Data
      4. Warehousing Big Data

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICC '16

      Acceptance Rates

      Overall Acceptance Rate 213 of 590 submissions, 36%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Dynamic ICT Modeling for Handling Student Data Using Big Data Technology and Hybrid Cloud ComputingComputer Communication, Networking and IoT10.1007/978-981-16-0980-0_2(9-21)Online publication date: 19-Jun-2021
      • (2021)A Framework for Computerization of Punjab Technical Education System for Financial Assistance to Underrepresented StudentsAdvances in Software Engineering, Education, and e-Learning10.1007/978-3-030-70873-3_24(337-349)Online publication date: 9-Sep-2021
      • (2020)Healthcare Decision-Making Over a Geographic, Socioeconomic, and Image Data WarehouseADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium10.1007/978-3-030-55814-7_7(85-97)Online publication date: 18-Aug-2020
      • (2019)A Review of Polyglot Persistence in the Big Data WorldInformation10.3390/info1004014110:4(141)Online publication date: 16-Apr-2019
      • (2018)Medical Big Data WarehouseJournal of Medical Systems10.1007/s10916-018-0894-942:4(1-16)Online publication date: 1-Apr-2018
      • (2017)Big Data Technologies to Improve Medical Data WarehousingProceedings of the 2nd international Conference on Big Data, Cloud and Applications10.1145/3090354.3090376(1-5)Online publication date: 29-Mar-2017
      • (2016)Big Data: The V's of the Game Changer Paradigm2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0014(17-24)Online publication date: Dec-2016

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media