Abstract
The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data. However, without an efficient system design and architecture, these performances will not be significant and valuable for medical managers. In this paper, we provide a short review of the literature about research issues of traditional data warehouses and we present some important Hadoop-based data warehouses. In addition, a Hadoop-based architecture and a conceptual data model for designing medical Big Data warehouse are given. In our case study, we provide implementation detail of big data warehouse based on the proposed architecture and data model in the Apache Hadoop platform to ensure an optimal allocation of health resources.
Similar content being viewed by others
References
Kuo, M.H., Sahama, T., Kushniruk, A.W., Borycki, E.M., and Grunwell, D.K., Health big data analytics: Current perspectives, challenges and potential solutions. Int. J. Big Data Intell. 1(1–2):114–126, 2014. https://doi.org/10.1504/IJBDI.2014.063835.
Cuzzocrea, A., Warehousing and Protecting Big Data: State-Of-The-Art-Analysis, Methodologies, Future Challenges. In Proceedings of the International Conference on Internet of things and Cloud Computing (p. 14). ACM, 2016. https://doi.org/10.1145/2896387.2900335
White, T., Hadoop: The definitive guide (third edition). O’Reilly, 2012. ISBN: 978-1-449-322252-0.
Sumathi, S., and Esakkirajan, S., Fundamentals of relational database management systems (Vol. 47). Springer, 2007. ISBN: 978 3 540 48397 7.
Ewen, E.F., Medsker, C.E., and Dusterhoft, L.E., Data warehousing in an integrated health system: building the business case. In Proceedings of the 1st ACM international workshop on Data warehousing and OLAP (pp. 47–53). ACM, 1998. https://doi.org/10.1145/294260.294271
Pedersen, T.B., and Jensen, C.S., Research issues in clinical data warehousing. In Scientific and Statistical Database Management. Proceedings. Tenth international conference on (pp. 43–52). IEEE, 1998. https://doi.org/10.1109/SSDM.1998.688110
Guérin, E., Moussouni, F., Courselaud, B., and Loréal, O., UML modeling of Gedaw: A gene expression data warehouse specialised in the liver. In The 3rd French bioinformatics conference proceeding: JOBIM 2002 (pp. 319–334), Saint-Malo, France, 2002.
Banek, M., Tjoa, A.M., and Stolba, N., Integrating different grain levels in a medical data warehouse federation. In International Conference on Data Warehousing and Knowledge Discovery (pp. 185–194). Springer Berlin Heidelberg, 2006. https://doi.org/10.1007/11823728_18
Kerkri, E.M., Quantin, C., Allaert, F.A., Cottin, Y., Charve, P., Jouanot, F., and Yétongnon, K., An approach for integrating heterogeneous information sources in a medical data warehouse. J. Med. Syst. 25(3):167–176, 2001. https://doi.org/10.1023/A:1010728915998.
Pavalam, S.M., Jawahar, M., and Akorli, F.K., Data warehouse based Architecture for Electronic Health Records for Rwanda. In Education and Management Technology (ICEMT) International Conference on (pp. 253–255). IEEE, 2010. https://doi.org/10.1109/ICEMT.2010.5657660
Sebaa, A., Nouicer, A., Tari, A., Ramtani, T., and Ouhab, A., Decision support system for health care resources allocation. Electron. Physician. 9(6):4661–4668, 2017. https://doi.org/10.19082/4661.
Sebaa, A., Nouicer, A., Tari, A., Ramtani, T., and Ouhab, A., Decision support system for Health Care Resources allocation. Abstracts Book of ICHSMT’16- International Conference on Health Sciences and Medical Technologies; 2016 Sep 27-29; Tlemcen, Algeria. Mehr publishing. p. 8, 2016. ISBN: 978-600-96661-0-2.
Sebaa, A., Tari, A., Ramtani, T., and Ouhab, A., DW RHSB: A framework for optimal allocation of health resources. Int. J. Comput. Sci. Commun Inf. Technol. 2(1):12–17, 2015.
Wang, L., and Alexander, C.A., Big data in medical applications and health care. Am. Med. J. 6(1):1, 2015. https://doi.org/10.3844/amjsp.2015.1.8.
Cuzzocrea, A., Song, I.Y., and Davis, K.C., Analytics over large-scale multidimensional data: the big data revolution. In Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP. pp. 101–104. ACM, 2011. https://doi.org/10.1145/2064676.2064695
Sebaa, A., Nouicer, N., Chikh, F., and Tari, A., Big Data Technologies to Improve Medical Data Warehousing. In Proceedings of 2nd international conference on Big Data, Cloud and Applications. ACM, 2017. https://doi.org/10.1145/3090354.3090376
Yao, Q., Tian, Y., Li, P.F., Tian, L.L., Qian, Y.M., and Li, J.S., Design and development of a medical big data processing system based on Hadoop. J. Med. Syst. 39(3):23, 2015. https://doi.org/10.1007/s10916-015-0220-8.
Istephan, S., and Siadat, M.R., Unstructured medical image query using big data–an epilepsy case study. J. Biomed. Inform. 59:218–226, 2016. https://doi.org/10.1016/j.jbi.2015.12.005.
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., and Saltz, J., Hadoop GIS: a high performance spatial data warehousing system over Map-Reduce. VLDB Endowment. 6(11):1009–1020, 2013. https://doi.org/10.14778/2536222.2536227.
Saravanakumar, N.M., Eswari, T., Sampath, P., and Lavanya, S., Predictive methodology for diabetic data analysis in big data. In 2nd ISBCC. Procedia Computer Science. 50:203–208, 2015. https://doi.org/10.1016/j.procs.2015.04.069.
Rodger, J.A., Discovery of medical big data analytics: Improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid Hadoop hive. Informatics in Medicine Unlocked. 1:17–26, 2015. https://doi.org/10.1016/j.imu.2016.01.002.
Sundvall, E., Wei-Kleiner, F., Freire, S.M., and Lambrix, P., Querying archetype-based electronic health records using Hadoop and Dewey encoding of openEHR models. Stud. Health Technol. Inform. 235:406, 2017. https://doi.org/10.3233/978-1-61499-753-5-406.
Raja, P.V., and Sivasankar, E., Modern Framework for Distributed Healthcare Data Analytics Based on Hadoop. In Information and Communication Technology-EurAsia Conference (pp. 348–355). Springer Berlin Heidelberg, 2014. https://doi.org/10.1007/978-3-642-55032-4_34
Yang, C.T., Liu, J.C., Chen, S.T., and Lu, H.W., Implementation of a big data accessing and processing platform for medical records in cloud. J. Med. Syst. 41(10):149, 2017. https://doi.org/10.1007/s10916-017-0777-5.
Sebaa, A., Chick, F., Nouicer, A., and Tari, A., Research in big data warehousing using Hadoop. J. Inform. Syst. Eng. Manag. 2(2), 2017. https://doi.org/10.20897/jisem.201710.
Dean, J., and Ghemawat, S., MapReduce: A flexible data processing tool. CACM. 53(1):72–77, 2010. https://doi.org/10.1145/1629175.1629198.
Wu, S., Li, F., Mehrotra, S., and Ooi, B.C., Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing (p. 12). ACM, 2011. https://doi.org/10.1145/2038916.2038928
Apache Hadoop: http://hadoop.apache.org/, Viewed in 02/2015.
Taylor, R.C., An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC bioinform. 11(12):S1, 2010. https://doi.org/10.1186/1471-2105-11-S12-S1.
Apache Hive: https://hive.apache.org/, Viewed in 02/2015.
Liu, X., Thomsen, C., and Pedersen, T.B., ETLMR: a highly scalable dimensional ETL framework based on mapreduce. In Transactions on Large-Scale Data-and Knowledge-Centered Systems VIII (pp. 1–31). Springer Berlin Heidelberg, 2013. https://doi.org/10.1007/978-3-642-37574-3_1
Gao, S., Li, L., Li, W., Janowicz, K., and Zhang, Y., Constructing gazetteers from volunteered big geo-data based on Hadoop. Comput. Environ. Urban. Syst. 61:172–186, 2017. https://doi.org/10.1016/j.compenvurbsys.2014.02.004.
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., et al., Hive: A warehousing solution over a map-reduce framework. Proc. VLDB Endowment. 2(2):1626–1629, 2009. https://doi.org/10.14778/1687553.1687609.
Ross, J., The use of economic evaluation in health care: Australian decision makers' perceptions. Health Policy. 31(2):103–110, 1995. https://doi.org/10.1016/0168-8510(94)00671-7.
ANDI: National Agency for Investment Development of Algeria, http://www.andi.dz/index.php/en/secteur-de-sante, Viewed in 02/2015.
Acknowledgements
This work was partially supported by the Ministry of Higher Education and Scientific Research of Algeria and the University of Bejaia, under the project CNEPRU (Ref. B*00620140066/2015-2018).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
This article is part of the Topical Collection on Transactional Processing Systems
Rights and permissions
About this article
Cite this article
Sebaa, A., Chikh, F., Nouicer, A. et al. Medical Big Data Warehouse: Architecture and System Design, a Case Study: Improving Healthcare Resources Distribution. J Med Syst 42, 59 (2018). https://doi.org/10.1007/s10916-018-0894-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-018-0894-9