Abstract
Virtual machine (VM) images (VMIs) often share common parts of significant size as they are stored individually. Using existing de-duplication techniques for such images are non-trivial, impose serious technical challenges, and requires direct access to clouds’ proprietary image storages, which is not always feasible. We propose an alternative approach to split images into shared parts, called fragments, which are stored only once. Our solution requires a reasonably small set of base images available in the cloud, and additionally only the increments will be stored without the contents of base images, providing significant storage space savings. Composite images consisting of a base image and one or more fragments are assembled on-demand at VM deployment. Our technique can be used in conjunction with practically any popular cloud solution, and the storage of fragments is independent of the proprietary image storage of the cloud provider.
Similar content being viewed by others
References
Amazon Web Services: Amazon Elastic Compute Cloud (Amazon EC2). https://aws.amazon.com/ec2/ (2017)
Brown, N.: Linux kernel storage overlayfs driver. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/overlayfs.txt (2016)
BTRFS: Wiki. https://btrfs.wiki.kernel.org/index.php/Main_Page (2016)
Canonical: Introduction to linux containers. https://linuxcontainers.org/lxc/introduction/ (2016)
Canonical: Linux containers: what is lxd? https://linuxcontainers.org/lxd/ (2016)
Canonical: Lxd 2.0 image management. https://insights.ubuntu.com/2016/04/01/lxd-2-0-image-management-512/ (2016)
Canonical: cloud-init - the standard for customising cloud instances. https://cloud-init.io/ (2017)
Canonical: Ubuntu cloud images 16.04 lts daily build. https://cloud-images.ubuntu.com/xenial/ (2017)
Docker: Storage overlayfs driver. https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ (2016)
Dyck, A., Penners, R., Lichter, H.: Towards definitions for release engineering and devops. In: Proceedings of the Third International Workshop on Release Engineering, pp. 3–3. IEEE Press, Piscataway (2015)
ENTICE: project website. http://www.entice-project.eu/ (2017)
ENTICE: Wp3 github repository. https://github.com/entice-repository/wp3-image-synthesis (2017)
Flexiant: Flexiant cloud orchestrator (fco). https://www.flexiant.com/flexiant-cloud-orchestrator/ (2017)
Gec, S., Kimovski, D., Prodan, R., Stankovski, V.: Using constraint-based reasoning for multi-objective optimisation of the entice environment. In: 2016 12th International Conference on Semantics, Knowledge and Grids (SKG), pp. 17–24 (2016)
Geer, D.: The os faces a brave new world. Computer 42(10), 15–17 (2009)
Hajnal, Á., Márton, I., Farkas, Z., Kacsuk, P.: Remote storage management in science gateways via data brid- ging. Concurr. Comput.: Pract. Exp. 27(16), 4398–4411 (2015)
Hovestadt, M., Kao, O., Kliem, A., Warneke, D.: Adaptive online compression in clouds—making informed decisions in virtual machine environments. J. Grid Comput. 11(2), 167–186 (2013)
Jayaram, K., Peng, C., Zhang, Z., Kim, M., Chen, H., Lei, H.: An empirical analysis of similarity in virtual machine images. In: Proceedings of the Middleware 2011 Industry Track Workshop, p 6. ACM (2011)
Jin, K., Miller, E.L.: The effectiveness of deduplication on virtual machine disk images. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, p. 7. ACM (2009)
Kimovski, D., Saurabh, N., Stankovski, V., Prodan, R.: Multi-objective middleware for distributed vmi repositories in federated cloud environment. Scalable Comput.: Pract. Exp. 17(4), 299–312 (2016)
Kimovski, D., Marosi, A., Gec, S., Saurabh, N., Kertesz, A., Kecskemeti, G., Stankovski, V., Prodan, R.: Distributed environment for efficient virtual machine image management in federated cloud architectures. Concurr. Comput.: Pract. Exp. e4220-n/a (2017)
Kováas, J., Kacsuk, P.: Occopus: a multi-cloud orchestrator to deploy and manage complex scientific infrastructures. J. Grid Comput 16(1), 19–37 (2018)
Lagar-Cavilla, H.A., Whitney, J.A., Scannell, A.M., Patchin, P., Rumble, S.M., De Lara, E., Brudno, M., Satyanarayanan, M.: Snowflock: rapid virtual machine cloning for cloud computing. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp. 1–12. ACM (2009)
Lin, X., Hibler, M., Eide, E., Ricci, R.: Using deduplicating storage for efficient disk image deployment. EAI Endorsed Trans. Scalable Inf. Syst. 2(6), e1 (2015)
Marosi, A.C.: List of fragments for “entice vm image analysis and optimised fragmentation”. https://s3.lpds.sztaki.hu/atisu/papers/entice-fragmentation/fragments.pdf (2017)
Milojicic, D., Llorente, I.M., Montero, R.S.: Opennebula: a cloud management tool. IEEE Internet Comput. 15(2), 11–14 (2011)
Peinl, R., Holzschuher, F., Pfitzer, F.: Docker cluster management for the cloud-survey results and own solution. J. Grid Comput. 14(2), 265–282 (2016)
Rkt: the pod-native container engine. https://github.com/rkt/rkt (2017)
Virtuozzo: Openvz virtuozzo containers wiki. https://openvz.org/Virtuozzo (2016)
Xu, J., Zhang, W., Zhang, Z., Wang, T., Huang, T.: Clustering-based acceleration for virtual machine image deduplication in the cloud environment. J. Syst. Softw. 121, 144–156 (2016)
Zhao, X., Zhang, Y., Wu, Y., Chen, K., Jiang, J., Li, K.: Liquid: a scalable deduplication file system for virtual machine images. IEEE Trans. Parallel Distrib. Syst. 25(5), 1257–1266 (2014)
Funding
This research work has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 644179 (ENTICE).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hajnal, A., Kecskemeti, G., Marosi, A.C. et al. ENTICE VM Image Analysis and Optimised Fragmentation. J Grid Computing 16, 247–263 (2018). https://doi.org/10.1007/s10723-018-9430-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-018-9430-x