Abstract
Compression is a process that is always carried out in terms of digitizing data, which is considered very important, especially in the development and growth of the Big Data era. Lossless compression is the process of reducing the size of the data but with the condition that it can be returned to its original source during the decompression process. One of the purposes of doing Lossless compression is to archive a file, usually the file is RAW and has a large file size with a minimum 16-Bit file system (65,536 possible differences in values). Huffman’s algorithm is currently still very effective in compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, but its performance cannot be determined if it is performed on data with complex variables and probabilities such as WAV format audio data. Based on a literature review, the compression performance measurement for file archiving uses the Compression Ratio (CR) and Compression Time (CT) indicators. This research resulted in a new scheme which we named 4-ary/MQ, the architectural basis of which is based on entropy coding rooted in the static, dynamic and adaptive variants of the Huffman scheme. For the variable code length characteristics, it follows the Quad Tree dynamic branching (FGK rule), the node symbol setting adopts an adaptive method, namely adding a maximum of 2 variables with a value of ‘0’ to maintain the root of the branch after the root always has 4 branches. Based on descriptive analysis of compression results, deviation, average, ANOVA and DMRT, 4-ary/MQ produces optimal CR with fast CT when compared to various variants of the Huffman algorithm and other lossless compression applications such as (PKZIP, WinZip, 7-Zip, and Monkeys Audio). From the results of trial analysis based on manual mathematical and statistical calculations, it is certain that 4-ary/MQ provides high compression results with a very fast process, so it has many benefits if it is used to compress data on local storage media, hosting/cloud and bandwidth.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Alakuijala J, Vandevenne L (2013) Data compression using Zopfli. Google Inc. Css-Ig.Net. Available at: http://css-ig.net/data/zopfli-deflate-in.pdf
Alakuijala J, Kliuchnikov E, Szabadka Z, Vandevenne L (2016) Comparison of Brotli, deflate, Zopfli, LZMA, LZHAM and Bzip2 compression algorithms. Google Inc. Css-Ig.Net
Alakuijala J et al (2019) Brotli. ACM Trans Inf Syst 37(1):1–30 Available at: http://dl.acm.org/citation.cfm?doid=3289475.3231935
Al-Bahadili H, Hussain SM (2010) A bit-level text compression scheme based on the ACW algorithm. Int J Autom Comput 7(1):123–131. https://doi.org/10.1007/s11633-010-0123-6
Ali AS, Hussein AS, Tolba MF, Yousef AH (2011) Large-scale vector data visualization using high performance computing. J Software 6(2):298–305. https://doi.org/10.4304/jsw.6.2.298-305
Arshad R, Saleem A, Khan D (2016) Performance comparison of Huffman Coding and Double Huffman Coding. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH). IEEE, pp 361–364
Baer MB (2006) A general framework for codes involving redundancy minimization. IEEE Trans Inf Theory 52(1):344–349 Available at: http://www.indianjournals.com/ijor.aspx?target=ijor:quest&volume=13&issue=1&article=005
Benjamin A (2010) Music compression algorithms and why you should care. "Whitepaper: http://ese.wustl.edu/ContentFiles/Research/UndergraduateResearch/CompletedProjects/Web-Pages/su10/AlexBenjamin_AudioCompression.pdf. Created 9 Dec 2010
Bentley JL, Sleator DD, Tarjan RE, Wei VK (1986) A locally adaptive data compression scheme. Commun ACM 29(4):320–330 Available at: http://portal.acm.org/citation.cfm?doid=5684.5688
Bosch JJ, Janer J, Fuhrmann F, Herrera P (2012) A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012, (Ismir), pp 559–564. Available at: http://mtg.upf.es/system/files/publications/Bosch-ISMIR2012.pdf
Bosch JJ, Marxer R, Gómez E (2016) Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music. J New Music Res 45(2):101–117. https://doi.org/10.1080/09298215.2016.1182191
Breaban MC, Graur A, Potorac AD, Balan DG (2018) Bandwidth management application in directory service environment. 2018 international conference on development and application systems (DAS). May 2018 IEEE, pp. 88–92
Bryukhanov YA, Lukashevich YA (2017) Nonlinear distortion modulating signals in quantization. 2017 IEEE East-West Design & Test Symposium (EWDTS). IEEE, pp 1–5
Calim AS, Kaya C, Yuksel H (2021) Big data based archiving management system. In: 2021 6th International Conference on Computer Science and Engineering (UBMK). IEEE, pp 21–26
Chambliss D, Pandey P, Thakur T, Fleshler A, Clark T, Ruddy JA, Gougherty KD, Kalos M, Merithew L, Thompson JG, Yudenfriend HM (2008) An architecture for storage-hosted application extensions. IBM J Res Dev 52(4–5):427–437. https://doi.org/10.1147/rd.524.0427
Chen HC, Wang YL, Lan YF (1999) A memory-efficient and fast Huffman decoding algorithm. Inf Process Lett 69(3):119–122
Chowdhury RA, Kaykobad M, King I (2002) An efficient decoding technique for Huffman codes. Inf Process Lett 81(6):305–308.https://doi.org/10.1016/S0020-0190(01)00243-5
Chung K (1997) Efficient Huffman decoding. Inf Process Lett 61:97–99
Djusdek DF, Studiawan H, Ahmad T (2016) Adaptive image compression using adaptive Huffman and LZW. In: 2016 international conference on Information & Communication Technology and systems (ICTS). IEEE, pp. 101–106
Elakkiya S, Thivya KS (2022) Comprehensive review on lossy and lossless compression techniques. Journal of The Institution of Engineers (India): Series B, 103(3):1003–1012. Available at: https://link.springer.com/10.1007/s40031-021-00686-3
Faller N (1973) An adaptive system for data compression. In record of the 7th Asilomar conference on circuits, systems, and computers. 1973 pp. 593–597
Fenwick PM (1995) Huffman code efficiencies for extensions of sources. IEEE transactions on communications 43(2/3/4):163–165 Available at: http://ieeexplore.ieee.org/document/380027/
Fonseca E et al. (2019) Learning sound event classifiers from web audio with Noisy labels. ICASSP, IEEE international conference on acoustics, speech and signal processing - proceedings, 2019-may(688382), pp.21–25. https://doi.org/10.1109/ICASSP.2019.8683158
Fruchtman A, Gross Y, Klein ST, Shapira D (2020) Weighted adaptive Huffman coding. 2020 data compression conference (DCC). March 2020 IEEE, pp. 368–368
Gallager R (1978) Variations on a theme by Huffman. IEEE Trans Inf Theory 24(6):668–674. https://doi.org/10.1109/TIT.1978.1055959
Guido RC (2016) ZCR-aided neurocomputing: a study with applications. Knowl-Based Syst 105:248–269. https://doi.org/10.1016/j.knosys.2016.05.011
Gupta A, Bansal A, Khanduja V (2017) Modern lossless compression techniques: review, comparison and analysis. In: 2017 second international conference on electrical, computer and communication technologies (ICECCT). IEEE, pp 1–8
Habib A, Rahman MS (2017) Balancing decoding speed and memory usage for Huffman codes using quaternary tree. Appl Inf 4:1–5 Available at: http://applied-informatics-j.springeropen.com/articles/10.1186/s40535-016-0032-z
Habib A, Hoque ASML, Hussain MR (2013) H-HIBASE: compression enhancement of HIBASE technique using Huffman coding. J Comput 8(5):1175–1183 Available at: http://ojs.academypublisher.com/index.php/jcp/article/view/8706
Habib A, Islam MJ, Rahman MS (2018) Huffman based code generation algorithms: data compression perspectives. J Comput Sci 14(12):1599–1610. https://doi.org/10.3844/jcssp.2018.1599.1610
Hashemian R (1995) Memory efficient and high-speed search Huffman coding. IEEE Trans Commun 43(10):2576–2581. https://doi.org/10.1109/26.469442
Hermassi H, Rhouma R, Belghith S (2010) Joint compression and encryption using chaotically mutated Huffman trees. Commun Nonlinear Sci Numer Simul 15(10):2987–2999 Available at: https://linkinghub.elsevier.com/retrieve/pii/S1007570409006108
Hidayat T (2019) Reformat the file uncompressed into lossy based on audio compression method using Huffman shift coding scheme. International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) 8(1.5):317–326. https://doi.org/10.30534/ijatcse/2019/5381.52019
Hidayat T, Zakaria MH, Pee ANC (2018) Lossless coding scheme for data audio 2 channel using huffman and shannon-Fano. J Theor Appl Inf Technol 96(11):3467–3477
Hidayat T, Zakaria MH, Che Pee N (2019a) A critical assessment of advanced coding standards for lossless audio compression. Int J Simul: Syst, Sci Technol 19(5):31.1–31.10. https://doi.org/10.5013/IJSSST.a.19.05.31
Hidayat T, Zakaria MH, Che Pee N (2019b) Comparison of lossless compression schemes for WAV audio data 16-bit between Huffman and coding arithmetic. Int J Simul: Syst, Sci Technol 19(6):36.1–36.7. https://doi.org/10.5013/IJSSST.a.19.06.36
Hidayat T, Zakaria MH, Pee ANC (2020) Survey of performance measurement indicators for lossless compression technique based on the objectives. 2020 3rd international conference on information and communications technology (ICOIACT). IEEE, pp 170–175
Huffman D (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101. https://doi.org/10.1007/BF02839372
Hussain AJ, Al-Fayadh A, Radi N (2018) Image compression techniques: a survey in lossless and lossy algorithms. Neurocomputing 300:44–69. https://doi.org/10.1016/j.neucom.2018.02.094
Islam MT, Ema RR, Islam MR, Islam T (2019) A lossless bit isolation algorithm for data compression by using static dictionary. In: 2019 international conference on computer, communication, chemical, materials and electronic engineering (IC4ME2), IEEE, pp 1–5
Johnson PD Jr, Harris GA, Hankerson DC (2003) Introduction to Information Theory and Data Compression, Chapman and Hall/CRC
Katona G, Nemetz O (1976) Huffman codes and self-information. IEEE Trans Inf Theory 22(3):337–340. https://doi.org/10.1109/TIT.1976.1055554
Kaur M (2015) Lossless text data compression algorithm using modified Huffman coding - a review. Int J Advan Technol Eng Sci 4(4):1273–1276
Kavousianos X, Kalligeros E, Nikolos D (2008) Test data compression based on variable-to-variable Huffman encoding with Codeword reusability. IEEE Trans Comput-Aided Design Integra Circ Syst 27(7):1333–1338 Available at: http://ieeexplore.ieee.org/document/4544866/
Knuth DE (1985) Dynamic huffman coding. J Algorithms 6(2):163–180 Available at: https://linkinghub.elsevier.com/retrieve/pii/0196677485900367
Kodituwakku SR, Amarasinghe US (2010) Comparison of lossless data compression algorithms for text data. Indian J Comput Sci Eng 1(4):416–425
Li S, Lian G, Peng Z (2015) Research on the loss-less compression algorithm of the ultrasonic testing of rail. In: 2015 Symposium on Piezoelectricity, Acoustic Waves, and Device Applications (SPAWDA). IEEE, pp. 458–461
Li Q, Li H, Zhang K (2019) Storage performance prediction. In: 2019 IEEE 10th international conference on software engineering and service science (ICSESS). IEEE, pp. 1–4
Lin YK, Huang SC, Yang CH (2012) A fast algorithm for Huffman decoding based on a recursion Huffman tree. J Syst Softw 85(4):974–980. https://doi.org/10.1016/j.jss.2011.11.1019
McAnlis C, Haecky A (2016). Understanding compression: data compression for modern developers 1st ed. O’Reilly Media, Inc.
Mentzer F, Tschannen M (2020) Learning better lossless compression using lossy compression. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.6638--6647.
Metwally A, Agrawal D, El Abbadi A (2004) Efficient computation of frequent and top-k elements in data streams. In: Proceedings of the 10th international conference on database theory, pp 398–412
Mustafa R (2017) An improved decoding technique for efficient Huffman coding. J Comput Sci Appl Info Technol 2(1):1–5 Available at: http://symbiosisonlinepublishing.com/computer-science-technology/computerscience-information-technology10.php
Nam LH, Hung PD (2021) Building a big data oriented architecture for Enterprise integration. In: pp. 172–182
Nobre ICM, Frossard P (2019) Optimized quantization in distributed graph signal processing. ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5376–5380
Oommen BJ, Rueda L (2004) A new family of weak estimators for training in non-stationary distributions, pp 644–652.https://doi.org/10.1007/978-3-540-27868-9_70
Paakkonen J et al (2019) File size distributions and caching for offloading. IEEE workshop on signal processing advances in wireless communications. SPAWC. pp.1–5
Pal R (2021) Speech compression with wavelet transform and Huffman coding. 2021 international conference on communication information and computing technology (ICCICT). 25 June 2021 IEEE, pp. 1–4
Ramya KA, Pushpa M (2016) A survey on lossless and Lossy data compression methods. Int J Comput Sci Eng Commun 4(1):1277–1280
Salomon D, Motta G (2010) Handbook of data compression. Springer London, London
Sayood K (2018) Huffman coding. Introduction to Data Compression. Elsevier, pp 41–88
Schack R The length of a typical Huffman codeword. IEEE Trans Inf Theory 40(4) 1246–1247. https://doi.org/10.1109/18.335944
Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27(3):379–423 Available at: http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf
Shao Z et al (2022) A high-throughput VLSI architecture design of canonical Huffman encoder. IEEE Trans Circ Syst II: Express Briefs 69(1):209–213 (Available at: https://ieeexplore.ieee.org/document/9462403/)
Sharma M (2010) Compression using Huffman coding. IJCSNS Int J Comput Sci Network Sec 10(5):133 Available at: http://paper.ijcsns.org/07_book/201005/20100520.pdf
Singh R, Bikramjeet B (2013) A survey on different compression techniques and bit reduction algorithm for compression of text/lossless data. Int J Advanc Res Comput Sci Software Eng 3(3):2277–2128
Snyder D, Chen G, Povey D (2015) MUSAN: A Music, speech, and noise corpus (October). Available at: http://arxiv.org/abs/1510.08484, https://doi.org/10.48550/arXiv.1510.08484
Suri PR, Goel M (2011) Ternary tree and memory-efficient Huffman decoding algorithm. Int J Comput Sci Issues 8(1):483–489
Szpankowski W, Verdu S (2011) Minimum expected length of fixed-to-variable lossless compression without prefix constraints. IEEE Trans Inf Theory 57(7):4017–4025 Available at: https://ieeexplore.ieee.org/document/5895099/
Trum C (2020) Realizing the benefits of digitalization through standards. Nat Gas Electric 36(7):9–17. https://doi.org/10.1002/gas.22157
Uthayakumar J, Vengattaraman T, Dhavachelvan P (2018) A survey on data compression techniques: from the perspective of data quality, coding schemes, data type and applications. J King Saud Univ - Comput Info Sci 33:119–140. https://doi.org/10.1016/j.jksuci.2018.05.006
Vitter JS (1987) Algorithm 673: dynamic Huffman coding. ACM Trans Math Software (TOMS) 15(2):158–167
Welch (1984) A technique for high-performance data compression. Computer 17(6):8–19. https://doi.org/10.1109/MC.1984.1659158
Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23(3):337–343 Available at: http://ieeexplore.ieee.org/document/1055714/
Zuo L, Zhu MM (2020) Minimize cost of data transfers using bandwidth reservation on FPVB paths of dynamic HPNs. 2020 international conference on computing, networking and communications (ICNC). February 2020 IEEE. pp. 74–78
Acknowledgements
This paper is an ongoing research on more in-depth lossless compression technique for data 16-bit files. Gratitude to Universitas Amikom Yogyakarta and Universiti Teknikal Malaysia Melaka, and everyone who provided substantial support in this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest and certify NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hidayat, T., Zakaria, M.H. & Pee, A.N.C. Increasing the Huffman generation code algorithm to equalize compression ratio and time in lossless 16-bit data archiving. Multimed Tools Appl 82, 24031–24068 (2023). https://doi.org/10.1007/s11042-022-14130-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14130-1