Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA.2016.37acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Bit-plane compression: transforming data for better compression in many-core architectures

Published: 18 June 2016 Publication History

Abstract

As key applications become more data-intensive and the computational throughput of processors increases, the amount of data to be transferred in modern memory subsystems grows. Increasing physical bandwidth to keep up with the demand growth is challenging, however, due to strict area and energy limitations. This paper presents a novel and lightweight compression algorithm, Bit-Plane Compression (BPC), to increase the effective memory bandwidth. BPC aims at homogeneously-typed memory blocks, which are prevalent in many-core architectures, and applies a smart data transformation to both improve the inherent data compressibility and to reduce the complexity of compression hardware. We demonstrate that BPC provides superior compression ratios of 4.1:1 for integer benchmarks and reduces memory bandwidth requirements significantly.

References

[1]
B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin, "Scaling the bandwidth wall: Challenges in and avenues for CMP scaling," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2009.
[2]
DDR2 SDRAM Specification, JESD79-2F, Joint Electron Device Engineering Council, Nov. 2009.
[3]
DDR3 SDRAM STANDARD, JESD79-3F, Joint Electron Device Engineering Council, July 2012.
[4]
DDR4 SDRAM STANDARD, JESD79-4, Joint Electron Device Engineering Council, Sep. 2012.
[5]
Graphics Double Data Rate (GDDR5) SGRAM Standard, JESD212B, Joint Electron Device Engineering Council, Dec. 2013.
[6]
V. Sathish, M. J. Schulte, and N. S. Kim, "Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads," in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
[7]
M. Thuresson, L. Spracklen, and P. Stenstrom, "Memory-link compression schemes: A value locality perspective," IEEE Transactions on Computers, 2008.
[8]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Proceedings of the International Symposium on Workload Characterization (IISWC), 2009, pp. 44--54.
[9]
J. A. Stratton, C. Rodrigrues, I.-J. Sung, N. Obeid, L. Chang, G. Liu, and W.-M. W. Hwu, "Parboil: A revised benchmark suite for scientific and commercial throughput computing," University of Illinois at Urbana-Champaign, Urbana, Tech. Rep. IMPACT-12-01, Mar. 2012.
[10]
M. Kulkarni, M. Burtscher, C. Casçaval, and K. Pingali, "Lonestar: A suite of parallel irregular programs," in Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 2009.
[11]
J. T. Pawlowski, "Hybrid Memory Cube (HMC)," in Symposium on High Performance Chips (HOTCHIPS), 2011.
[12]
Hybrid Memory Cube Specification 2.0, Hybrid Memory Cube Consortium, 2014.
[13]
X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas, "C-pack: A high-performance microprocessor cache compression algorithm," IEEE Transactions on VLSI Systems, vol. 18, no. 8, pp. 1196--1208, Aug. 2010.
[14]
G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Base-delta-immediate compression: Practical data compression for on-chip caches," in Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2012.
[15]
D. Huffman, "A method for the construction of minimum-redundancy codes," Proceedings of the IRE, vol. 40, no. 9, pp. 1098--1101, 1952.
[16]
C. E. Shannon, "A mathematical theory of communication," The Bell System Technical Journal, vol. 27, pp. 379--423, 1948.
[17]
I. H. Witten, R. M. Neal, and J. G. Cleary, "Arithmetic coding for data compression," Communications of the ACM, vol. 30, no. 6, pp. 520--540, 1987.
[18]
F. C. Pereira and T. Ebrahimi, The MPEG-4 Book. Prentice-hall, 2002.
[19]
A. R. Alameldeen and D. A. Wood, "Frequent Pattern Compression: A significance-based compression scheme for l2 caches," Technical Report 1500, Computer Sciences Department, University of Wisconsin-Madison, Tech. Rep., 2004.
[20]
J. Yang, Y. Zhang, and R. Gupta, "Frequent value compression in data caches," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2000, pp. 258--265.
[21]
B. Abali, H. Franke, D. E. Poff, R. A. Saccone, C. O. Schulz, L. M. Herger, and T. B. Smith, "Memory Expansion Technology (MXT): Software support and performance," IBM Journal of Research and Development, vol. 45, no. 2, pp. 287--301, March 2001.
[22]
M. Kjelso, M. Gooch, and S. Jones, "Design and performance of a main memory hardware data compressor," in EUROMICRO 96. Beyond 2000: Hardware and Software Design Strategies., Proceedings of the 22nd EUROMICRO Conference, Sep 1996, pp. 423--430.
[23]
L. Benini, D. Bruni, B. Ricco, A. Macii, and E. Macii, "An adaptive data compression scheme for memory traffic minimization in processor-based systems," in Proceedings of the International Symposium on Circuits and Systems (ISCAS), vol. 4, 2002.
[24]
A. Arelakis and P. Stenstrom, "SC2: A statistical compression cache scheme," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2014.
[25]
D. J. Palframan, N. S. Kim, and M. H. Lipasti, "COP: To compress and protect main memory," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2015, pp. 682--693.
[26]
J. Kim, M. Sullivan, S.-L. Gong, and M. Erez, "Frugal ECC: Efficient and versatile memory error protection through fine-grained compression," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2015.
[27]
B. Justice, "NVIDIA Maxwell GPU GeForce GTX 980 Video Card Review." {Online}. Available: http://goo.gl/y2i5Tm
[28]
J.-S. Lee, W.-K. Hong, and S.-D. Kim, "Design and evaluation of a selective compressed memory system," in Proceedings of the International Conference on Computer Design (ICCD), 1999, pp. 184--191.
[29]
E. G. Hallnor and S. K. Reinhardt, "A compressed memory hierarchy using an indirect index cache," in Proceedings of the Workshop on Memory Performance Issues (WMPI), 2004, pp. 9--15.
[30]
A. R. Alameldeen and D. A. Wood, "Adaptive cache compression for high-performance processors," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2004, pp. 212--223.
[31]
M. Ekman and P. Stenstrom, "A robust main-memory compression scheme," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2005, pp. 74--85.
[32]
G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry, "Linearly Compressed Pages: A main memory compression framework with low complexity and low latency," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2013.
[33]
A. Shafiee, M. Taassori, R. Balasubramonian, and A. Davis, "MemZip: Exploring unconventional benefits from memory compression," in Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2014.
[34]
S. Chaudhry, R. Cypher, M. Ekman, M. Karlsson, A. Landin, S. Yip, H. Zeffer, and M. Tremblay, "Rock: A high-performance Sparc CMT processor," IEEE Micro, vol. 29, no. 2, pp. 6--16, March 2009.
[35]
M. Rabbani and P. Jones, Digital Image Compression Techniques. SPIE Publications, 1991.
[36]
J. Kim, J. Park, J. Park, and Y. Kwon, "Hybrid image data processing system and method," Jul. 24 2012, US Patent 8,229,235. {Online}. Available: http://www.google.com/patents/US8229235
[37]
Taiwan Semiconductor Manufacturing Company, "40nm CMOS Standard Cell Library v120b," 2009.
[38]
"GPGPU-Sim," http://www.gpgpu-sim.org.
[39]
J. Ishac, "Survey of header compression techniques," National Aeronautic and Space Administration, Tech. Rep. NASA/TM-2001-21154, 2001.
[40]
Ethernet Alliance, "Ethernet Jumbo Frames," http://goo.gl/i6ktnh, November 2009.

Cited By

View all
  • (2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
  • (2024)Atalanta: A Bit is Worth a “Thousand” Tensor ValuesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640356(85-102)Online publication date: 27-Apr-2024
  • (2021)Byte-Select CompressionACM Transactions on Architecture and Code Optimization10.1145/346220918:4(1-27)Online publication date: 3-Sep-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
June 2016
756 pages
ISBN:9781467389471

Sponsors

Publisher

IEEE Press

Publication History

Published: 18 June 2016

Check for updates

Qualifiers

  • Research-article

Conference

ISCA '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)124
  • Downloads (Last 6 weeks)14
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HMComp: Extending Near-Memory Capacity using Compression in Hybrid MemoryProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656612(74-84)Online publication date: 30-May-2024
  • (2024)Atalanta: A Bit is Worth a “Thousand” Tensor ValuesProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640356(85-102)Online publication date: 27-Apr-2024
  • (2021)Byte-Select CompressionACM Transactions on Architecture and Code Optimization10.1145/346220918:4(1-27)Online publication date: 3-Sep-2021
  • (2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
  • (2020)AccelerometerProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378450(733-750)Online publication date: 9-Mar-2020
  • (2019)Accelerating generalized linear models with MLWeavingProceedings of the VLDB Endowment10.14778/3317315.331732212:7(807-821)Online publication date: 1-Mar-2019
  • (2019)AVRProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337824(1-10)Online publication date: 5-Aug-2019
  • (2019)LinebackerProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322222(183-196)Online publication date: 22-Jun-2019
  • (2019)A Framework for Memory Oversubscription Management in Graphics Processing UnitsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304044(49-63)Online publication date: 4-Apr-2019
  • (2019)Compress Objects, Not Cache LinesProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304006(229-242)Online publication date: 4-Apr-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media