Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3049832.3049854acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

A space- and energy-efficient code Compression/Decompression technique for coarse-grained reconfigurable architectures

Published: 04 February 2017 Publication History

Abstract

We present an effective code compression technique to reduce the area and energy overhead of the configuration memory for coarse-grained reconfigurable architectures (CGRA). Based on a statistical analysis of existing code, the proposed method reorders the storage locations of the reconfigurable entities and splits the wide configuration memory into a number of partitions. Code compression is achieved by removing consecutive duplicated lines in each partition. Compressibility is increased by an optimization phase in the compiler. The optimization minimizes the number of configuration changes for individual reconfigurable entities. Decompression is performed by a simple hardware decoder logic that is able to decode lines with no additional latency and negligible area overhead. Experiments with over 190 loop kernels from different application domains show that the proposed method achieves a memory reduction of over 40% on average with four partitions. The compressibility of yet unseen code is only slightly lower with 35% on average. In addition, executing compressed code results in a 22 to 47% reduction in the configuration logic's energy consumption.

References

[1]
N. Aslam, M. J. Milward, A. T. Erdogan, and T. Arslan. Code compression and decompression for coarse-grain reconfigurable architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 16(12):1596–1608, Dec 2008.
[2]
M.-K. Chung, Y.-G. Cho, and S. Ryu. Efficient code compression for coarse grained reconfigurable architectures. In IEEE 30th International Conference on Computer Design (ICCD), pages 488–489, 2012.
[3]
M.-K. Chung, J.-K. Kim, Y.-G. Cho, and S. Ryu. Adaptive compression for instruction code of coarse grained reconfigurable architectures. In International Conference on Field-Programmable Technology (FPT), pages 394–397, 2013.
[4]
J. H. Holland. Adaptation in natural and artificial systems. MIT Press, Cambridge, MA, 1992.
[5]
C. Kim, M. Chung, Y. Cho, M. Konijnenburg, S. Ryu, and J. Kim. ULP-SRP: Ultra low power samsung reconfigurable processor for biomedical applications. In International Conference on Field-Programmable Technology (FPT), pages 329–334, 2012.
[6]
Y. Kim, I. Park, K. Choi, and Y. Paek. Power-conscious configuration cache structure and code mapping for coarsegrained reconfigurable architecture. In International Symposium on Low Power Electronics and Design (ISLPED), pages 310–315, 2006.
[7]
Y. Kim, R. N. Mahapatra, I. Park, and K. Choi. Low power reconfiguration technique for coarse-grained reconfigurable architecture. IEEE transactions on very large scale integration (VLSI) systems, 17(5):593–603, 2009.
[8]
T. Kitaoka, H. Amano, and K. Anjo. Reducing the configuration loading time of a coarse grain multicontext reconfigurable device. In International Conference on Field Programmable Logic and Applications, pages 171–180, 2003.
[9]
M. Lam. Software pipelining: An effective scheduling technique for vliw machines. In ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI), pages 318–328, 1988.
[10]
A. Lambrechts, P. Raghavan, M. Jayapala, F. Catthoor, and D. Verkest. Energy-aware interconnect-exploration of coarse grained reconfigurable processors. In Workshop on Application Specific Processors, 2005.
[11]
J. Lee, Y. Shin, W.-J. Lee, S. Ryu, and K. Jeongwook. Realtime ray tracing on coarse-grained reconfigurable processor. In International Conference on Field-Programmable Technology (FPT), pages 192–197, Dec 2013.
[12]
W.-J. Lee, S.-H. Lee, J.-H. Nah, J.-W. Kim, Y. Shin, J. Lee, and S.-Y. Jung. SGRT: a scalable mobile gpu architecture based on ray tracing. In ACM SIGGRAPH 2012 Posters, page 44, 2012.
[13]
W.-J. Lee, Y. Shin, J. Lee, J.-W. Kim, J.-H. Nah, S. Jung, S. Lee, H.-S. Park, and T.-D. Han. SGRT: a mobile gpu architecture for real-time ray tracing. In Proceedings of the 5th high-performance graphics conference, pages 109–119, 2013.
[14]
W.-J. Lee, Y. Shin, J. Lee, J.-W. Kim, J.-H. Nah, H.-S. Park, S. Jung, and S. Lee. A novel mobile gpu architecture based on ray tracing. In 2013 IEEE International Conference on Consumer Electronics (ICCE), pages 21–22, 2013.
[15]
B. Liu, W.-Y. Zhu, Y. Liu, and P. Cao. A configuration compression approach for coarse-grain reconfigurable architecture for radar signal processing. In International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pages 448–453, Oct 2014.
[16]
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. ADRES: An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix. In 13th International Conference on Field Programmable Logic and Applications (FPL), pages 61–70, 2003.
[17]
B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. IEE Proceedings - Computers and Digital Techniques, 150(5):255– 261, Sept 2003.
[18]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. CACTI 6.0: A tool to model large caches. HP Laboratories, pages 22–31, 2009.
[19]
T. Oh, B. Egger, H. Park, and S. Mahlke. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), pages 21–30, 2009.
[20]
Y. Park, H. Park, and S. Mahlke. Cgra express: Accelerating execution using dynamic operation fusion. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), pages 271–280, 2009.
[21]
M. Quax, J. Huisken, and J. van Meerbergen. A scalable implementation of a reconfigurable wcdma rake receiver. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE) - Volume 3, pages 30230–, 2004.
[22]
B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In 27th Annual International Symposium on Microarchitecture (MICRO), pages 63–74, 1994.
[23]
Samsung Exynos 4210 Product Brief. http: //www.samsung.com/us/business/oem-solutions/ pdfs/Exynos\_v11.pdf, 2011. (online; accessed November 2016).
[24]
K. Sastry, D. E. Goldberg, and G. Kendall. Genetic algorithms. In Search methodologies, pages 93–117. Springer US, 2014.
[25]
Y. Shin, J. Lee, W.-J. Lee, S. Ryu, and J. Kim. Full-stream architecture for ray tracing with efficient data transmission. In 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pages 2165–2168, 2014.
[26]
H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, N. Bagherzadeh, and E. M. Chaves Filho. Morphosys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE transactions on computers, 49(5):465–481, 2000.
[27]
D. Suh, K. Kwon, S. Kim, S. Ryu, and J. Kim. Design space exploration and implementation of a high performance and low area coarse grained reconfigurable processor. In International Conference on Field-Programmable Technology (FPT), pages 67–70, 2012.
[28]
Synopsys Design Compiler 2010. http://www.synopsys. com/, 2010. (online; accessed November 2016).
[29]
TOMLAB CPLEX. http://www.tomopt.com/tomlab/ products/cplex, 2016. (online; accessed November 2016).

Cited By

View all
  • (2021)Thread-aware area-efficient high-level synthesis compiler for embedded devicesProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370341(327-339)Online publication date: 27-Feb-2021
  • (2019)dMazeRunnerACM Transactions on Embedded Computing Systems10.1145/335819818:5s(1-27)Online publication date: 8-Oct-2019
  • (2018)Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/DecompressionACM Transactions on Architecture and Code Optimization10.1145/316201815:1(1-26)Online publication date: 22-Mar-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and Optimization
February 2017
317 pages
ISBN:9781509049318

Sponsors

Publisher

IEEE Press

Publication History

Published: 04 February 2017

Check for updates

Author Tags

  1. Coarse-grained reconfigurable architecture
  2. code compression
  3. energy reduction

Qualifiers

  • Article

Conference

CGO '17
Sponsor:

Acceptance Rates

CGO '17 Paper Acceptance Rate 26 of 116 submissions, 22%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Thread-aware area-efficient high-level synthesis compiler for embedded devicesProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370341(327-339)Online publication date: 27-Feb-2021
  • (2019)dMazeRunnerACM Transactions on Embedded Computing Systems10.1145/335819818:5s(1-27)Online publication date: 8-Oct-2019
  • (2018)Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/DecompressionACM Transactions on Architecture and Code Optimization10.1145/316201815:1(1-26)Online publication date: 22-Mar-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media