Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD Architectures

Published: 23 May 2016 Publication History

Abstract

This work presents a methodology for efficient exploration of data interleaving and data-to-memory mapping options for Single Instruction Multiple Data (SIMD) platform architectures. The system architecture consists of a reconfigurable clustered scratch-pad memory and a SIMD functional unit, which performs the same operation on multiple input data in parallel. The memory accesses contribute substantially to the overall energy consumption of an embedded system executing a data intensive task. The scope of this work is the reduction of the overall energy consumption by increasing the utilization of the functional units and decreasing the number of memory accesses. The presented methodology is tested using a number of benchmark applications with holes in their access scheme. Potential gains are calculated based on the energy models, both for the processing and the memory part of the system. The reduction in energy consumption after efficient interleaving and mapping of data is between 40% and 80% for the complete system and the studied benchmarks.

References

[1]
Santosh G. Abraham and Scott A Mahlke. 1999. Automatic and efficient evaluation of memory hierarchies for embedded systems. In Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32). IEEE, 114--125.
[2]
Berkin Akin, Franz Franchetti, and James C. Hoe. 2015. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. ACM, 131--143.
[3]
Luca Benini, Alberto Macii, Enrico Macii, and Massimo Poncino. 2000b. Increasing energy efficiency of embedded systems by application-specific memory hierarchy generation. IEEE Design & Test of Computers 2 (2000), 74--85.
[4]
Luca Benini, Alberto Macii, and Massimo Poncino. 2000a. A recursive algorithm for low-power memory partitioning. In Proceedings of the 2000 International Symposium on Low Power Electronics and Design, 2000 (ISLPED’00). IEEE, 78--83.
[5]
Erik Brockmeyer, Bart Durinck, Henk Corporaal, and Francky Catthoor. 2007. Layer assignment techniques for low energy in multi-layered memory organizations. In Designing Embedded Processors. Springer, 157--190.
[6]
RTL Cadence. 2014. Compiler User Manual (2014). http://www.cadence.com/rl/Resources/datasheets/encounter_rtlcompiler.pdf.
[7]
Francky Catthoor, Sven Wuytack, G. E. de Greef, Florin Banica, Lode Nachtergaele, and Arnout Vandecappelle. 1998. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Springer.
[8]
Shuai Che, Jeremy W. Sheaffer, and Kevin Skadron. 2011. Dymaxion: Optimizing memory access patterns for heterogeneous systems. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 13.
[9]
Fei Chen and Edwin Hsing-Mean Sha. 1999. Loop scheduling and partitions for hiding memory latencies. In Proceedings of the 12th International Symposium on System Synthesis. IEEE Computer Society, 64.
[10]
Eric Cheung, Harry Hsieh, and Felice Balarin. 2009. Memory subsystem simulation in software TLM/T models. In Proceedings of the 2009 Asia and South Pacific Design Automation Conference (ASP-DAC 2009). IEEE, 811--816.
[11]
Doosan Cho, Ilya Issenin, Nikil Dutt, Jonghee W. Yoon, and Yunheung Paek. 2007. Software controlled memory layout reorganization for irregular array access patterns. In Proceedings of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, 179--188.
[12]
Iason Filippopoulos, Francky Catthoor, and Per Gunnar Kjeldsberg. 2013. Exploration of energy efficient memory organisations for dynamic multimedia applications using system scenarios. Design Automation for Embedded Systems (2013), 1--24.
[13]
Philip Garcia, Katherine Compton, Michael Schulte, Emily Blem, and Wenyin Fu. 2006. An overview of reconfigurable hardware in embedded systems. EURASIP Journal of Embedded Systems 2006, 1 (Jan. 2006), 13--13.
[14]
R. Gonzalez and M. Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (Sept. 1996), 1277--1284.
[15]
Peter Grun, Nikil Dutt, and Alex Nicolau. 2000. MIST: An algorithm for memory miss traffic management. In Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 431--438.
[16]
Yibo Guo, Qingfeng Zhuge, Jingtong Hu, Juan Yi, Meikang Qiu, and Edwin H. M. Sha. 2013. Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6 (2013), 809--817.
[17]
J. Hulzink, M. Konijnenburg, M. Ashouei, A. Breeschoten, T. Berset, J. Huisken, J. Stuyt, H. de Groot, F. Barat, J. David, et al. 2011. An ultra low energy biomedical signal processing system operating at near-threshold. IEEE Transactions on Biomedical Circuits and Systems 5, 6 (2011), 546--554.
[18]
Yuriko Ishitobi, Tohru Ishihara, and Hiroto Yasuura. 2007. Code placement for reducing the energy consumption of embedded processors with scratchpad and cache memories. In Proceedings of the IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia’07). IEEE, 13--18.
[19]
Bruce L. Jacob, Peter M. Chen, Seth R. Silverman, and Trevor N. Mudge. 1996. An analytical model for designing memory hierarchies. IEEE Transactions on Computers 45, 10 (1996), 1180--1194.
[20]
Axel Jantsch, Peeter Ellervee, Ahmed Hemani, Johnny Öberg, and Hannu Tenhunen. 1994. Hardware/software partitioning and minimizing memory interface traffic. In Proceedings of the Conference on European Design Automation. IEEE Computer Society Press, 226--231.
[21]
Wang Kai and Xu Zhiwei. 2003. Synopsys Prime Power Manual Release U-2003.06-QA. (2003).
[22]
Mahmut Kandemir, Ugur Sezer, and Victor Delaluz. 2001. Improving memory energy using access pattern classification. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 201--206.
[23]
A. Kritikakou, F. Catthoor, V. Kelefouras, and C. Goutis. 2014. A scalable and near-optimal representation for storage size management. ACM Transaction Architecture and Code Optimization 11, 1 (2014), 1--25.
[24]
Angeliki Stavros Kritikakou. 2013. Development of Methodologies for Memory Management and Design Space Exploration of SW/HW Computer Architectures for Designing Embedded Systems. Ph.D. Dissertation. Department of Electrical and Computer Engineering School of Engineering, University of Patras.
[25]
Chidamber Kulkarni, C. Ghez, Miguel Miranda, Francky Catthoor, and Hugo De Man. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Transactions on Computers 54, 1 (2005), 76--81.
[26]
Jong-eun Lee, Kiyoung Choi, and Nikil D. Dutt. 2003. Compilation approach for coarse-grained reconfigurable architectures. IEEE Design & Test of Computers 1 (2003), 26--33.
[27]
Yanbing Li and Wayne H. Wolf. 1999. Hardware/software co-synthesis with memory hierarchies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 18, 10 (1999), 1405--1417.
[28]
Zhe Ma, Pol Marchal, Daniele Paolo Scarpazza, Peng Yang, Chun Wong, José Ignacio Gómez, Stefaan Himpe, Chantal Ykman-Couvreur, and Francky Catthoor. 2007. Systematic Methodology for Real-Time Cost-Effective Mapping of Dynamic Concurrent Task-Based Systems on Heterogenous Platforms. Springer Science & Business Media.
[29]
A. Macii, L. Benini, and M. Poncino. 2002. Memory Design Techniques for Low-Energy Embedded Systems. Kluwer Academic Publishers.
[30]
Afzal Malik, Bill Moyer, and Dan Cermak. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the 2000 International Symposium on Low Power Electronics and Design (ISLPED’00). IEEE, 241--243.
[31]
Naraig Manjikian and Tarek Abdelrahman. 1995. Array data layout for the reduction of cache conflicts. In Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems. Citeseer, 1--8.
[32]
Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173.
[33]
P. Meinerzhagen, C. Roth, and A. Burg. 2010. Towards generic low-power area-efficient standard cell based memory architectures. In Proceedings of the 2010 53rd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’10). IEEE, 129--132.
[34]
Pascal Meinerzhagen, S. M. Yasser Sherazi, Andreas Burg, and Joachim Neves Rodrigues. 2011. Benchmarking of standard-cell based memories in the sub-VT domain in 65-nm CMOS technology. IEEE Transactions on Emerging and Selected Topics in Circuits and Systems 1, 2 (2011).
[35]
Sparsh Mittal. 2014. A survey of architectural techniques for improving cache power efficiency. Sustainable Computing: Informatics and Systems 4, 1 (2014), 33--43.
[36]
Yoichi Oshima, Bing J. Sheu, and Steve H. Jen. 1997. High-speed memory architectures for multimedia applications. Circuits and Devices Magazine, IEEE 13, 1 (1997), 8--13.
[37]
Preeti Ranjan Panda, Francky Catthoor, Nikil D. Dutt, Koen Danckaert, Erik Brockmeyer, Chidamber Kulkarni, A. Vandercappelle, and Per Gunnar Kjeldsberg. 2001. Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems (TODAES) 6, 2 (2001), 149--206.
[38]
Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. 1999. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Springer Science & Business Media.
[39]
Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nicolau, Francky Catthoor, Arnout Vandecappelle, Erik Brockmeyer, Chidamber Kulkarni, and Eddy De Greef. 2001. Data memory organization and optimizations in application-specific systems. IEEE Design & Test of Computers 3 (2001), 56--57.
[40]
N. L. Passes, Edwin Hsing-Mean Sha, and Liang-Fang Chao. 1995. Multi-dimensional interleaving for time-and-memory design optimization. In Proceedings of the 1995 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’95). IEEE, 440--445.
[41]
Herman Schmit and Donald E. Thomas. 1997. Synthesis of application-specific memory designs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5, 1 (1997), 101--111.
[42]
Namita Sharma, Tom Vander Aa, Prashant Agrawal, Praveen Raghavan, Preeti Ranjan Panda, and Francky Catthoor. 2013. Data memory optimization in LTE downlink. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2610--2614.
[43]
Namita Sharma, Preeti Ranjan Panda, Francky Catthoor, Praveen Raghavan, and Tom Vander Aa. 2015. Array interleaving an energy efficient data layout transformation. ACM Transactions on Design Automation of Electronic Systems (TODAES) 20, 3 (2015), 44.
[44]
Tajana Šimunić, Luca Benini, and Giovanni De Micheli. 1999. Cycle-accurate simulation of energy consumption in embedded systems. In Proceedings of the 36th Annual ACM/IEEE Design Automation Conference. ACM, 867--872.
[45]
Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the 2002 Design, Automation and Test in Europe Conference and Exhibition. IEEE, 409--415.
[46]
I.-Jui Sung, Geng Daniel Liu, and Wen-Mei W. Hwu. 2012. DL: A data layout transformation system for heterogeneous computing. In Innovative Parallel Computing (InPar) 2012. IEEE, 1--11.

Cited By

View all
  • (2022)Compilation of Parallel Data Access for Vector Processor in Radio Base StationsIEEE Embedded Systems Letters10.1109/LES.2021.308566414:1(11-14)Online publication date: Mar-2022
  • (2019)System Scenario Methodology FlowSystem-Scenario-based Design Principles and Applications10.1007/978-3-030-20343-6_2(7-52)Online publication date: 17-Sep-2019

Index Terms

  1. Integrated Exploration Methodology for Data Interleaving and Data-to-Memory Mapping on SIMD Architectures

                          Recommendations

                          Comments

                          Please enable JavaScript to view thecomments powered by Disqus.

                          Information & Contributors

                          Information

                          Published In

                          cover image ACM Transactions on Embedded Computing Systems
                          ACM Transactions on Embedded Computing Systems  Volume 15, Issue 3
                          July 2016
                          520 pages
                          ISSN:1539-9087
                          EISSN:1558-3465
                          DOI:10.1145/2899033
                          Issue’s Table of Contents
                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Journal Family

                          Publication History

                          Published: 23 May 2016
                          Accepted: 01 February 2016
                          Revised: 01 February 2016
                          Received: 01 July 2015
                          Published in TECS Volume 15, Issue 3

                          Permissions

                          Request permissions for this article.

                          Check for updates

                          Author Tags

                          1. Data interleaving
                          2. data layout
                          3. design space exploration
                          4. energy optimization
                          5. memory reconfiguration
                          6. single instruction multiple data (SIMD) architectures

                          Qualifiers

                          • Research-article
                          • Research
                          • Refereed

                          Contributors

                          Other Metrics

                          Bibliometrics & Citations

                          Bibliometrics

                          Article Metrics

                          • Downloads (Last 12 months)3
                          • Downloads (Last 6 weeks)1
                          Reflects downloads up to 24 Nov 2024

                          Other Metrics

                          Citations

                          Cited By

                          View all
                          • (2022)Compilation of Parallel Data Access for Vector Processor in Radio Base StationsIEEE Embedded Systems Letters10.1109/LES.2021.308566414:1(11-14)Online publication date: Mar-2022
                          • (2019)System Scenario Methodology FlowSystem-Scenario-based Design Principles and Applications10.1007/978-3-030-20343-6_2(7-52)Online publication date: 17-Sep-2019

                          View Options

                          Login options

                          Full Access

                          View options

                          PDF

                          View or Download as a PDF file.

                          PDF

                          eReader

                          View online with eReader.

                          eReader

                          Media

                          Figures

                          Other

                          Tables

                          Share

                          Share

                          Share this Publication link

                          Share on social media