research-article

Hybrid access-specific software cache techniques for the cell BE architecture

Authors:

Marc Gonzàlez,

Xavier Martorell,

Eduard Ayguadé,

Alexandre E. Eichenberger,

Kathryn O'BrienAuthors Info & Claims

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Pages 292 - 302

https://doi.org/10.1145/1454115.1454156

Published: 25 October 2008 Publication History

Abstract

Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that classifies at compile time memory accesses in two classes, high-locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software cache overhead in the innermost loop. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 3.5 to 8.4 speedup factors, compared to a traditional software cache approach. As a result, we demonstrate that the Cell BE processor can be a competitive alternative to a modern server-class multi-core such as the IBM Power5 processor for a set of parallel NAS applications.

References

[1]

A. E. Eichenberger et al., "Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture," IBM Sytems Journal, Vol. 45, No. 1, 2006.

Digital Library

[2]

M. Kistler et al., "Cell Multiprocessor Communication Network: Built for Speed," IEEE Micro, Vol. 26, Issue 3, 2006.

Digital Library

[3]

D. Pham et al., "The Design and Implementation of a First-Generation CELL Processor," in the Proceedings of the IEEE International Solid-State Circuits Conference, 2005.

[4]

M. Gschwind et al., "A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor," In Hot Chips 17, 2005.

[5]

T. Chen et al., "Optimizing the use of static buffers for DMA on a Cell chip," in the Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2006.

Digital Library

[6]

A. E. Eichenberger et al., "Optimizing Compiler for a Cell Processor," in the proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2005.

Digital Library

[7]

D. Bailey et al. "The NAS parallel benchmarks," Technical Report TR RNR-91-002, NASA Ames, 1991.

[8]

E. Witchel et al. "Direct Addressed Caches for Reduced Power Consumption," in the Proceedings of the Annual International Symposium on Microarchitecture, 2001.

Digital Library

[9]

C. A. Moritz et al., "Hot Pages: Software Caching for Raw Microprocessors," MIT-LCS Technical Memo LCS-TM-599, 1999.

[10]

J. B. Fryman et al., "SoftCache: A Technique for Power and Area Reduction in Embedded Systems," CERCS; GIT-CERCS-03-06

[11]

J. E. Miller and A. Agarwal, "Software-based Instruction Caching for Embedded Processors," in the Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems, 2006.

Digital Library

[12]

C. A. Moritz et al., "FlexCache: A framework for flexible compiler generated data caching," in the Proceedings of the 2nd Workshop on Intelligent Memory Systems, 2000.

Digital Library

[13]

S. Udayakumaran et al., "Dynamic Allocation for Scratch-Pad Memory Using Compile-Time Decisions," ACM Transactions on Embedded Computing Systems, Vol. 5, No. 2, 2006.

Digital Library

[14]

B. Sinharoy et al., "POWER 5 system micro-architecture," IBM Journal of Research and Development, Vol. 49, No. 4/5, 2005.

Digital Library

[15]

J. Hoeflinger and B. de Supinski, "The OpenMP Memory Model," in the Proceedings of the First International Workshop on OpenMP, 2005.

Digital Library

[16]

P. Altevogt et al., "IBM BladeCenter QS21 Hardware Performance," IBM Technical White Paper WP101245, 2008.

[17]

T. Chen et al., "Orchestrating Data Transfer for the Cell B.E. processor," in the Proceedings of the Annual International Conference on Supercomputing, 2008.

Digital Library

[18]

T. Chen et al., "Prefetching Irregular References for Software Cache on Cell, Proceedings of the sixth Annual International Symposium on Code Generation and Optimization.

Digital Library

Cited By

Li JDeng ZDu PLin J(2022)A new software cache structure on Sunway TaihuLightThe Journal of Supercomputing10.1007/s11227-021-04056-078:4(4779-4798)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s11227-021-04056-0
Tagliavini GHaugou GMarongiu ABenini L(2018)Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core acceleratorsJournal of Real-Time Image Processing10.1007/s11554-015-0544-015:1(73-92)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11554-015-0544-0
Arandi SMatheou GKyriacou CEvripidou P(2018)Data-Driven Thread Execution on Heterogeneous ProcessorsInternational Journal of Parallel Programming10.1007/s10766-016-0486-646:2(198-224)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10766-016-0486-6
Show More Cited By

Index Terms

Hybrid access-specific software cache techniques for the cell BE architecture
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

Ease of programming is one of the main requirements for the broad acceptance of multicore systems without hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a ...
Prefetching irregular references for software cache on cell
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization

The IBM Single Source Research Compiler for the Cell processor (the SSC Research Compiler) was developed to manage the complexity of programming the heterogeneous multicore Cell processor. The compiler accepts conventional source programs as input, and ...
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Languages and Compilers for Parallel Computing

Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

328 pages

ISBN:9781605582825

DOI:10.1145/1454115

General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '08

Sponsor:

PACT '08: International Conference on Parallel Architectures and Compilation Techniques

October 25 - 29, 2008

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

44
Total Citations
View Citations
471
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li JDeng ZDu PLin J(2022)A new software cache structure on Sunway TaihuLightThe Journal of Supercomputing10.1007/s11227-021-04056-078:4(4779-4798)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1007/s11227-021-04056-0
Tagliavini GHaugou GMarongiu ABenini L(2018)Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core acceleratorsJournal of Real-Time Image Processing10.1007/s11554-015-0544-015:1(73-92)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11554-015-0544-0
Arandi SMatheou GKyriacou CEvripidou P(2018)Data-Driven Thread Execution on Heterogeneous ProcessorsInternational Journal of Parallel Programming10.1007/s10766-016-0486-646:2(198-224)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10766-016-0486-6
Chakraborty PPanda PSen S(2016)Partitioning and Data Mapping in Reconfigurable Cache and Scratchpad Memory--Based ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/293468022:1(1-25)Online publication date: 2-Sep-2016
https://dl.acm.org/doi/10.1145/2934680
Alvarez LVilanova LMoreto MCasas MGonzàlez MMartorell XNavarro NAyguadé EValero M(2015)Coherence protocol for transparent management of scratchpad memories in shared memory manycore architecturesACM SIGARCH Computer Architecture News10.1145/2872887.275041143:3S(720-732)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750411
Alvarez LVilanova LMoreto MCasas MGonzàlez MMartorell XNavarro NAyguadé EValero MMarr DAlbonesi D(2015)Coherence protocol for transparent management of scratchpad memories in shared memory manycore architecturesProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750411(720-732)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750411
Alvarez LVilanova LGonzalez MMartorell XNavarro NAyguade E(2015)Hardware–Software Coherence Protocol for the Coexistence of Caches and Local MemoriesIEEE Transactions on Computers10.1109/TC.2013.19464:1(152-165)Online publication date: Jan-2015
https://doi.org/10.1109/TC.2013.194
Ferguson MBuettner D(2015)Caching Puts and Gets in a PGAS Language RuntimeProceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming Models10.1109/PGAS.2015.10(13-24)Online publication date: 16-Sep-2015
https://dl.acm.org/doi/10.1109/PGAS.2015.10
Li CYang YDai HYan SMueller FZhou H(2014)Understanding the tradeoffs between software-managed vs. hardware-managed caches in GPUs2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2014.6844487(231-242)Online publication date: Mar-2014
https://doi.org/10.1109/ISPASS.2014.6844487
Tagliavini GHaugou GBenini L(2014)Optimizing memory bandwidth in OpenVX graph execution on embedded many-core acceleratorsProceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing10.1109/DASIP.2014.7115617(1-8)Online publication date: Oct-2014
https://doi.org/10.1109/DASIP.2014.7115617
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents