Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3037697.3037715acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Crossing Guard: Mediating Host-Accelerator Coherence Interactions

Published: 04 April 2017 Publication History

Abstract

Specialized hardware accelerators have performance and energy-efficiency advantages over general-purpose processors. To fully realize these benefits and aid programmability, accelerators may share a physical and virtual address space and full cache coherence with the host system. However, allowing accelerators -- particularly those designed by third parties -- to directly communicate with host coherence protocols poses several problems. Host coherence protocols are complex, vary between companies, and may be proprietary, increasing burden on accelerator designers. Bugs in the accelerator implementation may cause crashes and other serious consequences to the host system.
We propose Crossing Guard, a coherence interface between the host coherence system and accelerators. The Crossing Guard interface provides the accelerator designer with a standardized set of coherence messages that are simple enough to aid in design of bug-free coherent caches. At the same time, they are sufficiently complex to allow customized and optimized accelerator caches with performance comparable to using the host protocol. The Crossing Guard hardware is implemented as part of the trusted host, and provides complete safety to the host coherence system, even in the presence of a pathologically buggy accelerator cache.

References

[1]
Cache Coherent Interconnect for Accelerators (CCIX). URL http://www.ccixconsortium.com/.
[2]
D. Abts, D. J. Lilja, and S. Scott. So many states, so little time: Verifying memory coherence in the cray x1. In Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS), Apr. 2003.
[3]
N. Agarwal, D. Nellans, E. Ebrahimi, T. F. Wenisch, J. Danskin, and S. W. Keckler. Selective GPU caches to eliminate CPU-GPU HW cache coherence. In Proc. of the 22nd IEEE Symp. on High-Performance Computer Architecture, Mar. 2016.
[4]
K. Atasu, R. Polig, C. Hagleitner, and F. R. Reiss. Hardware-accelerated regular expression matching for high-throughput text analytics. In Field Programmable Logic and Applications (FPL), 2013 23rd International Conference on, pages 1--7, Sept. 2013. 10.1109/FPL.2013.6645534. URL http://dx.doi.org/10.1109/FPL.2013.6645534.
[5]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. Computer Architecture News (CAN), 2011. URL http://gem5.org.
[6]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the International Symposium on Workload Characterization, pages 44--54, October 2009. 10.1109/IISWC.2009.5306797. URL http://dx.doi.org/10.1109/IISWC.2009.5306797.
[7]
E. M. Clarke, O. Grumberg, H. Hiraishi, S. Jha, D. E. Long, K. L. McMillan, and L. A. Ness. Verification of the Futurebus
[8]
cache coherence protocol. In CHDL, volume 93, pages 15--30. Citeseer, 1993.
[9]
A. DeOrio, A. Bauserman, and V. Bertacco. Post-silicon verification for cache coherence. In Computer Design, 2008. ICCD 2008. IEEE International Conference on, pages 348--355, Oct 2008. 10.1109/ICCD.2008.4751884.
[10]
D. L. Dill. The mur φ verification system. In Computer Aided Verification, pages 390--393. Springer, 1996.
[11]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In Proc. of the 45th Annual IEEE/ACM International Symp. on Microarchitecture, pages 449--460, Dec. 2012. 10.1109/MICRO.2012.48. URL http://dx.doi.org/10.1109/MICRO.2012.48.
[12]
R. Fernandez-Pascual, J. M. Garcia, M. E. Acacio, and J. Duato. A low overhead fault tolerant coherence protocol for cmp architectures. In Proc. of the 13th IEEE Symp. on High-Performance Computer Architecture, Feb. 2007.
[13]
J. Goodacre. The evolution of the ARM architecture towards big data and the data-centre. http://virtical.upv.es/pub/sc13.pdf, Nov. 2013. URL http://virtical.upv.es/pub/sc13.pdf.
[14]
E. E. Hagersten, M. D. Hill, and D. A. Wood. Methods and apparatus for a coherence transformer for connecting computer system coherence domains, Jan. 12 1999. US Patent 5,860,109.
[15]
Coherent Accelerator Processor Interface User's Manual. IBM, 2014.
[16]
O. Kocberber, B. Grot, J. Picorel, B. Falsafi, K. Lim, and P. Ranganathan. Meet the walkers: Accelerating index traversals for in-memory databases. In Proc. of the 46th Annual IEEE/ACM International Symp. on Microarchitecture, pages 468--479, Dec. 2013. 10.1145/2540708.2540748. URL http://doi.acm.org/10.1145/2540708.2540748.
[17]
S. Kumar, A. Shriraman, and N. Vedula. Fusion : Design tradeoffs in coherent cache hierarchies for accelerators. In Proc. of the 42nd Annual Intnl. Symp. on Computer Architecture, June 2015.
[18]
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The Stanford FLASH multiprocessor. In Proc. of the 21st Annual Intnl. Symp. on Computer Architecture, pages 302--313, Apr. 1994. 10.1109/ISCA.1994.288140. URL http://dx.doi.org/10.1109/ISCA.1994.288140.
[19]
J. V. Lunteren, T. Engbersen, J. Bostian, B. Carey, and C. Larsson. XML accelerator engine. In The First International Workshop on High Performance XML Processing. ACM, 2004.
[20]
Y. A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi. Ccicheck: using μhb graphs to verify the coherence-consistency interface. In Proceedings of the 48th International Symposium on Microarchitecture, pages 26--37. ACM, 2015.
[21]
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: Decoupling performance and correctness. In Proc. of the 30th Annual Intnl. Symp. on Computer Architecture, pages 182--193, June 2003.
[22]
B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of UNIX utilities. Communications of the ACM, 33 (12): 32--44, Dec. 1990. 10.1145/96267.96279. URL http://doi.acm.org/10.1145/96267.96279.
[23]
D. Moloney, B. Barry, R. Richmond, F. Connor, C. Brick, D. Donohoe, A. Lupas, S. Mitchell, D. Nicholls, and V. Toma. Myriad 2: Eye of the computational vision storm. In Hot Chips 26, 2014.
[24]
L. E. Olson, J. Power, M. D. Hill, and D. A. Wood. Border control: Sandboxing accelerators. In Proc. of the 48th Annual IEEE/ACM International Symp. on Microarchitecture, pages 470--481, Dec. 2015. 10.1145/2830772.2830819. URL http://doi.acm.org/10.1145/2830772.2830819.
[25]
S. Park and D. L. Dill. Verification of FLASH cache coherence protocol by aggregation of distributed transactions. In Proc. of the 8th ACM Symp. on Parallel Algorithms and Architectures, pages 288--296, June 1996.
[26]
W.-C. Park, H.-J. Shin, B. Lee, H. Yoon, and T.-D. Han. RayChip: Real-time ray-tracing chip for embedded applications. In Hot Chips 26, 2014.
[27]
S. Phillips. M7: Next generation SPARC. In Hot Chips 26, 2014.
[28]
J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. gem5-gpu: A heterogeneous cpu-gpu simulator. Computer Architecture Letters, 13 (1). 10.1109/LCA.2014.2299539. URL http://dx.doi.org/10.1109/LCA.2014.2299539.
[29]
J. Power, A. Basu, J. Gu, S. Puthoor, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Heterogeneous system coherence for integrated cpu-gpu systems. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, pages 457--467, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2638--4. 10.1145/2540708.2540747. URL http://doi.acm.org/10.1145/2540708.2540747.
[30]
V. Rajagopalan. All programmable devices: Not just an FPGA anymore. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, 2013. Keynote presentation.
[31]
D. J. Sorin, M. M. Martin, M. D. Hill, and D. A. Wood. SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proc. of the 29th Annual Intnl. Symp. on Computer Architecture, pages 123--134. IEEE, May 2002.
[32]
D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures in Computer Architecture, 2011.
[33]
J. Stuecheli, B. Blaner, C. R. Johns, and M. S. Siegel. CAPI: A coherent accelerator processor interface. IBM Journal of Research and Development, 59 (1): 7:1--7:7, Jan. 2015. ISSN 0018--8646. 10.1147/JRD.2014.2380198.
[34]
D. A. Wood, G. A. Gibson, and R. H. Katz. Verifying a multiprocessor cache controller using random test generation. IEEE Design and Test of Computers, pages 13--25, Aug. 1990.

Cited By

View all
  • (2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Feb-2023
  • (2022)HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence Protocols2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00061(756-771)Online publication date: Apr-2022
  • (2020)BYOCProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378479(699-714)Online publication date: 9-Mar-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerators
  2. cache coherence
  3. coherence interfaces

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '17

Acceptance Rates

ASPLOS '17 Paper Acceptance Rate 53 of 320 submissions, 17%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)22
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Duet: Creating Harmony between Processors and Embedded FPGAs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070989(745-758)Online publication date: Feb-2023
  • (2022)HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence Protocols2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00061(756-771)Online publication date: Apr-2022
  • (2020)BYOCProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378479(699-714)Online publication date: 9-Mar-2020
  • (2020)HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00054(582-595)Online publication date: Feb-2020
  • (2019)CoNDAProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322266(629-642)Online publication date: 22-Jun-2019
  • (2018)SpandexProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00031(261-274)Online publication date: 2-Jun-2018
  • (2018)Interference from GPU System Service Requests2018 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2018.8573485(179-190)Online publication date: Sep-2018
  • (2022)Consistency and Coherence for Heterogeneous SystemsA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_10(211-251)Online publication date: 28-Mar-2022
  • (2020)A Primer on Memory Consistency and Cache Coherence, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00962ED2V01Y201910CAC04915:1(1-294)Online publication date: 4-Feb-2020
  • (2022)Accelerators & Security: The Socket ApproachIEEE Computer Architecture Letters10.1109/LCA.2022.317994721:2(65-68)Online publication date: 1-Jul-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media