Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3180270.3180275acmconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

A Case for Scoped Persist Barriers in GPUs

Published: 24 February 2018 Publication History

Abstract

Two key trends in computing are evident --- emergence of GPU as a first-class compute element and emergence of byte-addressable nonvolatile memory technologies (NVRAM) as DRAM-supplement. GPUs and NVRAMs are likely to coexist in future systems. However, previous works have either focused on GPUs or on NVRAMs, in isolation. In this work, we investigate the enhancements necessary for a GPU to efficiently and correctly manipulate NVRAM-resident persistent data structures.
Specifically, we find that previously proposed CPU-centric persist barriers fall short for GPUs. We thus introduce the concept of scoped persist barriers that aligns with the hierarchical programming framework of GPUs. Scoped persist barriers enable GPU programmers to express which execution group (a.k.a., scope) a given persist barrier applies to. We demonstrate that: 1 use of narrower scope than algorithmically-required can lead to inconsistency of persistent data structure, and 2 use of wider scope than necessary leads to significant performance loss (e.g., 25% or more). Therefore, a future GPU can benefit from persist barriers with different scopes.

References

[1]
{n. d.}. CUDA C Programming Guide. ({n. d.}). Accessed: 2015-11-20.
[2]
{n. d.}. The Gem5 simulator. http://gem5.org/. ({n. d.}). Accessed: 2015-11-19.
[3]
{n. d.}. Heterogenous System Architecture (HSA). http://www.hsafoundation.com/standards/. ({n. d.}). Accessed: 2015-11-19.
[4]
{n. d.}. The OpenCL Specification Version 2.0. ({n. d.}). https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf.
[5]
AMD. 2016. AMD Collaborates with Alibaba Cloud to Deliver Cloud Computing Services Based on AMD Radeon Pro GPU Technologies. (2016). http://www.amd.com/en-us/press-releases/Pages/amd-collaborates-with-2016oct14.aspx.
[6]
AMD Inc. 2016. Radeon Pro SSG launched. (2016). http://www.amd.com/en-us/press-releases/Pages/amd-radeon-pro-2016jul25.aspx.
[7]
AMD Inc. 2016. Radeon Pro SSG Technical brief. (2016). https://www.amd.com/Documents/Radeon-Pro-SSG-Technical-Brief.pdf.
[8]
AnandTech. {n. d.}. AMD Carrizo. ({n. d.}). http://www.anandtech.com/show/9319/amd-launches-carrizo-the-laptop-leap-of-efficiency-and-architecture-updates/7.
[9]
Joy Arulraj, Matthew Perron, and Andrew Pavlo. 2016. Write-behind Logging. Proc. VLDB Endow. 10, 4 (Nov. 2016), 337--348.
[10]
Greg Atwood. 2011. Current and Emerging Memory Technology Landscape. In The Flash Memory Summit.
[11]
Peter Bakkum and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3). ACM, New York, NY, USA, 94--103.
[12]
Hans-J. Boehm and Dhruva R. Chakrabarti. 2015. Persistence Programming Models for Non-Volatile Memory. In Technical Report HPL-2015-59, Hewlett-Packard, 2015.
[13]
Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging Locks for Non-volatile Memory Consistency. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '14). ACM, New York, NY, USA, 433--452.
[14]
Y. Choi, I. Song, M. H. Park, H. Chung, S. Chang, B. Cho, J. Kim, Y. Oh, D. Kwon, J. Sunwoo, J. Shin, Y. Rho, C. Lee, M. G. Kang, J. Lee, Y. Kwon, S. Kim, J. Kim, Y. J. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y. T. Lee, J. Yoo, and G. Jeong. 2012. A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International. 46--48.
[15]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 105--118.
[16]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 105--118.
[17]
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O Through Byte-addressable, Persistent Memory. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 133--146.
[18]
Justin DeBrabant, Joy Arulraj, Andrew Pavlo, Michael Stonebraker, Stan Zdonik, and Subramanya Dulloor. 2014. A Prolegomenon on OLTP Database Systems for Non-Volatile Memory. In ADMS@VLDB. 57--63. http://hstore.cs.brown.edu/papers/hstore-nvm.pdf
[19]
Digital Ternds. 2016. Intel confirms 2016 arrival of 3D XPoint-based Optane SSD. (2016). http://www.digitaltrends.com/computing/intel-optane-ssd-2016-3dxpoint/.
[20]
Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System Software for Persistent Memory. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys '14). ACM, New York, NY, USA, Article 15, 15 pages.
[21]
Fallabs. 2012. Kyoto Cabinet: a straightforward implementation of DBM. (2012). http://fallabs.com/kyotocabinet/.
[22]
Blake A. Hechtman, Shuai Che, Derek R. Hower, Yingying Tian, Bradford M. Beckmann, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2014. Quick-Release: A Throughput-oriented Approach to Release Consistency on GPUs. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA '14).
[23]
Sungjoo Hong. 2010. Memory technology trend and future challenges. In Proceedings of the 201 IEEE International Electron Devices Meeting (IEDM 2010). San Francisco, CA, USA.
[24]
Derek R. Hower, Blake A. Hechtman, Bradford M. Beckmann, Benedict R. Gaster, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2014. Heterogeneous-race-free Memory Models. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). New York, NY, USA.
[25]
HSAFoundation. {n. d.}. Kernel agent context switching. ({n. d.}). http://www.hsafoundation.com/html/Content/SysArch/Topics/02_Details/req_kernel_agent_context_switching.htm.
[26]
Intel. 2017. Pmem: Persistent Memory Programming. (2017). http://pmem.io.
[27]
Intel Corp. 2015. Intel and Micron Produce Breakthrough Memory Technology. (July 23, 2015). Press Release from http://newsroom.intel.com/docs/DOC-6713.
[28]
International Technology Roadmap for Semiconductors. 2011. Process, Integration, Devices and Structures. (2011). http://www.itrs.net/Links/2011ITRS/2011Chapters/2011PIDS.pdf.
[29]
Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-Atomic Persistent Memory Updates via JUSTDO Logging. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 427--442.
[30]
John Barrus. 2016. Announcing GPUs for Google Cloud Platform. (2016). https://cloudplatform.googleblog.com/2016/11/announcing-GPUs-for-Google-Cloud-Platform.html.
[31]
Arpit Joshi, Vijay Nagarajan, Marcelo Cintra, and Stratis Viglas. 2015. Efficient Persist Barriers for Multicores. In The 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '15). IEEE, New York, NY, USA.
[32]
Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A High-performance, Distributed Main Memory Transaction Processing System. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1496--1499.
[33]
Hideaki Kimura. 2015. FOEDUS: OLTP Engine for a Thousand Cores and NVRAM. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 691--706.
[34]
Kinetica. 2017. GPU-accelerated analytics database. (2017). https://www.kinetica.com/.
[35]
Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2015. Persistency Programming 101. In Non-Volatile Memories Workshop (NVMW '15).
[36]
Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-Performance Transactions for Persistent Memories. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 399--411.
[37]
Aasheesh Kolli, Jeff Rosen, Stephan Diestelhorst, Ali Saidi, Steven Pelley, Sihang Liu, Peter M. Chen, and Thomas F. Wenisch. 2016. Delegated persist ordering. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '16). ACM, New York, NY, USA.
[38]
Ren-Shuo Liu, De-Yu Shen, Chia-Lin Yang, Shun-Chih Yu, and Cheng-Yuan Michael Wang. 2014. NVM Duet: Unified Working Memory and Persistent Store Architecture. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 455--470.
[39]
David E. Lowell and Peter M. Chen. 1997. Free Transactions with Rio Vista. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP '97). ACM, New York, NY, USA, 92--101.
[40]
Youyou Lu, Jiwu Shu, Long Sun, and O. Mutlu. 2014. Loose-Ordering Consistency for persistent memory. In Computer Design (ICCD), 2014 32nd IEEE International Conference on. 216--223.
[41]
MongoDB Inc. 2017. MongoDB. (2017). http://www.mongodb.org/.
[42]
N. Muralimanohar, R. Balasubramonian and N. P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. (2009). HP Laboratories.
[43]
P. J. Nair, C. Chou, B. Rajendran, and M. K. Qureshi. 2015. Reducing read latency of phase change memory via early read and Turbo Read. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. 309--319.
[44]
Prashant J. Nair, Dae-Hyun Kim, and Moinuddin K. Qureshi. 2013. ArchShield: Architectural Framework for Assisting DRAM Scaling by Tolerating High Error Rates. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 72--83.
[45]
NVIDIA. 2016. GPU-Accelerated Applications. (2016). http://images.nvidia.com/content/tesla/pdf/Apps-Catalog-March-2016.pdf.
[46]
Marc S. Orr, Shuai Che, Ayse Yilmazer, Bradford M. Beckmann, Mark D. Hill, and David A. Wood. 2015. Synchronization Using Remote-Scope Promotion. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 73--86.
[47]
Steven Pelley, Peter M. Chen, and Thomas F. Wenisch. 2014. Memory Persistency. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 265--276. http://dl.acm.org/citation.cfm?id=2665671.2665712
[48]
Steven Pelley, Thomas F. Wenisch, Brian T. Gold, and Bill Bridge. 2013. Storage Management in the NVRAM Era. Proc. VLDB Endow. 7, 2 (Oct. 2013), 121--132.
[49]
L. Pu, K. Doshi, E. Giles, and P. Varman. 2015. Non-intrusive Persistence with a Backend NVM Controller. IEEE Computer Architecture Letters PP, 99 (2015), 1--1.
[50]
M. Qureshi, S. Gurumurthi, and B. Rajendran. 2011. Phase Change Memory: From Devices to Systems. Morgan and Claypool.
[51]
Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. 2015. ThyNVM: Enabling Software-transparent Crash Consistency in Persistent Memory Systems. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 672--685.
[52]
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-Tree Filesystem. Trans. Storage 9, 3, Article 9 (Aug. 2013), 32 pages.
[53]
Seunghee Shin, James Tuck, and Yan Solihin. 2017. Hiding the Long Latency of Persist Barriers Using Speculative Execution. SIGARCH Comput. Archit. News 45, 2 (June 2017), 175--186.
[54]
Vishal Sikka, Franz Färber, Anil Goel, and Wolfgang Lehner. 2013. SAP HANA: The Evolution from a Modern Main-memory Data Platform to an Enterprise Application Platform. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1184--1185.
[55]
Long Sun, Youyou Lu, and Jiwu Shu. 2015. DP2: Reducing Transaction Overhead with Differential and Dual Persistency in Persistent Memory. In Proceedings of the 12th ACM International Conference on Computing Frontiers (CF '15). ACM, New York, NY, USA, Article 24, 8 pages.
[56]
T. A. Shah. 2010. FabMem: A Multiported RAM and CAM Compiler for Superscalar Design Space Exploration. (2010). Master's thesis, North Carolina State University.
[57]
TheNextPlatform. 2017. MapD GPU Database Looks Forward To Heftier Iron. (2017). https://www.nextplatform.com/2016/03/30/mapd-gpu-database-looks-forward-heftier-iron/.
[58]
top500.org. 2016. Top500 Supercomputer listing. (2016). https://www.top500.org/statistics/sublist/.
[59]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 91--104.
[60]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 91--104.
[61]
Hao Wang, Jie Zhang, Sharmila Shridhar, Gieseo Park, Myoungsoo Jung, and Nam Sung Kim. 2016. DUANG: Lightweight page migration and adaptive asymmetry in memory systems. In Proceedings of the 22nd International Symposium on High Performance Computer Architecture (HPCA '16).
[62]
Wikipedia. {n. d.}. B+ tree. ({n. d.}).
[63]
Jung H. Yoon, Hillery C. Hunter, and Gary A. Tressler. 2013. Flash and DRAM Si scaling challenges, emerging non-volatile memory technology enablement - Implications to enterprise storage and server compute systems. In The Flash Memory Summit.
[64]
H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge and Data Engineering 27, 7 (July 2015), 1920--1948.
[65]
Yiying Zhang, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A Reliable and Highly-Available Non-Volatile Memory System. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 3--18.
[66]
Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: Closing the Performance Gap Between Systems with and Without Persistence Support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421--432.
[67]
Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: Closing the Performance Gap Between Systems with and Without Persistence Support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421--432.
[68]
Jishen Zhao, Onur Mutlu, and Yuan Xie. 2014. FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 153--165.

Cited By

View all
  • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
  • (2021)COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous PersistenceMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480075(86-99)Online publication date: 18-Oct-2021
  • (2019)Exploring Memory Persistency Models for GPUs2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00032(311-323)Online publication date: Sep-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GPGPU-11: Proceedings of the 11th Workshop on General Purpose GPUs
February 2018
64 pages
ISBN:9781450356473
DOI:10.1145/3180270
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PPoPP '18

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
  • (2021)COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous PersistenceMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480075(86-99)Online publication date: 18-Oct-2021
  • (2019)Exploring Memory Persistency Models for GPUs2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2019.00032(311-323)Online publication date: Sep-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media