Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2458523.2458529acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Published: 16 March 2013 Publication History

Abstract

Heterogeneous systems have grown in popularity within the commercial platform and application developer communities. We have seen a growing number of systems incorporating CPUs, Graphics Processors (GPUs) and Accelerated Processing Units (APUs combine a CPU and GPU on the same chip). These emerging class of platforms are now being targeted to accelerate applications where the host processor (typically a CPU) and compute device (typically a GPU) co-operate on a computation. In this scenario, the performance of the application is not only dependent on the processing power of the respective heterogeneous processors, but also on the efficient interaction and communication between them.
To help architects and application developers to quantify many of the key aspects of heterogeneous execution, this paper presents a new set of benchmarks called the Valar. The Valar benchmarks are applications specifically chosen to study the dynamic behavior of OpenCL applications that will benefit from host-device interaction. We describe the general characteristics of our benchmarks, focusing on specific characteristics that can help characterize heterogeneous applications. For the purposes of this paper we focus on OpenCL as our programming environment, though we envision versions of Valar in additional heterogeneous programming languages.
We profile the Valar benchmarks based on their mapping and execution on different heterogeneous systems. Our evaluation examines optimizations for host-device communication and the effects of closely-coupled execution of the benchmarks on the multiple OpenCL devices present in heterogeneous systems.

References

[1]
The OpenACC Application Programming Interface 1.0. http://www.openacc-standard.org/, 2011.
[2]
AMD. Accelerated parallel processing: Opencl programming guide. 2011.
[3]
M. Arora, S. Nath, S. Mazumdar, S. B. Baden, and D. M. Tullsen. Redefining the Role of the CPU in the Era of CPU-GPU Integration. In IEEE Micro, pages 1--1, 2012.
[4]
S. S. Baghsorkhi, I. Gelado, M. Delahaye, and W.-m. W. Hwu. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12, page 23, New York, New York, USA, 2012. ACM Press.
[5]
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pages 163--174. IEEE, Apr. 2009.
[6]
P. Bakkum and K. Skadron. Accelerating SQL database operations on a GPU with CUDA. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10, page 94, New York, USA, 2010. ACM Press.
[7]
C. Bienia and K. Li. Fidelity and scaling of the PARSEC benchmark inputs. In IEEE International Symposium on Workload Characterization (IISWC'10), pages 1--10. IEEE, Dec. 2010.
[8]
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn, and K. Skadron. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads. In IEEE International Symposium on Workload Characterization (IISWC'10), pages 1--11. IEEE, Dec. 2010.
[9]
J. Coleman, T. Lau, B. Lokhande, P. Shum, R. W. Wisniewski, and M. P. Yost. The Autonomic Computing Benchmark. pages 1--22, 2008.
[10]
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. L. Spafford, V. Tipparaju, and J. S. Vetter. The Scalable Heterogeneous Computing (SHOC) benchmark suite. In ACM International Conference Proceeding Series; Vol. 425, 2010.
[11]
K. Fatahalian, W. J. Dally, P. Hanrahan, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, and A. Aiken. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, number November, page 83, New York, USA, 2006. ACM Press.
[12]
M. Ferdman, B. Falsafi, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, and A. Ailamaki. Clearing the Clouds A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12, number Asplos, page 37, New York, USA, 2012. ACM Press.
[13]
B. Gaster, L. Howes, D. Kaeli, P. Mistry, and D. Schaa. Heterogeneous Computing with OpenCL. Morgan Kaufmann.
[14]
B. R. Gaster and L. Howes. Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck? Computer, 45(8):42--52, Aug. 2012.
[15]
M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally, E. Lindholm, and K. Skadron. Energy-efficient mechanisms for managing thread context in throughput processors. In Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11, page 235, New York, USA, 2011. ACM Press.
[16]
C. Gregg and K. Hazelwood. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 134--144. IEEE, Apr. 2011.
[17]
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a MapReduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, page 260, New York, New York, USA, 2008. ACM Press.
[18]
W. Heirman, T. E. Carlson, S. Che, K. Skadron, and L. Eeckhout. Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 38--49, Nov. 2011.
[19]
J. L. Henning. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34(4):1--17, Sept. 2006.
[20]
T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. 2012 IEEE International Symposium on Performance Analysis of Systems & Software, pages 88--98, Apr. 2012.
[21]
V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web Search Using Mobile Cores: Quantifying and Mitigating the Price of Efficiency. In Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10, number Table 1, page 314, New York, USA, 2010. ACM Press.
[22]
B. Jang, D. Schaa, P. Mistry, and D. Kaeli. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures. IEEE Transactions on Parallel and Distributed Systems, 22(1):105--118, Jan. 2011.
[23]
Z. Kalal, J. Matas, and K. Mikolajczyk. Online learning of robust object detectors during unstable tracking. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pages 1417--1424, Sept. 2009.
[24]
T. Kalibera, J. Hagelberg, and P. Maj. A family of real time Java benchmarks. Concurrency and Computation: Practice and Experience, 2011.
[25]
M. Kulkarni and V. Pai. Towards architecture independent metrics for multicore performance analysis. ACM SIGMETRICS Performance, 2011.
[26]
P. R. Luszczek, D. H. Bailey, J. J. Dongarra, J. Kepner, R. F. Lucas, R. Rabenseifner, and D. Takahashi. The HPC Challenge (HPCC) benchmark suite. Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06, page 213, Nov. 2006.
[27]
M. Mantor and M. Houston. AMD Graphics Core Next. In AMD Fusion Developer Summit, 2011.
[28]
P. Mistry, C. Gregg, N. Rubin, D. Kaeli, and K. Hazelwood. Analyzing program flow within a many-kernel OpenCL application. In Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units - GPGPU-4, page 1, New York, New York, USA, 2011. ACM Press.
[29]
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming PPoPP 08, pages:73, 2008.
[30]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X) - ASPLOS '02, page 45, New York, New York, USA, 2002. ACM Press.
[31]
K. L. Spafford, J. S. Meredith, S. Lee, D. Li, P. C. Roth, and J. S. Vetter. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. In Proceedings of the 9th conference on Computing Frontiers - CF '12, page 103, New York, 2012. ACM Press.
[32]
K. L. Spafford, J. S. Meredith, and J. S. Vetter. Maestro: Data Orchestration and Tuning for OpenCL Devices. EuroPar 2010Parallel Processing, 6272:275--286, 2010.
[33]
J. A. Stratton, C. I. Rodrigues, I.-j. Sung, N. Obeid, L.-w. Chang, N. Anssari, G. D. Liu, and W.-m. W. Hwu. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. 2012.
[34]
D. Strippgen and K. Nagel. Multi-agent traffic simulation with CUDA. In 2009 International Conference on High Performance Computing & Simulation, pages 106--114. IEEE, June 2009.
[35]
W. Thies, M. Karczmarek, J. Sermulins, R. Rabbah, and S. Amarasinghe. Teleport messaging for distributed stream programs. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '05, page 224, New York, New York, USA, 2005. ACM Press.

Cited By

View all
  • (2020)Infrastructure-Aware TensorFlow for Heterogeneous Datacenters2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS50786.2020.9285969(1-8)Online publication date: 17-Nov-2020
  • (2019)Design Space Exploration of Embedded Applications on Heterogeneous CPU-GPU Platforms2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188052(620-627)Online publication date: Jul-2019
  • (2018)Benchmarking Heterogeneous HPC Systems Including Reconfigurable Fabrics: Community Aspirations for Ideal Comparisons2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547635(1-6)Online publication date: Sep-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
March 2013
156 pages
ISBN:9781450320177
DOI:10.1145/2458523
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. OpenCL
  3. benchmark suite
  4. benchmarking
  5. computer vision
  6. heterogeneous computing
  7. profiling

Qualifiers

  • Research-article

Funding Sources

Conference

GPGPU-6

Acceptance Rates

GPGPU-6 Paper Acceptance Rate 15 of 37 submissions, 41%;
Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Infrastructure-Aware TensorFlow for Heterogeneous Datacenters2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS50786.2020.9285969(1-8)Online publication date: 17-Nov-2020
  • (2019)Design Space Exploration of Embedded Applications on Heterogeneous CPU-GPU Platforms2019 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCS48598.2019.9188052(620-627)Online publication date: Jul-2019
  • (2018)Benchmarking Heterogeneous HPC Systems Including Reconfigurable Fabrics: Community Aspirations for Ideal Comparisons2018 IEEE High Performance extreme Computing Conference (HPEC)10.1109/HPEC.2018.8547635(1-6)Online publication date: Sep-2018
  • (2017)Benchmarking and Evaluating Unified Memory for OpenMP GPU OffloadingProceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC10.1145/3148173.3148184(1-10)Online publication date: 12-Nov-2017
  • (2017)Chai: Collaborative heterogeneous applications for integrated-architectures2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975269(43-54)Online publication date: Apr-2017
  • (2016)Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications2016 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC.2016.7581277(1-10)Online publication date: Sep-2016
  • (2016)Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R)2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0039(206-215)Online publication date: Dec-2016
  • (2015)CHOProceedings of the 3rd International Workshop on OpenCL10.1145/2791321.2791331(1-10)Online publication date: 12-May-2015
  • (2015)A Survey of CPU-GPU Heterogeneous Computing TechniquesACM Computing Surveys10.1145/278839647:4(1-35)Online publication date: 21-Jul-2015
  • (2015)NUPARProceedings of the 6th ACM/SPEC International Conference on Performance Engineering10.1145/2668930.2688046(253-264)Online publication date: 28-Jan-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media