Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hera-JVM: a runtime system for heterogeneous multi-core architectures

Published: 17 October 2010 Publication History

Abstract

Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores' local memories.
We present Hera-JVM, an implementation of the Java Virtual Machine which operates over the Cell processor, thereby making this platforms more readily accessible to mainstream developers. Hera-JVM supports the full Java language; threads from an unmodified Java application can be simultaneously executed on both the main PowerPC-based core and on the additional SPE accelerator cores. Migration of threads between these cores is transparent from the point of view of the application, requiring no modification to Java source code or bytecode. Hera-JVM supports the existing Java Memory Model, even though the underlying hardware does not provide cache coherence between the different core types.
We examine Hera-JVM's performance under a series of real-world Java benchmarks from the SpecJVM, Java Grande and Dacapo benchmark suites. These benchmarks show a wide variation in relative performance on the different core types of the Cell processor, depending upon the nature of their workload. Execution of these benchmarks on Hera-JVM can achieve speedups of up to 2.25x by using one of the Cell processor's SPE accelerator cores, compared to execution on the main PowerPC-based core. When all six SPE cores are exploited, parallel workloads can achieve speedups of up to 13x compared to execution on the single PowerPC core.

References

[1]
}}M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. The Next Generation of Intel IXP Network Processors. Intel Tech. Journal, 6(3), 2002.
[2]
}}T. Ainsworth and T. Pinkston. Characterizing the Cell EIB On-Chip Network. IEEE Micro, 27(5):6--14, 2007.
[3]
}}B. Alpern, S. Augart, S. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, et al. The Jikes Research Virtual Machine project: building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005.
[4]
}}G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference, pages 483--485, 1967.
[5]
}}S. Blackburn, R. Garner, C. Hoffmann, A. Khang, K. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Guyer, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA'06), pages 169--190, 2006.
[6]
}}T. Chen, R. Raghavan, J. N. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation: A Performance View. IBM Journal of Research and Development, 51(5):559--572, 2007.
[7]
}}A. Donaldson, C. Riley, A. Lokhmotov, and A. Cook. Auto-parallelisation of Sieve C++ programs. Lecture Notes in Computer Science, 4854:18, 2008.
[8]
}}M. Hill and M. Marty. Amdahl's Law in the Multicore Era. Computer, 41(7):33--38, 2008.
[9]
}}H. Hofstee. Power efficient processor architecture and the cell processor.11th International Symposium on High-Performance Computer Architecture (HPCA-11), pages 258--262, 2005.
[10]
}}J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In Proceedings of the 32nd Symposium on Principles of Programming Languages (POPL'05), pages 378--391, 2005.
[11]
}}J. A. Mathew, P. D. Coddington, and K. A. Hawick. Analysis and development of Java Grande benchmarks. In JAVA '99: Proceedings of the ACM 1999 conference on Java Grande, pages 72--80. ACM, 1999.
[12]
}}R. McIlroy. Using Program Behaviour to Exploit Heterogeneous Multi-Core Processors. PhD thesis, Department of Computing Science, The University of Glasgow, 2010.
[13]
}}R. McIlroy and J. Sventek. Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine. In Workshop on Hot Topics in Operating Systems (HotOS), 2009.
[14]
}}A. Munshi. The OpenCL Specification. Khronos OpenCL Working Group, 2009.
[15]
}}A. Noll, A. Gal, and M. Franz. CellVM: A Homogeneous Virtual Machine Runtime System for a Heterogeneous Single-Chip Multiprocessor. In Workshop on Cell Systems and Applications, June 2008.
[16]
}}J. Perez, P. Bellens, R. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development, 51(5):593--604, 2007.
[17]
}}D. Pham, S. Asano, M. Bolliger, M. Day, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, et al. The design and implementation of a first-generation CELL processor. IEEE Solid-State Circuits Conference, 2005.
[18]
}}S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, D. Kirk, and W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th Symposium on Principles and Practice of Parallel Programming (PPoPP'08), pages 73--82, 2008.
[19]
}}J. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A Comprehensive Scheduler for Asymmetric Multicore Processors. In Proceedings of EuroSys'10, 2010.
[20]
}}B. Saha, A. Adl-Tabatabai, A. Ghuloum, M. Rajagopalan, R. Hudson, L. Petersen, V. Menon, B. Murphy, T. Shpeisman, E. Sprangle, et al. Enabling scalability and performance in a large scale CMP environment. In Proceedings of EuroSys'07, pages 73--86, 2007.
[21]
}}K. Shiv, K. Chow, Y. Wang, and D. Petrochenko. SPECjvm2008 Performance Characterization. In Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking, pages 17--35. Springer, 2009.
[22]
}}L. Smith, J. Bull, and J. Obdrizalek. A parallel Java Grande benchmark suite. In Proceedings of the Conference on Super-computing (SC'01), 2001.
[23]
}}R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmere-2L: software coherent shared memory on a clustered remote-write network. In Proceedings of the 16th Symposium on Operating Systems Principles (SOSP'97), pages 170--183, 1997.

Cited By

View all
  • (2015)TinManProceedings of the Tenth European Conference on Computer Systems10.1145/2741948.2741977(1-16)Online publication date: 17-Apr-2015
  • (2014)JDMMACM SIGPLAN Notices10.1145/2775049.260299949:11(83-92)Online publication date: 12-Jun-2014
  • (2023)BeeHive: Sub-second Elasticity for Web Services with Semi-FaaS ExecutionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575752(74-87)Online publication date: 27-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 45, Issue 10
OOPSLA '10
October 2010
957 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1932682
Issue’s Table of Contents
  • cover image ACM Conferences
    OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
    October 2010
    984 pages
    ISBN:9781450302036
    DOI:10.1145/1869459
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2010
Published in SIGPLAN Volume 45, Issue 10

Check for updates

Author Tags

  1. heterogeneous multi-core architecture
  2. java virtual machine
  3. software caching

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)TinManProceedings of the Tenth European Conference on Computer Systems10.1145/2741948.2741977(1-16)Online publication date: 17-Apr-2015
  • (2014)JDMMACM SIGPLAN Notices10.1145/2775049.260299949:11(83-92)Online publication date: 12-Jun-2014
  • (2023)BeeHive: Sub-second Elasticity for Web Services with Semi-FaaS ExecutionProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575752(74-87)Online publication date: 27-Jan-2023
  • (2016)Building a Java™ Virtual Machine for Non-Cache-Coherent Many-core ArchitecturesProceedings of the 14th International Workshop on Java Technologies for Real-Time and Embedded Systems10.1145/2990509.2990510(1-10)Online publication date: 29-Aug-2016
  • (2016)DiSquawkProceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools10.1145/2972206.2972212(1-12)Online publication date: 29-Aug-2016
  • (2016)Towards a GPU Abstraction for Lua2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)10.1109/SBAC-PADW.2016.11(13-18)Online publication date: Oct-2016
  • (2016)Efficient Distributed Data Structures for Future Many-Core Architectures2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0113(835-842)Online publication date: Dec-2016
  • (2014)JDMMACM SIGPLAN Notices10.1145/2775049.260299949:11(83-92)Online publication date: 12-Jun-2014
  • (2014)JDMMProceedings of the 2014 international symposium on Memory management10.1145/2602988.2602999(83-92)Online publication date: 12-Jun-2014
  • (2014)Efficient Mapping of Irregular C++ Applications to Integrated GPUsProceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization10.1145/2581122.2544165(33-43)Online publication date: 15-Feb-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media