Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

Published: 01 June 2009 Publication History

Abstract

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating large multiprocessor systems with hundreds or thousands of processors or when instrumentation is introduced. We propose the ProtoFlex simulation architecture, which uses FPGAs to accelerate full-system multiprocessor simulation and to facilitate high-performance instrumentation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, ProtoFlex virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance at a large savings in complexity. Further, to achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system.
We have created a first instance of the ProtoFlex simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server, hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 38x speedup (and as high as 49×) over comparable software simulation across a suite of applications, including OLTP on a commercial database server. We also demonstrate the advantages of minimal-overhead FPGA-accelerated instrumentation through a CMP cache simulation technique that runs orders-of-magnitude faster than software.

References

[1]
AMD. 2008. Advanced Micro Devices, SimNow Simulator 4.4.3. User’s manual.
[2]
Barroso, L. A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. 2000. Piranha: A scalable architecture based on single-chip multiprocessing. SIGARCH Comput. Archit. News 28, 2, 282--293.
[3]
Bellard, F. 2005. QEMU, A fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, 41--41.
[4]
Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro 26, 4, 52--60.
[5]
Bohrer, P., Peterson, J., Elnozahy, M., Rajamony, R., Gheith, A., Rockhold, R., Lefurgy, C., Shafi, H., Nakra, T., Simpson, R., Speight, E., Sudeep, K., Hensbergen, E., and Zhang, L. 2004. Mambo: A full system simulator for the PowerPC architecture. ACM SIGMETRICS Perform. Eval. Rev. 31, 4, 8--12.
[6]
Chang, C., Wawrzynek, J., and Brodersen, R. W. 2005. BEE2: A high-end reconfigurable computing system. IEEE Des. Test Comput. 22, 2, 114--125.
[7]
Chen, S., Kozuch, M., Strigkos, T., Falsafi, B., Gibbons, P. B., Mowry, T. C., Ramachandran, V., Ruwase, O., Ryan, M., and Vlachos, E. 2008. Flexible hardware acceleration for instruction-grain program monitoring. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, 377--388.
[8]
Chidester, M. and George, A. 2002. Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul. 12, 3, 176--200.
[9]
Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E., Keefe, J., and Angepat, H. 2007. FPGA-Accelerated simulation technologies (FAST): Fast, full-system, cycle-accurate simulators. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’07). IEEE Computer Society, 249--261.
[10]
Chung, E. S., Nurvitadhi, E., Hoe, J. C., Falsafi, B., and Mai, K. 2008. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proceedings of the 16th International ACM/SIGDA Symposium on Field Programmable Gate Arrays (FPGA’08). ACM, New York, 77--86.
[11]
Dalton, M., Kannan, H., and Kozyrakis, C. 2007. Raksha: A flexible information flow architecture for software security. SIGARCH Comput. Archit. News 35, 2, 482--493.
[12]
Emer, J., Ahuja, P., Borch, E., Klauser, A., Luk, C.-K., Manne, S., Mukherjee, S. S., Patil, H., Wallace, S., Binkert, N., Espasa, R., and Juan, T. 2002. Asim: A performance model framework. Comput. 35, 2, 68--76.
[13]
Hankins, R., Diep, T., Annavaram, M., Hirano, B., Eri, H., Nueckel, H., and Shen, J. 2003. Scaling and characterizing database workloads: Bridging the gap between research and practice. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. 151--162.
[14]
Krasnov, A., Schultz, A., Wawrzynek, J., Gibeling, G., and Droz, P. 2007. RAMP Blue: A message-passing manycore system in FPGAs. In Proceedings of the Conference on Field Programmable Logic and Applications.
[15]
Lantz, R. 2008. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation.
[16]
Legedza, U. and Weihl, W. E. 1996. Reducing synchronization overhead in parallel simulation. SIGSIM Simul. Digest 26, 1, 86--95.
[17]
Lu, S.-L. L., Yiannacouras, P., Kassa, R., Konow, M., and Suh, T. 2007. An FPGA-based Pentium®in a complete desktop system. In Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA’07). ACM, New York, 53--59.
[18]
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58.
[19]
Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4, 92--99.
[20]
Mukherjee, S., Reinhardt, S., Falsafi, B., Litzkow, M., Hill, M., Wood, D., Huss-Lederman, S., and Larus, J. 2000. Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator. Concurr. IEEE 8, 4, 12--20.
[21]
Nethercote, N. and Seward, J. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, 89--100.
[22]
Nussbaum, F., Fedorova, A., and Small, C. 2004. An overview of the Sam CMT simulator kit. Tech. rep. TR-2004-133, Sun Microsystems Research Labs.
[23]
Öner, K., Barroso, L. A., Iman, S., Jeong, J., Ramamurthy, K., and Dubois, M. 1995. The design of RPM: An FPGA-based multiprocessor emulator. In Proceedings of the ACM 3rd International Symposium on Field Programmable Gate Arrays (FPGA’95). ACM, New York, 60--66.
[24]
Over, A., Clarke, B., and Strazdins, P. 2007. A comparison of two approaches to parallel simulation of multiprocessors. ispass 0, 12--22.
[25]
Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. 2004. Pinpointing representative portions of large Intel®Itanium®programs with dynamic instrumentation. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’04). IEEE Computer Society, 81--92.
[26]
Pellauer, M., Vijayaraghavan, M., Adler, M., and Emer, J. 2008. Quick performance models quickly: Timing-Directed simulation on FPGAs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software.
[27]
Penry, D., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D., and Connors, D. 2006. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 29--40.
[28]
Reinhardt, S. K., Hill, M. D., Larus, J. R., Lebeck, A. R., Lewis, J. C., and Wood, D. A. 1993. The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. ACM SIGMETRICS Perform. Eval. Rev. 21, 1, 48--60.
[29]
Rosenblum, M., Herrod, S. A., Witchel, E., and Gupta, A. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel Distrib. Technol. 3, 4, 34--43.
[30]
Smith, B. 1985. In The Architecture of HEP on Parallel MIMD Computation: HEP Supercomputer and its Applications. Massachusetts Institute of Technology, Cambridge, MA, 41--55.
[31]
Srivastava, A. and Eustace, A. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’94). ACM, New York, 196--205.
[32]
Tan, Z., Asanović, K., and Patterson, D. 2008. An FPGA host-multithreaded functional model for SPARC v8. In Proceedings of the 3rd Workshop on Architectural Research Prototyping.
[33]
Thornton, J. E. 1995. Parallel operation in the control data 6600. 5--12.
[34]
Vahia, D. and Hartke, P. 2007. OpenSPARC T1 on Xilinx FPGAs--Updates. June 2007 RAMP Retreat.
[35]
Venkataramani, G., Roemer, B., Solihin, Y., and Prvulovic, M. 2007. MemTracker: Efficient and programmable support for memory access monitoring and debugging. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). IEEE Computer Society, 273--284.
[36]
Wang, K., Zhang, Y., Wang, H., and Shen, X. 2008. Parallelization of IBM mambo system simulator in functional modes. SIGOPS Oper. Syst. Rev. 42, 1, 71--76.
[37]
Wawrzynek, J., Patterson, D., Oskin, M., Lu, S.-L., Kozyrakis, C., Hoe, J. C., Chiou, D., and Asanović, K. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2, 46--57.
[38]
Wee, S., Casper, J., Njoroge, N., Tesylar, Y., Ge, D., Kozyrakis, C., and Olukotun, K. 2007. A practical FPGA-based framework for novel CMP research. In Proceedings of the ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA’07). ACM, New York, 116--125.
[39]
Wenisch, T. and Wunderlich, R. 2005. SimFlex: Fast, accurate and flexible simulation of computer systems. In Proceedings of the Tutorial in the International Symposium on Microarchitecture (MICRO-38).
[40]
Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006. SimFlex: Statistical sampling of computer system simulation. IEEE Micro 26, 4, 18--31.
[41]
Witchel, E. and Rosenblum, M. 1996. Embra: Fast and flexible machine simulation. ACM SIGMETRICS Perform. Eval. Rev. 24, 1, 68--79.
[42]
Yourst, M. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 23--34.

Cited By

View all
  • (2023)SMAPPIC: Scalable Multi-FPGA Architecture Prototype Platform in the CloudProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575753(733-746)Online publication date: 27-Jan-2023
  • (2023)FreezeTime: Towards System Emulation through Architectural Virtualization2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00033(129-136)Online publication date: May-2023
  • (2020)MEGACM Transactions on Reconfigurable Technology and Systems10.1145/340911413:4(1-24)Online publication date: 30-Sep-2020
  • Show More Cited By

Index Terms

  1. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 2, Issue 2
    June 2009
    211 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/1534916
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2009
    Accepted: 01 November 2008
    Revised: 01 August 2008
    Received: 01 June 2008
    Published in TRETS Volume 2, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. emulator
    3. multicore
    4. multiprocessor
    5. prototype
    6. simulator

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)SMAPPIC: Scalable Multi-FPGA Architecture Prototype Platform in the CloudProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575753(733-746)Online publication date: 27-Jan-2023
    • (2023)FreezeTime: Towards System Emulation through Architectural Virtualization2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00033(129-136)Online publication date: May-2023
    • (2020)MEGACM Transactions on Reconfigurable Technology and Systems10.1145/340911413:4(1-24)Online publication date: 30-Sep-2020
    • (2020)FirePerfProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378455(715-731)Online publication date: 9-Mar-2020
    • (2019)EPSim-C: A Parallel Epoch-Based Cycle-Accurate Microarchitecture Simulator Using Cloud ComputingElectronics10.3390/electronics80607168:6(716)Online publication date: 24-Jun-2019
    • (2019)Analyzing the Impact of Operating System Activity of Different Linux Distributions in a Distributed Environment2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/EMPDP.2019.8671562(422-429)Online publication date: Feb-2019
    • (2019)Epsim: A Scalable and Parallel Marssx86 Simulator With Exploiting Epoch-Based ExecutionIEEE Access10.1109/ACCESS.2018.28866307(4782-4794)Online publication date: 2019
    • (2018)RpStacks-MTProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00054(586-599)Online publication date: 20-Oct-2018
    • (2018)FiresimProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00014(29-42)Online publication date: 2-Jun-2018
    • (2018)High Speed Cycle-Approximate Simulation of Embedded Cache-Incoherent and Coherent Chip-MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-018-0566-x46:6(1247-1282)Online publication date: 1-Dec-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media