Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ASC: automatically scalable computation

Published: 24 February 2014 Publication History

Abstract

We present an architecture designed to transparently and automatically scale the performance of sequential programs as a function of the hardware resources available. The architecture is predicated on a model of computation that views program execution as a walk through the enormous state space composed of the memory and registers of a single-threaded processor. Each instruction execution in this model moves the system from its current point in state space to a deterministic subsequent point. We can parallelize such execution by predictively partitioning the complete path and speculatively executing each partition in parallel. Accurately partitioning the path is a challenging prediction problem. We have implemented our system using a functional simulator that emulates the x86 instruction set, including a collection of state predictors and a mechanism for speculatively executing threads that explore potential states along the execution path. While the overhead of our simulation makes it impractical to measure speedup relative to native x86 execution, experiments on three benchmarks show scalability of up to a factor of 256 on a 1024 core machine when executing unmodified sequential programs.

References

[1]
Vikram S. Adve, John Mellor-Crummey, Mark Anderson, Jhy-Chun Wang, Daniel A. Reed, and Ken Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), Supercomputing '95, New York, NY, USA, 1995. ACM.
[2]
Haitham Akkary and Michael A. Driscoll. A dynamic multithreading processor. In Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, MICRO 31, pages 226--236, Los Alamitos, CA, USA, 1998. IEEE Computer Society Press.
[3]
Saman P. Amarasinghe and Monica S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, PLDI '93, pages 126--138, New York, NY, USA, 1993. ACM.
[4]
Sanjeev Arora and Boaz Barak. Computational complexity: a modern approach. Cambridge University Press, 2009.
[5]
Jean-Loup Baer and Tien-Fu Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 176--186, New York, NY, USA, 1991. ACM.
[6]
Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a transparent dynamic optimization system. ACM SIGPLAN Notices, 35(5):1--12, 2000.
[7]
Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[8]
Avrim Blum. On-line algorithms in machine learning. In Amos Fiat and Gerhard J. Woeginger, editors, Online Algorithms, volume 1442 of Lecture Notes in Computer Science, pages 306--325. Springer, 1996.
[9]
Bill Blume, Rudolf Eigenmann, Keith Faigin, John Grout, Jay Hoeflinger, David Padua, Paul Petersen, Bill Pottenger, Lawrence Rauchwerger, Peng Tu, and Stephen Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings Of The Workshop On Languages And Compilers For Parallel Computing, pages 10--1. Springer-Verlag, Berlin/Heidelberg, 1994.
[10]
Michael Boyer, David Tarjan, and Kevin Skadron. Federation: Boosting per-thread performance of throughput-oriented manycore architectures. ACM Trans. Archit. Code Optim., 7(4):19:1--19:38, December 2010.
[11]
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. How to use expert advice. J. ACM, 44(3):427--485, May 1997.
[12]
Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.
[13]
Michael K. Chen and Kunle Olukotun. The jrpm system for dynamically parallelizing java programs. In Proceedings of the 30th annual international symposium on Computer architecture, ISCA '03, pages 434--446, New York, NY, USA, 2003. ACM.
[14]
Marcelo Cintra, José F. Martínez, and Josep Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA '00, pages 13--24, New York, NY, USA, 2000. ACM.
[15]
Adam Coates, Andrew Y Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics, pages 215--223, 2011.
[16]
A. Dasgupta. Vizer: A framework to analyze and vectorize intel x86 binaries. Master's thesis, Rice University, 2002.
[17]
James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. The transmeta code morphing software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO '03, pages 15--24, Washington, DC, USA, 2003. IEEE Computer Society.
[18]
Pradeep K. Dubey, Kevin O'Brien, Kathryn M. O'Brien, and Charles Barton. Single-program speculative multithreading (spsm) architecture: compiler-assisted fine-grained multithreading. In Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, PACT '95, pages 109--121, Manchester, UK, UK, 1995. IFIP Working Group on Algol.
[19]
Maria florina Balcan, Manuel Blum, Yishay Mansour, Tom Mitchell, and Santosh Vempala. New theoretical frameworks for machine learning, 2008.
[20]
Björn Franke. Fast cycle-approximate instruction set simulation. In Proceedings of the 11th international workshop on Software & compilers for embedded systems, pages 69--78. ACM, 2008.
[21]
Freddy Gabbay and Freddy Gabbay. Speculative execution based on value prediction. Technical report, EE Department TR 1080, Technion - Israel Institue of Technology, 1996.
[22]
Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and Daniel Tarlow. Church: a language for generative models. CoRR, abs/1206.3255, 2012.
[23]
Raymond Greenlaw, H. James Hoover, and Walter L. Ruzzo. Limits to parallel computation: P-completeness theory. Oxford University Press, Inc., New York, NY, USA, 1995.
[24]
Lance Hammond, Mark Willey, and Kunle Olukotun. Data speculation support for a chip multiprocessor. In Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, ASPLOS VIII, pages 58--69, New York, NY, USA, 1998. ACM.
[25]
Milos Hauskrecht. Linear and logistic regression. Class lecture, 2005.
[26]
Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th annual international symposium on computer architecture, ISCA '93, pages 289--300, New York, NY, USA, 1993. ACM.
[27]
Ben Hertzberg. Runtime Automatic Speculative Parallelization of Sequential Programs. PhD thesis, Stanford University, 2009.
[28]
Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In Proceedings of the 34th annual international symposium on Computer architecture, ISCA '07, pages 186--197, New York, NY, USA, 2007. ACM.
[29]
E.T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, 2003.
[30]
Daniel A. Jiménez and Calvin Lin. Dynamic branch prediction with perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture, HPCA '01, pages 197--, Washington, DC, USA, 2001. IEEE Computer Society.
[31]
Troy A. Johnson, Rudolf Eigenmann, and T. N. Vijaykumar. Speculative thread decomposition through empirical optimization. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '07, pages 205--214, New York, NY, USA, 2007. ACM.
[32]
Ken Kennedy and John R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
[33]
Hanjun Kim, Nick P. Johnson, Jae W. Lee, Scott A. Mahlke, and David I. August. Automatic speculative DOALL for clusters. In Proceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 94--103, New York, NY, USA, 2012. ACM.
[34]
Tom Knight. An architecture for mostly functional languages. In Proceedings of the 1986 ACM conference on LISP and functional programming, LFP '86, pages 105--112, New York, NY, USA, 1986. ACM.
[35]
Aparna Kotha, Kapil Anand, Matthew Smithson, Greeshma Yellareddy, and Rajeev Barua. Automatic parallelization in a binary rewriter. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 547--557, Washington, DC, USA, 2010. IEEE Computer Society.
[36]
Jeffrey C. Lagarias. The 3x+1 Problem: An Annotated Bibliography, II (2000--2009). Arxiv, August 2009.
[37]
J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. Computer, 17(1):6--22, January 1984.
[38]
Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. Value locality and load value prediction. In ASPLOS, pages 138--147, 1996.
[39]
Nick Littlestone and Manfred K. Warmuth. The weighted majority algorithm. Inf. Comput., 108(2):212--261, February 1994.
[40]
Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. POSH: A TLS compiler that exploits program structure. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '06, pages 158--167, New York, NY, USA, 2006. ACM.
[41]
Edward N. Lorenz. Dimension of weather and climate attractors. Nature, 353(6341):241--244, 1991.
[42]
Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, and Scott Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 166--176, New York, NY, USA, 2009. ACM.
[43]
Donald Michie. "Memo" Functions and Machine Learning. Nature, 218(5136):19--22, April 1968.
[44]
Andreas Moshovos, Scott E. Breach, T. N. Vijaykumar, and Gurindar S. Sohi. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th annual international symposium on Computer architecture, ISCA '97, pages 181--193, New York, NY, USA, 1997. ACM.
[45]
Eugene W. Myers. An o(nd) difference algorithm and its variations. Algorithmica, 1:251--266, 1986.
[46]
Todd Mytkowicz, Amer Diwan, and Elizabeth Bradley. Computer systems are dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 19(3):033124, 2009.
[47]
Louis-Noel Pouchet. Polybench/c: the polyhedral benchmark suite.
[48]
Zach Purser, Karthik Sundaramoorthy, and Eric Rotenberg. A study of slipstream processors. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, MICRO 33, pages 269--280, New York, NY, USA, 2000. ACM.
[49]
Carlos García Quiñones, Carlos Madriles, Jesús Sánchez, Pedro Marcuello, Antonio González, and Dean M. Tullsen. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, PLDI '05, pages 269--279, New York, NY, USA, 2005. ACM.
[50]
George Radin. The 801 minicomputer. In Proceedings of the first international symposium on Architectural support for programming languages and operating systems, ASPLOS I, pages 39--47, New York, NY, USA, 1982. ACM.
[51]
Easwaran Raman, Neil Vachharajani, Ram Rangan, and David I. August. Spice: speculative parallel iteration chunk execution. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO '08, pages 175--184, New York, NY, USA, 2008. ACM.
[52]
Ram Rangan, Neil Vachharajani, Manish Vachharajani, and David I. August. Decoupled software pipelining with the synchronization array. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pages 177--188, Washington, DC, USA, 2004. IEEE Computer Society.
[53]
David Stork Richard Duda, Peter Hart. Pattern Classification (Second Edition). John Wiley & Sons, Inc., 2001.
[54]
C. G. Ritson and F. R. M. Barnes. Evaluating intel rtm for cpas. In P. H. Welch et al, editor, Proceedings of Communicating Process Architectures 2013. Open Channel Publishing Limited, 2013.
[55]
Eric Rotenberg, Steve Bennett, and James E. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, MICRO 29, pages 24--35, Washington, DC, USA, 1996. IEEE Computer Society.
[56]
Silvius Rus, Lawrence Rauchwerger, and Jay Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. Int. J. Parallel Program., 31(4):251--283, August 2003.
[57]
Yiannakis Sazeides. Instruction-isomorphism in program execution. In In Proceedings of the Value Prediction Workshop, pages 47--54, 2003.
[58]
Jeremy Singer, Gavin Brown, and Ian Watson. Deriving limits of branch prediction with the fano inequality, 2006.
[59]
James E. Smith. A study of branch prediction strategies. In Proceedings of the 8th annual symposium on Computer Architecture, ISCA '81, pages 135--148, Los Alamitos, CA, USA, 1981. IEEE Computer Society Press.
[60]
Avinash Sodani and Gurindar S. Sohi. An empirical analysis of instruction repetition. In Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, ASPLOS VIII, pages 35--45, New York, NY, USA, 1998. ACM.
[61]
Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22nd annual international symposium on Computer architecture, ISCA '95, pages 414--425, New York, NY, USA, 1995. ACM.
[62]
J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. A scalable approach to thread-level speculation. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA '00, pages 1--12, New York, NY, USA, 2000. ACM.
[63]
J. Gregory Steffan, Christopher Colohan, Antonia Zhai, and Todd C. Mowry. The stampede approach to thread-level speculation. ACM Trans. Comput. Syst., 23(3):253--300, August 2005.
[64]
Benjamin Vigoda. Analog logic: Continuous-Time analog circuits for statistical signal processing. PhD thesis, Massachusetts Institute of Technology, 2003.
[65]
Cheng Wang, Youfeng Wu, Edson Borin, Shiliang Hu, Wei Liu, Dave Sager, Tin-fook Ngai, and Jesse Fang. Dynamic parallelization of single-threaded binary programs using speculative slicing. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 158--168, New York, NY, USA, 2009. ACM.
[66]
Amos Waterland, Jonathan Appavoo, and Margo Seltzer. Parallelization by simulated tunneling. In Proceedings of the 4th USENIX conference on Hot Topics in Parallelism, HotPar'12, pages 9--14, Berkeley, CA, USA, 2012. USENIX Association.
[67]
Amos Waterland, Elaine Angelino, Ekin D. Cubuk, Efthimios Kaxiras, Ryan P. Adams, Jonathan Appavoo, and Margo Seltzer, phComputational caches, Proceedings of the 6th International Systems and Storage Conference (New York, NY, USA), SYSTOR '13, ACM, 2013, pp. 8:1--8:7.
[68]
J. Yang, K. Skadron, M. Soffa, and K. Whitehouse. Feasibility of dynamic binary parallelization. In Proceedings of the 4th USENIX conference on Hot Topics in Parallelism, 2011.
[69]
Efe Yardimci and Michael Franz. Dynamic parallelization and mapping of binary executables on hierarchical platforms. In Proceedings of the 3rd conference on Computing frontiers, CF '06, pages 127--138, New York, NY, USA, 2006. ACM.
[70]
Jenn yuan Tsai and Pen-Chung Yew. The superthreaded architecture: Thread pipelining with run-time data dependence checking and control speculation. In Proceedings of the conference on Parallel architectures and compilation techniques, PACT '96, pages 35--46, 1996.
[71]
Hongtao Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on, pages 290--301, Feb.
[72]
Craig Zilles and Gurindar Sohi. Master/slave speculative parallelization. In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, MICRO 35, pages 85--96, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 49, Issue 4
ASPLOS '14
April 2014
729 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2644865
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
    February 2014
    780 pages
    ISBN:9781450323055
    DOI:10.1145/2541940
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014
Published in SIGPLAN Volume 49, Issue 4

Check for updates

Author Tags

  1. automatic parallelization
  2. machine learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media