research-article

Siblingrivalry: online autotuning through local competitions

Authors:

Marek Olszewski,

Una-May O'Reilly,

Saman AmarasingheAuthors Info & Claims

CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems

Pages 91 - 100

https://doi.org/10.1145/2380403.2380425

Published: 07 October 2012 Publication History

Abstract

Modern high performance libraries, such as ATLAS and FFTW, and programming languages, such as PetaBricks, have shown that autotuning computer programs can lead to significant speedups. However, autotuning can be burdensome to the deployment of a program, since the tuning process can take a long time and should be re-run whenever the program, microarchitecture, execution environment, or tool chain changes. Failure to re-autotune programs often leads to widespread use of sub-optimal algorithms. With the growth of cloud computing, where computations can run in environments with unknown load and migrate between different (possibly unknown) microarchitectures, the need for online autotuning has become increasingly important.

We present SiblingRivalry, a new model for always-on online autotuning that allows parallel programs to continuously adapt and optimize themselves to their environment. In our system, requests are processed by dividing the available cores in half, and processing two identical requests in parallel on each half. Half of the cores are devoted to a known safe program configuration, while the other half are used for an experimental program configuration chosen by our self-adapting evolutionary algorithm. When the faster configuration completes, its results are returned, and the slower configuration is terminated. Over time, this constant experimentation allows programs to adapt to changing dynamic environments and often outperform the original algorithm that uses the entire system.

References

[1]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization, pages 295--305, 2006.

Digital Library

[2]

L. Almagor, Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven W. Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. Finding effective compilation sequences. In LCTES'04, pages 231--239, 2004.

Digital Library

[3]

Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. In PLDI, Dublin, Ireland, Jun 2009.

Digital Library

[4]

Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In CGO, Chamonix, France, Apr 2011.

Digital Library

[5]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 2003.

Digital Library

[6]

Joel Auslander, Matthai Philipose, Craig Chambers, Susan J. Eggers, and Brian N. Bershad. Fast, effective dynamic compilation. In PLDI, 1996.

Digital Library

[7]

Woongki Baek and Trishul Chilimbi. Green: A framework for supporting energy-conscious programming using controlled approximation. In PLDI, June 2010.

Digital Library

[8]

K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, S. Williams, and K. Yelick. Exascale computing study: Technology challenges in achieving exascale systems, 2008.

[9]

V. Bhat, M. Parashar, . Hua Liu, M. Khandekar, N. Kandasamy, and S. Abdelwahed. Enabling self-managing applications using model-based online control strategies. In International Conference on Autonomic Computing, Washington, DC, 2006.

Digital Library

[10]

Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Supercomputing, New York, NY, 1997.

Digital Library

[11]

Andrew P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1997.

Digital Library

[12]

Fangzhe Chang and Vijay Karamcheti. A framework for automatic adaptation of tunable distributed applications. Cluster Computing, 4, March 2001.

Digital Library

[13]

Dehao Chen, Neil Vachharajani, Robert Hundt, Shih-wei Liao, Vinodha Ramasamy, Paul Yuan, Wenguang Chen, and Weimin Zheng. Taming hardware event samples for FDO compilation. In CGO, New York, NY, 2010.

Digital Library

[14]

Luis DaCosta, Alvaro Fialho, Marc Schoenauer, and Michèle Sebag. Adaptive operator selection with dynamic multi-armed bandits. In GECCO, New York, NY, 2008.

Digital Library

[15]

Lawrence Davis. Adapting operator probabilities in genetic algorithms. In ICGA, San Francisco, CA, 1989.

Digital Library

[16]

Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In Marc Schoenauer, Kalyanmoy Deb, Günter Rudolph, Xin Yao, Evelyne Lutton, Juan Julian Merelo, and Hans-Paul Schwefel, editors, PPSN, Berlin, 2000.

Digital Library

[17]

Pedro C. Diniz and Martin C. Rinard. Dynamic feedback: an effective technique for adaptive computing. In PLDI, New York, NY, 1997.

Digital Library

[18]

Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecture for the FFT. In IEEE International Conference on Acoustics Speech and Signal Processing, volume 3, 1998.

[19]

Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. IEEE, 93(2), February 2005. Invited paper, special issue on "Program Generation, Optimization, and Platform Adaptation".

[20]

Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle, Phil Barnard, Elton Ashton, Eric Courtois, and Francois Bodin. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers' Summit, Jul 2008.

[21]

Henry Hoffmann, Jonathan Eastep, Marco D. Santambrogio, Jason E. Miller, and Anant Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In ICAC, New York, NY, 2010.

Digital Library

[22]

Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agarwal, and Martin Rinard. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAIL-TR-2209-042, Massachusetts Institute of Technology, Sep 2009.

[23]

Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. Power-aware computing with dynamic knobs. In ASPLOS, 2011.

Digital Library

[24]

Eun-jin Im and Katherine Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In International Conference on Computational Science, 2001.

Digital Library

[25]

Gabor Karsai, Akos Ledeczi, Janos Sztipanovits, Gabor Peceli, Gyula Simon, and Tamas Kovacshazy. An approach to self-adaptive software based on supervisory control. In International Workshop in Self-adaptive software, 2001.

Digital Library

[26]

Eunjung Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 119 --129, April 2011.

Digital Library

[27]

Kenneth Price, Rainer Storn, and Jouni Lampinen. Differential Evolution: A Practical Approach to Global Optimization. Natural Computing Series. Springer-Verlag, Berlin, Germany, 2005.

Digital Library

[28]

Markus Püschel, José M. F. Moura, Bryan Singer, Jianxin Xiong, Jeremy R. Johnson, David A. Padua, Manuela M. Veloso, and Robert W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1), 2004.

Digital Library

[29]

Robert Schaefer, Carlos Cotta, Joanna Kolodziej, and Günter Rudolph, editors. Parallel Problem Solving from Nature, volume 6238 of Lecture Notes in Computer Science, 2010.

Digital Library

[30]

Cristian Tapus, I-Hsin Chung, and Jeffrey K. Hollingsworth. Active harmony: Towards automated performance tuning. In In Proceedings from the Conference on High Performance Networking and Computing, pages 1--11, 2003.

Digital Library

[31]

Dirk Thierens. Adaptive strategies for operator allocation. In Fernando G. Lobo, Cláudio F. Lima, and Zbigniew Michalewicz, editors, Parameter Setting in Evolutionary Algorithms, volume 54 of Studies in Computational Intelligence. 2007.

[32]

Michael Voss and Rudolf Eigenmann. Adapt: Automated de-coupled adaptive program transformation. In International Conference on Parallel Processing, 2000.

Digital Library

[33]

Michael Voss and Rudolf Eigenmann. High-level adaptive program optimization with adapt. ACM SIGPLAN Notices, 36(7), 2001.

Digital Library

[34]

Richard Vuduc, James W. Demmel, and Katherine A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Scientific Discovery through Advanced Computing Conference, Journal of Physics: Conference Series, San Francisco, CA, June 2005.

[35]

Richard Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Supercomputing, Washington, DC, 1998.

Digital Library

[36]

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, and D. Bailey. PERI - auto-tuning memory-intensive kernels for multicore. Journal of Physics Conference Series, 125(1), July 2008.

[37]

Eckart Zitzler, Marco Laumanns, and Lothar Thiele. SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. In K. Giannakoglou, D. Tsahalis, J. Periaux, K. Papailiou, and T. Fogarty, editors, Evolutionary Methods for Design, Optimisation and Control. Barcelona, Spain, 2002.

Cited By

Randall TKoo JVideau BKruse MWu XHovland PHall MGe RBalaprakash PGallivan KNikolopoulos DBeivide RGallopoulos E(2023)Transfer-learning-based Autotuning using Gaussian CopulaProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593712(37-49)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593712
Pervaiz AYang YDuracz ABartha FSai RImes CCartwright RPalem KLu SHoffmann HScholliers CSinger J(2022)GOAL: Supporting General and Dynamic Adaptation in Computing SystemsProceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3563835.3567655(16-32)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3563835.3567655
Kannan THoffmann HFalsafi BFerdman MLu SWenisch T(2022)Protecting adaptive sampling from information leakage on low-power sensorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507775(240-254)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507775
Show More Cited By

Index Terms

Siblingrivalry: online autotuning through local competitions
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. Context specific languages

Recommendations

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...
A Portable and High-Performance General Matrix-Multiply (GEMM) Library for GPUs and Single-Chip CPU/GPU Systems
PDP '14: Proceedings of the 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

OpenCL is a vendor neutral and portable interface for programming parallel compute devices such as GPUs. Tuning OpenCL implementations of important library functions such as dense general matrix multiply (GEMM) for a particular device is a difficult ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems

October 2012

230 pages

ISBN:9781450314244

DOI:10.1145/2380403

General Chairs:
Ahmed Jerraya
CEA
,
Luca Carloni
Columbia University
,
Program Chairs:
Vincent Mooney
Georgia Institute of Technology, USA
,
Rodric Rabbah
IBM, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWEEK'12

Sponsor:

ESWEEK'12: Eighth Embedded System Week

October 7 - 12, 2012

Tampere, Finland

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
204
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Randall TKoo JVideau BKruse MWu XHovland PHall MGe RBalaprakash PGallivan KNikolopoulos DBeivide RGallopoulos E(2023)Transfer-learning-based Autotuning using Gaussian CopulaProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593712(37-49)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593712
Pervaiz AYang YDuracz ABartha FSai RImes CCartwright RPalem KLu SHoffmann HScholliers CSinger J(2022)GOAL: Supporting General and Dynamic Adaptation in Computing SystemsProceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3563835.3567655(16-32)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3563835.3567655
Kannan THoffmann HFalsafi BFerdman MLu SWenisch T(2022)Protecting adaptive sampling from information leakage on low-power sensorsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507775(240-254)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507775
Chen WWang YXu YGao CHan YZhang L(2022)Amphis: Managing Reconfigurable Processor Architectures With Generative Adversarial LearningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319798041:11(3993-4003)Online publication date: Nov-2022
https://doi.org/10.1109/TCAD.2022.3197980
Wan CSantriaji MRogers EHoffmann HMaire MLu SGavrilovska AZadok E(2020)ALERTProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489170(353-369)Online publication date: 15-Jul-2020
https://dl.acm.org/doi/10.5555/3489146.3489170
Lee WLiu YLiu PMa SChoi HZhang XGupta RKandemir MJimborean AMoseley T(2019)White-box program tuningProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314889(122-135)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.5555/3314872.3314889
Lee WLiu PLiu YMa SZhang XMcKinley KFisher K(2019)Programming support for autonomizing softwareProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314593(702-716)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314593
Ding YMishra NHoffmann HManne SHunter HAltman E(2019)Generative and multi-phase learning for computer systems optimizationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3326633(39-52)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3326633
Gadioli DVitali EPalermo GSilvano C(2019)mARGOt: A Dynamic Autotuning Framework for Self-Aware Approximate ComputingIEEE Transactions on Computers10.1109/TC.2018.288359768:5(713-728)Online publication date: 1-May-2019
https://doi.org/10.1109/TC.2018.2883597
Lee WLiu YLiu PMa SChoi HZhang XGupta R(2019)White-Box Program Tuning2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2019.8661177(122-135)Online publication date: Feb-2019
https://doi.org/10.1109/CGO.2019.8661177
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents