Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Paraprox: pattern-based approximation for data parallel applications

Published: 24 February 2014 Publication History

Abstract

Approximate computing is an approach where reduced accuracy of results is traded off for increased speed, throughput, or both. Loss of accuracy is not permissible in all computing domains, but there are a growing number of data-intensive domains where the output of programs need not be perfectly correct to provide useful results or even noticeable differences to the end user. These soft domains include multimedia processing, machine learning, and data mining/analysis. An important challenge with approximate computing is transparency to insulate both software and hardware developers from the time, cost, and difficulty of using approximation. This paper proposes a software-only system, Paraprox, for realizing transparent approximation of data-parallel programs that operates on commodity hardware systems. Paraprox starts with a data-parallel kernel implemented using OpenCL or CUDA and creates a parameterized approximate kernel that is tuned at runtime to maximize performance subject to a target output quality (TOQ) that is supplied by the user. Approximate kernels are created by recognizing common computation idioms found in data-parallel programs (e.g., Map, Scatter/Gather, Reduction, Scan, Stencil, and Partition) and substituting approximate implementations in their place. Across a set of 13 soft data-parallel applications with at most 10% quality degradation, Paraprox yields an average performance gain of 2.7x on a NVIDIA GTX 560 GPU and 2.5x on an Intel Core i7 quad-core processor compared to accurate execution on each platform.

References

[1]
The credit card equation, 2013. http://faculty.bennington.edu/~jzimba/CreditCardEquation Derivation.pdf.
[2]
A. Agarwal, M. Rinard, S. Sidiroglou, S. Misailovic, and H. Hoffmann. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAIL-TR-2009-042, MIT, Mar. 2009.
[3]
C. Álvarez, J. Corbal, and M. Valero. Fuzzy memoization for floating-point multimedia applications. IEEE Transactions on Computers, pages 922--927, 2005.
[4]
J. Ansel, Y. L. Wong, C. Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for autotuning variable-accuracy algorithms. In Proc. of the 2011 International Symposium on Code Generation and Optimization, pages 85--96, 2011.
[5]
W. Baek and T. M. Chilimbi. Green: a framework for supporting energy-conscious programming using controlled approximation. In Proc. of the '10 Conference on Programming Language Design and Implementation, pages 198--209, 2010.
[6]
D. H. Bailey. Experimental mathematics in action. A.K. Peters, 2007.
[7]
F. Bass. A new product growth for model consumer durables. Management Science, pages 215--227, 1969.
[8]
S. Chaudhuri, S. Gulwani, R. L. Roberto, and S. Navidpour. Proving programs robust. In Proc. of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 102--112, 2011.
[9]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proc. of the IEEE Symposium on Workload Characterization, pages 44--54, 2009.
[10]
EMC Corporation. Extracting value from chaos, 2011. www.emc.com /collateral/analyst-reports/idc-extractingvalue-from-chaos-ar.pdf.
[11]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Architecture support for disciplined approximate programming. In 17th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 301--312, 2012.
[12]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In Proc. of the 45th Annual International Symposium on Microarchitecture, pages 449--460, 2012.
[13]
KHRONOS Group. OpenCL - the open standard for parallel programming of heterogeneous systems, 2013.
[14]
C. A. Martinez, J. C. S. Adrian, and M. Cortes. Dynamic tolerance region computing for multimedia. IEEE Transactions on Computers, pages 650--665, 2012.
[15]
M. McCool, J. Reinders, and A. Robison. Structured Parallel Programming: Patterns for Efficient Computation. Morgan Kaufmann, 2012.
[16]
P. D. Michailidis and K. G. Margaritis. Accelerating kerne stimation on the GPU using the CUDA framework. Journal of Applied Mathematical Science, 7(30):1447--1476, 2013.
[17]
S. Misailovic, D. M. Roy, and M. C. Rinard. Probabilistically accurate program transformations. In Proc. of the 18th Static Analysis Symposium, pages 316--333, 2011.
[18]
S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. Quality of service profiling. In Proc. of the 32nd ACM/IEEE conference on Software Engineering, pages 25--34, 2010.
[19]
NVIDIA. CUDA C Programming Guide, Oct. 2012.
[20]
NVIDIA. NVIDIA's next generation CUDA compute architecture: Kepler GK110, 2012. www.nvidia.com/content/PDF/NVIDIA Kepler GK110 Architecture Whitepaper.pdf.
[21]
K. Ohishi, H. Okamura, and T. Dohi. Gompertz software reliability model: Estimation algorithm and empirical validation. Journal of Systems and Software, pages 535--543, 2009.
[22]
L. Renganarayana, V. Srinivasan, R. Nair, and D. Prener. Programming with relaxed synchronization. In Proc. of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability, pages 41--50, 2012.
[23]
M. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In Proc. of the 2006 International Conference on Supercomputing, pages 324--334, 2006.
[24]
M. Rinard. Parallel synchronization-free approximate data structure construction. In Proc. of the 5th USENIX Workshop on Hot Topics in Parallelism, pages 1--8, 2013.
[25]
M. C. Rinard. Using early phase termination to eliminate load imbalances at barrier synchronization points. In Proc. of the 22nd annual ACM SIGPLAN conference on Object-Oriented Systems and applications, pages 369--386, 2007.
[26]
D. Roger, U. Assarsson, and N. Holzschuch. Efficient stream reduction on the GPU. In Proc. of the 1st Workshop on General Purpose Processing on Graphics Processing Units, pages 1--4, 2007.
[27]
M. Samadi, J. Lee, D. A. Jamshidi, A. Hormati, and S.Mahlke. SAGE: Self-tuning approximation for graphics engines. In Proc. of the 46th Annual International Symposium on Microarchitecture, pages 13--24, 2013.
[28]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman. EnerJ: approximate data types for safe and general low-power computation. Proc. of the '11 Conference on Programming Language Design and Implementation, 46(6):164--174, June 2011.
[29]
A. Sampson, J. Nelson, K. Strauss, and L. Ceze. Approximate storage in solid-state memories. In Proc. of the 46th Annual International Symposium on Microarchitecture, pages 25--36, 2013.
[30]
J. Sartori and R. Kumar. Branch and data herding: Reducing control and memory divergence for error-tolerant GPU applications. In IEEE Transactions on on Multimedia, pages 427--428, 2012.
[31]
H. Sheikh, M. Sabir, and A. Bovik. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on Image Processing, 15(11):3440--3451, 2006.
[32]
A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Wu, A. R. Atreya, K. Olukotun, T. Rompf, and M. Odersky. OptiML: an implicitly parallel domain specific language for machine learning. In Proc. of the 28th International Conference on Machine learning, pages 609--616, 2011.
[33]
O. Temam. A defect-tolerant accelerator for emerging highperformance applications. In Proc. of the 39th Annual International Symposium on Computer Architecture, pages 356--367, 2012.
[34]
S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan. Quality programmable vector processors for approximate computing. In Proc. of the 46th Annual International Symposium on Microarchitecture, pages 1--12, 2013.
[35]
H. Wong, M. M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos. Demystifying GPU microarchitecture through microbenchmarking. In Proc. of the 2010 IEEE Symposium on Performance Analysis of Systems and Software, pages 235--246, 2010.
[36]
X.-L. Wu, N. Obeid, and W.-M. Hwu. Exploiting more parallelism from applications having generalized reductions on GPU architectures. In Proc. of the 2010 10th International Conference on Computers and Information Technology, pages 1175--1180, 2010.

Cited By

View all
  • (2023)Approximate High-Performance Computing: A Fast and Energy-Efficient Computing Paradigm in the Post-Moore EraIT Professional10.1109/MITP.2023.325464225:2(7-15)Online publication date: 1-Mar-2023
  • (2023)Machine Learning and Data Mining Algorithms for Geospatial Big DataRemote Sensing Big Data10.1007/978-3-031-33932-5_12(207-226)Online publication date: 23-Jul-2023
  • (2023)SMSE: A serverless platform for multimedia cloud systemsConcurrency and Computation: Practice and Experience10.1002/cpe.792236:4Online publication date: 4-Dec-2023
  • Show More Cited By

Index Terms

  1. Paraprox: pattern-based approximation for data parallel applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 42, Issue 1
    ASPLOS '14
    March 2014
    729 pages
    ISSN:0163-5964
    DOI:10.1145/2654822
    Issue’s Table of Contents
    • cover image ACM Conferences
      ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
      February 2014
      780 pages
      ISBN:9781450323055
      DOI:10.1145/2541940
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 February 2014
    Published in SIGARCH Volume 42, Issue 1

    Check for updates

    Author Tags

    1. accuracy-aware computing
    2. approximation
    3. data parallel
    4. gpu

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)73
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Approximate High-Performance Computing: A Fast and Energy-Efficient Computing Paradigm in the Post-Moore EraIT Professional10.1109/MITP.2023.325464225:2(7-15)Online publication date: 1-Mar-2023
    • (2023)Machine Learning and Data Mining Algorithms for Geospatial Big DataRemote Sensing Big Data10.1007/978-3-031-33932-5_12(207-226)Online publication date: 23-Jul-2023
    • (2023)SMSE: A serverless platform for multimedia cloud systemsConcurrency and Computation: Practice and Experience10.1002/cpe.792236:4Online publication date: 4-Dec-2023
    • (2022)Harnessing the Potential of Function-Reuse in Multimedia Cloud SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309791133:3(617-629)Online publication date: 1-Mar-2022
    • (2021)Energy Efficient Error Resilient Multiplier Using Low-power CompressorsACM Transactions on Design Automation of Electronic Systems10.1145/348883727:3(1-26)Online publication date: 17-Nov-2021
    • (2021)Towards Fine-Grained Online Adaptive Approximation Control for Dense SLAM on Embedded GPUsACM Transactions on Design Automation of Electronic Systems10.1145/348661227:2(1-19)Online publication date: 2-Nov-2021
    • (2019)High-Level Synthesis of Approximate Designs under Real-Time ConstraintsACM Transactions on Embedded Computing Systems10.1145/335818218:5s(1-21)Online publication date: 7-Oct-2019
    • (2018)Uncertainty Propagation in Data Processing SystemsProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267833(95-106)Online publication date: 11-Oct-2018
    • (2018)FoggyCacheProceedings of the 24th Annual International Conference on Mobile Computing and Networking10.1145/3241539.3241557(19-34)Online publication date: 15-Oct-2018
    • (2018)Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)10.1145/3240765.3240819(1-8)Online publication date: 5-Nov-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media