Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3195638.3195688acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Concise loads and stores: the case for an asymmetric compute-memory architecture for approximation

Published: 15 October 2016 Publication History

Abstract

Cache capacity and memory bandwidth play critical roles in application performance, particularly for data-intensive applications from domains that include machine learning, numerical analysis, and data mining. Many of these applications are also tolerant to imprecise inputs and have loose constraints on the quality of output, making them ideal candidates for approximate computing. This paper introduces a novel approximate computing technique that decouples the format of data in the memory hierarchy from the format of data in the compute subsystem to significantly reduce the cost of storing and moving bits throughout the memory hierarchy and improve application performance. This asymmetric compute-memory extension to conventional architectures, ACME, adds two new instruction classes to the ISA - load-concise and store-concise - along with three small functional units to the micro-architecture to support these instructions. ACME does not affect exact execution of applications and comes into play only when concise memory operations are used. Through detailed experimentation we find that ACME is very effective at trading result accuracy for improved application performance. Our results show that ACME achieves a 1.3X speedup (up to 1.8X) while maintaining 99% accuracy, or a 1.1X speedup while maintaining 99.999% accuracy. Moreover, our approach incurs negligible area and power overheads, adding just 0.005% area and 0.1% power to a conventional modern architecture.

References

[1]
J. Meng, S. Chakradhar, and A. Raghunathan, "Best-effort parallel execution framework for recognition and mining applications," in International Symposium on Parallel and Distributed Processing (IPDPS), 2009.
[2]
V. Chippa, S. Chakradhar, K. Roy, and A. Raghunathan, "Analysis and characterization of inherent application resilience for approximate computing," in Design Automation Conference (DAC), 2013.
[3]
J. Hauswald, M. A. Laurenzano, Y. Zhang, C. Li, A. Rovinski, A. Khurana, R. G. Dreslinski, T. Mudge, V. Petrucci, L. Tang, and J. Mars, "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.
[4]
A. Sampson, J. Nelson, K. Strauss, and L. Ceze, "Approximate storage in solid-state memories," in International Symposium on Microarchitecture (MICRO), 2013.
[5]
W. Baek and T. M. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation," in Programming Language Design and Implementation (PLDI), 2010.
[6]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Architecture support for disciplined approximate programming," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.
[7]
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard, "Dynamic knobs for responsive power-aware computing," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2011.
[8]
M. A. Laurenzano, P. Hill, M. Samadi, S. Mahlke, J. Mars, and L. Tang, "Input responsiveness: using canary inputs to dynamically steer approximation," in Programming Language Design and Implementation (PLDI), 2016.
[9]
A. Arelakis and P. Stenstrom, "Sc2: A statistical compression cache scheme," in International Symposium on Computer Architecuture (MICRO), 2014.
[10]
G. Pekhimenko, V. Seshadri, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, "Base-delta-immediate compression: Practical data compression for on-chip caches," in Parallel Architectures and Compilation Techniques (PACT), 2012.
[11]
S. Sardashti and D. A. Wood, "Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching," in International Symposium on Microarchitecture (MICRO), 2013.
[12]
S. Sardashti, A. Seznec, and D. A. Wood, "Skewed compressed caches," in International Symposium on Microarchitecture (MICRO), 2014.
[13]
J. San Miguel, J. Albericio, A. Moshovos, and N. Enright Jerger, "Doppelganger: A cache for approximate computing," in International Symposium on Microarchitecture (MICRO), 2015.
[14]
D. A. Patterson and J. L. Hennessy, Computer organization and design: the hardware/software interface, 2013.
[15]
A. Sampson, W. Dietl, E. Fortuna, D. Gnanapragasam, L. Ceze, and D. Grossman, "Enerj: Approximate data types for safe and general low-power computation," in Programming Language Design and Implementation (PLDI), 2011.
[16]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neural acceleration for general-purpose approximate programs," in International Symposium on Microarchitecture (MICRO), 2012.
[17]
A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh, "Neural acceleration for gpu throughput processors," in International Symposium on Microarchitecture (MICRO), 2015.
[18]
A. Yazdanbakhsh, D. Mahajan, B. Thwaites, J. Park, A. Nagendrakumar, S. Sethuraman, K. Ramkrishnan, N. Ravindran, R. Jariwala, A. Rahimi, H. Esmaeilzadeh, and K. Bazargan, "Axilog: Language support for approximate hardware design," in Design, Automation and Test in Europe (DATE), 2015.
[19]
M. Carbin, S. Misailovic, and M. C. Rinard, "Verifying quantitative reliability for programs that execute on unreliable hardware," in Object Oriented Programming Systems Languages and Applications (OOPSLA), 2013.
[20]
S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, "Auto-tuning a high-level language targeted to gpu codes," in Innovative Parallel Computing (InPar), 2012.
[21]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in International Symposium on Workload Characterization (ISWC), 2009.
[22]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The Gem5 simulator," in SIGARCH Computer Architecture News, 2011.
[23]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in International Symposium on Microarchitecture (MICRO), 2009.
[24]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, "Cacti 6.0: A tool to model large caches," HP Laboratories, 2009. {Online}. Available: http://www.hpl.hp.com/techreports/2009/HPL-2009-85.html
[25]
G. Rivera and C.-W. Tseng, "Data transformations for eliminating conflict misses," in Programming language design and implementation (PLDI), 1998.
[26]
N. M. Ravindra, "International technology roadmap for semiconductors (ITRS) symposium," in Journal of electronic materials, 2001.
[27]
P. Zhou, B. Zhao, J. Yang, and Y. Zhang, "A durable and energy efficient main memory using phase change memory technology," in International Symposium on Computer Architecture (ISCA), 2009.
[28]
M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable high performance main memory system using phase-change memory technology," in International Symposium on Computer Architecture (ISCA), 2009.
[29]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, "Architecting phase change memory as a scalable dram alternative," in International Symposium on Computer Architecture (ISCA), 2009.
[30]
J. San Miguel, M. Badr, and N. Jerger, "Load value approximation," in International Symposium on Microarchitecture (MICRO), 2014.
[31]
S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, "Flikker: Saving dram refresh-power through critical data partitioning," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2011.
[32]
A. Raha, H. Jayakumar, S. Sutar, and V. Raghunathan, "Quality-aware data allocation in approximate dram," in Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2015.
[33]
R. K. Venkatesan, S. Herr, and E. Rotenberg, "Retention-aware placement in dram (rapid): software methods for quasi-non-volatile dram," in High-Performance Computer Architecture (HPCA), 2006.
[34]
M. Courbariaux, Y. Bengio, and J.-P. David, "Low precision arithmetic for deep learning," in arXiv:1412.7024, 2014.
[35]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in arXiv:1502.02551, 2015.
[36]
S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both weights and connections for efficient neural networks," in arXiv:1506.02626, 2015.
[37]
C. Rubio-González, C. Nguyen, H. D. Nguyen, J. Demmel, W. Kahan, K. Sen, D. H. Bailey, C. Iancu, and D. Hough, "Precimonious: Tuning assistant for floating-point precision," in International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2013.
[38]
M. D. Linderman, M. Ho, D. L. Dill, T. H. Meng, and G. P. Nolan, "Towards program optimization through automated analysis of numerical precision," in Code Generation and Optimization (CGO), 2010.
[39]
A. Sampson, P. Panchekha, T. Mytkowicz, K. S. McKinley, D. Grossman, and L. Ceze, "Expressing and verifying probabilistic assertions," in Programming Language Design and Implementation (PLDI), 2014.
[40]
J. Bornholt, T. Mytkowicz, and K. S. McKinley, "Uncertain<T>: A first-order type for uncertain data," in Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014.

Cited By

View all
  • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
  • (2019)AVRProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337824(1-10)Online publication date: 5-Aug-2019
  • (2018)Architectural support for convolutional neural networks on modern CPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243177(1-13)Online publication date: 1-Nov-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture
October 2016
816 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 October 2016

Check for updates

Qualifiers

  • Research-article

Conference

MICRO-49
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
  • (2019)AVRProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337824(1-10)Online publication date: 5-Aug-2019
  • (2018)Architectural support for convolutional neural networks on modern CPUsProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243177(1-13)Online publication date: 1-Nov-2018
  • (2018)SculptorProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205317(341-351)Online publication date: 12-Jun-2018
  • (2018)The EH modelProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00055(600-612)Online publication date: 20-Oct-2018
  • (2018)GistProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00070(776-789)Online publication date: 2-Jun-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media