research-article

IDEAL: image denoising accelerator

Authors:

Mostafa Mahmoud,

Alberto Delmás Lascorz,

Jonathan Assouline,

Emmanuel Onzon,

Andreas MoshovosAuthors Info & Claims

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 82 - 95

https://doi.org/10.1145/3123939.3123941

Published: 14 October 2017 Publication History

Abstract

Computational imaging pipelines (CIPs) convert the raw output of imaging sensors into the high-quality images that are used for further processing. This work studies how Block-Matching and 3D filtering (BM3D), a state-of-the-art denoising algorithm can be implemented to meet the demands of user-interactive (UI) applications. Denoising is the most computationally demanding stage of a CIP taking more than 95% of time on a highly-optimized software implementation [29]. We analyze the performance and energy consumption of optimized software implementations on three commodity platforms and find that their performance is inadequate.

Accordingly, we consider two alternatives: a dedicated accelerator, and running recently proposed Neural Network (NN) based approximations of BM3D [9, 27] on an NN accelerator. We develop Image DEnoising AcceLerator(IDEAL), a hardware BM3D accelerator which incorporates the following techniques: 1) a novel software-hardware optimization, Matches Reuse (MR), that exploits typical image content to reduce the computations needed by BM3D, 2) prefetching and judicious use of on-chip buffering to minimize execution stalls and off-chip bandwidth consumption, 3) a careful arrangement of specialized computing blocks, and 4) data type precision tuning. Over a dataset of images with resolutions ranging from 8 megapixel (MP) and up to 42MP, IDEAL is 11, 352× and 591× faster than high-end general-purpose (CPU) and graphics processor (GPU) software implementations with orders of magnitude better energy efficiency. Even when the NN approximations of BM3D are run on the DaDianNao [14] high-end hardware NN accelerator, IDEAL is 5.4× faster and 3.95× more energy efficient.

References

[1]

2010. BM3D assembly device designed on basis of ASIC. (July 28 2010). http://www.google.us/patents/CN101789043A?cl=en CN Patent App. CN 201,010,102,701.

[2]

2016. Intel 64 and IA-32 Architectures Software Developer's Manual. http://www.intel.corn/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf. (September 2016).

[3]

2017. Bosch's Driver assistance systems - Predictive pedestrian protection. http://products.bosch-mobility-solutions.com/en/de/_technik/component/SF_PC_DA_Predictive-Pedestrian-Protection_SF_PC_Driver-Assistance-Systems_5251.html?compld=2880. (2017).

[4]

2017. NVIDIA Visual Profiler, https://developer.nvidia.com/nvidia-visual-profiler. (2017).

[5]

2017. Photography Blog. (2017). http://www.photographyblog.com

[6]

Bernardo Manuel Aguiar Silva Teixeira Cardoso. 2015. Algorithm and Hardware Design for Image Restoration. Master's thesis. Faculty of Engineering, the University of Porto, Porto, Portugal, https://repositorio-aberto.up.pt/bitstream/10216/84329/2/35861.pdf

[7]

Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. 2005. A Non-Local Algorithm for Image Denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 (CVPR '05). IEEE Computer Society, Washington, DC, USA, 60--65.

Digital Library

[8]

Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. 2011. Non-Local Means Denoising. Image Processing On Line 1 (2011), 208--212.

[9]

Harold C. Burger, Christian J. Schuler, and Stefan Harmeling. 2012. Image denoising: Can plain neural networks compete with BM3D?. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2392--2399.

Digital Library

[10]

Harold C. Burger, Christian J. Schuler, and Stefan Harmeling. 2013. Learning how to combine internal and external denoising methods. In Proceedings of the 35th German Conference on Pattern Recognition (GCPR 2013).

[11]

Frank Cabello, Julio León, Yuzo Iano, and Rangel Arthur. 2015. Implementation of a fixed-point 2D Gaussian Filter for Image Processing based on FPGA. In 2015 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). 28--33.

[12]

Stuart K. Card, George G. Robertson, and Jock D. Mackinlay. 1991. The Information Visualizer, an Information Workspace. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '91). ACM, New York, NY, USA, 181--186.

Digital Library

[13]

S. Grace Chang, Bin Yu, and Martin Vetterli. 2000. Adaptive Wavelet Thresholding for Image Denoising and Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING 9, 9 (2000), 1532--1546.

Digital Library

[14]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 609--622.

Digital Library

[15]

Jason Clemons, Chih C. Cheng, Iuri Frosio, Daniel Johnson, and Stephen W. Keckler. 2016. A patch memory system for image processing and computer vision. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.

[16]

Kostadin Dabov, Alessandro Foi, and Karen Egiazarian. 2007. Video denoising by sparse 3D transform-domain collaborative filtering. In 2007 15th European Signal Processing Conference. 145--149.

[17]

Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2006. Image denoising with block-matching and 3D filtering. In Electronic Imaging 2006. International Society for Optics and Photonics, 606414--606414.

[18]

Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing 16, 8 (Aug 2007), 2080--2095.

Digital Library

[19]

Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Joint image sharpening and denoising by 3D transform-domain collaborative filtering. In Proc. 2007 Int. TICSP Workshop Spectral Meth. Multirate Signal Process., SMMSP, Vol. 2007. Citeseer.

[20]

Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2008. Image restoration by sparse 3D transform-domain collaborative filtering. In Electronic Imaging 2008. International Society for Optics and Photonics, 681207--681207.

[21]

AramDanielyan, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2008. Image and video super-resolution via spatially adaptive blockmatching filtering. In Proceedings of International Workshop on Local and non-Local Approximation in Image Processing (LNLA).

[22]

Aram Danielyan, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2008. Image upsampling via spatially adaptive block-matching filtering. In 2008 16th European Signal Processing Conference. 1--5.

[23]

David Honzátko. 2015. GPU Acceleration of Advanced Image Denoising. Ph.D. Dissertation. Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic, https://is.cuni.cz/webapps/zzp/download/130165253/?lang=en

[24]

Karen Egiazarian, Jaakko Astola, Mika Helsingius, and Pauli Kuosmanen. 1999. Adaptive denoising and lossy compression of images in transform domain. Journal of Electronic Imaging 8, 3 (1999), 233--245.

[25]

Michael Elad and Michal Aharon. 2006. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. IEEE Transactions on Image Processing 15, 12 (Dec 2006), 3736--3745.

Digital Library

[26]

Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images. IEEE Transactions on Image Processing 16, 5 (May 2007), 1395--1411.

Digital Library

[27]

Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. 2016. Deep Joint Demosaicking and Denoising. ACM Trans. Graph. 35, 6, Article 191 (Nov. 2016), 12 pages

Digital Library

[28]

Jose A. Guerrero-Colon and Javier Portilla. 2005. Two-level adaptive denoising using Gaussian scale mixtures in overcomplete oriented pyramids. In IEEE International Conference on Image Processing 2005, Vol. 1. I-105--8.

[29]

Felix Heide, Markus Steinberger, Yun-Ta Tsai, Mushfiqur Rouf, Dawid Pająk, Dikpal Reddy, Orazio Gallo, Jing Liu abd Wolfgang Heidrich, Karen Egiazarian, Jan Kautz, and Kari Pulli. 2014. FlexISP: A Flexible Camera Image Processing Framework. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia 2014) 33, 6 (December 2014).

Digital Library

[30]

Viren Jain and Sebastian Seung. 2009. Natural Image Denoising with Convolutional Networks. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.). Curran Associates, Inc., 769--776. http://papers.nips.cc/paper/3506-natural-image-denoising-with-convolutional-networks.pdf

Digital Library

[31]

Lynn Jenner. 2015. Hubble's High-Definition Panoramic View of the Andromeda Galaxy. https://www.nasa.gov/content/goddard/hubble-s-high-definition-panoramic-view-of-the-andromeda-galaxy. (Jan. 5 2015).

[32]

Gerald C. Kane and Alexandra Pear. 2016. The Rise of Visual Content Online, http://sloanreview.mit.edu/article/the-rise-of-visual-content-online/. (Jan. 4 2016).

[33]

Charles Kervrann and Jérôme Boulanger. 2006. Optimal Spatial Adaptation for Patch-Based Image Denoising. IEEE Transactions on Image Processing 15, 10 (Oct 2006), 2866--2878.

Digital Library

[34]

John E. Krist. 1992. Deconvolution of hubble space telescope images using simulated point spread functions. In Astronomical Data Analysis Software and Systems I, Vol. 25. 226.

[35]

Sheng Li, Jung H. Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 469--480.

Digital Library

[36]

Markku Mäkitalo and Alessandro Foi. 2011. Spatially adaptive alpha-rooting in BM3D sharpening. In Image Processing: Algorithms and Systems IX, San Francisco, California, USA, January 24--25, 2011. 787012.

[37]

Robert B. Miller. 1968. Response Time in Man-computer Conversational Transactions. In Proceedings of the December 9--11, 1968, Fall Joint Computer Conference, Part I (AFIPS '68 (Fall, part I)). ACM, New York, NY, USA, 267--277.

Digital Library

[38]

MihirMody. 2016. ADAS Front Camera: Demystifying Resolution and Frame-Rate. http://www.eetimes.com/author.asp?section_id=36&doc_id=1329109. (March 7 2016).

[39]

Junichi Nakamura. 2005. Image Sensors and Signal Processing for Digital Still Cameras. CRC Press, Inc., Boca Raton, EL, USA.

Digital Library

[40]

Jakob Nielsen. 2009. Powers of 10: Time Scales in User Experience, https://www.nngroup.com/articles/powers-of-10-time-scales-in-ux/. (Oct. 5 2009).

[41]

Wayne T. Padgett and David V. Anderson. 2009. Fixed-Point Signal Processing. Morgan & Claypool. https://books.google.ca/books?id=h590cd_BagMC

Digital Library

[42]

Matt Poremba, Sparsh Mittal, Dong Li, Jeffrey S. Vetter, and Yuan Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1543--1546.

Digital Library

[43]

Javier Portilla, Vasily Strela, Martin J. Wainwright, and Eero P. Simoncelli. 2003. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Transactions on Image Processing 12, 11 (Nov 2003), 1338--1351.

Digital Library

[44]

Rajeev Ramanath, Wesley E. Snyder, Youngjun Yoo, and Mark S. Drew. 2005. Color image processing pipeline. IEEE Signal Processing Magazine 22, 1 (Jan 2005), 34--43.

[45]

Marc'Aurelio Ranzato, Y-lan Boureau, Sumit Chopra, and Yann Lecun. 2007. A Unified Energy-Based Framework for Unsupervised Learning. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS-07), Marina Meila and Xiaotong Shen (Eds.), Vol. 2. Journal of Machine Learning Research - Proceedings Track, 371--379. http://jmlr.csail.mit.edu/proceedings/papers/v2/ranzato07a/ranzato07a.pdf

[46]

Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (Jan 2011), 16--19.

Digital Library

[47]

Sampsa Sarjanoja, Jani Boutellier, and Jari Hannuksela. 2015. BM3D image denoising using heterogeneous computing platforms. In 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP). 1--8.

[48]

Yun-Ta Tsai, Markus Steinberger, Dawid Pająk, and Kari Pulli. 2014. Fast ANN for High-quality Collaborative Filtering. In Proceedings of High Performance Graphics (HPG '14). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 61--70. http://dl.acm.org/citation.cfm?id=2980009.2980016

Digital Library

[49]

Gerd Waloszek and Ulrich Kreichgauer. 2009. User-Centered Evaluation of the Responsiveness of Applications. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I (INTERACT '09). Springer-Verlag, Berlin, Heidelberg, 239--242.

Digital Library

[50]

Vincent M. Weaver, Matt Johnson, Kiran Kasichayanula, James Ralph, Piotr Luszczek, Dan Terpstra, and Shirley Moore. 2012. Measuring Energy and Power with PAPI. In 2012 41st International Conference on Parallel Processing Workshops. 262--268.

Digital Library

[51]

Paul Worthington. 2014. One Trillion Photos in 2015. http://mylio.com/true-stories/tech-today/one-trillion-photos-in-2015-2. (Dec. 11 2014).

[52]

Ahmad Yasin. 2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.

[53]

Reza Yazdani, Albert Segura, Jose-Maria Arnau, and Antonio Gonzalez. 2016. An ultra low-power hardware accelerator for automatic speech recognition. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.

[54]

Hao Zhang, Wenjiang Liu, Ruolin Wang, Tao Liu, and Mengtian Rong. 2016. Hardware architecture design of block-matching and 3D-filtering denoising algorithm. Journal of Shanghai Jiaotong University (Science) 21, 2 (2016), 173--183.

[55]

S. Zhang and E. Salari. 2005. Image denoising using a neural network based nonlinear filter in wavelet domain. In Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., Vol. 2. ii/989--ii/992 Vol. 2.

Cited By

Ujjainkar NLeng JZhu YSolihin YHeinrich M(2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589076
Xiao HZhou YGao TDuan SChen GHu X(2023)Memristor-Based Light-Weight Transformer Circuit Implementation for Speech RecognizingIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323758213:1(344-356)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3237582
Krestinskaya OSalama KJames A(2022)Analog Image Denoising with an Adaptive Memristive Crossbar Network2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937269(3453-3457)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937269
Show More Cited By

Index Terms

IDEAL: image denoising accelerator
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Special purpose systems
    2. Parallel architectures
      1. Single instruction, multiple data
  2. Real-time systems
    1. Real-time system architecture

Recommendations

From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
ASPLOS '14

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural ...
Direct MPI Library for Intel Xeon Phi Co-Processors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 2017

850 pages

ISBN:9781450349529

DOI:10.1145/3123939

General Chairs:
Hillery Hunter
IBM Research
,
Jaime Moreno
IBM Research
,
Program Chairs:
Joel Emer
NVIDIA and MIT
,
Daniel Sanchez
MIT

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSERC

Conference

MICRO-50

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 14 - 18, 2017

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
685
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ujjainkar NLeng JZhu YSolihin YHeinrich M(2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589076
Xiao HZhou YGao TDuan SChen GHu X(2023)Memristor-Based Light-Weight Transformer Circuit Implementation for Speech RecognizingIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323758213:1(344-356)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3237582
Krestinskaya OSalama KJames A(2022)Analog Image Denoising with an Adaptive Memristive Crossbar Network2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937269(3453-3457)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937269
Nematollahi NSadrosadati MFalahati HBarkhordar MDrumond MSarbazi-Azad HFalsafi B(2020)Efficient Nearest-Neighbor Data Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/342998118:1(1-26)Online publication date: 30-Dec-2020
https://dl.acm.org/doi/10.1145/3429981
Feng YTian BXu TWhatmough PZhu Y(2020)Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00087(1037-1050)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00087
Huang CDing YWang HWeng CLin KWang LChen L(2019)eCNNProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358263(182-195)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358263
Boroumand AGhose SPatel MHassan HLucia BAusavarungnirun RHsieh KHajinazar NMalladi KZheng HMutlu OManne SHunter HAltman E(2019)CoNDAProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322266(629-642)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322266
Mahmoud MSiu KMoshovos AOskin MInoue K(2018)DiffyProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00020(134-147)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00020
Nematollahi NSadrosadati MFalahati HBarkhordar MSarbazi-Azad H(2018)Neda: Supporting direct inter-core neighbor data exchange in GPUsIEEE Computer Architecture Letters10.1109/LCA.2018.2873679(1-1)Online publication date: 2018
https://doi.org/10.1109/LCA.2018.2873679
Zhu YSamajdar AMattina MWhatmough P(2018)EuphratesProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00052(547-560)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00052
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents