Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3123939.3123941acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

IDEAL: image denoising accelerator

Published: 14 October 2017 Publication History

Abstract

Computational imaging pipelines (CIPs) convert the raw output of imaging sensors into the high-quality images that are used for further processing. This work studies how Block-Matching and 3D filtering (BM3D), a state-of-the-art denoising algorithm can be implemented to meet the demands of user-interactive (UI) applications. Denoising is the most computationally demanding stage of a CIP taking more than 95% of time on a highly-optimized software implementation [29]. We analyze the performance and energy consumption of optimized software implementations on three commodity platforms and find that their performance is inadequate.
Accordingly, we consider two alternatives: a dedicated accelerator, and running recently proposed Neural Network (NN) based approximations of BM3D [9, 27] on an NN accelerator. We develop Image DEnoising AcceLerator(IDEAL), a hardware BM3D accelerator which incorporates the following techniques: 1) a novel software-hardware optimization, Matches Reuse (MR), that exploits typical image content to reduce the computations needed by BM3D, 2) prefetching and judicious use of on-chip buffering to minimize execution stalls and off-chip bandwidth consumption, 3) a careful arrangement of specialized computing blocks, and 4) data type precision tuning. Over a dataset of images with resolutions ranging from 8 megapixel (MP) and up to 42MP, IDEAL is 11, 352× and 591× faster than high-end general-purpose (CPU) and graphics processor (GPU) software implementations with orders of magnitude better energy efficiency. Even when the NN approximations of BM3D are run on the DaDianNao [14] high-end hardware NN accelerator, IDEAL is 5.4× faster and 3.95× more energy efficient.

References

[1]
2010. BM3D assembly device designed on basis of ASIC. (July 28 2010). http://www.google.us/patents/CN101789043A?cl=en CN Patent App. CN 201,010,102,701.
[2]
2016. Intel 64 and IA-32 Architectures Software Developer's Manual. http://www.intel.corn/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf. (September 2016).
[3]
2017. Bosch's Driver assistance systems - Predictive pedestrian protection. http://products.bosch-mobility-solutions.com/en/de/_technik/component/SF_PC_DA_Predictive-Pedestrian-Protection_SF_PC_Driver-Assistance-Systems_5251.html?compld=2880. (2017).
[4]
2017. NVIDIA Visual Profiler, https://developer.nvidia.com/nvidia-visual-profiler. (2017).
[5]
2017. Photography Blog. (2017). http://www.photographyblog.com
[6]
Bernardo Manuel Aguiar Silva Teixeira Cardoso. 2015. Algorithm and Hardware Design for Image Restoration. Master's thesis. Faculty of Engineering, the University of Porto, Porto, Portugal, https://repositorio-aberto.up.pt/bitstream/10216/84329/2/35861.pdf
[7]
Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. 2005. A Non-Local Algorithm for Image Denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 (CVPR '05). IEEE Computer Society, Washington, DC, USA, 60--65.
[8]
Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. 2011. Non-Local Means Denoising. Image Processing On Line 1 (2011), 208--212.
[9]
Harold C. Burger, Christian J. Schuler, and Stefan Harmeling. 2012. Image denoising: Can plain neural networks compete with BM3D?. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2392--2399.
[10]
Harold C. Burger, Christian J. Schuler, and Stefan Harmeling. 2013. Learning how to combine internal and external denoising methods. In Proceedings of the 35th German Conference on Pattern Recognition (GCPR 2013).
[11]
Frank Cabello, Julio León, Yuzo Iano, and Rangel Arthur. 2015. Implementation of a fixed-point 2D Gaussian Filter for Image Processing based on FPGA. In 2015 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). 28--33.
[12]
Stuart K. Card, George G. Robertson, and Jock D. Mackinlay. 1991. The Information Visualizer, an Information Workspace. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '91). ACM, New York, NY, USA, 181--186.
[13]
S. Grace Chang, Bin Yu, and Martin Vetterli. 2000. Adaptive Wavelet Thresholding for Image Denoising and Compression. IEEE TRANSACTIONS ON IMAGE PROCESSING 9, 9 (2000), 1532--1546.
[14]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 609--622.
[15]
Jason Clemons, Chih C. Cheng, Iuri Frosio, Daniel Johnson, and Stephen W. Keckler. 2016. A patch memory system for image processing and computer vision. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.
[16]
Kostadin Dabov, Alessandro Foi, and Karen Egiazarian. 2007. Video denoising by sparse 3D transform-domain collaborative filtering. In 2007 15th European Signal Processing Conference. 145--149.
[17]
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2006. Image denoising with block-matching and 3D filtering. In Electronic Imaging 2006. International Society for Optics and Photonics, 606414--606414.
[18]
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Transactions on Image Processing 16, 8 (Aug 2007), 2080--2095.
[19]
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Joint image sharpening and denoising by 3D transform-domain collaborative filtering. In Proc. 2007 Int. TICSP Workshop Spectral Meth. Multirate Signal Process., SMMSP, Vol. 2007. Citeseer.
[20]
Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2008. Image restoration by sparse 3D transform-domain collaborative filtering. In Electronic Imaging 2008. International Society for Optics and Photonics, 681207--681207.
[21]
AramDanielyan, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2008. Image and video super-resolution via spatially adaptive blockmatching filtering. In Proceedings of International Workshop on Local and non-Local Approximation in Image Processing (LNLA).
[22]
Aram Danielyan, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2008. Image upsampling via spatially adaptive block-matching filtering. In 2008 16th European Signal Processing Conference. 1--5.
[23]
David Honzátko. 2015. GPU Acceleration of Advanced Image Denoising. Ph.D. Dissertation. Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic, https://is.cuni.cz/webapps/zzp/download/130165253/?lang=en
[24]
Karen Egiazarian, Jaakko Astola, Mika Helsingius, and Pauli Kuosmanen. 1999. Adaptive denoising and lossy compression of images in transform domain. Journal of Electronic Imaging 8, 3 (1999), 233--245.
[25]
Michael Elad and Michal Aharon. 2006. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. IEEE Transactions on Image Processing 15, 12 (Dec 2006), 3736--3745.
[26]
Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. 2007. Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images. IEEE Transactions on Image Processing 16, 5 (May 2007), 1395--1411.
[27]
Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. 2016. Deep Joint Demosaicking and Denoising. ACM Trans. Graph. 35, 6, Article 191 (Nov. 2016), 12 pages
[28]
Jose A. Guerrero-Colon and Javier Portilla. 2005. Two-level adaptive denoising using Gaussian scale mixtures in overcomplete oriented pyramids. In IEEE International Conference on Image Processing 2005, Vol. 1. I-105--8.
[29]
Felix Heide, Markus Steinberger, Yun-Ta Tsai, Mushfiqur Rouf, Dawid Pająk, Dikpal Reddy, Orazio Gallo, Jing Liu abd Wolfgang Heidrich, Karen Egiazarian, Jan Kautz, and Kari Pulli. 2014. FlexISP: A Flexible Camera Image Processing Framework. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia 2014) 33, 6 (December 2014).
[30]
Viren Jain and Sebastian Seung. 2009. Natural Image Denoising with Convolutional Networks. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.). Curran Associates, Inc., 769--776. http://papers.nips.cc/paper/3506-natural-image-denoising-with-convolutional-networks.pdf
[31]
Lynn Jenner. 2015. Hubble's High-Definition Panoramic View of the Andromeda Galaxy. https://www.nasa.gov/content/goddard/hubble-s-high-definition-panoramic-view-of-the-andromeda-galaxy. (Jan. 5 2015).
[32]
Gerald C. Kane and Alexandra Pear. 2016. The Rise of Visual Content Online, http://sloanreview.mit.edu/article/the-rise-of-visual-content-online/. (Jan. 4 2016).
[33]
Charles Kervrann and Jérôme Boulanger. 2006. Optimal Spatial Adaptation for Patch-Based Image Denoising. IEEE Transactions on Image Processing 15, 10 (Oct 2006), 2866--2878.
[34]
John E. Krist. 1992. Deconvolution of hubble space telescope images using simulated point spread functions. In Astronomical Data Analysis Software and Systems I, Vol. 25. 226.
[35]
Sheng Li, Jung H. Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 469--480.
[36]
Markku Mäkitalo and Alessandro Foi. 2011. Spatially adaptive alpha-rooting in BM3D sharpening. In Image Processing: Algorithms and Systems IX, San Francisco, California, USA, January 24--25, 2011. 787012.
[37]
Robert B. Miller. 1968. Response Time in Man-computer Conversational Transactions. In Proceedings of the December 9--11, 1968, Fall Joint Computer Conference, Part I (AFIPS '68 (Fall, part I)). ACM, New York, NY, USA, 267--277.
[38]
MihirMody. 2016. ADAS Front Camera: Demystifying Resolution and Frame-Rate. http://www.eetimes.com/author.asp?section_id=36&doc_id=1329109. (March 7 2016).
[39]
Junichi Nakamura. 2005. Image Sensors and Signal Processing for Digital Still Cameras. CRC Press, Inc., Boca Raton, EL, USA.
[40]
Jakob Nielsen. 2009. Powers of 10: Time Scales in User Experience, https://www.nngroup.com/articles/powers-of-10-time-scales-in-ux/. (Oct. 5 2009).
[41]
Wayne T. Padgett and David V. Anderson. 2009. Fixed-Point Signal Processing. Morgan & Claypool. https://books.google.ca/books?id=h590cd_BagMC
[42]
Matt Poremba, Sparsh Mittal, Dong Li, Jeffrey S. Vetter, and Yuan Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1543--1546.
[43]
Javier Portilla, Vasily Strela, Martin J. Wainwright, and Eero P. Simoncelli. 2003. Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Transactions on Image Processing 12, 11 (Nov 2003), 1338--1351.
[44]
Rajeev Ramanath, Wesley E. Snyder, Youngjun Yoo, and Mark S. Drew. 2005. Color image processing pipeline. IEEE Signal Processing Magazine 22, 1 (Jan 2005), 34--43.
[45]
Marc'Aurelio Ranzato, Y-lan Boureau, Sumit Chopra, and Yann Lecun. 2007. A Unified Energy-Based Framework for Unsupervised Learning. In Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS-07), Marina Meila and Xiaotong Shen (Eds.), Vol. 2. Journal of Machine Learning Research - Proceedings Track, 371--379. http://jmlr.csail.mit.edu/proceedings/papers/v2/ranzato07a/ranzato07a.pdf
[46]
Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (Jan 2011), 16--19.
[47]
Sampsa Sarjanoja, Jani Boutellier, and Jari Hannuksela. 2015. BM3D image denoising using heterogeneous computing platforms. In 2015 Conference on Design and Architectures for Signal and Image Processing (DASIP). 1--8.
[48]
Yun-Ta Tsai, Markus Steinberger, Dawid Pająk, and Kari Pulli. 2014. Fast ANN for High-quality Collaborative Filtering. In Proceedings of High Performance Graphics (HPG '14). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 61--70. http://dl.acm.org/citation.cfm?id=2980009.2980016
[49]
Gerd Waloszek and Ulrich Kreichgauer. 2009. User-Centered Evaluation of the Responsiveness of Applications. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I (INTERACT '09). Springer-Verlag, Berlin, Heidelberg, 239--242.
[50]
Vincent M. Weaver, Matt Johnson, Kiran Kasichayanula, James Ralph, Piotr Luszczek, Dan Terpstra, and Shirley Moore. 2012. Measuring Energy and Power with PAPI. In 2012 41st International Conference on Parallel Processing Workshops. 262--268.
[51]
Paul Worthington. 2014. One Trillion Photos in 2015. http://mylio.com/true-stories/tech-today/one-trillion-photos-in-2015-2. (Dec. 11 2014).
[52]
Ahmad Yasin. 2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.
[53]
Reza Yazdani, Albert Segura, Jose-Maria Arnau, and Antonio Gonzalez. 2016. An ultra low-power hardware accelerator for automatic speech recognition. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--12.
[54]
Hao Zhang, Wenjiang Liu, Ruolin Wang, Tao Liu, and Mengtian Rong. 2016. Hardware architecture design of block-matching and 3D-filtering denoising algorithm. Journal of Shanghai Jiaotong University (Science) 21, 2 (2016), 173--183.
[55]
S. Zhang and E. Salari. 2005. Image denoising using a neural network based nonlinear filter in wavelet domain. In Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., Vol. 2. ii/989--ii/992 Vol. 2.

Cited By

View all
  • (2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
  • (2023)Memristor-Based Light-Weight Transformer Circuit Implementation for Speech RecognizingIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323758213:1(344-356)Online publication date: Mar-2023
  • (2022)Analog Image Denoising with an Adaptive Memristive Crossbar Network2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937269(3453-3457)Online publication date: 28-May-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
October 2017
850 pages
ISBN:9781450349529
DOI:10.1145/3123939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator
  2. computational imaging
  3. image denoising
  4. neural networks

Qualifiers

  • Research-article

Funding Sources

  • NSERC

Conference

MICRO-50
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589076(1-13)Online publication date: 17-Jun-2023
  • (2023)Memristor-Based Light-Weight Transformer Circuit Implementation for Speech RecognizingIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323758213:1(344-356)Online publication date: Mar-2023
  • (2022)Analog Image Denoising with an Adaptive Memristive Crossbar Network2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937269(3453-3457)Online publication date: 28-May-2022
  • (2020)Efficient Nearest-Neighbor Data Sharing in GPUsACM Transactions on Architecture and Code Optimization10.1145/342998118:1(1-26)Online publication date: 30-Dec-2020
  • (2020)Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00087(1037-1050)Online publication date: Oct-2020
  • (2019)eCNNProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358263(182-195)Online publication date: 12-Oct-2019
  • (2019)CoNDAProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322266(629-642)Online publication date: 22-Jun-2019
  • (2018)DiffyProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00020(134-147)Online publication date: 20-Oct-2018
  • (2018)Neda: Supporting direct inter-core neighbor data exchange in GPUsIEEE Computer Architecture Letters10.1109/LCA.2018.2873679(1-1)Online publication date: 2018
  • (2018)EuphratesProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00052(547-560)Online publication date: 2-Jun-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media