The Impact of 3D Stacking and Technology Scaling on the Power and Area of Stereo Matching Processors
<p>(<b>a</b>) Left image (R<sub>win</sub>: reference window); (<b>b</b>) right image (d<sub>x</sub>: disparity range, C<sub>win</sub>: candidate window); (<b>c</b>) dissimilarity between R<sub>win</sub> and C<sub>win</sub>; and (<b>d</b>) a depth map.</p> "> Figure 2
<p>Flow diagram of the stereo matching processor.</p> "> Figure 3
<p>Illustration of the multiple-read, single-write operation of the stereo matching algorithm.</p> "> Figure 4
<p>Pipelined hardware architecture of our stereo matching processor.</p> "> Figure 5
<p>Via-first bonding technology used in this paper: (<b>a</b>) Side view of via-first TSVs; and (<b>b</b>) top-down view of TSVs.</p> "> Figure 6
<p>2D and 3D IC design flow.</p> "> Figure 7
<p>(<b>a</b>) The conventional macro-level partitioning method; and (<b>b</b>) the proposed pipeline-level partitioning method.</p> "> Figure 8
<p>An illustration of the proposed pipeline-level partitioning method: (<b>a</b>) Split the pipeline stages into two tiers, and (<b>b</b>) adjust the number of SRAMs in each tier.</p> "> Figure 9
<p>Overall flow of the power and timing analyses for a 3D IC.</p> "> Figure 10
<p>Comparisons between the normalized designs of 2D and 3D ICs: (<b>a</b>) 2D and 3D ICs designed in 130-nm process technology; and (<b>b</b>) 2D and 3D ICs designed in 45-nm process technology.</p> "> Figure 11
<p>Layout snapshots of 2D and 3D ICs designed in 130-nm process technology: (<b>a</b>) 2D IC (2D-130); (<b>b</b>) the top and bottom tiers of a 3D IC using macro-level partitioning (3D-MP-130); and (<b>c</b>) the top and bottom tiers of a 3D IC using pipeline-level partitioning (3D-PP-130).</p> "> Figure 12
<p>Layout snapshots of 2D and 3D ICs designed in 45-nm process technology: (<b>a</b>) 2D IC (2D-45); (<b>b</b>) the top and bottom tiers of a 3D IC using macro-level partitioning (3D-MP-45); and (<b>c</b>) the top and bottom tiers of a 3D IC using pipeline-level partitioning (3D-PP-45).</p> "> Figure 13
<p>Normalized power comparisons of 2D and 3D ICs: (<b>a</b>) 130-nm process technology and (<b>b</b>) 45-nm process technology.</p> "> Figure 14
<p>Normalized power comparisons of 2D and 3D ICs: (<b>a</b>) 130-nm process technology and (<b>b</b>) 45-nm process technology.</p> "> Figure 15
<p>Comparisons of the normalized power of 2D and 3D ICs as a function of switching activity: (<b>a</b>) Total power; (<b>b</b>) net switching power; (<b>c</b>) cell internal power; (<b>d</b>) cell leakage power. Note that the power consumption of 2D-130 actually increases as the switching activity increases.</p> "> Figure 16
<p>Comparisons of the normalized power of 2D and 3D ICs as a function of switching activity: (<b>a</b>) Total power; (<b>b</b>) net switching power; (<b>c</b>) cell internal power; (<b>d</b>) cell leakage power. Note that the power consumption of 2D-45 actually increases as the switching activity increases.</p> ">
Abstract
:1. Introduction
2. Stereo Matching Processor
2.1. Matching Algorithm
2.2. Hardware Architecture
3. Design and Analysis Flow
3.1. Design Environments
3.2. Design Flow
3.2.1. Macro-Level Partitioning (MP) Method
3.2.2. Pipeline-Level Partitioning (PP) Method
3.2.3. Comparison of the Partitioning Methods
3.3. Timing and Power Analysis Flow
4. Experimental Results
4.1. Overall Layout Comparisons
4.2. Detailed Power Analysis
4.3. Impact of Switching Activity
4.4. Comparisons of the Results with the Related Studies
5. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Szeliski, R. Computer Vision: Algorithms and Applications, 1st ed.; Springer: New York, NY, USA, 2010; pp. 533–576. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Van der Mark, W.; Gavrila, D.M. Real-time dense stereo for intelligent vehicles. IEEE Trans. Intell. Transp. Syst. 2006, 7, 38–50. [Google Scholar] [CrossRef]
- DeSouza, G.N.; Kak, A.C. Vision for mobile robot navigation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 237–267. [Google Scholar] [CrossRef]
- Howard, A. Real-time stereo visual odometry for autonomous ground vehicles. In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2008), Nice, France, 22–26 September 2008; pp. 3946–3952.
- Zhou, L.; Sun, T.; Zhan, Y.; Wang, J. Software and hardware implementations of stereo matching. Int. J. Signal Proc. Image Proc. Pattern. Recogn. 2014, 7, 37–56. [Google Scholar] [CrossRef]
- Ouyang, J.; Sun, G.; Chen, Y.; Duan, L.; Zhang, T.; Xie, Y.; Irwin, M.J. Arithmetic unit design using 180 nm TSV-based 3D stacking technology. In Proceedings of the IEEE International Conference on 3D System Integration, San Francisco, CA, USA, 28–30 September 2009; pp. 1–4.
- Thorolfsson, T.; Gonsalves, K.; Franzon, P. Design automation for a 3DIC FFT processor for synthetic aperture radar: A case study. In Proceedings of the Design Automation Conference (DAC), San Francisco, CA, USA, 26–31 July 2009; pp. 51–56.
- Neela, G.; Draper, J. Challenges in 3DIC implementation of a design using current CAD tools. In Proceedings of the 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), Boise, ID, USA, 5–8 August 2012; pp. 478–481.
- Kim, D.H.; Athikulwongse, K.; Healy, M.; Hossain, M.; Jung, M.; Khorosh, I.; Kumar, G.; Lee, Y.-J.; Lewis, D.; Lin, T.-W.; et al. Design and analysis of 3D-MAPS (3D Massively Parallel Processor with Stacked Memory). IEEE Trans. Comput. 2015, 64, 112–125. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, K.; Feng, Y.; Chen, Y.; Li, Q.; Shao, B.; Xie, J.; Song, X.; Duan, L.; Xie, Y.; et al. A 3D SoC design for H.264 application with on-chip DRAM stacking. In Proceedings of the 3D Systems Integration Conference (3DIC), Munich, Germany, 16–18 November 2010; pp. 1–6.
- Saito, H.; Nakajima, M.; Okamoto, T.; Yamada, Y.; Ohuchi, A.; Iguchi, N.; Sakamoto, T.; Yamaguchi, K.; Mizuno, M. A chip-stacked memory for on-chip SRAM-rich SoCs and processors. IEEE. J. Solid Sate Circ. 2010, 45, 15–22. [Google Scholar] [CrossRef]
- Franzon, P.D.; Davis, W.R.; Thorolfsson, T.; Melamed, S. 3D specific systems: Design and CAD. In Proceedings of the Asian Test Symposium, New Delhi, India, 20–23 November 2011; pp. 470–473.
- Oh, E.C.; Franzon, P.D. Design considerations and benefits of three-dimensional ternary content addressable memory. In Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, USA, 16–19 September 2007; pp. 591–594.
- Ok, S.-H.; Bae, K.-R.; Lim, S.K.; Moon, B. Design and analysis of 3D IC-based low power stereo matching processors. In Proceedings of the 2013 IEEE International Symposium on Low Power Electronics and Design (ISLPED), Beijing, China, 4–6 September 2013; pp. 15–20.
- Zhang, X.; Chen, Z. SAD-based stereo vision machine on a system-on-programmable-chip (SoPC). Sensors 2013, 13, 3014–3027. [Google Scholar] [CrossRef] [PubMed]
- Perri, S.; Colonna, D.; Zicari, P.; Corsonello, P. SAD-based stereo matching circuit for FPGAs. In Proceedings of the 13th IEEE International Conference on Electronics, Circuits and Systems, Nice, France, 10–13 December 2006; pp. 846–849.
- Zabih, R.; Woodfill, J.W. Non-parametric local transforms for computing visual correspondence. In Proceedings of the third European conference on Computer Vision, Stockholm, Sweden, 2–6 May 1994; pp. 151–158.
- Xiaoyan, H.; Mordohai, P. A quantitative evaluation of confidence measures for stereo vision. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2121–2133. [Google Scholar] [CrossRef] [PubMed]
- Banks, J.; Bennamoun, M.; Corke, P. Non-parametric techniques for fast and robust stereo matching. In Proceedings of the Speech and Image Technologies for Computing and Telecommunications (TENCON ‘97), Brisbane, Australia, 2–4 December 1997; pp. 365–368.
- Hirschmuller, H.; Scharstein, D. Evaluation of cost functions for stereo matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8.
- Bae, K.R.; Son, H.S.; Hyun, J.; Moon, B. A census-based stereo matching algorithm with multiple sparse windows. In Proceedings of the 2015 Seventh International Conference on Ubiquitous and Future Networks, Sapporo, Japan, 7–10 July 2015; pp. 240–245.
Authors | 3D IC Design | Process Technology | Key Features |
---|---|---|---|
Ouyang et al. [7] | Arithmetic units with three logic tiers | Massachusetts Institute of Technology (MIT) Lincoln Lab’s 180 nm | 11.0% ~ 46.1% reduction in power |
Thorolfsson et al. [8] | FFT processor with two logic tiers and one static random access memory (SRAM) tier | MIT Lincoln Lab’s 180 nm | 56.9% reduction in wire length, 4.4% reduction in logic power |
Neela et al. [9] | Single precision floating-point unit with two logic tiers | GlobalFoundries 130 nm | 41.5% reduction in footprint, 3% increase in frequency |
Kim et al. [10] | 64 processors with one logic tier and one SRAM tier | GlobalFoundries 130 nm | 63.8 GB/s memory bandwidth, power consumption up to 4.0 W |
Zhang et al. [11] | Syntem-on-Chip with two logic tiers and three DRAM tiers | GlobalFoundries 130 nm | 12.57 mW power consumption, 8.5 GB/s bandwidth |
Saito et al. [12] | Dynamic-reconfigurable memory with one logic tier and one SRAM tier | 90-nm process technology | 63% reduction in area and 43% reduction in latency |
Franzon et al. [13] | Digital signal processor with two logic tiers and one SRAM tier | MIT Lincoln Lab’s 180 nm | 25% reduction in total power (logic and memory) |
Oh et al. [14] | Ternary content-addressable memory with three tiers | MIT Lincoln Lab’s 180 nm | 21% reduction in total power |
Feature Type | 130-nm Process Technology | 45-nm Process Technology |
---|---|---|
Maximum frequency | 312 MHz | 556 MHz |
Image size (pixel) | 752 × 480 | 752 × 480 |
Window size (pixel) | 15 × 15 | 15 × 15 |
Disparity range (pixel) | 64 | 64 |
Maximum frame rate | 108 frames/s | 192 frames/s |
Maximum bandwidth | 12.8 GB/s | 22.8 GB/s |
Process Technology | Area (μm × μm) | Clock Period | # of SRAM | SRAM Capacity |
---|---|---|---|---|
130-nm | 2,977,542 | 3.2 ns | 44 | 44 × 752 bytes = 31.3 kB |
45-nm | 451,797 | 1.8 ns | 44 | 44 × 752 bytes = 31.3 kB |
Partition Method | Type of Signal | # of TSVs |
---|---|---|
Macro-level Partition (MP) | SRAM control signals | 29 |
SRAM address signals | 20 | |
SRAM data signlas | 376 | |
Total Number of TSVs | 425 | |
Pipeline-level Partition (PP) | Logic signals between pipeline stages | 56 |
SRAM control signals | 41 | |
SRAM address signals | 20 | |
SRAM data signlas | 104 | |
Total Number of TSVs | 221 |
# of Pipeline Stage | Name | 130-nm Process Technology | 45-nm Process Technology | ||
---|---|---|---|---|---|
Cell Area (μm × μm) | % | Cell Area (μm × μm) | % | ||
1 | 32 SRAM macros | 1,294,065 | 43.5 | 194,109 | 43.0 |
2 | Hamming weight | 255,783 | 8.6 | 42,660 | 9.4 |
3 | Hamming distance | 459,072 | 15.4 | 68,576 | 15.2 |
4 | 12 SRAM macros | 485,274 | 16.3 | 72,791 | 16.1 |
5 | Median filter | 241,725 | 8.1 | 32,831 | 7.3 |
6 | Disparity diffusion | 241,623 | 8.1 | 40,831 | 9.0 |
2D-130 | 3D-MP-130 | 3D-PP-130 | |||
---|---|---|---|---|---|
Top Tier | Bottom Tier | Top Tier | Bottom Tier | ||
Clock period (ns) | 3.2 | 3.2 | 3.2 | 3.2 | 3.2 |
Footprint (μm × μm) | 2350 × 2350 | 2350 × 1350 | 2350 × 1350 | 2350 × 1350 | 2350 × 1350 |
# of gates | 97,630 | 87,245 | 205 | 50,113 | 39,528 |
Total wire lengths (μm) | 5,488,514 | 4,507,365 | 220,485 | 2,662,651 | 2,582,875 |
Clock net wire lengths | 231,232 | 184,683 | 18,293 | 119,704 | 95,636 |
Total # of buffers | 18,968 | 15,242 | 161 | 8575 | 7051 |
# of clock tree buffers | 871 | 740 | 61 | 508 | 399 |
Power (mW) | 1006.31 | 871.04 | 931.30 |
2D-45 | 3D-MP-45 | 3D-PP-45 | |||
---|---|---|---|---|---|
Top Tier | Bottom Tier | Top Tier | Bottom Tier | ||
Clock period (ns) | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 |
Footprint (μm × μm) | 830 × 830 | 830 × 485 | 830 × 485 | 830 × 485 | 830 × 485 |
# of gates | 123,659 | 101,640 | 165 | 58,745 | 40,859 |
Total wire lengths (μm) | 1,918,625 | 1,629,674 | 75,494 | 988,637 | 881,191 |
Clock net wire lengths | 86,758 | 75,249 | 5950 | 47,981 | 38,455 |
Total # of buffers | 34,520 | 24,198 | 121 | 12,704 | 9863 |
# of clock tree buffers | 1451 | 1415 | 52 | 909 | 692 |
Power (mW) | 273.79 | 252.56 | 253.44 |
Design Type | Power Group | Cell Internal (mW) | Net Switching (mW) | Cell Leakage (mW) | Total (mW) | Percentage (%) |
---|---|---|---|---|---|---|
2D-130 | Memory | 353.70 | 1.33 | 0.08 | 355.11 | 35.29 |
Clock Network | 246.10 | 71.0 | 0.00 | 317.10 | 31.51 | |
Register | 41.0 | 14.70 | 0.00 | 55.70 | 5.54 | |
Combinational Logic | 135.80 | 142.60 | 0.00 | 278.40 | 27.67 | |
Total | 776.60 | 229.63 | 0.08 | 1006.31 | 100.0 | |
Percentage (%) | 77.17 | 22.82 | 0.01 | 100.00 | n/a | |
3D-MP-130 | Memory | 353.80 | 1.46 | 0.08 | 355.34 | 40.79 |
Clock Network | 209.70 | 68.20 | 0.00 | 277.90 | 31.90 | |
Register | 31.80 | 12.40 | 0.00 | 44.20 | 5.07 | |
Combinational Logic | 89.80 | 103.80 | 0.00 | 193.60 | 22.23 | |
Total | 685.10 | 185.86 | 0.08 | 871.04 | 100.00 | |
Percentage (%) | 78.65 | 21.34 | 0.01 | 100.00 | n/a | |
3D-PP-130 | Memory | 353.70 | 2.02 | 0.08 | 355.80 | 38.20 |
Clock Network | 219.50 | 68.80 | 0.00 | 288.30 | 30.96 | |
Register | 36.40 | 13.70 | 0.00 | 50.10 | 5.38 | |
Combinational Logic | 113.80 | 123.30 | 0.00 | 237.10 | 25.46 | |
Total | 723.40 | 207.82 | 0.08 | 931.30 | 100.00 | |
Percentage (%) | 77.68 | 22.32 | 0.01 | 100.00 | n/a |
Design Type | Power Group | Cell Internal (mW) | Net Switching (mW) | Cell Leakage (mW) | Total (mW) | Percentage (%) |
---|---|---|---|---|---|---|
2D-45 | Memory | 79.20 | 0.39 | 5.05 | 84.64 | 30.9% |
Clock Network | 37.30 | 28.10 | 0.13 | 65.53 | 23.9% | |
Register | 15.20 | 5.44 | 0.72 | 21.35 | 7.8% | |
Combinational Logic | 49.90 | 50.40 | 1.97 | 102.27 | 37.4% | |
Total | 181.60 | 84.32 | 7.87 | 273.79 | 1.00 | |
Percentage (%) | 66.33 | 30.80 | 2.87 | 100.00 | n/a | |
3D-MP-45 | Memory | 79.2 | 0.4548 | 5.051 | 84.71 | 33.5% |
Clock Network | 38.4 | 28 | 0.1299 | 66.53 | 26.3% | |
Register | 15.2 | 4.687 | 0.7167 | 20.60 | 8.2% | |
Combinational Logic | 39.6 | 39.7 | 1.419 | 80.72 | 32.0% | |
Total | 172.40 | 72.84 | 7.32 | 252.56 | 100 | |
Percentage (%) | 68.26 | 28.84 | 2.90 | 100.00 | n/a | |
3D-PP-45 | Memory | 79.2 | 0.6416 | 5.051 | 84.89 | 33.5% |
Clock Network | 36.5 | 28.4 | 0.1349 | 65.03 | 25.7% | |
Register | 15.2 | 5.296 | 0.7156 | 21.21 | 8.4% | |
Combinational Logic | 41.1 | 39.7 | 1.505 | 82.31 | 32.5% | |
Total (mW) | 172.00 | 74.04 | 7.41 | 253.44 | 100 | |
Percentage (%) | 67.87 | 29.21 | 2.92 | 100.00 | n/a |
Power (mW) | Wire Length (m) | |||||||
---|---|---|---|---|---|---|---|---|
2D IC | 3D IC | ∆ (%) | 2D IC | 3D IC | ∆ (%) | |||
Proposed | 2D-130 | 3D-MP-130 | 1006.3 | 871 | −13.4% | 5.489 | 4.728 | −13.9% |
3D-PP-130 | 931.3 | −7.5% | 5.246 | −4.4% | ||||
2D-45 | 3D-MP-45 | 273.8 | 252.6 | −7.7% | 1.919 | 1.705 | −11.1% | |
3D-PP-45 | 253.4 | −7.5% | 1.870 | −2.5% | ||||
Related Studies | Ouyang et al. [7] | n/a | n/a | n/a | n/a | n/a | n/a | |
Thorolfsson et al. [8] | 340.0 | 324.9 | −4.4% | 19.107 | 8.238 | −56.9% | ||
Neela et al. [9] | 9.95 | 10.72 | 7.7% | 10.37 | 10.96 | 5.7% | ||
Kim et al. [10] | n/a | 4032.0 | n/a | n/a | n/a | n/a | ||
Zhang et al. [11] | n/a | 12.57 | n/a | n/a | n/a | n/a | ||
Saito et al. [12] | n/a | 120.0 | n/a | n/a | n/a | n/a | ||
Franzon et al. [13] | 340.0 | 324.9 | −4.4% | 19.107 | 8.238 | −56.9% | ||
Oh et al. [14] | 0.042 | 0.033 | −21.5% | n/a | n/a | n/a |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ok, S.-H.; Lee, Y.-H.; Shim, J.H.; Lim, S.K.; Moon, B. The Impact of 3D Stacking and Technology Scaling on the Power and Area of Stereo Matching Processors. Sensors 2017, 17, 426. https://doi.org/10.3390/s17020426
Ok S-H, Lee Y-H, Shim JH, Lim SK, Moon B. The Impact of 3D Stacking and Technology Scaling on the Power and Area of Stereo Matching Processors. Sensors. 2017; 17(2):426. https://doi.org/10.3390/s17020426
Chicago/Turabian StyleOk, Seung-Ho, Yong-Hwan Lee, Jae Hoon Shim, Sung Kyu Lim, and Byungin Moon. 2017. "The Impact of 3D Stacking and Technology Scaling on the Power and Area of Stereo Matching Processors" Sensors 17, no. 2: 426. https://doi.org/10.3390/s17020426
APA StyleOk, S. -H., Lee, Y. -H., Shim, J. H., Lim, S. K., & Moon, B. (2017). The Impact of 3D Stacking and Technology Scaling on the Power and Area of Stereo Matching Processors. Sensors, 17(2), 426. https://doi.org/10.3390/s17020426