Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1168857.1168873acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Published: 20 October 2006 Publication History

Abstract

In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.

References

[1]
ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.
[2]
Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.
[3]
FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.
[4]
The Micron system-power calculator. http://www.micron.com/products/dram/syscalc.html.
[5]
National semiconductor DP83820 10 / 100 / 1000 Mb/s PCI ethernet network interface controller.
[6]
Predictive technology model. http://www.eas.asu.edu/~ptm.
[7]
(LS)3-libre streaming, libre software, libre standards an open multimedia streaming project. http://streaming.polito.it/.
[8]
RLDRAM memory. http://www.micron.com/products/dram/rldram/.
[9]
SPECweb99 benchmark. http://www.spec.org/osg/web99/.
[10]
Sun Fire T2000 Server Power Calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.
[11]
ITRS roadmap. Technical report, 2005.
[12]
K. Banerjee, S.J. Souri, P. Kapur, and K.C. Saraswat. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. of IEEE, 89(5):602--533, May 2001.
[13]
P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pages 151--160, 1998.
[14]
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proc. Int'l Symp. on Computer Architecture, June 2000.
[15]
N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, Jul/Aug 2006.
[16]
B. Black, D. Nelson, C. Webb, and N. Samra. 3D processing technology and its impact on iA32 microprocessors. In Proc. Int'l Conf. of Computer Design, pages 316--318, 2004.
[17]
T.-Y. Chiang, S.J. Souri, C.O. Chui, and K.C. Saraswat. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Technical Digest, pages 681--684, Dec. 2001.
[18]
L.T. Clark, E.J. Hoffman, J. Miller, M. Biyani, Y. Liao, S. Strazdus, M. Morrow, K.E. Verlarde, and M.A. Yarch. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid State Circuits, 36(11):1599--1608, Nov. 2001.
[19]
E.L. Congduc. Packet classification in the NIC for improved SMPbased internet servers. In Proc. Int'l Conf. on Networking, Feb. 2004.
[20]
W.R. Davis, J.Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A.M. Sule, M. Steer, and P.D. Franzon. Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design & Test of Computers, 22(6):498--510, 2005.
[21]
M.J. Flynn and P. Hung. Computer architecture and technology: Some thoughts on the road ahead. In Proc. Int'l Conf. on Engineering of Reconfigurable Systems and Algorithms, pages 3--16, 2004.
[22]
B. Goplen and S.S. Sapatnekar. Thermal via placement in 3D ICs. In Proc. Int'l Symp. on Physical Design, pages 167--174, Apr. 2005.
[23]
S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for producing 3D ICs with high-density interconnect. www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.
[24]
R. Ho and M. Horowitz. The future of wires. Proc. of the IEEE, 89(4), Apr. 2001.
[25]
W. Huang, M.R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. Compact thermal modeling for temperature-aware design. In Proc. Design Automation Conf., June 2004.
[26]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21--29, Mar. 2005.
[27]
M. Koyanagi. Different approaches to 3D chips. http://asia.stanford.edu/events/Spring05/slides/051205-Koyanagi.pdf.
[28]
C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, D. Patterson, and K. Yelick. Vector IRAM: A mediaoriented vector processor with embedded DRAM. In Hotchips, Aug. 2000.
[29]
J. Laudon. Performance/watt: the new server focus. SIGARCH Computer Architecture News, 33(4):5--13, 2005.
[30]
K. Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, K. Park, H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer stacking technology. In IEDM Technical Digest., pages 165--168, Dec 2000.
[31]
J. Li and J.F. Martinez. Power-performance implications of threadlevel parallelism in chip multiprocessors. In Proc. Int'l Symp. on Performance Analysis of Systems and Software, Mar. 2005.
[32]
J. Lu. Wafer-level 3D hyper-integration technology platform. www.rpi.edu/~luj/RPI_3D_Research_0504.pdf.
[33]
G. MacGillivray. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.
[34]
D.A. Maltz and P. Bhagwat. TCP splicing for application layer proxy performance. Research Report RC 21139, IBM, Mar. 1998.
[35]
R.E. Matick and S.E. Schuster. Logic-based eDRAM: origins and rationale for use. IBM Journal of Research and Development, 49(1), Jan. 2005.
[36]
T. Mudge. Power: A first-class architectural design constraint. IEEE Computer, 34(4), Apr. 2001.
[37]
K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Proc. Int'l Conf. on Arch. Support for Prog. Lang. and Oper. Sys., Oct. 1996.
[38]
A. Rahman and R. Reif. System-level performance evaluation of three-dimensional integrated circuits. IEEE Trans. on VLSI, 8, Dec. 2000.
[39]
F. Ricci, L.T. Clark, T. Beatty, W. Yu, A. Bashmakov, S. Demmons, E. Fox, J. Miller, M. Biyani, and J. Haigh. A 1.5GHz 90nm embedded microprocessor core. In Proc. Symp. on VLSI Circuits, June 2005.
[40]
J. Schutz and C. Webb. A scalable X86 CPU design for 90 nm process. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.
[41]
D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar. A 4MB on-chip l2 cache for a 90nm 1.6GHz 64b SPARC microprocessor. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.
[42]
L. Xue, C.C. Liu, H.-S. Kim, S. Kim, and S. Tiwari. Threedimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Trans. on Electron Devices, 50:601--609, May 2003.

Cited By

View all
  • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
  • (2022)Using Chiplet Encapsulation Technology to Achieve Processing-in-Memory FunctionsMicromachines10.3390/mi1310179013:10(1790)Online publication date: 20-Oct-2022
  • (2021)Performance and Area Trade-Off of 3D-Stacked DRAM Based Chip Multiprocessor with Hybrid InterconnectIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2019.29468879:4(1945-1959)Online publication date: 1-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
October 2006
440 pages
ISBN:1595934510
DOI:10.1145/1168857
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 41, Issue 11
    Proceedings of the 2006 ASPLOS Conference
    November 2006
    425 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1168918
    Issue’s Table of Contents
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 40, Issue 5
    Proceedings of the 2006 ASPLOS Conference
    December 2006
    425 pages
    ISSN:0163-5980
    DOI:10.1145/1168917
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 34, Issue 5
    Proceedings of the 2006 ASPLOS Conference
    December 2006
    425 pages
    ISSN:0163-5964
    DOI:10.1145/1168919
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D stacking technology
  2. chip multiprocessor
  3. full-system simulation
  4. low power
  5. tier 1 server
  6. web/file/streaming server

Qualifiers

  • Article

Conference

ASPLOS06

Acceptance Rates

ASPLOS XII Paper Acceptance Rate 38 of 158 submissions, 24%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey of Memory-Centric Energy Efficient Computer ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329759534:10(2657-2670)Online publication date: Oct-2023
  • (2022)Using Chiplet Encapsulation Technology to Achieve Processing-in-Memory FunctionsMicromachines10.3390/mi1310179013:10(1790)Online publication date: 20-Oct-2022
  • (2021)Performance and Area Trade-Off of 3D-Stacked DRAM Based Chip Multiprocessor with Hybrid InterconnectIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2019.29468879:4(1945-1959)Online publication date: 1-Oct-2021
  • (2021)Invalidate or Update? Revisiting Coherence for Tomorrow's Cache Hierarchies2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT52795.2021.00024(226-241)Online publication date: Sep-2021
  • (2020)Polymorphic Memory: A Hybrid Approach for Utilizing On-Chip Memory in Manycore SystemsElectronics10.3390/electronics91220619:12(2061)Online publication date: 3-Dec-2020
  • (2019)Study of Annular Copper-Filled TSVs of Sensor and Interposer Chips for 3-D IntegrationIEEE Transactions on Components, Packaging and Manufacturing Technology10.1109/TCPMT.2019.28961949:3(391-398)Online publication date: Mar-2019
  • (2019)Cool Interconnect: A 1024-bit Wide Bus for Chip-to-Chip Communications in 3-D Integrated CircuitsIEEE Transactions on Components, Packaging and Manufacturing Technology10.1109/TCPMT.2018.28732989:3(525-535)Online publication date: Mar-2019
  • (2019)A New Prewetting Process of Through Silicon Vias (TSV) Electroplating for 3D IntegrationJournal of Microelectromechanical Systems10.1109/JMEMS.2019.290037228:3(447-452)Online publication date: Jun-2019
  • (2018)Contemporary alternatives to traditional processor design in the post Moore's law eraProceedings of the 2nd International Conference on Future Networks and Distributed Systems10.1145/3231053.3231099(1-5)Online publication date: 26-Jun-2018
  • (2018)Farewell my shared LLC!Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00052(559-572)Online publication date: 20-Oct-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media