Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Computer Architecture with Associative Processor Replacing Last-Level Cache and SIMD Accelerator

Published: 01 February 2015 Publication History

Abstract

This study presents a computer architecture, where a last-level cache and a SIMD accelerator are replaced by an associative processor. Associative processor combines data storage and data processing, and functions as a massively parallel SIMD processor and a memory at the same time. An analytic performance model of this computer architecture is introduced. Comparative analysis supported by cycle-accurate simulation and emulation shows that this architecture may outperform a conventional computer architecture comprising a SIMD coprocessor and a shared last-level cache while consuming less power.

References

[2]
D. Aberdeen and J. Baxter, “Emmerald: A fast matrix–matrix multiply using Intel's SSE instructions”, in Concurrency Comput.: Pract. Exp., vol. 13, no. 2, pp. 103–119, 2001.
[3]
G. Almási et al., “Dissecting cyclops: A detailed analysis of a multithreaded architecture”, in ACM SIGARCH Comput. Archit. News, vol. 31, no. 1, pp. 26–38, 2003.
[4]
C. Auth et al., “A 22 nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors,” in Proc. IEEE Symp. VLSI Technol. (VLSIT’12), 2012.
[5]
K. Banerjee et al., “A self-consistent junction temperature estimation methodology for nanometer scale ICs with implications for performance and thermal management,” in Proc. IEEE Int. Electron Devices Meet. (IEDM’03), 2003, pp. 887–890.
[6]
N. Binkert et al., “The gem5 simulator”, in ACM SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1–7, 2011.
[7]
F. Black and M. Scholes, “The pricing of options and corporate liabilities”, in J. Politic. Econ., vol. 81, pp. 637–654, 1973.
[8]
S. Borkar, “Thousand core chips: A technology perspective,” in Proc. ACM/IEEE 44th Des. Autom. Conf. (DAC), 2007, pp. 746–749.
[9]
J. Brockman et al., “A low cost, multithreaded processing-in-memory system,” in Proc. 31st Int. Symp. Comput. Archit., 2004.
[10]
D. Burger and T. Austin, “The SimpleScalar tool set, version 2.0”, in ACM SIGARCH Comput. Archit. News, vol. 25, no. 3, pp. 13–25, 1997.
[11]
A. Cassidy and A. Andreou, “Beyond Amdahl Law—An objective function that links performance gains to delay and energy”, in IEEE Trans. Comput., vol. 61, no. 8, pp. 1110–1126, Aug. 2012.
[12]
H. Flatt and K. Kennedy, “Performance of parallel processors”, in Parallel Comput., vol. 12, no. 1, pp. 1–20, 1989.
[13]
C. Foster, Content Addressable Parallel Processors, New York, NY: Van Nostrand, 1976.
[14]
Y. Fung, “Associative processor architecture—A survey”, in ACM Comput. Surveys J. (CSUR), vol. 9, no. 1, pp. 3–27, Mar. 1977.
[15]
M. Gokhale et al., “Processing in memory: The TeraSys massively parallel PIM array”, in Computer, pp. 23–31, 1995.
[16]
M. Gschwind et al., “Synergistic processing in cell's multicore architecture”, in IEEE Micro, vol. 26, no. 2, pp. 10–24, 2006.
[17]
N. Gunther, S. Subramanyam, and S. Parvu, A Methodology for Optimizing Multithreaded System Scalability on Multi-Cores, [Online]. Available: http://arxiv.org/abs/1105.4301.
[18]
Z. Guz et al., “Threads vs. caches: Modeling the behavior of parallel workloads,” in Proc. IEEE Int. Conf. Comput. Des. (ICCD), Oct. 2010, pp. 274–281.
[19]
M. Hall et al., “Mapping irregular applications to DIVA, a PIM-based data-intensive architecture,” in Proc. ACM/IEEE Conf. Supercomput., 1999.
[20]
N. Hardavellas et al., “Toward dark silicon in servers”, in IEEE Micro, vol. 31, no. 4, pp. 6–15, 2011.
[21]
A. Hartstein et al., “On the nature of cache miss behavior: Is it square root of 2?”, in J. Instruct.-Level Parallel., 2008.
[22]
D. Hentrich et al., “Performance evaluation of SRAM cells in 22 nm predictive CMOS technology”, in Proc. IEEE Int. Conf. Electro/Inf. Technol., 2009.
[23]
M. Hill and M. Marty, “Amdahl's law in the multicore ERA”, in Computer, vol. 41, no. 7, pp. 33–38, Jul. 2008.
[25]
S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, “An auto-tuning framework for parallel multicore stencil computations,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. (IPDPS), 2010, pp. 1–12.
[26]
P. Kogge et al., “PIM architectures to support petaflops level computation in the HTMT machine,” in Proc. Int. Workshop Innov. Archit. Future Gener. Processors Syst., 2000.
[27]
H. Li et al., “An AND-type match line scheme for high-performance energy-efficient content addressable memories”, in IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1108–1119, May 2006.
[28]
G. Lipovski and C. Yu, “The dynamic associative access memory chip and its application to SIMD processing and full-text database retrieval”, in Proc. IEEE Int. Workshop Memory Technol. Des. Test., 1999.
[29]
G. Loh, “The cost of uncore in throughput-oriented many-core processors,” in Proc. Workshop Archit. Lang. Throughput Appl. (ALTA), Jun. 2008.
[30]
D. Luebke, “General-purpose computation on graphics hardware,” in Proc. Workshop, SIGGRAPH, 2004.
[31]
T. Morad et al., “Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors”, in IEEE Comput. Archit. Letters, vol. 5, no. 1, pp. 14–17, Jan./Jun. 2006.
[32]
J. Owens et al., “GPU computing”, in Proc. IEEE, vol. 96, no. 5, pp. 879–899, May 2008.
[33]
K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory (CAM) circuits and architectures: A tutorial and survey”, in IEEE J. Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006.
[34]
F. Pollack, “New microarchitecture challenges in the coming generations of CMOS process technologies,” in Proc. 32nd Annu. ACM/IEEE Int. Symp. Microarchit. (MICRO 32), 1999.
[35]
J. Potter et al., “ASC: An associative-computing paradigm”, in Computer, vol. 27, no. 11, pp. 19–25, 1994.
[36]
J. Potter and W. Meilander, “Array processor supercomputers”, in Proc. IEEE, vol. 77, no. 12, pp. 1896–1914, 1989.
[37]
G. Qing, X. Guo, R. Patel, E. Ipek, and E. Friedman, “AP-DIMM: Associative computing with STT-MRAM,” in Proc. Int. Symp. Comput. Archit. (ISCA), 2013.
[38]
M. Quinn, Designing Efficient Algorithms for Parallel Computers, New York, NY: McGraw-Hill, 1987, p. 125.
[39]
B. Rogers et al., “Scaling the bandwidth wall: Challenges in and avenues for CMP scaling,” in Proc. 36th Annu. Int. Symp. Comput. Archit. (ISCA’09), 2009, pp. 371–382.
[40]
I. Scherson et al., “Bit-parallel arithmetic in a massively-parallel associative processor”, in IEEE Trans. Comput., vol. 41, no. 10, pp. 1201–1210, Oct. 1992.
[41]
J. Sheaffer et al., “Studying thermal management for graphics-processor architectures,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw. (ISPASS), 2005.
[42]
D. Steinkraus, L. Buck, and P. Simard, “Using GPUs for machine learning algorithms,” in Proc. 8th IEEE Int. Conf. Document Anal. Recognit. (ICDAR), 2005.
[43]
T. Sterling and H. Zima, “Gilgamesh: A multithreaded processor-in-memory architecture for petaflops computing,” in Proc. ACM/IEEE Conf. Supercomput., 2002.
[44]
J. Suh et al., “A PIM-based multiprocessor system”, in Proc. 15th Int. Symp. Parallel Distrib. Process., 2001.
[45]
D. Wentzlaff et al., Core count vs. cache size for manycore architectures in the cloud, Massachusetts Inst. Technol., Cambridge, MA, Tech. Rep. MIT-CSAIL-TR-2010-008, 2010.
[46]
L. Yavits et al., “The effect of communication and synchronization on Amdahl's law in multicore systems,” in Parallel Comput., 2014, vol. 40, no. 1, pp. 1–16.
[47]
L. Yavits, Architecture and design of associative processor for image processing and computer vision, M.Sc. thesis, Technion–Israel Instit. Technol., Haifa, Israel: 1994, [Online]. Available: http://webee.technion.ac.il/publication-link/index/id/633.
[48]
L. Yavits et al., Thermal Analysis of 3D Associative Processor, [Online]. Available: http://arxiv.org/abs/1307.3853v1.
[49]
L. Yavits et al., “Cache hierarchy optimization,” in IEEE Comput. Archit. Lett., Jul. 2013.

Cited By

View all
  • (2022)EDAMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527424(495-507)Online publication date: 18-Jun-2022
  • (2022)GIRAF: General Purpose In-Storage Resistive Associative FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306544833:2(276-287)Online publication date: 1-Feb-2022
  • (2022)Compiling All-Digital-Embedded Content Addressable Memories on Chip for Edge ApplicationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.311213541:8(2560-2572)Online publication date: 1-Aug-2022
  • Show More Cited By

Index Terms

  1. Computer Architecture with Associative Processor Replacing Last-Level Cache and SIMD Accelerator
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Computers
    IEEE Transactions on Computers  Volume 64, Issue 2
    Feb. 2015
    297 pages

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 01 February 2015

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)EDAMProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527424(495-507)Online publication date: 18-Jun-2022
    • (2022)GIRAF: General Purpose In-Storage Resistive Associative FrameworkIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.306544833:2(276-287)Online publication date: 1-Feb-2022
    • (2022)Compiling All-Digital-Embedded Content Addressable Memories on Chip for Edge ApplicationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.311213541:8(2560-2572)Online publication date: 1-Aug-2022
    • (2022)AIDA: Associative In-Memory Deep Learning AcceleratorIEEE Micro10.1109/MM.2022.319092442:6(67-75)Online publication date: 1-Nov-2022
    • (2022)A hardware/software co-design methodology for in-memory processorsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.10.009161:C(63-71)Online publication date: 1-Mar-2022
    • (2022)ReCSA: a dedicated sort accelerator using ReRAM-based content addressable memoryFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-1322-317:2Online publication date: 8-Aug-2022
    • (2017)Approximate Memristive In-memory ComputingACM Transactions on Embedded Computing Systems10.1145/312652616:5s(1-18)Online publication date: 27-Sep-2017

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media