SIGARCH: Vol 34, No 2

Packet-based on-chip networks are increasingly being adopted in complex System-on-Chip (SoC) designs supporting numerous homogeneous and heterogeneous functional blocks. These Network-on-Chip (NoC) architectures are required to not only provide ultra-...

article

The BlackWidow High-Radix Clos Network

Pages 16–28https://doi.org/10.1145/1150019.1136488

This paper describes the radix-64 folded-Clos network of the Cray BlackWidow scalable vector multiprocessor. We describe the BlackWidow network which scales to 32K processors with a worstcase diameter of seven hops, and the underlying high-radix router ...

article

Memory Model = Instruction Reordering + Store Atomicity

Pages 29–40https://doi.org/10.1145/1150019.1136489

We present a novel framework for defining memory models in terms of two properties: thread-local Instruction Reordering axioms and Store Atomicity, which describes inter-thread communication via memory. Most memory models have the store atomicity ...

article

Conditional Memory Ordering

Pages 41–52https://doi.org/10.1145/1150019.1136490

Conventional relaxed memory ordering techniques follow a proactive model: at a synchronization point, a processor makes its own updates to memory available to other processors by executing a memory barrier instruction, ensuring that recent writes have ...

article

Architectural Semantics for Practical Transactional Memory

Pages 53–65https://doi.org/10.1145/1150019.1136491

Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software ...

article

Ensemble-level Power Management for Dense Blade Servers

Pages 66–77https://doi.org/10.1145/1150019.1136492

One of the key challenges for high-density servers (e.g., blades) is the increased costs in addressing the power and heat density associated with compaction. Prior approaches have mainly focused on reducing the heat generated at the level of an ...

article

Techniques for Multicore Thermal Management: Classification and New Exploration

Pages 78–88https://doi.org/10.1145/1150019.1136493

Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these ...

article

SODA: A Low-power Architecture For Software Radio

Pages 89–101https://doi.org/10.1145/1150019.1136494

The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult ...

article

An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors

Pages 102–113https://doi.org/10.1145/1150019.1136520

This paper presents a high-availability system architecture called INDRA an INtegrated framework for Dependable and Revivable Architecture that enhances a multicore processor (or CMP) with novel security and fault recovery mechanisms. INDRA represents ...

article

Multiple Instruction Stream Processor

Pages 114–127https://doi.org/10.1145/1150019.1136495

Microprocessor design is undergoing a major paradigm shift towards multi-core designs, in anticipation that future performance gains will come from exploiting threadlevel parallelism in the software. To support this trend, we present a novel processor ...

article

The End of Scaling? Revolutions in Technology and Microarchitecture as We Pass the 90 Nanometer Node

Philip Emma

Page 128https://doi.org/10.1145/1150019.1136496

article

Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

Pages 130–141https://doi.org/10.1145/1150019.1136497

Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures and three-dimensional (3D) designs where multiple ...

article

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Pages 142–154https://doi.org/10.1145/1150019.1136498

An efficient mechanism to track and enforce memory dependences is crucial to an out-of-order microprocessor. The conventional approach of using cross-checked load queue and store queue, while very effective in earlier processor incarnations, suffers ...

article

Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Chuanjun Zhang

Pages 155–166https://doi.org/10.1145/1150019.1136499

Level one cache normally resides on a processor's critical path, which determines the clock frequency. Directmapped caches exhibit fast access time but poor hit rates compared with same sized set-associative caches due to nonuniform accesses to the ...

article

A Case for MLP-Aware Cache Replacement

Pages 167–178https://doi.org/10.1145/1150019.1136501

Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory accesses concurrently. The notion of generating and servicing long-latency cache misses in parallel is called Memory Level Parallelism (MLP). MLP is not ...

article

Improving Cost, Performance, and Security of Memory Encryption and Authentication

Pages 179–190https://doi.org/10.1145/1150019.1136502

Protection from hardware attacks such as snoopers and mod chips has been receiving increasing attention in computer architecture. This paper presents a new combined memory encryption/authentication scheme. Our new split counters for counter-mode ...

article

A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching

Pages 191–202https://doi.org/10.1145/1150019.1136500

We present and evaluate an architecture for highthroughput pattern matching of regular expressions. Our approach matches multiple patterns concurrently, responds rapidly to changes in the pattern set, and is well suited for synthesis in an ASIC or FPGA. ...

article

Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture

Pages 203–215https://doi.org/10.1145/1150019.1136503

Longest Prefix Matching (LPM) is a fundamental part of various network processing tasks. Previously proposed approaches for LPM result in prohibitive cost and power dissipation (TCAMs) or in large memory requirements and long lookup latencies (tries), ...

article

Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Pages 216–226https://doi.org/10.1145/1150019.1136504

Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have ...

article

Bulk Disambiguation of Speculative Threads in Multiprocessors

Pages 227–238https://doi.org/10.1145/1150019.1136506

Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple, cooperating speculative threads. In these environments, correctly maintaining data ...

article

Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Pages 239–251https://doi.org/10.1145/1150019.1136507

The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential ...

article

Spatial Memory Streaming

Pages 252–263https://doi.org/10.1145/1150019.1136508

Prior research indicates that there is much spatial variation in applications' memory access patterns. Modern memory systems, however, use small fixed-size cache blocks and as such cannot exploit the variation. Increasing the block size would not only ...

article

Cooperative Caching for Chip Multiprocessors

Pages 264–276https://doi.org/10.1145/1150019.1136509

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through ...

article

Reducing Startup Time in Co-Designed Virtual Machines

Pages 277–288https://doi.org/10.1145/1150019.1136510

A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-...

article

TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time

Pages 289–301https://doi.org/10.1145/1150019.1136511

RAID architectures have been used for more than two decades to recover data upon disk failures. Disk failure is just one of the many causes of damaged data. Data can be damaged by virus attacks, user errors, defective software/firmware, hardware faults, ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Sections

Save to Binder

Subjects

Comments