A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
- Jongman Kim,
- Chrysostomos Nicopoulos,
- Dongkook Park,
- Vijaykrishnan Narayanan,
- Mazin S. Yousif,
- Chita R. Das
Packet-based on-chip networks are increasingly being adopted in complex System-on-Chip (SoC) designs supporting numerous homogeneous and heterogeneous functional blocks. These Network-on-Chip (NoC) architectures are required to not only provide ultra-...
The BlackWidow High-Radix Clos Network
This paper describes the radix-64 folded-Clos network of the Cray BlackWidow scalable vector multiprocessor. We describe the BlackWidow network which scales to 32K processors with a worstcase diameter of seven hops, and the underlying high-radix router ...
Memory Model = Instruction Reordering + Store Atomicity
We present a novel framework for defining memory models in terms of two properties: thread-local Instruction Reordering axioms and Store Atomicity, which describes inter-thread communication via memory. Most memory models have the store atomicity ...
Conditional Memory Ordering
Conventional relaxed memory ordering techniques follow a proactive model: at a synchronization point, a processor makes its own updates to memory available to other processors by executing a memory barrier instruction, ensuring that recent writes have ...
Architectural Semantics for Practical Transactional Memory
- Austen McDonald,
- JaeWoong Chung,
- Brian D. Carlstrom,
- Chi Cao Minh,
- Hassan Chafi,
- Christos Kozyrakis,
- Kunle Olukotun
Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software ...
Ensemble-level Power Management for Dense Blade Servers
One of the key challenges for high-density servers (e.g., blades) is the increased costs in addressing the power and heat density associated with compaction. Prior approaches have mainly focused on reducing the heat generated at the level of an ...
Techniques for Multicore Thermal Management: Classification and New Exploration
Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these ...
SODA: A Low-power Architecture For Software Radio
- Yuan Lin,
- Hyunseok Lee,
- Mark Woh,
- Yoav Harel,
- Scott Mahlke,
- Trevor Mudge,
- Chaitali Chakrabarti,
- Krisztian Flautner
The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult ...
An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors
This paper presents a high-availability system architecture called INDRA an INtegrated framework for Dependable and Revivable Architecture that enhances a multicore processor (or CMP) with novel security and fault recovery mechanisms. INDRA represents ...
Multiple Instruction Stream Processor
- Richard A. Hankins,
- Gautham N. Chinya,
- Jamison D. Collins,
- Perry H. Wang,
- Ryan Rakvic,
- Hong Wang,
- John P. Shen
Microprocessor design is undergoing a major paradigm shift towards multi-core designs, in anticipation that future performance gains will come from exploiting threadlevel parallelism in the software. To support this trend, we present a novel processor ...
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
- Feihui Li,
- Chrysostomos Nicopoulos,
- Thomas Richardson,
- Yuan Xie,
- Vijaykrishnan Narayanan,
- Mahmut Kandemir
Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures and three-dimensional (3D) designs where multiple ...
Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification
An efficient mechanism to track and enforce memory dependences is crucial to an out-of-order microprocessor. The conventional approach of using cross-checked load queue and store queue, while very effective in earlier processor incarnations, suffers ...
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Level one cache normally resides on a processor's critical path, which determines the clock frequency. Directmapped caches exhibit fast access time but poor hit rates compared with same sized set-associative caches due to nonuniform accesses to the ...
A Case for MLP-Aware Cache Replacement
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory accesses concurrently. The notion of generating and servicing long-latency cache misses in parallel is called Memory Level Parallelism (MLP). MLP is not ...
Improving Cost, Performance, and Security of Memory Encryption and Authentication
Protection from hardware attacks such as snoopers and mod chips has been receiving increasing attention in computer architecture. This paper presents a new combined memory encryption/authentication scheme. Our new split counters for counter-mode ...
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching
We present and evaluate an architecture for highthroughput pattern matching of regular expressions. Our approach matches multiple patterns concurrently, responds rapidly to changes in the pattern set, and is well suited for synthesis in an ASIC or FPGA. ...
Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture
Longest Prefix Matching (LPM) is a fundamental part of various network processing tasks. Previously proposed approaches for LPM result in prohibitive cost and power dissipation (TCAMs) or in large memory requirements and long lookup latencies (tries), ...
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have ...
Bulk Disambiguation of Speculative Threads in Multiprocessors
Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple, cooperating speculative threads. In these environments, correctly maintaining data ...
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential ...
Spatial Memory Streaming
Prior research indicates that there is much spatial variation in applications' memory access patterns. Modern memory systems, however, use small fixed-size cache blocks and as such cannot exploit the variation. Increasing the block size would not only ...
Cooperative Caching for Chip Multiprocessors
This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through ...
Reducing Startup Time in Co-Designed Virtual Machines
A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-...
TRAP-Array: A Disk Array Architecture Providing Timely Recovery to Any Point-in-time
RAID architectures have been used for more than two decades to recover data upon disk failures. Disk failure is just one of the many causes of damaged data. Data can be damaged by virus attacks, user errors, defective software/firmware, hardware faults, ...