- Sponsor:
- sigarch
No abstract available.
General Co-Chair's Message
Program Chair's Message
Committees
Reviewers
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
- Michael Bedford Taylor,
- Walter Lee,
- Jason Miller,
- David Wentzlaff,
- Ian Bratt,
- Ben Greenwald,
- Henry Hoffmann,
- Paul Johnson,
- Jason Kim,
- James Psota,
- Arvind Saraf,
- Nathan Shnidman,
- Volker Strumpen,
- Matt Frank,
- Saman Amarasinghe,
- Anant Agarwal
This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running ...
Evaluating the Imagine Stream Architecture
This paper describes an experimental evaluation of theprototype Imagine stream processor. Imagine [Imagine: Media processing with streams] is a stream processor that employs a two-level register hierarchy with9.7 Kbytes of local register file capacity ...
Field-testing IMPACT EPIC research results in Itanium 2
Explicitly-Parallel Instruction Computing (EPIC) providesarchitectural features, including predication and explicitcontrol speculation, intended to enhance the compiler'sability to expose instruction-level parallelism (ILP) incontrol-intensive programs. ...
Wire Delay is Not a Problem for SMT (In the Near Future)
Previous papers have shown that the slow scaling of wiredelays compared to logic delays will prevent superscalar performancefrom scaling with technology.In this paper we showthat the optimal pipeline for superscalar becomes shallowerwith technology, ...
The Vector-Thread Architecture
- Ronny Krashinsky,
- Christopher Batten,
- Mark Hampton,
- Steve Gerding,
- Brian Pharris,
- Jared Casper,
- Krste Asanovic
The vector-thread (VT) architectural paradigm unifies the vectorand multithreaded compute models. The VT abstraction providesthe programmer with a control processor and a vector of virtualprocessors (VPs). The control processor can use vector-fetch ...
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
A single-ISA heterogeneous multi-core architecture is achip multiprocessor composed of cores of varying size, performance,and complexity. This paper demonstrates that thisarchitecture can provide significantly higher performance inthe same area than a ...
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
The performance of memory-bound commercial applicationssuch as databases is limited by increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism(MLP) is an effective approach for improving the performance ofthese ...
Memory Ordering: A Value-Based Approach
Conventional out-of-order processors employ a multi-ported,fully-associative load queue to guarantee correctmemory reference order both within a single thread of executionand across threads in a multiprocessor system. Asimprovements in process ...
Transactional Memory Coherence and Consistency
- Lance Hammond,
- Vicky Wong,
- Mike Chen,
- Brian D. Carlstrom,
- John D. Davis,
- Ben Hertzberg,
- Manohar K. Prabhu,
- Honggo Wijaya,
- Christos Kozyrakis,
- Kunle Olukotun
In this paper, we propos a new shared memory model: Transactionalmemory Coherence and Consistency (TCC).TCC providesa model in which atomic transactions are always the basicunit of parallel work, communication, memory coherence, andmemory reference ...
TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model
In this paper, we describe TSOtool, a program to check thebehavior of the memory subsystem in a shared memorymultiprocessor. TSOtool runs pseudo-randomly generatedprograms with data races on a system compliant with theTotal Store Order (TSO) memory ...
SMTp: An Architecture for Next-generation Scalable Multi-threading
We introduce the SMTp architecture-an SMT processoraugmented with a coherence protocol thread context,that together with a standard integrated memory controllercan enable the design of (among other possibilities) scalablecache-coherent hardware ...
A Formal Approach to Frequent Energy Adaptations for Multimedia Applications
Much research has recently been done on adapting architecturalresources of general-purpose processors to saveenergy at the cost of increased execution time. This workexamines adaptation control algorithms for such processorsrunning real-time multimedia ...
Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor
- John Oliver,
- Ravishankar Rao,
- Paul Sultana,
- Jedidiah Crandall,
- Erik Czernikowski,
- Leslie W. Jones IV,
- Diana Franklin,
- Venkatesh Akella,
- Frederic T. Chong
We present Synchroscalar, a tile-based architecture forembedded processing that is designed to provide the flexibilityof DSPs while approaching the power efficiency ofASICs. We achieve this goal by providing high parallelismand voltage scaling while ...
Power Awareness through Selective Dynamically Optimized Traces
We present the PARROT concept that seeks to achievehigher performance with reduced energy consumptionthrough gradual optimization of frequently executed codetraces. The PARROT microarchitectural framework integratestrace caching, dynamic optimizations ...
X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs
RAID storage arrays often possess gigabytes of RAM forcaching disk blocks. Currently, most RAID systems use LRUor LRU-like policies to manage these caches. Since these arraycaches do not recognize the presence of file system buffer caches,they ...
Low-Latency Virtual-Channel Routers for On-Chip Networks
The on-chip communication requirements of manysystems are best served through the deployment of a regularchip-wide network. This paper presents the design of alow-latency on-chip network router for such applications.We remove control overheads (routing ...
Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism
A new and efficient mechanism to tolerate failures ininterconnection networks for parallel and distributedcomputers, denoted as Immunet, is presented in this work.In the presence of failures, Immunet automatically reactswith a hardware reconfiguration ...
Adaptive Cache Compression for High-Performance Processors
Modern processors use two or more levels ofcache memories to bridge the rising disparity betweenprocessor and memory speeds. Compression canimprove cache performance by increasing effectivecache capacity and eliminating misses. However,decompressing ...
iWatcher: Efficient Architectural Support for Software Debugging
Recent impressive performance improvements in computer architecturehave not led to significant gains in ease of debugging.Software debugging often relies on inserting run-time softwarechecks. In many cases, however, it is hard to find the root causeof a ...
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation
In this article, we present an approach for improving the performance of sequences of dependent instructions. We observe that many sequences of instructionscan be interpreted as functions. Unlike sequences of instructions, functions can be translated ...
Prophet/Critic Hybrid Branch Prediction
This paper introduces the prophet/critic hybrid conditionalbranch predictor, which has two component predictorsthat play the role of either prophet or critic.Theprophet is a conventional predictor that uses branch historyto predict the direction of the ...
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Transient faults due to neutron and alpha particle strikes posea significant obstacle to increasing processor transistor counts infuture technologies. Although fault rates of individual transistorsmay not rise significantly, incorporating more ...
The Case for Lifetime Reliability-Aware Microprocessors
Ensuring long processor lifetimes by limiting failuresdue to wear-out related hard errors is a critical requirementfor all microprocessor manufacturers. We observethat continuous device scaling and increasing temperaturesare making lifetime reliability ...
Exploiting Resonant Behavior to Reduce Inductive Noise
Inductive noise in high-performance microprocessors is a reliabilityissue caused by variations in processor current (di/dt)which are converted to supply-voltage glitches by impedances inthe power-supply network. Inductive noise has been addressed ...
Use-Based Register Caching with Decoupled Indexing
Wide, deep pipelines need many physical registersto hold the results of in-flight instructions. Simultaneously,high clock frequencies prohibit using largeregister files and bypass networks without a significantperformance penalty. Previously proposed ...
A Content Aware Integer Register File Organization
A register file is a critical component of a modernsuperscalar processor.It has a large number of entriesand read/write ports in order to enable high levels ofinstruction parallelism.As a result, the register file'sarea, access time, and energy ...