Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1099547acmconferencesBook PagePublication PagesmicroConference Proceedingsconference-collections
MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
2005 Proceeding
Publisher:
  • IEEE Computer Society
  • 1730 Massachusetts Ave., NW Washington, DC
  • United States
Conference:
Micro-38: The 38th Annual IEEE/ACM International Symposium on Microarchitecture Barcelona Spain November 12 - 16, 2005
ISBN:
978-0-7695-2440-5
Published:
12 November 2005
Sponsors:

Reflects downloads up to 10 Nov 2024Bibliometrics
Abstract

No abstract available.

Article
38th Annual IEEE/ACM International Symposium on Microarchitecture - Title Page
Article
Article
Message from the General Chairs
Article
Message from the Program Co-Chairs
Article
The Cell Processor Architecture

This talk will present the Cell processor, jointly developed by the STI (Sony-Toshiba-IBM) partnership. Cell is a non-homogeneous chip multiprocessor intended for general-purpose applications but with a particular emphasis on multimedia performance. The ...

Article
How to Fake 1000 Registers

Large numbers of logical registers can improve performance by allowing fast access to multiple subroutine contexts (register windows) and multiple thread contexts (multithreading). Support for both of these together requires a multiplicative number of ...

Article
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows

Instruction packing is a combination compiler/ architectural approach that allows for decreased code size, reduced power consumption and improved performance. The packing is obtained by placing frequently occurring instructions into an Instruction ...

Article
Efficient Use of Invisible Registers in Thumb Code

The ARM processor is a dual width ISA processor that provides a 16-bit Thumb instruction set in addition to the 32-bit ARM instruction set. The compromises made in designing the Thumb instruction set leads to significantly increased instruction counts. ...

Article
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution

Predicated execution has been used to reduce the number of branch mispredictions by eliminating hard-to-predict branches. However, the additional instruction overhead and additional data dependencies due to predicated execution sometimes offset the ...

Article
A Criticality Analysis of Clustering in Superscalar Processors

Clustered machines partition hardware resources to circumvent the cycle time penalties incurred by large, monolithic structures. This partitioning introduces a long inter-cluster forwarding latency and the potential for load imbalance, both of which ...

Article
Incremental Commit Groups for Non-Atomic Trace Processing

We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally committed long traces significantly reduce wasted computations on exception ...

Article
Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities

We propose a speculative multi-threading processor architecture called Pinot. Pinot exploits parallelism over a wide range of granularities without modifying program sources. Since exploitation of fine-grain parallelism suffers from limits of ...

Article
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Data prefetching via helper threading has been extensively investigated on Simultaneous Multi- Threading (SMT) or Virtual Multi-Threading (VMT) architectures. Although reportedly large cache latency can be hidden by helper threads at runtime, most ...

Article
Automatic Thread Extraction with Decoupled Software Pipelining

Until recently, a steadily rising clock rate and other uniprocessor microarchitectural improvements could be relied upon to consistently deliver increasing performance for a wide range of applications. Current difficulties in maintaining this trend have ...

Article
Exploiting Vector Parallelism in Software Pipelined Loops

An emerging trend in processor design is the addition of short vector instructions to general-purpose and embedded ISAs. Frequently, these extensions are employed using traditional vectorization technology first developed for supercomputers. In contrast,...

Article
Continuous Path and Edge Profiling

Microarchitectures increasingly rely on dynamic optimization to improve performance in ways that are dif- ficult or impossible for ahead-of-time compilers. Dynamic optimizers in turn require continuous, portable, low cost, and accurate control-flow ...

Article
Improving Region Selection in Dynamic Optimization Systems

The performance of a dynamic optimization system depends heavily on the code it selects to optimize. Many current systems follow the design of HP Dynamo and select a single interprocedural path, or trace, as the unit of code optimization and code ...

Article
The Future Evolution of High-Performance Microprocessors

The evolution of high-performance microprocessors has reached several significant inflection points. First, the marginal utility of additional single-core complexity is now rapidly diminishing due to a number of factors. The increase in instructions per ...

Article
Scalable Store-Load Forwarding via Store Queue Index Prediction

Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths required by wide-issue, large window processors. In this work, we improve SQ ...

Article
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding

This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculatively and out-of-order prior to resolving their dependences. Whereas the ...

Article
Store Memory-Level Parallelism Optimizations for Commercial Applications

This paper studies the impact of off-chip store misses on processor performance for modern commercial applications. The performance impact of off-chip store misses is largely determined by the extent of their overlap with other off-chip cache misses. ...

Article
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors

We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field ...

Article
uComplexity: Estimating Processor Design Effort

Microprocessor design complexity is growing rapidly. As a result, current development costs for top of the line processors are staggering, and are doubling every 4 years. As we design ever larger and more complex processors, it is becoming increasingly ...

Article
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Scheduling algorithms used in compilers traditionally focus on goals such as reducing schedule length and register pressure or producing compact code. In the context of a hardware synthesis system where the schedule is used to determine various ...

Article
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel technique, address-value delta (AVD) ...

Article
Cherry-MP: Correctly Integrating Checkpointed Early Resource Recycling in Chip Multiprocessors

Checkpointed Early Resource Recycling (Cherry) is a recently-proposed micro-architectural technique that aims at improving critical resource utilization by performing aggressive resource recycling decoupled from instruction retirement, using a ...

Article
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

As more data value speculation mechanisms are being proposed to speed-up processors, there is growing pressure on the critical processor structures that must buffer the state of the speculative instructions. A scalable solution is to checkpoint the ...

Article
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance

Dynamic voltage and frequency scaling (DVFS) is an effective technique for controlling microprocessor energy and performance. Existing DVFS techniques are primarily based on hardware, OS timeinterrupts, or static-compiler techniques. However, ...

Article
Thermal Management of On-Chip Caches Through Power Density Minimization

Various architectural power reduction techniques have been proposed for on-chip caches in the last decade. However, these techniques mostly ignore the effects of temperature on the power consumption. In this paper, first we show that these power ...

Article
Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines

Power density is a growing problem in high-performance processors in which small, high-activity resources overheat. Two categories of techniques, temporal and spatial, can address power density in a processor. Temporal solutions slow computation and ...

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Acceptance Rates

MICRO 38 Paper Acceptance Rate 29 of 147 submissions, 20%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%
YearSubmittedAcceptedRate
MICRO-482836122%
MICRO-472795319%
MICRO-462393916%
MICRO 412104019%
MICRO 401663521%
MICRO 391744224%
MICRO 381472920%
MICRO 371582918%
MICRO 361343526%
MICRO 331103128%
MICRO 321312721%
MICRO 311082826%
MICRO 301033534%
Overall2,24248422%