Proceedings of the 28th annual international symposium on Computer architecture

A relatively small set of static instructions has significant leverage on program execution performance. These problem instructions contribute a disproportionate number of cache misses and branch mispredictions because their behavior cannot be ...

- 192
- 838
Metrics
Total Citations192
Total Downloads838
Last 12 Months33
Last 6 weeks2

Abstract
Get Access

Article

Opening Remarks

Page 4https://doi.org/10.1145/379240.881395

- 0
Metrics
Total Citations0

Article

Speculative precomputation: long-range prefetching of delinquent loads

Jamison D. Collins,
Hong Wang,
Dean M. Tullsen,
Christopher Hughes,
Yong-Fong Lee,
Dan Lavery,
John P. Shen

Pages 14–25https://doi.org/10.1145/379240.379248

This paper explores Speculative Precomputation, a technique that uses idle thread context in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future ...

- 208
- 1,062
Metrics
Total Citations208
Total Downloads1,062
Last 12 Months31
Last 6 weeks1

Abstract
Get Access

Article

Dynamically allocating processor resources between nearby and distant ILP

Rajeev Balasubramonian,
Sandhya Dwarkadas,
David H. Albonesi

Pages 26–37https://doi.org/10.1145/379240.379249

Modern superscalar processors use wide instruction issue widths and out-of-order execution in order to increase instruction-level parallelism (ILP). Because instructions must be committed in order so as to guarantee precise exceptions, increasing ILP ...

- 70
- 607
Metrics
Total Citations70
Total Downloads607
Last 12 Months6
Last 6 weeks0

Abstract
Get Access

Article

Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

Chi-Keung Luk

Pages 40–51https://doi.org/10.1145/379240.379250

Hardly predictable data addresses in many irregular applications have rendered prefetching ineffective. In many cases, the only accurate way to predict these addresses is to directly execute the code that generates them. As multithreaded architectures ...

- 235
- 837
Metrics
Total Citations235
Total Downloads837
Last 12 Months18
Last 6 weeks2

Abstract
Get Access

Article

Data prefetching by dependence graph precomputation

Murali Annavaram,
Jignesh M. Patel,
Edward S. Davidson

Pages 52–61https://doi.org/10.1145/379240.379251

Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the processor. Prefetching data by predicting the miss address is one way to tolerate the cache miss latencies. But current applications with irregular ...

- 141
- 738
Metrics
Total Citations141
Total Downloads738
Last 12 Months26
Last 6 weeks2

Abstract
Get Access

Article

Concurrency, latency, or system overhead: which has the largest impact on uniprocessor DRAM-system performance?

Vinodh Cuppu,
Bruce Jacob

Pages 62–71https://doi.org/10.1145/379240.379252

Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and ...

- 53
- 751
Metrics
Total Citations53
Total Downloads751
Last 12 Months21
Last 6 weeks0

Abstract
Get Access

Article

Focusing processor policies via critical-path prediction

Brian Fields,
Shai Rubin,
Rastislav Bodík

Pages 74–85https://doi.org/10.1145/379240.379253

Although some instructions hurt performance more than others, current processors typically apply scheduling and speculation as if each instruction was equally costly. Instruction cost can be naturally expressed through the critical path: if we could ...

- 227
- 1,750
Metrics
Total Citations227
Total Downloads1,750
Last 12 Months60
Last 6 weeks3

Abstract
Get Access

Article

Automated design of finite state machine predictors for customized processors

Timothy Sherwood,
Brad Calder

Pages 86–97https://doi.org/10.1145/379240.379254

Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application or set of applications. These processors offer the ...

- 28
- 608
Metrics
Total Citations28
Total Downloads608
Last 12 Months7
Last 6 weeks0

Abstract
Get Access

Article

Better exploration of region-level value locality with integrated computation reuse and value prediction

Youfeng Wu,
Dong-Yuan Chen,
Jesse Fang

Pages 98–108https://doi.org/10.1145/379240.379255

Computation-reuse and value-prediction are two recent techniques for improving microprocessor performance by exploiting value localities. They both aim at breaking the data dependence limit in traditional processors. In this paper, we propose a ...

- 31
- 783
Metrics
Total Citations31
Total Downloads783
Last 12 Months4
Last 6 weeks1

Abstract
Get Access

Article

CryptoManiac: a fast flexible architecture for secure communication

Lisa Wu,
Chris Weaver,
Todd Austin

Pages 110–119https://doi.org/10.1145/379240.379256

The growth of the Internet as a vehicle for secure communication and electronic commerce has brought cryptographic processing performance to the forefront of high throughput system design. This trend will be further underscored with the widespread ...

- 100
- 1,280
Metrics
Total Citations100
Total Downloads1,280
Last 12 Months21
Last 6 weeks4

Abstract
Get Access

Article

QoS provisioning in clusters: an investigation of Router and NIC design

Ki Hwan Yum,
Eun Jung Kim,
Chita R. Das

Pages 120–129https://doi.org/10.1145/379240.379257

Design of high performance cluster networks (routers) with Quality-of-Service (QoS) guarantees is becoming increasingly important to support a variety of multimedia applications, many of which have real-time constraints. Most commercial routers, which ...

- 35
- 545
Metrics
Total Citations35
Total Downloads545
Last 12 Months5
Last 6 weeks0

Abstract
Get Access

Article

Locality vs. criticality

Roy Dz-ching Ju,
Alvin R. Lebeck,
Chris Wilkerson,
Srikanth T. Srinivasan

Pages 132–143https://doi.org/10.1145/379240.379258

Current memory hierarchies exploit locality of references to reduce load latency and thereby improve processor performance. Locality based schemes aim at reducing the number of cache misses and tend to ignore the nature of misses. This leads to a ...

- 46
- 856
Metrics
Total Citations46
Total Downloads856
Last 12 Months13
Last 6 weeks2

Abstract
Get Access

Article

Dead-block prediction & dead-block correlating prefetchers

An-Chow Lai,
Cem Fide,
Babak Falsafi

Pages 144–154https://doi.org/10.1145/379240.379259

Effective data prefetching requires accurate mechanisms to predict both “which” cache blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block Predictors (DBPs), trace-based predictors that accurately identify “when” an Ll data ...

- 297
- 1,477
Metrics
Total Citations297
Total Downloads1,477
Last 12 Months48
Last 6 weeks9

Abstract
Get Access

Article

Code layout optimizations for transaction processing workloads

Alex Ramirez,
Luiz André Barroso,
Kourosh Gharachorloo,
Robert Cohn,
Josep Larriba-Pey,
P. Geoffrey Lowney,
Mateo Valero

Pages 155–164https://doi.org/10.1145/379240.379260

Commercial applications such as databases and Web servers constitute the most important market segment for high-performance servers. Among these applications, on-line transaction processing (OLTP) workloads provide a challenging set of requirements for ...

- 56
- 644
Metrics
Total Citations56
Total Downloads644
Last 12 Months12
Last 6 weeks1

Abstract
Get Access

Article

Exploring and exploiting wire-level pipelining in emerging technologies

Michael Thaddeus Niemier,
Peter M. Kogge

Pages 166–177https://doi.org/10.1145/379240.379261

Pipelining is a technique that has long since been considered fundamental by computer architects. However, the world of nanoelectronics is pushing the idea of pipelining to new and lower levels — particularly the device level. How this affects circuits ...

- 55
- 463
Metrics
Total Citations55
Total Downloads463
Last 12 Months5
Last 6 weeks0

Abstract
Get Access

Article

NanoFabrics: spatial computing using molecular electronics

Seth Copen Goldstein,
Mihai Budiu

Pages 178–191https://doi.org/10.1145/379240.379262

The continuation of the remarkable exponential increases in processing power over the recent past faces imminent challenges due in part to the physics of deep-submicron CMOS devices and the costs of both chip masks and future fabrication plants. A ...

- 185
- 1,109
Metrics
Total Citations185
Total Downloads1,109
Last 12 Months46
Last 6 weeks5

Abstract
Get Access

Article

A simple method for extracting models for protocol code

David Lie,
Andy Chou,
Dawson Engler,
David L. Dill

Pages 192–203https://doi.org/10.1145/379240.379263

The use of model checking for validation requires that models of the underlying system be created. Creating such models is both difficult and error prone and as a result, verification is rarely used despite its advantages. In this paper, we present a ...

- 20
- 498
Metrics
Total Citations20
Total Downloads498
Last 12 Months8
Last 6 weeks1

Abstract
Get Access

Article

Removing architectural bottlenecks to the scalability of speculative parallelization

Milos Prvulovic,
María Jesús Garzarán,
Lawrence Rauchwerger,
Josep Torrellas

Pages 204–215https://doi.org/10.1145/379240.379264

Speculative thread-level parallelization is a promising way to speed up codes that compilers fail to parallelize. While several speculative parallelization schemes have been proposed for different machine sizes and types of codes, the results so far ...

- 72
- 542
Metrics
Total Citations72
Total Downloads542
Last 12 Months3
Last 6 weeks0

Abstract
Get Access

Article

Power and energy reduction via pipeline balancing

R. Iris Bahar,
Srilatha Manne

Pages 218–229https://doi.org/10.1145/379240.379265

Minimizing power dissipation is an important design requirement for both portable and non-portable systems. In this work, we propose an architectural solution to the power problem that retains performance while reducing power. The technique, known as ...

- 116
- 1,078
Metrics
Total Citations116
Total Downloads1,078
Last 12 Months21
Last 6 weeks3

Abstract
Get Access

Article

Energy-effective issue logic

Daniele Folegnani,
Antonio González

Pages 230–239https://doi.org/10.1145/379240.379266

The issue logic of a dynamically-scheduled superscalar processor is a complex mechanism devoted to start the execution of multiple instructions every cycle. Due to its complexity, it is responsible for a significant percentage of the energy consumed by ...

- 187
- 1,262
Metrics
Total Citations187
Total Downloads1,262
Last 12 Months26
Last 6 weeks5

Abstract
Get Access

Article

Cache decay: exploiting generational behavior to reduce cache leakage power

Stefanos Kaxiras,
Zhigang Hu,
Margaret Martonosi

Pages 240–251https://doi.org/10.1145/379240.379268

Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also ...

- 362
- 1,219
Metrics
Total Citations362
Total Downloads1,219
Last 12 Months53
Last 6 weeks6

Abstract
Get Access

Article

Variability in the execution of multimedia applications and implications for architecture

Christopher J. Hughes,
Praful Kaul,
Sarita V. Adve,
Rohit Jain,
Chanik Park,
Jayanth Srinivasan

Pages 254–265https://doi.org/10.1145/379240.379270

Multimedia applications are an increasingly important workload for general-purpose processors. This paper analyzes frame-level execution time variability for several multimedia applications on general-purpose architectures. There are two reasons for ...

- 86
- 650
Metrics
Total Citations86
Total Downloads650
Last 12 Months7
Last 6 weeks0

Abstract
Get Access

Article

Measuring Experimental Error in Microprocessor Simulation

Rajagopalan Desikan,
Doug Burger,
Stephen W. Keckler

Pages 266–277https://doi.org/10.1145/379240.565338

Abstract: We measure the experimental error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation- based studies. We describe the methodology that we used to validate ...

- 138
- 690
Metrics
Total Citations138
Total Downloads690
Last 12 Months34
Last 6 weeks1

Abstract
Get Access

Article

Rapid profiling via stratified sampling

S. Subramanya Sastry,
Rastislav Bodík,
James E. Smith

Pages 278–289https://doi.org/10.1145/379240.379273

Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the ...

- 31
- 613
Metrics
Total Citations31
Total Downloads613
Last 12 Months11
Last 6 weeks1

Abstract
Get Access

Article

Author Index

Page 291https://doi.org/10.1145/379240.881394

- 0
Metrics
Total Citations0

Save to Binder

Create a New Binder

Name

Contributors

Per Stenström
Chalmers University of Technology
- Publication Years1987 - 2024
- Publication counts146
- Citation count3,881
- Available for Download66
- Downloads (cumulative)41,013
- Downloads (12 months)5,596
- Downloads (6 weeks)1,419
- Average Downloads per Article621
- Average Citation per Article27
View Full Profile

Index Terms

Proceedings of the 28th annual international symposium on Computer architecture

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)
LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Acceptance Rates

ISCA '01 Paper Acceptance Rate 24 of 163 submissions, 15%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Year	Submitted	Accepted	Rate
ISCA '22	400	67	17%
ISCA '19	365	62	17%
ISCA '17	322	54	17%
ISCA '13	288	56	19%
ISCA '12	262	47	18%
ISCA '08	259	37	14%
ISCA '06	234	31	13%
ISCA '05	194	45	23%
ISCA '04	217	31	14%
ISCA '03	184	36	20%
ISCA '02	180	27	15%
ISCA '01	163	24	15%
ISCA '99	135	26	19%
Overall	3,203	543	17%

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Save to Binder

Index Terms

Recommendations

CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)

LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science

ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Acceptance Rates