Proceedings of the 22nd annual international symposium on Computer architecture

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

July 1995

1995 Proceeding

Chairman:
David A. Patterson
Univ. of California, Berkeley

Publisher:

Association for Computing Machinery
New York
NY
United States

Conference:

ISCA95: International Conference on Computer Architecture S. Margherita Ligure Italy June 22 - 24, 1995

ISBN:

978-0-89791-698-1

Published:

01 July 1995

Sponsors:

SIGARCH, IEEE-CS\TCCA

Get Alerts for this ConferenceAlerts Save to BinderBinder

Save to Binder

Create a New Binder

Name

Export CitationCitation

Share on

Next Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

ISCA '25 website

Reflects downloads up to 26 Nov 2024Bibliometrics

Citation Count

5,613

Downloads (6 weeks)

1,100

Downloads (12 months)

5,785

Downloads (cumulative)

37,309

Sections

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

1995

Previous Next

Abstract

No abstract available.

Select All

Export Citations Save to Binder

Article

Free

The MIT Alewife machine: architecture and performance

Anant Agarwal,
Ricardo Bianchini,
David Chaiken,
Kirk L. Johnson,
David Kranz,
John Kubiatowicz,
Beng-Hong Lim,
Kenneth Mackenzie,
Donald Yeung

Pages 2–13https://doi.org/10.1145/223982.223985

Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, ...

- 238
- 1,281
Metrics
Total Citations238
Total Downloads1,281
Last 12 Months186
Last 6 weeks46

Abstract
View online with eReader
PDF

Article

Free

The EM-X parallel computer: architecture and basic performance

Yuetsu Kodama,
Hirohumi Sakane,
Mitsuhisa Sato,
Hayato Yamana,
Shuichi Sakai,
Yoshinori Yamaguchi

Pages 14–23https://doi.org/10.1145/223982.223987

Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It ...

- 35
- 542
Metrics
Total Citations35
Total Downloads542
Last 12 Months95
Last 6 weeks6

Abstract
View online with eReader
PDF

Article

Free

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo,
Moriyoshi Ohara,
Evan Torrie,
Jaswinder Pal Singh,
Anoop Gupta

Pages 24–36https://doi.org/10.1145/223982.223990

The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the ...

- 2,239
- 5,330
Metrics
Total Citations2,239
Total Downloads5,330
Last 12 Months481
Last 6 weeks81

Abstract
View online with eReader
PDF

Article

Free

Efficient strategies for software-only protocols in shared-memory multiprocessors

Håkan Grahn,
Per Stenström

Pages 38–47https://doi.org/10.1145/223982.225958

The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important ...

- 16
- 406
Metrics
Total Citations16
Total Downloads406
Last 12 Months83
Last 6 weeks10

Abstract
View online with eReader
PDF

Article

Free

Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

Alvin R. Lebeck,
David A. Wood

Pages 48–59https://doi.org/10.1145/223982.223995

This paper introduces dynamic self-invalidation (DSI), a new technique for reducing cache coherence overhead in shared-memory multiprocessors. DSI eliminates invalidation messages by having a processor automatically invalidate its local copy of a cache ...

- 114
- 910
Metrics
Total Citations114
Total Downloads910
Last 12 Months146
Last 6 weeks19

Abstract
View online with eReader
PDF

Article

Free

Boosting the performance of hybrid snooping cache protocols

Fredrik Dahlgren

Pages 60–69https://doi.org/10.1145/223982.223998

Previous studies of bus-based shared-memory multiprocessors have shown hybrid write-invalidate/write-update snooping protocols to be incapable of providing consistent performance improvements over write-invalidate protocols. In this paper, we analyze ...

- 26
- 569
Metrics
Total Citations26
Total Downloads569
Last 12 Months78
Last 6 weeks18

Abstract
View online with eReader
PDF

Article

Free

S-connect: from networks of workstations to supercomputer performance

Andreas G. Nowatzyk,
Michael C. Browne,
Edmund J. Kelly,
Michael Parkin

Pages 71–82https://doi.org/10.1145/223982.224004

S-Connect is a new high speed, scalable interconnect system that has been developed to support networks of workstations to efficiently share computing resources. It uses off-the-shelf CMOS technology to directly drive fiber-optic systems at speeds ...

- 17
- 406
Metrics
Total Citations17
Total Downloads406
Last 12 Months100
Last 6 weeks17

Abstract
View online with eReader
PDF

Article

Free

Destage algorithms for disk arrays with non-volatile caches

Anujan Varma,
Quinn Jacobson

Pages 83–95https://doi.org/10.1145/223982.224042

In a disk array with a nonvolatile write cache, destages from the cache to the disk are performed in the background asynchronously while read requests from the host system are serviced in the foreground. In this paper, we study a number of algorithms ...

- 17
- 535
Metrics
Total Citations17
Total Downloads535
Last 12 Months56
Last 6 weeks11

Abstract
View online with eReader
PDF

Article

Free

Evaluating multi-port frame buffer designs for a mesh-connected multicomputer

Gordon Stoll,
Bin Wei,
Douglas Clark,
Edward W. Felten,
Kai Li,
Patrick Hanrahan

Pages 96–105https://doi.org/10.1145/223982.224043

Multicomputers can be effectively used for interactive graphics rendering only if there are mechanisms available to rapidly composite and transfer images to an external display device. One method for achieving the necessary bandwidth for this operation ...

- 5
- 365
Metrics
Total Citations5
Total Downloads365
Last 12 Months68
Last 6 weeks15

Abstract
View online with eReader
PDF

Article

Free

Are crossbars really dead?: the case for optical multiprocessor interconnect systems

Andreas G. Nowatzyk,
Paul R. Prucnal

Pages 106–115https://doi.org/10.1145/223982.224364

Crossbar switches are rarely considered for large, scalable multiprocessor interconnect systems because they require O(n²) switching elements, are difficult to control efficiently and are hard to implement once their size becomes too large to fit on one ...

- 12
- 540
Metrics
Total Citations12
Total Downloads540
Last 12 Months79
Last 6 weeks13

Abstract
View online with eReader
PDF

Article

Free

Exploring configurations of functional units in an out-of-order superscalar processor

Stéphan Jourdan,
Pascal Sainrat,
Daniel Litaize

Pages 117–125https://doi.org/10.1145/223982.224366

This study has been carried out in order to determine cost-effective configurations of functional units for multiple-issue out-of-order superscalar processors. The trace-driven simulations were performed on the six integer and the fourteen floating-...

- 18
- 528
Metrics
Total Citations18
Total Downloads528
Last 12 Months61
Last 6 weeks8

Abstract
View online with eReader
PDF

Article

Free

Unconstrained speculative execution with predicated state buffering

Hideki Ando,
Chikako Nakanishi,
Tetsuya Hara,
Masao Nakaya

Pages 126–137https://doi.org/10.1145/223982.224367

Speculative execution is execution of instructions before it is known whether these instructions should be executed. Compiler-based speculative execution has the potential to achieve both a high instruction per cycle rate and high clock rate. Pure ...

- 17
- 405
Metrics
Total Citations17
Total Downloads405
Last 12 Months65
Last 6 weeks14

Abstract
View online with eReader
PDF

Article

Free

A comparison of full and partial predicated execution support for ILP processors

Scott A. Mahlke,
Richard E. Hank,
James E. McCormick,
David I. August,
Wen-Mei W. Hwu

Pages 138–150https://doi.org/10.1145/223982.225965

One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support ...

- 135
- 1,042
Metrics
Total Citations135
Total Downloads1,042
Last 12 Months225
Last 6 weeks77

Abstract
View online with eReader
PDF

Article

Free

Implementation trade-offs in using a restricted data flow architecture in a high performance RISC microprocessor

M. Simone,
A. Essen,
A. Ike,
A. Krishnamoorthy,
T. Maruyama,
N. Patkar,
M. Ramaswami,
M. Shebanow,
V. Thirumalaiswamy,
D. Tovey

Pages 151–162https://doi.org/10.1145/223982.224411

The implementation of a superscalar, speculative execution SPARC-V9 microprocessor incorporating Restricted Data Flow principles required many design trade-offs. Consideration was given to both performance and cost. Performance is largely a function of ...

- 11
- 556
Metrics
Total Citations11
Total Downloads556
Last 12 Months120
Last 6 weeks29

Abstract
View online with eReader
PDF

Article

Free

Performance evaluation of the PowerPC 620 microarchitecture

Trung A. Diep,
Christopher Nelson,
John Paul Shen

Pages 163–174https://doi.org/10.1145/223982.224417

The PowerPC 620™ microprocessor is the most recent and performance leading member of the PowerPC™ family. The 64-bit PowerPC 620 microprocessor employs a two-phase branch prediction scheme, dynamic renaming for all the register files, ...

- 50
- 975
Metrics
Total Citations50
Total Downloads975
Last 12 Months154
Last 6 weeks32

Abstract
View online with eReader
PDF

Article

Free

Reducing TLB and memory overhead using online superpage promotion

Theodore H. Romer,
Wayne H. Ohlrich,
Anna R. Karlin,
Brian N. Bershad

Pages 176–187https://doi.org/10.1145/223982.224419

Modern microprocessors contain small TLBs that maintain a cache of recently used translations. A TLB's coverage is the sum of the number of bytes mapped by each entry. Applications with working sets larger than the TLB coverage will perform poorly due ...

- 100
- 905
Metrics
Total Citations100
Total Downloads905
Last 12 Months177
Last 6 weeks30

Abstract
View online with eReader
PDF

Article

Free

Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

Zheng Zhang,
Josep Torrellas

Pages 188–199https://doi.org/10.1145/223982.224423

While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former ...

- 46
- 1,198
Metrics
Total Citations46
Total Downloads1,198
Last 12 Months77
Last 6 weeks6

Abstract
View online with eReader
PDF

Article

Free

An efficient, fully adaptive deadlock recovery scheme: DISHA

K. V. Anjan,
Timothy Mark Pinkston

Pages 201–210https://doi.org/10.1145/223982.224431

This paper presents a simple, efficient and cost effective routing strategy that considers deadlock recovery as opposed to prevention. Performance is optimized in the absence of deadlocks by allowing maximum flexibility in routing. Disha supports true ...

- 102
- 873
Metrics
Total Citations102
Total Downloads873
Last 12 Months141
Last 6 weeks30

Abstract
View online with eReader
PDF

Article

Free

Analysis and implementation of hybrid switching

Kang G. Shin,
Stuart W. Daniel

Pages 211–219https://doi.org/10.1145/223982.224432

The switching scheme of a point-to-point network determines how packets flow through each node, and is a primary element in determining the network's performance. In this paper, we present and evaluate a new switching scheme called hybrid switching. ...

- 9
- 413
Metrics
Total Citations9
Total Downloads413
Last 12 Months85
Last 6 weeks26

Abstract
View online with eReader
PDF

Article

Free

Configurable flow control mechanisms for fault-tolerant routing

Binh Vien Dao,
Jose Duato,
Sudhakar Yalamanchili

Pages 220–229https://doi.org/10.1145/223982.224433

Fault-tolerant routing protocols in modern interconnection networks rely heavily on the network flow control mechanisms used. Optimistic flow control mechanisms such as wormhole routing (WR) realize very good performance, but are prone to deadlock in ...

- 17
- 460
Metrics
Total Citations17
Total Downloads460
Last 12 Months67
Last 6 weeks13

Abstract
View online with eReader
PDF

Article

Free

NIFDY: a low overhead, high throughput network interface

Timothy Callahan,
Seth Copen Goldstein

Pages 230–241https://doi.org/10.1145/223982.224434

In this paper we present NIFDY, a network interface that uses admission control to reduce congestion and ensures that packets are received by a processor in the order in which they were sent, even if the underlying network delivers the packets out of ...

- 13
- 388
Metrics
Total Citations13
Total Downloads388
Last 12 Months46
Last 6 weeks3

Abstract
View online with eReader
PDF

Article

Free

Vector multiprocessors with arbitrated memory access

Montse Peiron,
Mateo Valero,
Eduard Ayguadé,
Tomás Lang

Pages 243–252https://doi.org/10.1145/223982.224435

The high latency of memory accesses is one of the factors that most contribute to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in the case ...

- 10
- 393
Metrics
Total Citations10
Total Downloads393
Last 12 Months60
Last 6 weeks8

Abstract
View online with eReader
PDF

Article

Free

Design of cache memories for multi-threaded dataflow architecture

Krishna M. Kavi,
A. R. Hurson,
Phenil Patadia,
Elizabeth Abraham,
Ponnarasu Shanmugam

Pages 253–264https://doi.org/10.1145/223982.224436

Cache memories have proven their effectiveness in the von Neumann architecture when localities of reference govern the execution loci of programs. A pure dataflow program, in contrast, contains no locality of reference since the execution sequence is ...

- 19
- 796
Metrics
Total Citations19
Total Downloads796
Last 12 Months119
Last 6 weeks7

Abstract
View online with eReader
PDF

Article

Free

Skewed associativity enhances performance predictability

François Bodin,
André Seznec

Pages 265–274https://doi.org/10.1145/223982.224437

Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be ...

- 37
- 714
Metrics
Total Citations37
Total Downloads714
Last 12 Months90
Last 6 weeks27

Abstract
View online with eReader
PDF

Article

Free

A comparative analysis of schemes for correlated branch prediction

Cliff Young,
Nicolas Gloy,
Michael D. Smith

Pages 276–286https://doi.org/10.1145/223982.224438

Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that categorizes branch prediction schemes by the way in which they partition ...

- 145
- 1,005
Metrics
Total Citations145
Total Downloads1,005
Last 12 Months154
Last 6 weeks30

Abstract
View online with eReader
PDF

Article

Free

Next cache line and set prediction

Brad Calder,
Dirk Grunwald

Pages 287–296https://doi.org/10.1145/223982.224439

Accurate instruction fetch and branch prediction is increasingly important on today's wide-issue architectures. Fetch prediction is the process of determining the next instruction to request from the memory subsystem. Branch prediction is the process of ...

- 65
- 1,005
Metrics
Total Citations65
Total Downloads1,005
Last 12 Months119
Last 6 weeks23

Abstract
View online with eReader
PDF

Article

Free

A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

Vijay Karamcheti,
Andrew A. Chien

Pages 298–307https://doi.org/10.1145/223982.224440

Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support for ...

- 46
- 440
Metrics
Total Citations46
Total Downloads440
Last 12 Months69
Last 6 weeks15

Abstract
View online with eReader
PDF

Article

Free

Optimizing memory system performance for communication in parallel computers

T. Stricker,
T. Gross

Pages 308–319https://doi.org/10.1145/223982.224442

Communication in a parallel system frequently involves moving data from the memory of one node to the memory of another; this is the standard communication model employed in message passing systems. Depending on the application, we observe a variety of ...

- 19
- 612
Metrics
Total Citations19
Total Downloads612
Last 12 Months164
Last 6 weeks53

Abstract
View online with eReader
PDF

Article

Free

Empirical evaluation of the CRAY-T3D: a compiler perspective

Remzi H. Arpaci,
David E. Culler,
Arvind Krishnamurthy,
Steve G. Steinberg,
Katherine Yelick

Pages 320–331https://doi.org/10.1145/223982.224443

Most recent MPP systems employ a fast microprocessor surrounded by a shell of communication and synchronization logic. The CRAY-T3D provides an elaborate shell to support global-memory access, prefetch, atomic operations, barriers, and block transfers. ...

- 59
- 586
Metrics
Total Citations59
Total Downloads586
Last 12 Months197
Last 6 weeks23

Abstract
View online with eReader
PDF

Article

Free

Optimization of instruction fetch mechanisms for high issue rates

Thomas M. Conte,
Kishore N. Menezes,
Patrick M. Mills,
Burzin A. Patel

Pages 333–344https://doi.org/10.1145/223982.224444

Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be exploited when fed by high instruction bandwidth. This task is the ...

- 142
- 1,196
Metrics
Total Citations142
Total Downloads1,196
Last 12 Months407
Last 6 weeks84

Abstract
View online with eReader
PDF

Save to Binder

Create a New Binder

Name

Contributors

David A Patterson
Google LLC
- Publication Years1975 - 2024
- Publication counts298
- Citation count36,129
- Available for Download153
- Downloads (cumulative)1,622,757
- Downloads (12 months)136,381
- Downloads (6 weeks)40,979
- Average Downloads per Article10,606
- Average Citation per Article121
View Full Profile

Index Terms

Proceedings of the 22nd annual international symposium on Computer architecture

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)
LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Year	Submitted	Accepted	Rate
ISCA '22	400	67	17%
ISCA '19	365	62	17%
ISCA '17	322	54	17%
ISCA '13	288	56	19%
ISCA '12	262	47	18%
ISCA '08	259	37	14%
ISCA '06	234	31	13%
ISCA '05	194	45	23%
ISCA '04	217	31	14%
ISCA '03	184	36	20%
ISCA '02	180	27	15%
ISCA '01	163	24	15%
ISCA '99	135	26	19%
Overall	3,203	543	17%

Export Citations

Select Citation format

Please download or close your previous search result export first before starting a new bulk export.
Preview is not available.
By clicking download,a status dialog will open to start the export process. The process may takea few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress.
Download
- Download citation
- Copy citation

Save to Binder

Sections

Save to Binder

Index Terms

Recommendations

CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)

LICS '20: Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science

ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Acceptance Rates