SIGARCH: Vol 18, No 2SI

Volume 18, Issue 2SIJune 1990Special Issue: Proceedings of the 17th annual international symposium on Computer Architecture

Volume 18, Issue 2SI

June 1990

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5964

Bibliometrics

Select All

Export Citations Save to Binder

article

Free

Weak ordering—a new definition

Pages 2–14https://doi.org/10.1145/325096.325100

A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An ...

article

Free

Memory consistency and event ordering in scalable shared-memory multiprocessors

Pages 15–26https://doi.org/10.1145/325096.325102

Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge ...

article

Free

Synchronization with multiprocessor caches

Pages 27–37https://doi.org/10.1145/325096.325107

Introducing private caches in bus-based shared memory multiprocessors leads to the cache consistency problem since there may be multiple copies of shared data. However, the ability to snoop on the bus coupled with the fast broadcast capability allows ...

article

Free

Dynamic processor allocation in hypercube computers

Pages 40–49https://doi.org/10.1145/325096.325110

Fully recognizing various subcubes in a hypercube computer efficiently is nontrivial due to the specific structure of the hypercube. We propose a method with much less complexity than the multiple-GC strategy in generating the search space, while ...

article

Free

A new approach to fast control of r²× r² 3-stage benes networks of r×r crossbar switches

Pages 50–59https://doi.org/10.1145/325096.325113

The routing control of Benes networks has proven to be costly. This paper introduces a new approach to fast control of N × N 3-stage Benes networks of r × r crossbar switches as building blocks, where N = r² and r ≥ 2. The new approach consists of ...

article

Free

Virtual-channel flow control

William J. Dally

Pages 60–68https://doi.org/10.1145/325096.325115

Network throughput can be increased by dividing the buffer storage associated with each network channel into several virtual channels [DalSei]. Each physical channel is associated with several small queues, virtual channels, rather than a single deep ...

article

Free

Supporting systolic and memory communication in iWarp

Pages 70–81https://doi.org/10.1145/325096.325116

iWarp is a parallel architecture developed jointly by Carnegie Mellon University and Intel Corporation. The iWarp communication system supports two widely used interprocessor communication styles: memory communication and systolic communication. This ...

article

Free

Monsoon: an explicit token-store architecture

Pages 82–91https://doi.org/10.1145/325096.325117

Dataflow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, rather than assuming that program mapping will cause these issues to disappear. However, the ...

article

Free

The K2 parallel processor: architecture and hardware implementation

Pages 92–101https://doi.org/10.1145/325096.325118

K2 is a distributed-memory parallel processor designed to support a multi-user, multi-tasking, time-sharing operating system and an automatically parallelizing FORTRAN compiler. This paper presents the architecture and the hardware implementation of K2, ...

article

Free

APRIL: a processor architecture for multiprocessing

Pages 104–114https://doi.org/10.1145/325096.325119

Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching processor called APRIL with support for fine-grain threads ...

article

Free

PLUS: a distributed shared-memory system

Pages 115–124https://doi.org/10.1145/325096.325121

PLUS is a multiprocessor architecture tailored to the fast execution of a single multithreaded process; its goal is to accelerate the execution of CPU-bound applications. PLUS supports shared memory and efficient synchronization. Memory access latency ...

article

Free

Adaptive software cache management for distributed shared memory architectures

Pages 125–134https://doi.org/10.1145/325096.325124

An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform ...

article

Free

Big science versus little science—do you have to build it? (panel session)

Page 136https://doi.org/10.1145/325096.325126

Research can be called big science if projects have numerous researchers, large funding, significant infrastructure, and plans to build complex tools or prototypes. Most experimental physicists practice big science, as do computer architects who build ...

article

Free

An empirical evaluation of two memory-efficient directory methods

Pages 138–147https://doi.org/10.1145/325096.325130

This paper presents an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared memory multiprocessors. Both directory methods are modifications of a scheme proposed by Censier and Feautrier [5] that ...

article

Free

The directory-based cache coherence protocol for the DASH multiprocessor

Pages 148–159https://doi.org/10.1145/325096.325132

DASH is a scalable shared-memory multiprocessor currently being developed at Stanford's Computer Systems Laboratory. The architecture consists of powerful processing nodes, each with a portion of the shared-memory, connected to a scalable ...

article

Free

The performance impact of block sizes and fetch strategies

Steven Przybylski

Pages 160–169https://doi.org/10.1145/325096.325135

This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block ...

article

Free

Performance comparison of load/store and symmetric instruction set architectures

Pages 172–181https://doi.org/10.1145/325096.325137

Is it true that a Load/Store architecture is both simpler and faster than a Symmetric architecture, or does the Symmetric architecture offer a potential performance advantage that can be realized by the use of additional hardware?

In order to answer it ...

article

Free

Reducing the cost of branches by using registers

Pages 182–191https://doi.org/10.1145/325096.325138

In an attempt to reduce the number of operand memory references, many RISC machines have thirty-two or more general-purpose registers (e.g., MIPS, ARM, Spectrum, 88K). Without special compiler optimizations, such as inlining or interprocedural register ...

article

Free

An investigation of static versus dynamic scheduling

Pages 192–201https://doi.org/10.1145/325096.325140

article

Free

VAX vector architecture

Pages 204–215https://doi.org/10.1145/325096.325145

The VAX Architecture has been extended to include an integrated, register-based vector processor. This extension allows both high-end and low-end implementations and can be supported with only small changes by VAX/VMS and VAX/ULTRIX operating systems. ...

article

Free

Multiple instruction issue in the NonStop cyclone processor

Pages 216–226https://doi.org/10.1145/325096.325147

This paper describes the architecture for issuing multiple instructions per clock in the NonStop Cyclone Processor. Pairs of instructions are fetched and decoded by a dual two-stage prefetch pipeline and passed to a dual six-stage pipeline for ...

article

Free

Performance of an OLTP application on symmetry multiprocessor system

Pages 228–238https://doi.org/10.1145/325096.325149

Sequent's Symmetry Series is a bus-based shared-memory multiprocessor. System performance in an OLTP relational database application was investigated using the TP1 benchmark. System performance was tested with fully-cached benchmarks and with scaled ...

article

Free

The impact of synchronization and granularity on parallel systems

Pages 239–248https://doi.org/10.1145/325096.325150

In this paper, we study the impact of synchronization and granularity on the performance of parallel systems using an execution-driven simulation technique. We find that even though there can be a lot of parallelism at the fine grain level, ...

article

Free

Trace-driven simulations for a two-level cache design in open bus systems

Pages 250–259https://doi.org/10.1145/325096.325151

Two-level cache hierarchies will be a design issue in future high-performance CPUs. In this paper we evaluate various metrics for data cache^* designs. We discuss both one- and two-level cache hierarchies. Our target is a new 100+ mips CPU, but the ...

article

Free

Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer

Pages 260–269https://doi.org/10.1145/325096.325152

This paper presents the performance evaluation, workload characterization and trace driven simulation of a hypercube multi-computer running realistic workloads. Six representative parallel applications were selected as benchmarks. Software monitoring ...

article

Free

Generation and analysis of very long address traces

Pages 270–279https://doi.org/10.1145/325096.325153

Existing methods of generating and analyzing traces suffer from a variety of limitations including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC machines. We use a trace generation mechanism based on link-time code ...

article

Free

Fast Prolog with an extended general purpose architecture

Pages 282–291https://doi.org/10.1145/325096.325154

Most Prolog machines have been based on specialized architectures. Our goal is to start with a general purpose architecture and determine a minimal set of extensions for high performance Prolog execution. We have developed both the architecture and ...

article

Free

Architectural support for the management of tightly-coupled fine-grain goals in flat concurrent Prolog

Pages 292–301https://doi.org/10.1145/325096.325155

We propose architectural support for goal management as part of a special-purpose processor architecture for the efficient execution of Flat Concurrent Prolog. Goal management operations: halt, spawn, suspend and commit are decoupled from goal reduction,...

article

Free

Balance in architectural design

Pages 302–310https://doi.org/10.1145/325096.325156

We introduce a performance metric, normalized time, which is closely related to such measures as the area-time product of VLSI theory and the price / performance ratio of advertising literature. This metric captures the idea of a piece of hardware “...

article

Free

A study of I/O behavior of perfect benchmarks on a multiprocessor

Pages 312–321https://doi.org/10.1145/325096.325157

The I/O behavior of some scientific applications, a subset of Perfect benchmarks, executing on a multiprocessor is studied. The aim of this study is to explore the various patterns of I/O access of large scientific applications and to understand the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Subjects

Currently Not Available

Sections

Save to Binder

Comments

Subjects