- Sponsor:
- sigarch
No abstract available.
A model and an architecture for a relational knowledge base
A relational knowledge base model and an architecture which manipulates the model are presented. An item stored in the relational knowledge base is called a term. A unification operation on terms in the relational knowledge base is used as the retrieval ...
Implementation and evaluation of a list-processing-oriented data flow machine
The architecture of a data flow machine, called DFM, is developed for parallel list processing. The DFM can maximally exploit parallelism inherent in list processing, due to its ultra-multi-processing mechanism, packet communication-based parallel and ...
A new string search hardware architecture for VLSI
This paper presents a new architecture for practical string search hardware design. This architecture is based on the finite state automaton design concept using a character control charge transfer model. The resultant hardware is a set of programmable ...
Parallel algorithms and architectures for rule-based systems
Rule-based systems, on the surface, appear to be capable of exploiting large amounts of parallelism—it is possible to match each rule to the data memory in parallel. In practice, however, we show that the speed-up from parallelism is quite limited, less ...
Concert: design of a multiprocessor development system
Concert is a shared-memory multiprocessor testbed intended to facilitate experimentation with parallel programs and programming languages. It consists of up to eight clusters, with 4-8 processors in each cluster. The processors in each cluster ...
Memory requirements for balanced computer architectures
One particular result is that to balance an array of p linearly connected PEs for performing matrix computations such as matrix multiplication and matrix triangularization, the size of each PE's local memory must grow linearly with p. Thus, the larger ...
Graph allocation in static dataflow systems
One of the most important considerations for a dataflow multiprocessor is the algorithm by which the nodes of a program graph are allocated for execution to its processors. In the case of the static type of architecture one must consider pipelining as ...
Software implementation of a recursive fault tolerance algorithm on a network of computers
RAFT is a recursive algorithm for fault tolerance that uses a combination of dynamic space and time redundancy techniques for detecting faulty processors and recovering from errors. U* is a multicomputer testbed consisting of a network of AT&T 3B2 ...
Microprogrammable processor for object-oriented architecture
An advanced microprocessor has been developed for the high performance execution of object oriented language programs. In object oriented languages, improvement of frequent or complex operations such as dynamic type checking, procedure calls, and ...
An instruction fetch unit for a graph reduction machine
The G-machine provides architecture support for the evaluation of functional programming languages by graph reduction. This paper describes an instruction fetch unit for such an architecture that provides a high throughput of instructions, low latency ...
Fast object-oriented procedure calls: lessons from the Intel 432
As modular programming grows in importance, the efficiency of procedure calls assumes an ever more critical role in system performance. Meanwhile, software designers are becoming more aware of the benefits of object-oriented programming in structuring ...
On coupling many small systems for transaction processing
The prospect of coupling a large number of small inexpensive microprocessor based systems to deliver the performance of a large transaction processing system at lower cost has not been realized, to date. Inter-system interference, multi-system coupling ...
Performance measurement of paging behavior in multiprogramming systems
This paper presents empirical results on the performance of CD, a compiler directed memory management policy, and the Working Set policy in a multiprogramming system. A description of the multiprogramming model used in the experiments is also presented. ...
ATUM: a new technique for capturing address traces using microcode
Trace-driven simulation is often used in the design of computer systems, especially caches and translation lookaside buffers. Capturing address traces to drive such simulations has been problematic, often involving 1000:1 software overhead to trace a ...
Experimenting with EPILOG: some results and preliminary conclusions
The EPILOG language and model of computation are briefly described, together with four multiprocessor architectures on which it is proposed to run the model: a form of chordal network and a slight variant of Wu and Feng's Baseline and Reverse Baseline ...
A unification processor based on a uniformly structured cellular hardware
In this paper, an implementation of unification using a systolic-like method is presented for a VLSI-oriented Prolog machine. Not pointers but a line of symbols and the arity of each symbol are used to express the structure of terms on a uniformly ...
The architecture and preliminary evaluation results of the experimental parallel inference machine PIM-D
A parallel inference machine based on the dataflow model and the mechanisms to support two types of logic programming languages are presented. The machine is constructed from multiple processing elements and structure memories interconnected through a ...
An efficient routing control for the SIGMA network Σ(4)
When processing vectors on SIMD computers, the interconnection network may become the bottleneck for performances if it lacks an efficient routing control unit. In the pass, many multistage networks have been designed, but general algorithms to control ...
REYSM, a high performance, low power multi-processor bus
In order to build lower cost multimicroprocessor systems, a narrow synchronous bus (15 active lines) is proposed. It multiplexes address and data on 8 bits, and arbitrates in two pipe-lined cycles on four lines. Due to the 20 to 40 MHz bus clock, and ...
The extra stage gamma network
The augmented data manipulator (ADM), inverse augmented data manipulator (IADM), and the gamma network are based on the Plus-Minus-2i connection patterns. In such a network there exist multiple paths to connect a source S to a destination D except when ...
Evaluation of the FACOM ALPHA Lisp machine
The FACOM ALPHA is the first and only commercially dedicated processor for Lisp and Prolog manufactured in Japan. This paper discusses the evaluation of the FACOM ALPHA for Lisp execution when compared with a general-purpose computer. The CPU use rate ...
An architecture for efficient Lisp list access
In this paper, we present a Lisp machine architecture that supports efficient list manipulation. This Lisp architecture is organized as two processing units: a List Processor (LP), that performs all list related operations and manages the list memory, ...
A functional level simulation engine of MAN-YO: a special purpose parallel machine for logic design automation
The architecture of a proto-type functional level simulator element of a massively parallel machine (MAN-YO) designed for logic design automation is presented. At functional level, hardware systems are described in a hardware description language, FDL. ...
Exploiting parallelism in a switch-level simulation machine
The parallelism inherent in actual circuits suggests that this parallelism might be exploited in a switch-level simulation machine, in order to reduce total simulation time. This paper explores the extent to which this parallelism exists and the extent ...
A hardware accelerator for speech recognition algorithms
This paper describes two custom architectures tailored to a speech recognition beam search algorithm. Both architectures have been simulated using real data and the results of the simulation are presented. The paper also describes the design process of ...
Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations
A processing element and a structure element of data flow computer SIGMA-1 for scientific computations is now operational. The elements are evaluated for several benchmark programs. For efficient execution of loop constructs, the sticky token mechanism ...
Stored data structures on the Manchester dataflow machine
Experience with the Manchester Dataflow Machine has highlighted the importance of efficient handling of stored data structures in a practical parallel machine. It has proved necessary to add a special-purpose structure store to the machine, and this ...
A scalable dataflow structure store
A design for a highly parallel data structure store for the prototype Manchester Dataflow Computer is presented. The main design objective is to allow all storage functions to be performed concurrently. The functions include space allocation and garbage ...
AT2 = O(N log4 N), T = O(log N) fast Fourier transform in a light connected 3-dimensional VLSI
We can perform a N-point FFT with time performance T=Ο(log N) and area-time performance AT2=Ο(N log4 N), by using the 3-dimensional VLSI system which is optically interconnected. This performance exceeds the theoretical lower bound of the area-time ...
Modular architecture for high performance implementation of FFT algorithm
The paper presents two new versions of the FFT algorithm. Based on these versions a new VLSI oriented architecture for implementing of the FFT algorithm is introduced. It consists of a homogenous structure of processing elements. The structure has a ...
Index Terms
- Proceedings of the 13th annual international symposium on Computer architecture