Performance analysis of a broadcast communications protocol
The Trans protocol is a communications protocol that exploits the broadcast capability of local area networks. Classical Markov models and queueing theory are used to analyze the performance of components of this protocol, but cannot be applied directly ...
An analytical model of operating system protocol processing including effects of multiprogramming
We model the limited buffer queueing process that occurs within the UNIX operating system's protocol processing layers. Our model accounts for the effects of user process multiprogramming and preemptive, priority scheduling of interrupt, operating ...
Load sharing in limited access distributed systems
In this paper we examine dynamic load sharing in limited access distributed systems. In this class of distributed systems all servers are not accessible to all sources, and there exist many different accessibility topologies. We focus our attention on ...
Scheduling periodic and aperiodic tasks in hard real-time computing systems
Scheduling periodic and aperiodic tasks to meet their time constraints has been an important issue in the design of real-time computing systems. Usually, the task scheduling algorithms in such systems must satisfy the deadlines of periodic tasks and ...
An approach to detecting changes in the factors affecting the performance of computer systems
Resolving intermittent performance problems in computer systems is made easier by pinpointing when a change occurs in the system's perforrnance-determinin g factors (e.g., workload composition, configuration). Since we often lack direct measurements of ...
A synthetic workload model for a distributed system file server
The accuracy of the results of any performance study depends largely on the quality of the workload model driving it. Not surprisingly then, workload modelling is an area of great interest to those involved in the study of computer system performance. ...
A Markov chain approximation for the analysis of banyan networks
This paper analyzes the delay suffered by messages in a clocked, packet-switched, square Banyan network with k x k output-buffered switches by approximating the flow processes in the network with Markov chains. We recursively approximate the departure ...
Performance analysis of finite-buffered multistage interconnection networks with a general traffic pattern
We present an analytical model for evaluating the performance of finite-buffered packet switching multistage interconnection networks using blocking switches under any general traffic pattern. Most of the previous research work has assumed unbuffered, ...
A model for estimating trace-sample miss ratios
Unknown references, also known as cold-start misses, arise during trace-driven simulation of uniprocessor caches because of the unknown initial conditions. Accurately estimating the miss ratio of unknown references, denoted by μ, is particularly ...
Experience with mean value analysis model for evaluating shared bus, throughput-oriented multiprocessors
We report on our experience with the accuracy of mean value analysis analytical models for evaluating shared bus multiprocessors operating in a throughput-oriented environment. Having developed separate models for multiprocessors with circuit switched ...
Performance analysis of Time Warp with homogeneous processors and exponential task times
The behavior of n interacting processors synchronized by the "Time Warp" protocol is analyzed using a discrete state continuous time Markov chain model. The performance and dynamics of the processes are analyzed under the following assumptions: ...
On subcube dependability in a hypercube
In this paper, we present an analytical model for computing the dependability of hypercube systems. The model, referred to as task-based dependability (TBD), is developed under the assumption that a task needs at least an m-cube (m < n) in an n-cube ...
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications
Shared-memory multiprocessors are frequently used as compute servers with multiple parallel applications executing at the same time. In such environments, the efficiency of a parallel application can be significantly affected by the operating system ...
Processor-pool-based scheduling for large-scale NUMA multiprocessors
Large-scale Non-Uniform Memory Access (NUMA) multiprocessors are gaining increased attention due to their potential for achieving high performance through the replication of relatively simple components. Because of the complexity of such systems, ...
Analysis of task migration in shared-memory multiprocessor scheduling
In shared-memory multiprocessor systems it may be more efficient to schedule a task on one processor than on mother. Due to the inevitability of idle processors in these environments, there exists an important tradeoff between keeping the workload ...
Analytical modelling of a hierarchical buffer for a data sharing environment
In a data sharing environment, where a number of loosely coupled computing nodes share a common storage subsystem, the effectiveness of a private buffer at each node is limited due to the multi-system invalidation effect, particularly under a non-...
Performance analysis of concurrent-read exclusive-write
We analyze the concurrent-read exclusive-write protocol for access to a shared resource, such as occurs in database and distributed operating systems. Readers arrive according to a Poisson process and acquire shareable i.e., non-exclusive, locks which, ...
Performance measurement of a parallel Input/Output system for the Intel iPSC/2 Hypercube
The Intel Concurrent File System (CFS) for the iPSC/2 hypercube is one of the first production file systems to utilize the declustering of large files across numbers of disks to improve I/O performance. The CFS also makes use of dedicated I/O nodes, ...
Performance of a disk array protype
The RAID group at U.C. Berkeley recently built a prototype disk array. This paper examines the performance limits of each component of the array usiug SCSI bus traces, Sprite operating system traces and user programs.The array performs successfully for ...
Performance of a mirrored disk in a real-time transaction system
Disk mirroring has found widespread use in computer systems as a method for providing fault tolerance. In addition to increasing reliability, a mirrored disk can also reduce I/O response time by supporting the execution of parallel I/O requests. The ...
Instrumentation for a massively parallel MIMD application
This paper describes an application implemented on a simulated machine called Horizon. One purpose of this study is to investigate some of the features of a possible future machine (or class of machines) with a view toward deciding, early on in the ...
MTOOL: a method for detecting memory bottlenecks
This paper presents a new, relatively inexpensive method for detecting regions (e.g. loops and procedures) in a program where the memory hierarchy is performing poorly. By observing where actual measured execution time differs from the time predicted ...
Implementing stack simulation for highly-associative memories
Prior to this work, all implementations of stack simulation [MGS70] required more than linear time to process an address trace. In particular these implementations are often slow for highly-associative memories and traces with poor locality, as can be ...
Performance analysis case study (abstract): application of experimental design & statistical data analysis techniques
A common requirement of computer vendor's competitive performance analysis departments is to measure and report on the performance characteristics of another vendor's system. In many cases the amount of prior knowledge concerning the competitor's systcm ...
Measurements of the paging behavior of UNIX
This paper analyzes measurements of paging activity from several different versions of UNIX. We set out to characterize paging activity by first taking measurements of it, and then writing programs to analyze it. In doing so, we were interested in ...
A static and dynamic workload characterization study of the San Diego Supercomputer center Cray X-MP
The San Diego Supercomputer Center is one of four NSF sponsored national supercomputer centers. Up until January of 1990, its workhorse was a Cray X-MP, which served 2700 researchers from 170 institutions, spanning 44 states. In order to better ...
An experiment on measuring application performance over the Internet
The use of wide area networks (WANs) such as the Internet is growing at a tremendous rate.Such networks hold great promise for new types of distributed applications, which will be widely distributed, highly replicated, intensely interactive, and ...
A parallel branch-and-bound algorithm for MIN-based multiprocessors
A parallel "Decomposite Best-First" search Branch-and-Bound algorithm (pdbsbb) for MIN-based multiprocessor systems is proposed in this paper. A conflict free mapping scheme, known as step-by-step spread, is used to map the algorithm efficiently on to a ...