Downsampling Algorithms for Large Sparse Matrices
Mapping of sparse matrices to processors of a parallel system may have a significant impact on the development of sparse-matrix algorithms and, in effect, to their efficiency. We present and empirically compare two downsampling algorithms for sparse ...
A Comparative Study of Parallel RANSAC Implementations in 3D Space
RANSAC (RAndom SAmple Consensus) is an iterative method for estimating the parameters of a certain mathematical model from a set of data which may contain a large number of outliers (noisy points). The main problem of the RANSAC algorithm is that it is ...
Queue-Based and Adaptive Lock Algorithms for Scalable Resource Allocation on Shared-Memory Multiprocessors
We present a scalable lock algorithm and an adaptive scheme for shared-memory multiprocessors addressing the resource allocation problem, which is also known as the $$h$$h-out-of-$$k$$k mutual exclusion problem. In this problem, threads compete for $$k$$...
pocl: A Performance-Portable OpenCL Implementation
OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the ...
A Comparative Analysis of Adaptive Solutions for Grid Environments
- María Botón-Fernández,
- Manuel Rodríguez-Pascual,
- Miguel A. Vega-Rodríguez,
- Francisco Prieto-Castrillo,
- Rafael Mayo-García
Grid computing environments are distributed systems composed by heterogeneous and geographically distributed resources. This type of systems mainly emerged to satisfy the increasing computing power demand within the scientific community. Despite the ...
Co-operation in the Parallel Memetic Algorithm
Evolutionary algorithms (EAs) have been attracting research attention for last decades. They were shown to be very efficient in solving various complex optimization problems in most fields of science and engineering. In EAs, the population of solutions ...
A Fast Parallel Implementation of a PTAS for Fractional Packing and Covering Linear Programs
We present a parallel implementation of the randomized $$(1+\varepsilon )$$(1+ ) approximation algorithm for packing and covering linear programs presented by Koufogiannakis and Young (2007). Their approach builds on ideas of the sublinear time ...
Efficient 3D Transpositions in Graphics Processing Units
Matrix transposition is a basic operation for several computing tasks. Hence, transposing a matrix in a computer's main memory has been well studied since many years ago. More recently, the out-of-place matrix transposition has been performed ...
Steal Locally, Share Globally
In a general-purpose computing system, several parallel applications run simultaneously on the same platform. Even if each application is highly tuned for that specific platform, additional performance issues are arising in such a dynamic environment in ...
Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem
The single-source shortest path (SSSP) problem arises in many different fields. In this paper, we present a GPU SSSP algorithm implementation. Our work significantly speeds up the computation of the SSSP, not only with respect to a CPU-based version, ...
TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities
During the last decade, parallel processing architectures have become a powerful tool to deal with massively-parallel problems that require high performance computing (HPC). The last trend of HPC is the use of heterogeneous environments, that combine ...