IJPP: Vol 43, No 6

Volume 43, Issue 6December 2015

Volume 43, Issue 6

December 2015

Publisher:

Kluwer Academic Publishers
101 Philip Drive Assinippi Park Norwell, MA
United States

ISSN:0885-7458

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

article

Guest Editorial: SBAC-PAD 2013

Pages 961–964https://doi.org/10.1007/s10766-015-0377-2

article

A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors

Pages 965–987https://doi.org/10.1007/s10766-014-0325-6

As a kernel function in network routers, packet classification requires the incoming packet headers to be checked against a set of predefined rules. There are two trends for packet classification: (1) to examine a large number of packet header fields, ...

article

Fully Optimized Code Block Segmentation Algorithm for LTE-Advanced

Pages 988–1003https://doi.org/10.1007/s10766-014-0324-7

In our previous work, we presented a brief analysis of the performance of the code block segmentation procedure adopted by the 3GPP LTE Advanced (LTE-A) Standard as part of its physical layer channel coding scheme. Here, a detailed analysis of its ...

article

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

Pages 1004–1027https://doi.org/10.1007/s10766-014-0336-3

Achieving high scalability with dynamically adaptive algorithms in high-performance computing (HPC) is a non-trivial task. The invasive paradigm using compute migration represents an efficient alternative to classical data migration approaches for such ...

article

PageRank Computation Using a Multiple Implicitly Restarted Arnoldi Method for Modeling Epidemic Spread

Pages 1028–1053https://doi.org/10.1007/s10766-014-0344-3

A parallel implementation based on implicitly restarted Arnoldi method (MIRAM) is proposed for calculating dominant eigenpair of stochastic matrices derived from very large real networks. Their high damping factor makes many existing algorithms less ...

article

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

Pages 1054–1077https://doi.org/10.1007/s10766-014-0339-0

As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies ...

article

BPLG: A Tuned Butterfly Processing Library for GPU Architectures

Pages 1078–1102https://doi.org/10.1007/s10766-014-0323-8

In order to increase the efficiency of existing software many works are incorporating GPU processing. However, despite the current advances in GPU languages and tools, taking advantage of their parallel architecture is still far more complex than ...

article

List Scheduling in Embedded Systems Under Memory Constraints

Pages 1103–1128https://doi.org/10.1007/s10766-014-0338-1

Video decoding and image processing in embedded systems are subject to strong resource constraints, particularly in terms of memory. List-scheduling heuristics with static priorities (HEFT, SDC, etc.) being the oft-cited solutions due to both their good ...

article

A Hardware/Software Approach for Database Query Acceleration with FPGAs

Pages 1129–1159https://doi.org/10.1007/s10766-014-0327-4

Complex analytics queries often involve expensive operations that may require large computational runtimes leading to slow query responsiveness and hampering real-time performance. Moreover, running these expensive analytics queries inside traditional ...

article

An Autotuning Engine for the 3D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms

Pages 1160–1191https://doi.org/10.1007/s10766-014-0328-3

This work presents an optimization method to run the 3D-fast wavelet transform (3D-FWT) on a CPU + GPU system. The optimization engine detects the different computing components in the system, and executes the appropriate kernel implemented in both CUDA ...

article

The Scalability of Disjoint Data Structures on a New Hardware Transactional Memory System

Pages 1192–1217https://doi.org/10.1007/s10766-014-0322-9

In this paper we present our experiences constructing and testing in-memory data structures designed to be disjoint enough for transactional memory to be profitable as a serialization mechanism with no fallback to traditional locking. Our goal was to ...

article

Extending Summation Precision for Network Reduction Operations

Pages 1218–1243https://doi.org/10.1007/s10766-014-0326-5

Double precision summation is at the core of numerous important algorithms such as Newton---Krylov methods and other operations involving inner products, such as matrix multiplication and dot products. However, the effectiveness of summation is limited ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

International Journal of Parallel Programming

Sections

Guest Editorial: SBAC-PAD 2013

A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors

Fully Optimized Code Block Segmentation Algorithm for LTE-Advanced

Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization

PageRank Computation Using a Multiple Implicitly Restarted Arnoldi Method for Modeling Epidemic Spread

Cluster Cache Monitor: Leveraging the Proximity Data in CMP

BPLG: A Tuned Butterfly Processing Library for GPU Architectures

List Scheduling in Embedded Systems Under Memory Constraints

A Hardware/Software Approach for Database Query Acceleration with FPGAs

An Autotuning Engine for the 3D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms

The Scalability of Disjoint Data Structures on a New Hardware Transactional Memory System

Extending Summation Precision for Network Reduction Operations