Keyword: shared-memory : Search

research-article

Open Access

Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systems

Wojciech Kwedlo

ICPP '24: Proceedings of the 53rd International Conference on Parallel ProcessingPages 1–10https://doi.org/10.1145/3673038.3673057

This paper addresses the problem of deriving explanations in the form of compact decision trees for cluster assignments made by the well-known K-means method. It introduces two versions of the Iterative Mistake Minimization (IMM) algorithm, both ...

research-article

Open Access

Scalable High-Quality Hypergraph Partitioning

ACM Transactions on Algorithms (TALG), Volume 20, Issue 1Article No.: 9, Pages 1–54https://doi.org/10.1145/3626527

Balanced hypergraph partitioning is an NP-hard problem with many applications, e.g., optimizing communication in distributed data placement problems. The goal is to place all nodes across k different blocks of bounded size, such that hyperedges span as ...

research-article

Free

JUST ACCEPTED

Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

ACM Transactions on Modeling and Computer Simulation (TOMACS), Just Accepted https://doi.org/10.1145/3639703

Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation ...

Article

Transactional-Turn Causal Consistency

Euro-Par 2023: Parallel ProcessingPages 578–591https://doi.org/10.1007/978-3-031-39698-4_39

Abstract

Function-as-a-Service (FaaS, serverless) computing systems use an actor-like model that executes a function asynchronously, atomically and in an isolated context. However, a function must often also access state, e.g., memory or a database. This ...

research-article

Effective Access to the Committed Global State in Speculative Parallel Discrete Event Simulation on Multi-core Machines

SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationPages 107–117https://doi.org/10.1145/3573900.3591117

Output production and predicate detection are critical in speculative parallel discrete event simulation, since they need to take place accessing past state values—which have become committed—rather than the current state of the simulation objects, ...

research-article

Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

SIGSIM-PADS '22: Proceedings of the 2022 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationPages 81–92https://doi.org/10.1145/3518997.3531026

The recent literature has reshuffled the architectural organization of speculative parallel discrete event simulation systems for shared-memory multi-core machines. A core aspect has been the full sharing of the workload at the level of individual ...

research-article

Open Access

What’s Decidable About Causally Consistent Shared Memory?

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 44, Issue 2Article No.: 8, Pages 1–55https://doi.org/10.1145/3505273

While causal consistency is one of the most fundamental consistency models weaker than sequential consistency, the decidability of safety verification for (finite-state) concurrent programs running under causally consistent shared memories is still ...

research-article

Open Access

Efficient computation of Hash Hirschberg protein alignment utilizing hyper threading multi‐core sharing technology

CAAI Transactions on Intelligence Technology (CIT2), Volume 7, Issue 2Pages 278–291https://doi.org/10.1049/cit2.12070

Abstract

Due to current technology enhancement, molecular databases have exponentially grown requesting faster efficient methods that can handle these amounts of huge data. Therefore, Multi‐processing CPUs technology can be used including physical and ...

research-article

Public Access

SB-Fetch: synchronization aware hardware prefetching for chip multiprocessors

ICS '20: Proceedings of the 34th ACM International Conference on SupercomputingArticle No.: 15, Pages 1–12https://doi.org/10.1145/3392717.3392735

Shared-memory, multi-threaded applications often require programmers to insert thread synchronization primitives (i.e. locks, barriers, and condition variables) in critical sections to synchronize data access between processes. Scaling performance ...

research-article

Open Access

Decidable verification under a causally consistent shared memory

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationPages 211–226https://doi.org/10.1145/3385412.3385966

Causal consistency is one of the most fundamental and widely used consistency models weaker than sequential consistency. In this paper, we study the verification of safety properties for finite-state concurrent programs running under a causally ...

research-article

Scalable Kernelization for Maximum Independent Sets

ACM Journal of Experimental Algorithmics (JEA), Volume 24Article No.: 1.16, Pages 1–22https://doi.org/10.1145/3355502

The most efficient algorithms for finding maximum independent sets in both theory and practice use reduction rules to obtain a much smaller problem instance called a kernel. The kernel can then be solved quickly using exact or heuristic algorithms—or by ...

research-article

Persistent Non-Blocking Binary Search Trees Supporting Wait-Free Range Queries

SPAA '19: The 31st ACM Symposium on Parallelism in Algorithms and ArchitecturesPages 275–286https://doi.org/10.1145/3323165.3323197

This paper presents the first implementation of a search tree data structure in an asynchronous shared-memory system that provides a wait-free algorithm for executing range queries on the tree, in addition to non-blocking algorithms for Insert, Delete ...

announcement

Brief Announcement: 2D-Stack -- A Scalable Lock-Free Stack Design that Continuously Relaxes Semantics for Better Performance

PODC '18: Proceedings of the 2018 ACM Symposium on Principles of Distributed ComputingPages 407–409https://doi.org/10.1145/3212734.3212794

We briefly describe an efficient lock-free concurrent stack design with tunable and tenable relaxed semantics to allow for better performance. The design is tunable and allow for a continuous monotonic trade of weaker semantics for better throughput ...

research-article

Public Access

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

SPAA '18: Proceedings of the 30th on Symposium on Parallelism in Algorithms and ArchitecturesPages 393–404https://doi.org/10.1145/3210377.3210414

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest ...

research-article

Free

Shared-memory parallelization of the fast marching method using an overlapping domain-decomposition approach

HPC '16: Proceedings of the 24th High Performance Computing SymposiumArticle No.: 18, Pages 1–8https://doi.org/10.22360/SpringSim.2016.HPC.052

The fast marching method is used to compute a monotone front propagation of anisotropic nature by solving the eikonal equation. Due to the sequential nature of the original algorithm, parallel approaches presented so far were unconvincing. In this work, ...

abstract

Teaching Parallel Computing Concepts with OpenMP (Abstract Only)

SIGCSE '16: Proceedings of the 47th ACM Technical Symposium on Computing Science EducationPages 712–713https://doi.org/10.1145/2839509.2844681

OpenMP is an industry-standard, platform-independent parallel programming library built into all modern C and C++ compilers. Unlike complex parallel platforms, OpenMP is designed to make it relatively easy to add parallelism to existing sequential ...

research-article

The Price of being Adaptive

PODC '15: Proceedings of the 2015 ACM Symposium on Principles of Distributed ComputingPages 183–192https://doi.org/10.1145/2767386.2767428

Mutual exclusion is a fundamental distributed coordination problem. Shared-memory mutual exclusion research focuses on local-spin algorithms and uses the remote memory references (RMRs) metric. To ensure the correctness of concurrent algorithms in ...

research-article

Cache-Efficient Aggregation: Hashing Is Sorting

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataPages 1123–1136https://doi.org/10.1145/2723372.2747644

For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically ...

research-article

Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies

ACM Transactions on Mathematical Software (TOMS), Volume 40, Issue 4Article No.: 27, Pages 1–22https://doi.org/10.1145/2560421

In many scientific and engineering applications, one has to solve not one but multiple instances of the same problem. Often times, these problems are linked in a way that allows intermediate results to be reused. A characteristic example for this class ...

research-article

Large-scale network simulation: leveraging the strengths of modern SMP-based compute clusters

SIMUTools '14: Proceedings of the 7th International ICST Conference on Simulation Tools and TechniquesPages 31–40https://doi.org/10.4108/icst.simutools.2014.254622

Parallelization is crucial for efficient execution of large-scale network simulation. Today's computing clusters commonly used for that purpose are built from a large amount of multi-processor machines. The traditional approach to utilize all CPU cores ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systems

Scalable High-Quality Hypergraph Partitioning

Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

Transactional-Turn Causal Consistency

Effective Access to the Committed Global State in Speculative Parallel Discrete Event Simulation on Multi-core Machines

Upcoming Conferences

Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

What’s Decidable About Causally Consistent Shared Memory?

Efficient computation of Hash Hirschberg protein alignment utilizing hyper threading multi‐core sharing technology

SB-Fetch: synchronization aware hardware prefetching for chip multiprocessors

Decidable verification under a causally consistent shared memory

Scalable Kernelization for Maximum Independent Sets

Persistent Non-Blocking Binary Search Trees Supporting Wait-Free Range Queries

Brief Announcement: 2D-Stack -- A Scalable Lock-Free Stack Design that Continuously Relaxes Semantics for Better Performance

Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

Shared-memory parallelization of the fast marching method using an overlapping domain-decomposition approach

Teaching Parallel Computing Concepts with OpenMP (Abstract Only)

The Price of being Adaptive

Cache-Efficient Aggregation: Hashing Is Sorting

Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies

Large-scale network simulation: leveraging the strengths of modern SMP-based compute clusters

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences