Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2021
Working Set Analytics
ACM Computing Surveys (CSUR), Volume 53, Issue 6Article No.: 113, Pages 1–36https://doi.org/10.1145/3399709The working set model for program behavior was invented in 1965. It has stood the test of time in virtual memory management for over 50 years. It is considered the ideal for managing memory in operating systems and caches. Its superior performance was ...
- research-articleJune 2016
Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming
ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitecturePages 230–242https://doi.org/10.1109/ISCA.2016.29As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of computing resources to support thread-level parallelism. But even with the best effort, exposing massive thread-level parallelism from a single GPU kernel, ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 44 Issue 3 - research-articleJune 2011
Waste not, want not: resource-based garbage collection in a shared environment
ISMM '11: Proceedings of the international symposium on Memory managementPages 65–76https://doi.org/10.1145/1993478.1993487To achieve optimal performance, garbage-collected applications must balance the sizes of their heaps dynamically. Sizing the heap too small can reduce throughput by increasing the number of garbage collections that must be performed. Too large a heap, ...
Also Published in:
ACM SIGPLAN Notices: Volume 46 Issue 11 - ArticleJuly 2010
Calculation of the acceleration of parallel programs as a function of the number of threads
The purpose of this study is to determine analytically what and how acceleration from paralleling execution of a task depends. It is reasonable if level of parallelism is increased, the costs of synchronization will be increased also and upon reaching a ...
- research-articleOctober 2008
Execution context optimization for disk energy
CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systemsPages 255–264https://doi.org/10.1145/1450095.1450132Power, energy, and thermal concerns have constrained embedded systems designs. Computing capability and storage density have increased dramatically, enabling the emergence of handheld devices from special to general purpose computing. In many mobile ...
-
- research-articleSeptember 2008
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS), Volume 26, Issue 3Article No.: 7, Pages 1–32https://doi.org/10.1145/1394441.1394443Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on its allotted ...
- research-articleMay 2008
Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems
CPUs consume too much power. Modern complex cores sometimes waste power on functions that are not useful for the code they run. In particular, operating system kernels do not benefit from many power-consuming features intended to improve application ...
- articleApril 2008
Prototyping Concurrent Systems with Agents and Artifacts: Framework and Core Calculus
Electronic Notes in Theoretical Computer Science (ENTCS) (ENTCS), Volume 194, Issue 4Pages 111–132https://doi.org/10.1016/j.entcs.2008.03.102More and more aspects of concurrency and concurrent programming are becoming part of mainstream programming and software engineering, due to several factors such as the widespread availability of multi-core / parallel architectures and Internet-based ...
- articleAugust 2007
A Possible Connection Between Two Theories: Grammar Systems and Concurrent Programming
Fundamenta Informaticae (FUNI), Volume 76, Issue 3Pages 325–336The aim of this note is to show how parallel communicating grammar systems and concurrent programs might be viewed as related models for distributed and cooperating computation. We argue that a grammar system can be translated into a concurrent program, ...
- ArticleMarch 2007
Adaptive work stealing with parallelism feedback
PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programmingPages 112–120https://doi.org/10.1145/1229428.1229448We present an adaptive work-stealing thread scheduler, A-Steal, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealing library. The A-Steal algorithm is appropriate for large parallel servers ...
- articleMarch 2007
A Possible Connection Between Two Theories: Grammar Systems and Concurrent Programming
Fundamenta Informaticae (FUNI), Volume 76, Issue 3Pages 325–336The aim of this note is to show how parallel communicating grammar systems and concurrent programs might be viewed as related models for distributed and cooperating computation. We argue that a grammar system can be translated into a concurrent program, ...
- ArticleJuly 2006
Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs
ICPADS '06: Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1Pages 345–354https://doi.org/10.1109/ICPADS.2006.59Symmetric Multiprocessors (SMPs), combined with modern interconnection technologies are commonly used to build cost-effective compute clusters. However, contention among processors for access to shared resources, as is the main memory bus and the NIC can ...
- ArticleMarch 2006
Adaptive scheduling with parallelism feedback
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingPages 100–109https://doi.org/10.1145/1122971.1122988Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted ...
- research-articleJune 2004
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 15, Issue 6Pages 491–504https://doi.org/10.1109/TPDS.2004.8Abstract--Lock-free objects offer significant performance and reliability advantages over conventional lock-based objects. However, the lack of an efficient portable lock-free method for the reclamation of the memory occupied by dynamic nodes removed ...
- research-articleJune 2002
Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors
- Dimitrios S. Nikolopoulos,
- Constantine D. Polychronopoulos,
- Theodore S. Papatheodorou,
- Jesús Labarta,
- Eduard Ayguadé
Journal of Parallel and Distributed Computing (JPDC), Volume 62, Issue 6Pages 1069–1103https://doi.org/10.1006/jpdc.2001.1817The performance of multiprogrammed shared-memory multiprocessors suffers often from scheduler interventions that neglect data locality. On cache-coherent distributed shared-memory (DSM) multiprocessors, such scheduler interventions tend to increase the ...
- articleJanuary 2002
A Simple, Object-Based View of Multiprogramming
Formal Methods in System Design (FMSD), Volume 20, Issue 1Pages 23–45https://doi.org/10.1023/A:1012904412467Object-based sequential programming has had a major impact on software engineering. However, object-based concurrent programming remains elusive as an effective programming tool. The class of applications that will be implemented on future high-...
- ArticleDecember 2000
Holistic schedulability analysis of a fault-tolerant real-time distributed run-time support
RTCSA '00: Proceedings of the Seventh International Conference on Real-Time Systems and ApplicationsPage 355The feasibility test of a hard real time system must not only take into account the temporal behavior of the application tasks but also the behavior of the run-time support in charge of executing applications. The paper is devoted to the schedulability ...
- ArticleAugust 2000
Sector Cache Design and Performance
The first commercially available CPU cache memory used a sector design, by which the cache consisted of sectors (address tags) and sub-sectors (or blocks, with valid bits). It rapidly became clear that superior performance could be obtained with the now ...