research-article

Open access

Enabling Real-Time Irregular Data-Flow Pipelines on SIMD Devices

Authors:

Jeremy BuhlerAuthors Info & Claims

ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop

Article No.: 9, Pages 1 - 8

https://doi.org/10.1145/3458744.3473367

Published: 23 September 2021 Publication History

All formats PDF

Abstract

Streaming data-flow applications arise in many contexts where each item in a data stream must be processed within a bounded latency, or deadline, following its arrival. We consider applications whose behavior is irregular, in the sense that the application may reduce or amplify data volumes dynamically at various stages of its computation. Our implementation target for these applications is SIMD-capable processors such as GPUs. For such devices, organizing the computation so that a full-width SIMD vector of inputs can be processed at once makes efficient use of the processor. However, having parts of the computation wait while full vectors of input accumulate may cause the application to miss deadlines.

We present a novel approach to scheduling irregular streaming applications with latency constraints on SIMD devices. After describing a model for executing such applications, we formalize the objective of efficient processor utilization and the constraints associated with bounded latency and sufficient throughput to handle a stream of items arriving at a fixed rate. We introduce a strategy, enforced waits, to optimize the objective subject to the constraints. We demonstrate empirically that, for a test application from bioinformatics, our strategy can lower processor utilization relative to a baseline approach that cannot introduce waits inside the application pipeline. Finally, we characterize the region of parameter space in which the new approach is likely to outperform the baseline.

References

[1]

Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403–410.

[2]

Norman T. J. Bailey. 1954. On queueing processes with bulk service. J. Royal Statistical Society, Series B (Methodological) 16 (1954), 80–87.

[3]

Pierre Bonami, Lorenz T. Biegler, Andrew R. Conn, Gérard Cornuéjols, Ignacio E. Grossmann, Carl D. Laird, Jon Lee, Andrea Lodi, François Margot, Nicolas Sawaya, and Andreas Wächter. 2008. An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optimization 5, 2 (2008), 186–204.

Digital Library

[4]

Pierre Bonami, Mustafa Kilinç, and Jeff Linderoth. 2012. Algorithms and software for convex mixed integer nonlinear programs. In Mixed Integer Nonlinear Programming. Springer, 1–39.

[5]

G. Briére and M. L. Chaudhry. 1989. Computational analysis of single-server bulk service queues, M/GY/1. Advances in Applied Probability 21 (1989), 207–225.

[6]

James Buckley, Lars Bergstrom, Bob Binns, Jeremy Buhler, Wenlei Chen, Michael Cherry, Stefan Funk, Dan Hooper, John Mitchell, Georgia De Nolfo, 2019. The Advanced Particle-astrophysics Telescope (APT). Bulletin of the American Astronomical Society 51, 7 (2019), 78.

[7]

M. Burtscher, R. Nasre, and K. Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proc. 2012 IEEE Int’l Symp. on Workload Characterization. 141–151.

[8]

Guoyang Chen, Yue Zhao, Xipeng Shen, and Huiyang Zhou. 2017. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU. SIGPLAN Not. 52, 8 (Jan. 2017), 3–16.

Digital Library

[9]

Stephen V. Cole and Jeremy Buhler. 2017. MERCATOR: A GPGPU Framework for Irregular Streaming Applications. In 2017 International Conference on High Performance Computing Simulation (HPCS). 727–736.

[10]

Glenn A. Elliott, Bryan C. Ward, and James H. Anderson. 2013. GPUSync: A Framework for Real-Time GPU Management. In 2013 IEEE 34th Real-Time Systems Symposium. 33–44.

[11]

Kshitij Gupta, Jeff A. Stuart, and John D. Owens. 2012. A study of Persistent Threads style GPU programming for GPGPU workloads. In 2012 Innovative Parallel Computing (InPar). 1–14.

[12]

K. Gupta, J. A. Stuart, and J. D. Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In Innovative Parallel Computing (IEEE InPar). 1–14.

[13]

W. Henderson and P. G. Taylor. 1990. Product form in networks of queues with batch arrivals and batch services. Queueing Systems 6(1990), 71–88.

Digital Library

[14]

W. Henderson and P. G. Taylor. 1991. Some new results on queueing networks with batch movement. J. Applied Probability 28 (1991), 409–421.

[15]

Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proc. USENIX ATC. 17–30.

[16]

E.A. Lee and D.G. Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235–1245.

[17]

Robin Lougee-Heimer. 2003. The common optimization interface for operations research: Promoting open-source software in the operations research community. IBM Journal of Research and Development 47, 1 (2003), 57–66.

Digital Library

[18]

Mihaela Mitici, Jasper Goseling, Jan-Kees van Ommeren, Maurits de Graaf, and Richard J. Boucherie. 2017. Tandem queue with batch service and its applications in wireless sensor networks. Queueing Systems 87(2017), 81–93.

Digital Library

[19]

Nathan Otterness and James H Anderson. 2021. Exploring AMD GPU Scheduling Details by Experimenting With “Worst Practices”. In Proceedings of the 29th International Conference on Real-Time Networks and Systems.

Digital Library

[20]

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (Istanbul, Turkey) (ASPLOS ’15). Association for Computing Machinery, New York, NY, USA, 593–606.

Digital Library

[21]

Tom Plano and Jeremy Buhler. 2020. Scheduling irregular dataflow pipelines on SIMD architectures. In Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing. 1–9.

Digital Library

[22]

Martin Roesch 1999. Snort: Lightweight intrusion detection for networks. In Lisa, Vol. 99. 229–238.

[23]

Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling Preemptive Multiprogramming on GPUs. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (Minneapolis, Minnesota, USA) (ISCA ’14). IEEE Press, 193–204.

Digital Library

[24]

Stanley Tzeng, Anjul Patney, and John D. Owens. 2010. Task Management for Irregular-Parallel Workloads on the GPU. In High Performance Graphics, Michael Doggett, Samuli Laine, and Warren Hunt (Eds.). The Eurographics Association, 9 pages.

[25]

Uri Verner, Assaf Schuster, Mark Silberstein, and Avi Mendelson. 2012. Scheduling Processing of Real-Time Data Streams on Heterogeneous Multi-GPU Systems. In Proceedings of the 5th Annual International Systems and Storage Conference (Haifa, Israel) (SYSTOR ’12). Association for Computing Machinery, New York, NY, USA, Article 8.

Digital Library

[26]

Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, Vol. 1. IEEE, I–I.

[27]

Bo Wu, Xu Liu, Xiaobo Zhou, and Changjun Jiang. 2017. FLEP: Enabling Flexible and Efficient Preemption on GPUs. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). Association for Computing Machinery, New York, NY, USA, 483–496.

Digital Library

Index Terms

Enabling Real-Time Irregular Data-Flow Pipelines on SIMD Devices

Index terms have been assigned to the content through auto-classification.

Recommendations

Scheduling Irregular Dataflow Pipelines on SIMD Architectures
WPMVP'20: Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing

Streaming computations often exhibit substantial data parallelism that makes them well-suited to SIMD architectures. However, many such computations also exhibit irregularity, in the form of data-dependent, dynamic data rates, that makes efficient SIMD ...
Continual flow pipelines
ASPLOS '04

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides ...
Continual flow pipelines
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems

Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop

August 2021

314 pages

ISBN:9781450384414

DOI:10.1145/3458744

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

NSF

Conference

ICPP 2021

ICPP 2021: 50th International Conference on Parallel Processing

August 9 - 12, 2021

IL, Lemont, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
182
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)21

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents