Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3458744.3473367acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Enabling Real-Time Irregular Data-Flow Pipelines on SIMD Devices

Published: 23 September 2021 Publication History

Abstract

Streaming data-flow applications arise in many contexts where each item in a data stream must be processed within a bounded latency, or deadline, following its arrival. We consider applications whose behavior is irregular, in the sense that the application may reduce or amplify data volumes dynamically at various stages of its computation. Our implementation target for these applications is SIMD-capable processors such as GPUs. For such devices, organizing the computation so that a full-width SIMD vector of inputs can be processed at once makes efficient use of the processor. However, having parts of the computation wait while full vectors of input accumulate may cause the application to miss deadlines.
We present a novel approach to scheduling irregular streaming applications with latency constraints on SIMD devices. After describing a model for executing such applications, we formalize the objective of efficient processor utilization and the constraints associated with bounded latency and sufficient throughput to handle a stream of items arriving at a fixed rate. We introduce a strategy, enforced waits, to optimize the objective subject to the constraints. We demonstrate empirically that, for a test application from bioinformatics, our strategy can lower processor utilization relative to a baseline approach that cannot introduce waits inside the application pipeline. Finally, we characterize the region of parameter space in which the new approach is likely to outperform the baseline.

References

[1]
Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403–410.
[2]
Norman T. J. Bailey. 1954. On queueing processes with bulk service. J. Royal Statistical Society, Series B (Methodological) 16 (1954), 80–87.
[3]
Pierre Bonami, Lorenz T. Biegler, Andrew R. Conn, Gérard Cornuéjols, Ignacio E. Grossmann, Carl D. Laird, Jon Lee, Andrea Lodi, François Margot, Nicolas Sawaya, and Andreas Wächter. 2008. An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optimization 5, 2 (2008), 186–204.
[4]
Pierre Bonami, Mustafa Kilinç, and Jeff Linderoth. 2012. Algorithms and software for convex mixed integer nonlinear programs. In Mixed Integer Nonlinear Programming. Springer, 1–39.
[5]
G. Briére and M. L. Chaudhry. 1989. Computational analysis of single-server bulk service queues, M/GY/1. Advances in Applied Probability 21 (1989), 207–225.
[6]
James Buckley, Lars Bergstrom, Bob Binns, Jeremy Buhler, Wenlei Chen, Michael Cherry, Stefan Funk, Dan Hooper, John Mitchell, Georgia De Nolfo, 2019. The Advanced Particle-astrophysics Telescope (APT). Bulletin of the American Astronomical Society 51, 7 (2019), 78.
[7]
M. Burtscher, R. Nasre, and K. Pingali. 2012. A quantitative study of irregular programs on GPUs. In Proc. 2012 IEEE Int’l Symp. on Workload Characterization. 141–151.
[8]
Guoyang Chen, Yue Zhao, Xipeng Shen, and Huiyang Zhou. 2017. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU. SIGPLAN Not. 52, 8 (Jan. 2017), 3–16.
[9]
Stephen V. Cole and Jeremy Buhler. 2017. MERCATOR: A GPGPU Framework for Irregular Streaming Applications. In 2017 International Conference on High Performance Computing Simulation (HPCS). 727–736.
[10]
Glenn A. Elliott, Bryan C. Ward, and James H. Anderson. 2013. GPUSync: A Framework for Real-Time GPU Management. In 2013 IEEE 34th Real-Time Systems Symposium. 33–44.
[11]
Kshitij Gupta, Jeff A. Stuart, and John D. Owens. 2012. A study of Persistent Threads style GPU programming for GPGPU workloads. In 2012 Innovative Parallel Computing (InPar). 1–14.
[12]
K. Gupta, J. A. Stuart, and J. D. Owens. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In Innovative Parallel Computing (IEEE InPar). 1–14.
[13]
W. Henderson and P. G. Taylor. 1990. Product form in networks of queues with batch arrivals and batch services. Queueing Systems 6(1990), 71–88.
[14]
W. Henderson and P. G. Taylor. 1991. Some new results on queueing networks with batch movement. J. Applied Probability 28 (1991), 409–421.
[15]
Shinpei Kato, Karthik Lakshmanan, Raj Rajkumar, and Yutaka Ishikawa. 2011. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proc. USENIX ATC. 17–30.
[16]
E.A. Lee and D.G. Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235–1245.
[17]
Robin Lougee-Heimer. 2003. The common optimization interface for operations research: Promoting open-source software in the operations research community. IBM Journal of Research and Development 47, 1 (2003), 57–66.
[18]
Mihaela Mitici, Jasper Goseling, Jan-Kees van Ommeren, Maurits de Graaf, and Richard J. Boucherie. 2017. Tandem queue with batch service and its applications in wireless sensor networks. Queueing Systems 87(2017), 81–93.
[19]
Nathan Otterness and James H Anderson. 2021. Exploring AMD GPU Scheduling Details by Experimenting With “Worst Practices”. In Proceedings of the 29th International Conference on Real-Time Networks and Systems.
[20]
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (Istanbul, Turkey) (ASPLOS ’15). Association for Computing Machinery, New York, NY, USA, 593–606.
[21]
Tom Plano and Jeremy Buhler. 2020. Scheduling irregular dataflow pipelines on SIMD architectures. In Proceedings of the 2020 Sixth Workshop on Programming Models for SIMD/Vector Processing. 1–9.
[22]
Martin Roesch 1999. Snort: Lightweight intrusion detection for networks. In Lisa, Vol. 99. 229–238.
[23]
Ivan Tanasic, Isaac Gelado, Javier Cabezas, Alex Ramirez, Nacho Navarro, and Mateo Valero. 2014. Enabling Preemptive Multiprogramming on GPUs. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (Minneapolis, Minnesota, USA) (ISCA ’14). IEEE Press, 193–204.
[24]
Stanley Tzeng, Anjul Patney, and John D. Owens. 2010. Task Management for Irregular-Parallel Workloads on the GPU. In High Performance Graphics, Michael Doggett, Samuli Laine, and Warren Hunt (Eds.). The Eurographics Association, 9 pages.
[25]
Uri Verner, Assaf Schuster, Mark Silberstein, and Avi Mendelson. 2012. Scheduling Processing of Real-Time Data Streams on Heterogeneous Multi-GPU Systems. In Proceedings of the 5th Annual International Systems and Storage Conference (Haifa, Israel) (SYSTOR ’12). Association for Computing Machinery, New York, NY, USA, Article 8.
[26]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, Vol. 1. IEEE, I–I.
[27]
Bo Wu, Xu Liu, Xiaobo Zhou, and Changjun Jiang. 2017. FLEP: Enabling Flexible and Efficient Preemption on GPUs. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). Association for Computing Machinery, New York, NY, USA, 483–496.

Index Terms

  1. Enabling Real-Time Irregular Data-Flow Pipelines on SIMD Devices
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop
    August 2021
    314 pages
    ISBN:9781450384414
    DOI:10.1145/3458744
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 September 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPGPU
    2. bounded latency
    3. irregular data-flow
    4. scheduling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • NSF

    Conference

    ICPP 2021

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 182
      Total Downloads
    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media