research-article

Open access

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis

Authors:

Yun LiangAuthors Info & Claims

MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 1271 - 1288

https://doi.org/10.1145/3613424.3623792

Published: 08 December 2023 Publication History

All formats PDF

Abstract

With the increasing size of DNN models and the growing discrepancy between compute performance and memory bandwidth, fusing multiple layers together to reduce off-chip memory access has become a popular approach in dataflow design. However, designing such dataflows requires flexible and accurate performance models to facilitate evaluation, architecture analysis, and design space exploration. Unfortunately, current state-of-the-art performance models are limited to the dataflows of single operator acceleration, making them inapplicable to operator fusion dataflows.

In this paper, we propose a framework called TileFlow that models dataflows for operator fusion. We first characterize the design space of fusion dataflows as a 3D space encompassing compute ordering, resource binding, and loop tiling. We then introduce a tile-centric notation to express dataflow designs within this space. Inspired by the tiling structure of fusion dataflows, we present a tree-based approach to analyze two critical performance metrics: data movement volume within the accelerator memory hierarchy and accelerator compute/memory resource usage. Finally, we leverage these metrics to calculate latency and energy consumption. Our evaluation validates TileFlow’s modeling accuracy against both real hardware and state-of-the-art performance models. We use TileFlow to aid in fusion dataflow design and analysis, and it helps us discover fusion dataflows that achieve an average runtime speedup of 1.85 × for self-attention and 1.28 × for convolution chains compared to the state-of-the-art dataflow.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016. 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

Abstract

References

Cited By

Index Terms

Recommendations

Threshold-optimized decision-level fusion and its application to biometrics

Segmentation fusion based on neighboring information for MR brain images

UML-based fusion analysis

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations