research-article

GLite: a fast and efficient automatic graph-level optimizer for large-scale DNNs

Authors:

Mengting YuanAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 199 - 204

https://doi.org/10.1145/3489517.3530418

Published: 23 August 2022 Publication History

Abstract

We propose a scalable graph-level optimizer named GLite to speed up search-based optimizations on large neural networks. GLite leverages a potential-based partitioning strategy to partition large computation graphs into small subgraphs without losing profitable substitution patterns. To avoid redundant subgraph matching, we propose a dynamic programming algorithm to reuse explored matching patterns. The experimental results show that GLite reduces the running time of search-based optimizations from hours to milliseconds, without compromising in inference performance.

References

[1]

Böhme, D., Wolf, F., de Supinski, B. R., Schulz, M., and Geimer, M. Scalable critical-path based performance analysis. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium (2012), IEEE, pp. 1330--1340.

Digital Library

[2]

Carletti, V., Foggia, P., Saggese, A., and Vento, M. Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 804--818.

[3]

Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E. Q., Shen, H., Cowan, M., Wang, L., Hu, Y., Ceze, L., Guestrin, C., and Krishnamurthy, A. TVM: an automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2018 (2018), A. C. Arpaci-Dusseau and G. Voelker, Eds., USENIX Association, pp. 578--594.

[4]

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to algorithms. MIT press, 2009.

Digital Library

[5]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770--778.

[7]

Jia, Z., Padon, O., Thomas, J., Warszawski, T., Zaharia, M., and Aiken, A. Taso: optimizing deep learning computation with automatic generation of graph substitutions. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (2019), pp. 47--62.

Digital Library

[8]

Jia, Z., Thomas, J., Warszawski, T., Gao, M., Zaharia, M., and Aiken, A. Optimizing dnn computation with relaxed graph substitutions. SysML 2019 (2019).

[9]

Lattner, C., and Adve, V. S. LLVM: A compilation framework for lifelong program analysis & transformation. In 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 20--24 March 2004, San Jose, CA, USA (2004), IEEE Computer Society, pp. 75--88.

[10]

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026--8037.

[11]

Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[12]

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 2818--2826.

[13]

Vanholder, H. Efficient inference with tensorrt, 2016.

[14]

Yang, Y., Phothilimthana, P., Wang, Y., Willsey, M., Roy, S., and Pienaar, J. Equality saturation for tensor graph superoptimization. Proceedings of Machine Learning and Systems 3 (2021).

[15]

Zoph, B., and Le, Q. V. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).

Cited By

Yan GLiu XWang HHa Y(2023)Fast FPGA Accelerator of Graph Cut Algorithm with Out-of-order Parallel Execution in Folding Grid Architecture2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247784(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247784

Index Terms

GLite: a fast and efficient automatic graph-level optimizer for large-scale DNNs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Development and investigation of efficient artificial bee colony algorithm for numerical function optimization

Artificial bee colony algorithm (ABC), which is inspired by the foraging behavior of honey bee swarm, is a biological-inspired optimization. It shows more effective than genetic algorithm (GA), particle swarm optimization (PSO) and ant colony ...
CAPSO: Centripetal accelerated particle swarm optimization

Meta-heuristic search algorithms are developed to solve optimization problems. Such algorithms are appropriate for global searches because of their global exploration and local exploitation abilities. Swarm intelligence (SI) algorithms comprise a branch ...
Training neural networks using Salp Swarm Algorithm for pattern classification
ICFNDS '18: Proceedings of the 2nd International Conference on Future Networks and Distributed Systems

Pattern classification is one of the popular applications of neural networks. However, training the neural networks is the most essential phase. Traditional training algorithms (e.g. Back-propagation algorithm) have some drawbacks such as falling into ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

2030 National Key AI Program of China
Natural Science Foundation of China (NSFC)
Application Foundation Frontier Project of Wuhan
Key R&D Project of Hubei Province

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
312
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yan GLiu XWang HHa Y(2023)Fast FPGA Accelerator of Graph Cut Algorithm with Out-of-order Parallel Execution in Folding Grid Architecture2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247784(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247784

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents