Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3519941.3535072acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

RollBin: reducing code-size via loop rerolling at binary level

Published: 14 June 2022 Publication History

Abstract

Code size is an increasing concern on resource constrained systems, ranging from embedded devices to cloud servers. To address the issue, lowering memory occupancy has become a priority in developing and deploying applications, and accordingly compiler-based optimizations have been proposed to reduce program footprint. However, prior arts are generally dealing with source codes or intermediate representations, and thus are very limited in scope in real scenarios where only binary files are commonly provided. To fill the gap, this paper presents a novel code-size optimization RollBin to reroll loops at binary level. RollBin first locates the unrolled loops in binary files, and then probes to decide the unrolling factor by identifying regular memory address patterns. To reconstruct the iterations, we propose a customized data dependency analysis that tackles the challenges brought by shuffled instructions and loop-carry dependencies. Next, the recognized iterations are rolled up through instruction removal and update, which are generally reverting the normal unrolling procedure. The evaluations on standard SPEC2006/2017 and MiBench demonstrate that RollBin effectively shrinks code size by 1.7% and 2.2% on average (up to 7.8%), which respectively outperforms the state-of-the-arts by 31% and 38%. In addition, the use cases of representative realistic applications manifest that RollBin can be applicable in practices.

References

[1]
David Callahan, Jack J. Dongarra, and David Levine. 1988. Vectorizing Compilers: A Test Suite and Results. In Proceedings Supercomputing ’88. IEEE Computer Society, 98–105. https://doi.org/10.1109/SUPERC.1988.44642
[2]
Milind Chabbi, Jin Lin, and Raj Barik. 2021. An Experience with Code-Size Optimization for Production iOS Mobile Applications. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 363–377. https://doi.org/10.1109/CGO51591.2021.9370306
[3]
John Cocke. 1970. Global Common Subexpression Elimination. In Proceedings of a Symposium on Compiler Optimization. ACM, 20–24. https://doi.org/10.1145/800028.808480
[4]
Keith D. Cooper, Philip J. Schielke, and Devika Subramanian. 1999. Optimizing for Reduced Code Space using Genetic Algorithms. In Proceedings of the ACM SIGPLAN 1999 Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES). ACM, 1–9. https://doi.org/10.1145/314403.314414
[5]
Standard Performance Evaluation Corporation. 2021. SPEC CPU® 2017. https://www.spec.org/cpu2017/
[6]
Joe Curley and Sanjiv Shah. 2022. oneAPI – The Cross-Architecture, Multi-Vendor Path to Accelerated Computing. https://www.intel.com/content/www/us/en/developer/articles/technical/cross-architecture-multi-vendor-path-computing.html
[7]
Anderson Faustino da Silva, Bernardo N. B. de Lima, and Fernando Magno Quintão Pereira. 2021. Exploring the Space of Optimization Sequences For Code-Size Reduction: Insights and Tools. In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction (CC). ACM, 47–58. https://doi.org/10.1145/3446804.3446849
[8]
Anderson Faustino da Silva, Bruno Conde Kind, José Wesley de Souza Magalhães, Jerônimo Nunes Rocha, Breno Campos Ferreira Guimarães, and Fernando Magno Quintão Pereira. 2021. ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 378–390. https://doi.org/10.1109/CGO51591.2021.9370322
[9]
Thaís Damásio, Vinícius Pacheco, Fabrício Goes, Fernando Pereira, and Rodrigo Rocha. 2021. Inlining for Code Size Reduction. In Proceedings of the 25th Brazilian Symposium on Programming Languages (SBLP). ACM, 17–24. https://doi.org/10.1145/3475061.3475081
[10]
Jack J. Dongarra and A. R. Hinds. 1979. Unrolling Loops in FORTRAN. Softw. Pract. Exp., 9, 3 (1979), 219–226. https://doi.org/10.1002/spe.4380090307
[11]
Kavon Farvardin. 2019. Halo: Wholly Adaptive LLVM Optimizer. https://github.com/halo-project/halo/blob/master/docs/proposal.pdf
[12]
Gianluca Frison. 2022. BLASFEO - BLAS For Embedded Optimization. https://github.com/giaf/blasfeo/blob/master/CMakeLists.txt
[13]
Gianluca Frison, Dimitris Kouzoupis, Tommaso Sartor, Andrea Zanelli, and Moritz Diehl. 2018. BLASFEO: Basic Linear Algebra Subroutines for Embedded Optimization. ACM Trans. Math. Softw., 44, 4 (2018), 42:1–42:30. https://doi.org/10.1145/3210754
[14]
Google. 2022. Android Developers – Reduce Your App Size. https://developer.android.com/topic/performance/reduce-apk-size
[15]
Matthew R. Guthaus, Jeff Ringenberg, Dan Ernst, Todd M. Austin, Trevor N. Mudge, and Richard B. Brown. 2001. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proceedings of the fourth annual IEEE international workshop on workload characterization (WWC-4). IEEE Computer Society, USA. 3–14. isbn:0780373154 https://doi.org/10.1109/WWC.2001.990739
[16]
Todd T. Hahn, Eric Stotzer, Dineel Sule, and Mike Asal. 2008. Compilation Strategies for Reducing Code Size on a VLIW Processor with Variable Length Instructions. In Proceedings of the High Performance Embedded Architectures and Compilers, Third International Conference (HiPEAC). 4917, Springer, 147–160. https://doi.org/10.1007/978-3-540-77560-7_11
[17]
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News, 34, 4 (2006), 1–17. https://doi.org/10.1145/1186736.1186737
[18]
Erh-Wen Hu, Bogong Su, and Jian Wang. 2016. Instruction Level Loop De-optimization. In Computer and Information Science 2015. Springer International Publishing, Cham. 221–234. isbn:978-3-319-23467-0
[19]
Wen-mei W. Hwu and Pohua P. Chang. 1989. Inline Function Expansion for Compiling C Programs. In Proceedings of the ACM SIGPLAN’89 Conference on Programming Language Design and Implementation (PLDI). ACM, 246–257. https://doi.org/10.1145/73141.74840
[20]
Apple Inc. 2022. Building a Universal macOS Binary. https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary
[21]
Jens Knoop, Oliver Rüthing, and Bernhard Steffen. 1994. Partial Dead Code Elimination. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI). ACM, 147–158. https://doi.org/10.1145/178243.178256
[22]
Chris Lattner. 2002. LLVM: An Infrastructure for Multi-Stage Optimization. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign. Urbana, IL. See http://llvm.cs.uiuc.edu.
[23]
LLVM. 2020. MergeFunctions pass, how it works. https://llvm.org/docs/MergeFunctions.html
[24]
David Monniaux and Cyril Six. 2021. Simple, Light, Yet Formally Verified, Global Common Subexpression Elimination and Loop-Invariant Code Motion. In Proceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES). ACM, 85–96. https://doi.org/10.1145/3461648.3463850
[25]
Kateryna Muts, Arno Luppold, and Heiko Falk. 2019. Compiler-Based Code Compression for Hard Real-Time Systems. In Proceedings of the 22nd International Workshop on Software and Compilers for Embedded Systems (SCOPES). ACM, 72–81. https://doi.org/10.1145/3323439.3323976
[26]
Niels Groot Obbink, Ivano Malavolta, Gian Luca Scoccia, and Patricia Lago. 2018. An Extensible Approach for Taming the Challenges of JavaScript Dead Code Elimination. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, 391–401. https://doi.org/10.1109/SANER.2018.8330226
[27]
Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2–14. https://doi.org/10.1109/CGO.2019.8661201
[28]
Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. 2014. Dynamic Reconstruction of Relocation Information for Stripped Binaries. In International Workshop on Recent Advances in Intrusion Detection. 8688, Springer, 68–87. https://doi.org/10.1007/978-3-319-11379-1_4
[29]
Daejin Park, Min-Woo Jung, and Jeonghun Cho. 2017. Area Efficient Remote Code Execution Platform With On-Demand Instruction Manager For Cloud-Connected Code Executable IoT Devices. Simul. Model. Pract. Theory, 77 (2017), 379–389. https://doi.org/10.1016/j.simpat.2016.08.010
[30]
Matteo Perotti, Pasquale D Schiavone, Giuseppe Tagliavini, Davide Rossi, Tariq Kurd, Mark Hill, Liu Yingying, and Luca Benini. 2020. HW/SW Approaches for RISC-V Code Size Reduction. In Workshop on Computer Architecture Research with RISC-V (CARRV). 1–7. https://doi.org/10.3929/ethz-b-000461404
[31]
Rodrigo C. O. Rocha, Pavlos Petoumenos, Björn Franke, Pramod Bhatotia, and Michael F. P. O’Boyle. 2022. Loop Rolling for Code Size Reduction. In Proceedings of the 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 217–229. https://doi.org/10.1109/CGO53902.2022.9741256
[32]
Rodrigo C. O. Rocha, Pavlos Petoumenos, Zheng Wang, Murray Cole, Kim M. Hazelwood, and Hugh Leather. 2021. HyFM: Function Merging for Free. In Proceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES). ACM, 110–121. https://doi.org/10.1145/3461648.3463852
[33]
Rodrigo C. O. Rocha, Pavlos Petoumenos, Zheng Wang, Murray Cole, and Hugh Leather. 2019. Function Merging by Sequence Alignment. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 149–163. https://doi.org/10.1109/CGO.2019.8661174
[34]
Rodrigo C. O. Rocha, Pavlos Petoumenos, Zheng Wang, Murray Cole, and Hugh Leather. 2020. Effective Function Merging in the SSA Form. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI). ACM, 854–868. https://doi.org/10.1145/3385412.3386030
[35]
Greg Stiff and Frank Vahid. 2005. New Decompilation Techniques for Binary-Level Co-Processor Generation. In Proceedings of the 2005 International Conference on Computer-Aided Design (ICCAD). IEEE Computer Society, 547–554. https://doi.org/10.1109/ICCAD.2005.1560127
[36]
Sean Stirling, Rocha Rodrigo C. O., Kim M. Hazelwood, Hugh Leather, Michael F. P. O’Boyle, and Pavlos Petoumenos. 2022. F3M: Fast Focused Function Merging. In Proceedings of the 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 242–253. https://doi.org/10.1109/CGO53902.2022.9741269
[37]
Sriraman Tallam. 2019. Propeller: Profile Guided Optimizing Large Dcale LLVM-Based Relinker. https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf
[38]
Sriraman Tallam, Cary Coutant, Ian Lance Taylor, Xinliang David Li, and Chris Demetriou. 2010. Safe ICF: Pointer Safe and Unwinding Aware Identical Code Folding in Gold. In GCC Developers Summit. http://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=view&target=tallam.pdf
[39]
TensorFlow. 2021. Build TensorFlow Lite with CMake. https://www.tensorflow.org/lite/guide/build_cmake
[40]
TensorFlow. 2022. ML for Mobile and Edge Devices - TensorFlow Lite. https://www.tensorflow.org/lite
[41]
Theodoros Theodoridis, Tobias Grosser, and Zhendong Su. 2022. Understanding and Exploiting Optimal Function Inlining. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 977–989. https://doi.org/10.1145/3503222.3507744
[42]
Tobias J. K. Edler von Koch, Igor Böhm, and Björn Franke. 2010. Integrated Instruction Selection and Register Allocation For Compact Code Generation Exploiting Freeform Mixing of 16- and 32-bit Instructions. In Proceedings of the 8th International Symposium on Code Generation and Optimization (CGO). ACM, 180–189. https://doi.org/10.1145/1772954.1772980
[43]
Tobias J. K. Edler von Koch, Björn Franke, Pranav Bhandarkar, and Anshuman Dasgupta. 2014. Exploiting Function Similarity For Code Size Reduction. In Proceedings of the 2014 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES). ACM, 85–94. https://doi.org/10.1145/2597809.2597811
[44]
Andrew Wolfe and Alex Chanin. 1992. Executing Compressed Programs on an Embedded RISC Architecture. In Proceedings of the 25th Annual International Symposium on Microarchitecture. ACM / IEEE Computer Society, 81–91. https://doi.org/10.1109/MICRO.1992.697002
[45]
Xianhong Xu, Simon Jones, and Christopher T. Clarke. 2003. ARM/THUMB Code Compression for Embedded Systems. In Proceedings of the 12th IEEE International Conference on Fuzzy Systems. 32–35. https://doi.org/10.1109/ICM.2003.238300
[46]
Ruoyu Zhou and Timothy M. Jones. 2019. Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelisation. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 15–25. https://doi.org/10.1109/CGO.2019.8661196

Cited By

View all
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2024)EAtuner: Comparative Study of Evolutionary Algorithms for Compiler Auto-tuning2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580120(419-426)Online publication date: 8-May-2024
  • (2023)Loop Rerolling for Hardware DecompilationProceedings of the ACM on Programming Languages10.1145/35912377:PLDI(420-442)Online publication date: 6-Jun-2023

Index Terms

  1. RollBin: reducing code-size via loop rerolling at binary level

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LCTES 2022: Proceedings of the 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
    June 2022
    161 pages
    ISBN:9781450392662
    DOI:10.1145/3519941
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Binary Optimization
    2. Code-Size Reduction
    3. Loop Rerolling

    Qualifiers

    • Research-article

    Conference

    LCTES '22

    Acceptance Rates

    Overall Acceptance Rate 116 of 438 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
    • (2024)EAtuner: Comparative Study of Evolutionary Algorithms for Compiler Auto-tuning2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580120(419-426)Online publication date: 8-May-2024
    • (2023)Loop Rerolling for Hardware DecompilationProceedings of the ACM on Programming Languages10.1145/35912377:PLDI(420-442)Online publication date: 6-Jun-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media