Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

PROMPT: A Fast and Extensible Memory Profiling Framework

Published: 29 April 2024 Publication History

Abstract

Memory profiling captures programs’ dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT’s impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.

References

[1]
Abseil Team. 2023. Abseil/Abseil-CPP: Abseil Common Libraries (C++). https://github.com/abseil/abseil-cpp
[2]
Sotiris Apostolakis, Ziyang Xu, Greg Chan, Simone Campanoni, and David I. August. 2020. Perspective: A Sensible Approach to Speculative Automatic Parallelization. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 351–367. isbn:978-1-4503-7102-5 https://doi.org/10.1145/3373376.3378458
[3]
Sotiris Apostolakis, Ziyang Xu, Zujun Tan, Greg Chan, Simone Campanoni, and David I. August. 2020. SCAF: a speculation-aware collaborative dependence analysis framework. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 638–654. isbn:978-1-4503-7613-6 https://doi.org/10.1145/3385412.3386028
[4]
Matthew Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, and David August. 2007. Revisiting the Sequential Programming Model for Multi-Core. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 69–84. isbn:978-0-7695-3047-5 https://doi.org/10.1109/MICRO.2007.20
[5]
Derek Bruening, Qin Zhao, and Saman Amarasinghe. 2012. Transparent dynamic instrumentation. 47, 7 (2012), 133–144. issn:0362-1340, 1558-1160 https://doi.org/10.1145/2365864.2151043
[6]
James Bucek, Klaus-Dieter Lange, and Jóakim V. Kistowski. 2018. SPEC CPU2017: Next-Generation Compute Benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. ACM, 41–42. isbn:978-1-4503-5629-9 https://doi.org/10.1145/3185768.3185771
[7]
Dehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: automatic feedback-directed optimization for warehouse-scale applications. In Proceedings of the 2016 International Symposium on Code Generation and Optimization. ACM, 12–23. isbn:978-1-4503-3778-6 https://doi.org/10.1145/2854038.2854044
[8]
Tong Chen, Jin Lin, Xiaoru Dai, Wei-Chung Hsu, and Pen-Chung Yew. 2004. Data Dependence Profiling for Speculative Optimizations. In Compiler Construction, Evelyn Duesterwald (Ed.). 2985, Springer Berlin Heidelberg, 57–72. isbn:978-3-540-21297-3 978-3-540-24723-4 https://doi.org/10.1007/978-3-540-24723-4_5
[9]
D. A. Connors. 1997. Memory Profiling for Directing Data Speculative Optimizations and Scheduling. http://impact.crhc.illinois.edu/Shared/Thesis/dconnors-thesis.pdf
[10]
Albert Danial. 2021. cloc: v1.92. https://github.com/AlDanial/cloc
[11]
Enrico Armenio Deiana, Brian Suchy, Michael Wilkins, Brian Homerding, Tommy McMichen, Katarzyna Dunajewski, Peter Dinda, Nikos Hardavellas, and Simone Campanoni. 2023. Program State Element Characterization. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization. ACM, 199–211. isbn:9798400701016 https://doi.org/10.1145/3579990.3580011
[12]
DynamoRio Team. 2023. drcachesim. https://dynamorio.org/page_drcachesim.html Publication Title: Tracing and analysis framework
[13]
F. Gabbay and A. Mendelson. 1997. Can program profiling support value prediction? In Proceedings of 30th Annual International Symposium on Microarchitecture. IEEE Comput. Soc, 270–280. isbn:978-0-8186-7977-3 https://doi.org/10.1109/MICRO.1997.645817
[14]
GCC Team. 2023. GCC, the GNU compiler collection. https://gcc.gnu.org/
[15]
Gregory Popovitch. 2023. GREG7MDP/parallel-hashmap: A family of header-only, very fast and memory-friendly hashmap and BTREE containers. https://github.com/greg7mdp/parallel-hashmap
[16]
Thomas B Jablin, Yun Zhang, James A Jablin, Jialu Huang, Hanjun Kim, and David I August. 2010. Liberty queues for epic architectures. In Proceedings of the Eigth Workshop on Explicitly Parallel Instruction Computer Architectures and Compiler Technology (EPIC). https://liberty.princeton.edu/Publications/epic10_queues.pdf
[17]
Nick P. Johnson, Hanjun Kim, Prakash Prabhu, Ayal Zaks, and David I. August. 2012. Speculative separation for privatization and reductions. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 359–370. isbn:978-1-4503-1205-9 https://doi.org/10.1145/2254064.2254107
[18]
Alain Ketterlin and Philippe Clauss. 2012. Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 437–448. isbn:978-1-4673-4819-5 978-0-7695-4924-8 https://doi.org/10.1109/MICRO.2012.47
[19]
Changsu Kim, Juhyun Kim, Juwon Kang, Jae W. Lee, and Hanjun Kim. 2017. Context-Aware Memory Profiling for Speculative Parallelism. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC). IEEE, 328–337. isbn:978-1-5386-2293-3 https://doi.org/10.1109/HiPC.2017.00045
[20]
Minjang Kim, Hyesoon Kim, and Chi-Keung Luk. 2010. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 535–546. isbn:978-1-4244-9071-4 https://doi.org/10.1109/MICRO.2010.49
[21]
Rakesh Krishnaiyer, Emre Kultursay, Pankaj Chawla, Serguei Preis, Anatoly Zvezdin, and Hideki Saito. 2013. Compiler-Based Data Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. IEEE, 1575–1586. isbn:978-0-7695-4979-8 https://doi.org/10.1109/IPDPSW.2013.231
[22]
J.R. Larus. 1993. Loop-level parallelism in numeric and symbolic programs. 4, 7 (1993), 812–826. issn:10459219 https://doi.org/10.1109/71.238302
[23]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 75–86. isbn:978-0-7695-2102-2 https://doi.org/10.1109/CGO.2004.1281665
[24]
Liberty Research Group. 2022. Collaborative Parallelization Framework Compiler. https://github.com/PrincetonUniversity/cpf
[25]
Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. 2006. POSH: a TLS compiler that exploits program structure. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 158–167. isbn:978-1-59593-189-4 https://doi.org/10.1145/1122971.1122997
[26]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. 40, 6 (2005), 190–200. issn:0362-1340, 1558-1160 https://doi.org/10.1145/1064978.1065034
[27]
Thomas Mason. 2009. Lampview: A loop-aware toolset for facilitating parallelization. https://liberty.princeton.edu/Publications/mastersthesis_tmason.pdf
[28]
Nicolas Morew, Mohammad Norouzi, Ali Jannesari, and Felix Wolf. 2020. Skipping Non-essential Instructions Makes Data-Dependence Profiling Faster. In Euro-Par 2020: Parallel Processing, Maciej Malawski and Krzysztof Rzadca (Eds.). 12247, Springer International Publishing, 3–17. isbn:978-3-030-57674-5 978-3-030-57675-2 https://doi.org/10.1007/978-3-030-57675-2_1
[29]
Tipp Moseley, Alex Shye, Vijay Janapa Reddi, Dirk Grunwald, and Ramesh Peri. 2007. Shadow Profiling: Hiding Instrumentation Costs with Parallelism. In International Symposium on Code Generation and Optimization (CGO’07). IEEE, 198–208. isbn:978-0-7695-2764-2 https://doi.org/10.1109/CGO.2007.35
[30]
mTrace Team. 2013. MTRACE. http://lacasa.uah.edu/index.php/software-data/mtrace-tools-and-traces
[31]
Nicholas Nethercote and Julian Seward. 2007. How to shadow every byte of memory used by a program. In Proceedings of the 3rd international conference on Virtual execution environments. ACM, 65–74. isbn:978-1-59593-630-1 https://doi.org/10.1145/1254810.1254820
[32]
Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 89–100. isbn:978-1-59593-633-2 https://doi.org/10.1145/1250734.1250746
[33]
Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2–14. isbn:978-1-72811-436-1 https://doi.org/10.1109/CGO.2019.8661201
[34]
Arun Kejariwal Peng Wu and Calin Cascaval. 2008. Compiler-Driven Dependence Profiling to Guide Program Parallelization. In LCPC. 232–248. https://doi.org/10.1007/978-3-540-89740-8_16
[35]
PROMPT Team. 2024. PROMPT memory profiling system. https://github.com/PrincetonUniversity/PROMPT
[36]
Qiang Wu, A. Pyatakov, A. Spiridonov, E. Raman, D.W. Clark, and D.I. August. 2004. Exposing memory access regularities using object-relative memory profiling. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. IEEE, 315–323. isbn:978-0-7695-2102-2 https://doi.org/10.1109/CGO.2004.1281684
[37]
Ram Rangan and David I August. 2006. Amortizing software queue overhead for pipelined interthread communication. In Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism (PMUP). 1–5. https://liberty.princeton.edu/Publications/pmup06_pmtsync.pdf
[38]
L. Rauchwerger and D.A. Padua. 1999. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. 10, 2 (1999), 160–180. issn:10459219 https://doi.org/10.1109/71.752782
[39]
Thomas Reps and Todd Turnidge. 1996. Program specialization via program slicing. In Partial Evaluation, Olivier Danvy, Robert Glück, and Peter Thiemann (Eds.). 1110, Springer Berlin Heidelberg, 409–429. isbn:978-3-540-61580-4 978-3-540-70589-5 https://doi.org/10.1007/3-540-61580-6_20
[40]
Yukinori Sato, Yasushi Inoguchi, and Tadao Nakamura. 2012. Whole program data dependence profiling to unveil parallel regions in the dynamic execution. In 2012 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 69–80. isbn:978-1-4673-4532-3 978-1-4673-4531-6 https://doi.org/10.1109/IISWC.2012.6402902
[41]
Ulrik P. Schultz, Julia L. Lawall, and Charles Consel. 2003. Automatic program specialization for Java. 25, 4 (2003), 452–499. issn:0164-0925, 1558-4593 https://doi.org/10.1145/778559.778561
[42]
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In 2012 USENIX annual technical conference (USENIX ATC 12). https://www.usenix.org/conference/ usenixfederatedconferencesweek/addresssanitizer-fast-address- sanity-checker
[43]
J. Greggory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry. 2000. A scalable approach to thread-level speculation. 28, 2 (2000), 1–12. issn:0163-5964 https://doi.org/10.1145/342001.339650
[44]
Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast detector of uninitialized memory use in C++. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 46–55. isbn:978-1-4799-8161-8 https://doi.org/10.1109/CGO.2015.7054186
[45]
K. Swaminathan, G. Lakshminarayanan, and Seok-Bum Ko. 2012. High Speed Generic Network Interface for Network on Chip Using Ping Pong Buffers. In 2012 International Symposium on Electronic System Design (ISED). IEEE, 72–76. isbn:978-1-4673-4704-4 978-0-7695-4902-6 https://doi.org/10.1109/ISED.2012.11
[46]
Jakub Szuppe. 2016. Boost.Compute: A parallel computing library for C++ based on OpenCL. In Proceedings of the 4th International Workshop on OpenCL. ACM, 1–39. isbn:978-1-4503-4338-1 https://doi.org/10.1145/2909437.2909454
[47]
Sriraman Tallam and Rajiv Gupta. 2007. Unified control flow and data dependence traces. 4, 3 (2007), 19. issn:1544-3566, 1544-3973 https://doi.org/10.1145/1275937.1275943
[48]
William Thies, Vikram Chandrasekhar, and Saman Amarasinghe. 2007. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 356–369. isbn:978-0-7695-3047-5 https://doi.org/10.1109/MICRO.2007.38
[49]
Rajeshwar Vanka and James Tuck. 2012. Efficient and accurate data dependence profiling using software signatures. In Proceedings of the Tenth International Symposium on Code Generation and Optimization. ACM, 186–195. isbn:978-1-4503-1206-6 https://doi.org/10.1145/2259016.2259041
[50]
Steven Wallace and Kim Hazelwood. 2007. SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance. In International Symposium on Code Generation and Optimization (CGO’07). IEEE, 209–220. isbn:978-0-7695-2764-2 https://doi.org/10.1109/CGO.2007.37
[51]
Mingzhe Wang, Jie Liang, Chijin Zhou, Zhiyong Wu, Xinyi Xu, and Yu Jiang. 2022. Odin: on-demand instrumentation with on-the-fly recompilation. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ACM, 1010–1024. isbn:978-1-4503-9265-5 https://doi.org/10.1145/3519939.3523428
[52]
Xiangyu Zhang and R. Gupta. 2004. Whole Execution Traces. In 37th International Symposium on Microarchitecture (MICRO-37’04). IEEE, 105–116. isbn:978-0-7695-2126-8 https://doi.org/10.1109/MICRO.2004.37
[53]
Ziyang Xu, Yebin Chon, Yian Su, Zujun Tan, Sotiris Apostolakis, Simone Campanoni, and David August. 2024. Artifact for Paper "PROMPT: A Fast and Extensible Memory Profiling Framework". https://doi.org/10.5281/zenodo.10783906
[54]
Hongtao Yu and Zhiyuan Li. 2012. Fast loop-level data dependence profiling. In Proceedings of the 26th ACM international conference on Supercomputing. ACM, 37–46. isbn:978-1-4503-1316-2 https://doi.org/10.1145/2304576.2304584
[55]
Hongtao Yu and Zhiyuan Li. 2012. Multi-slicing: a compiler-supported parallel approach to data dependence profiling. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. ACM, 23–33. isbn:978-1-4503-1454-1 https://doi.org/10.1145/2338965.2336756
[56]
Xiangyu Zhang, Armand Navabi, and Suresh Jagannathan. 2009. Alchemist: A Transparent Dependence Distance Profiling Infrastructure. In 2009 International Symposium on Code Generation and Optimization. IEEE, 47–58. isbn:978-0-7695-3576-0 https://doi.org/10.1109/CGO.2009.15
[57]
Qin Zhao, Derek Bruening, and Saman Amarasinghe. 2010. Efficient memory shadowing for 64-bit architectures. 45, 8 (2010), 93–102. issn:0362-1340, 1558-1160 https://doi.org/10.1145/1837855.1806667
[58]
Qin Zhao, Joon Edward Sim, Weng-Fai Wong, and Larry Rudolph. 2006. DEP: detailed execution profile. In Proceedings of the 15th international conference on Parallel architectures and compilation techniques. ACM, 154–163. isbn:978-1-59593-264-8 https://doi.org/10.1145/1152154.1152180

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue OOPSLA1
April 2024
1492 pages
EISSN:2475-1421
DOI:10.1145/3554316
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024
Published in PACMPL Volume 8, Issue OOPSLA1

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. compiler optimizations
  2. memory profiling
  3. profiler framework

Qualifiers

  • Research-article

Funding Sources

  • NSF (National Science Foundation)
  • DOE U.S. Department of Energy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 359
    Total Downloads
  • Downloads (Last 12 months)359
  • Downloads (Last 6 weeks)106
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media