Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

Published: 05 September 2013 Publication History

Abstract

This article presents a scheme for managing heap data in the local memory present in each core of a limited local memory (LLM) multicore architecture. Although managing heap data semi-automatically with software cache is feasible, it may require modifications of other thread codes. Crossthread modifications are very difficult to code and debug, and will become more complex and challenging as we increase the number of cores. In this article, we propose an intuitive programming interface, which is an automatic and scalable scheme for heap data management. Besides, for embedded applications, where the maximum heap size can be profiled, we propose several optimizations on our heap management to significantly decrease the library overheads. Our experiments on several benchmarks from MiBench executing on the Sony Playstation 3 show that our scheme is natural to use, and if we know the maximum size of heap data, our optimizations can improve application performance by an average of 14%.

References

[1]
Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM, New York, NY, 259--267.
[2]
Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Sys. 1, 1, 6--26.
[3]
Bai, K., Lu, D., and Shrivastava, A. 2011a. Vector class on limited local memory (LLM) multi-core processors. In Proceedings of the 14th International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 215--224.
[4]
Bai, K., Shrivastava, A., and Kudchadker, S. 2011b. Stack data management for limited local memory (LLM) multi-core processors. In Proceedings of the International Conference on Application Specific Systems, Architectures and Processors. 231--234.
[5]
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. ACM, New York, NY, 73--78.
[6]
Che, W. and Chatha, K. 2011a. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In Proceedings of the 48th Design Automation Conference. ACM, New York, NY, 122--127.
[7]
Che, W. and Chatha, K. 2011b. Scheduling of stream programs onto spm enhanced processors with code overlay. In Proceedings of the 9th IEEE/ACM Symposium on Embedded Systems and Real-Time Multimedia.
[8]
Che, W. and Chatha, K. S. 2010. Scheduling of synchronous data flow models on scratchpad memory based embedded processors. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). 205--212.
[9]
Che, W., Panda, A., and Chatha, K. S. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, Belgium, 1118--1123.
[10]
Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Embed. Comput. 1, 4, 521--540.
[11]
Egger, B., Kim, C., Jang, C., Nam, Y., Lee, J., and Min, S. L. 2006a. A dynamic code placement technique for scratchpad memory using postpass optimization. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. ACM, New York, NY, 223--233.
[12]
Egger, B., Lee, J., and Shin, H. 2006b. Scratchpad memory management for portable systems with a memory management unit. In Proceedings of the 6th ACM & IEEE International Conference on Embedded Software. ACM, New York, NY, 321--330.
[13]
Eichenberger, A., O'Brien, J. K., O'Brien, K. M., Wu, P., Chen, T., Oden, P. H., Prener, D. A., Shepard, J. C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M. K., Archambault, R., Gao, Y., and Koo, R. 2006. Using advanced compiler technology to exploit the performance of the cell broadband engineTM architecture. IBM Syst. J. 45, 1, 59--84.
[14]
Flachs, B., Asano, S., Dhong, S., Hofstee, H., Gervais, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H.-J., Mueller, S., Takahashi, O., Hatakeyama, A., Watanabe, Y., Yano, N., Brokenshire, D., Peyravian, M., To, V., and Iwata, E. 2006. The microarchitecture of the synergistic processor for a cell processor. IEEE Solid-State Circuits 41, 1, 63--70.
[15]
Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st Annual Design Automation Conference. ACM, New York, NY, 238--243.
[16]
Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14.
[17]
Janapsatya, A., Ignjatović, A., and Parameswaran, S. 2006. A novel instruction scratchpad memory optimization method based on concomitance metric. In Proceedings of the Conference on Asia South Pacific Design Automation. IEEE Press, Piscataway, NJ, 612--617.
[18]
Jung, S. C., Shrivastava, A., and Bai, K. 2010. Dynamic code mapping for limited local memory systems. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors. 13--20.
[19]
Kandemir, M., Ramanujam, J., and Choudhary, A. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the 39th Annual Design Automation Conference. ACM, New York, NY, 219--224.
[20]
Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Annual Design Automation Conference. ACM, New York, NY, 690--695.
[21]
Kannan, A., Shrivastava, A., Pabalkar, A., and Lee, J.-E. 2009. A software solution for dynamic stack management on scratch pad memory. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE Press, Piscataway, NJ, 612--617.
[22]
Li, L., Gao, L., and Xue, J. 2005. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Washington, DC, 329--338.
[23]
McIlroy, R., Dickman, P., and Sventek, J. 2008. Efficient dynamic heap allocation of scratch-pad memory. In Proceedings of the 7th International Symposium on Memory Management. ACM Press, New York, NY, 31--40.
[24]
Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. 1997. A 160-mhz, 32-b, 0.5-w CMOS RISC microprocessor. Digital Tech. J. 9, 1, 49--62.
[25]
Nguyen, N., Dominguez, A., and Barua, R. 2005. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, New York, NY, 115--125.
[26]
Pabalkar, A., Shrivastava, A., Kannan, A., and Lee, J. 2008. SDRM: Simultaneous determination of regions and function-to-region mapping for scratchpad memories. In Proceedings of the International Conference on High Performance Computing (HiPC).
[27]
Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto onchip memory. In Proceedings of the 15th International Symposium on System Synthesis. ACM, New York, NY, 213--218.
[28]
Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, Los Alamitos, CA, 409.
[29]
Udayakumaran, S., Dominguez, A., and Barua, R. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. Trans. Embed. Comput. Sys. 5, 2, 472--511.
[30]
Verma, M. and Marwedel, P. Aug. 2006. Overlay techniques for scratchpad memories in low power embedded processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 14, 8, 802--815.
[31]
Verma, M., Petzold, K., Wehmeyer, L., Falk, H., and Marwedel, P. 2005. Scratchpad sharing strategies for multiprocess embedded systems: A first approach. In Proceedings of the 3rd Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia). 115--120.
[32]
Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. Vol. 2. IEEE Computer Society, Washington, DC, 21264.

Cited By

View all
  • (2015)Efficient Code Assignment Techniques for Local Memory on Software Managed MulticoresACM Transactions on Embedded Computing Systems10.1145/273803914:4(1-24)Online publication date: 8-Dec-2015
  • (2014)Construction of GCCFG for inter-procedural optimizations in software managed manycore (SMM) architecturesProceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems10.1145/2656106.2656122(1-10)Online publication date: 12-Oct-2014

Index Terms

  1. A software-only scheme for managing heap data on limited local memory(LLM) multicore processors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 13, Issue 1
    August 2013
    332 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2501626
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 05 September 2013
    Accepted: 01 January 2012
    Revised: 01 July 2011
    Received: 01 November 2010
    Published in TECS Volume 13, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Heap data
    2. IBM Cell BE
    3. MPI
    4. embedded systems
    5. local memory
    6. multicore processor
    7. scratch pad memory

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Efficient Code Assignment Techniques for Local Memory on Software Managed MulticoresACM Transactions on Embedded Computing Systems10.1145/273803914:4(1-24)Online publication date: 8-Dec-2015
    • (2014)Construction of GCCFG for inter-procedural optimizations in software managed manycore (SMM) architecturesProceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems10.1145/2656106.2656122(1-10)Online publication date: 12-Oct-2014

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media