Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/996841.996876acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Balancing register allocation across threads for a multithreaded network processor

Published: 09 June 2004 Publication History

Abstract

Modern network processors employ multi-threading to allow concurrency amongst multiple packet processing tasks. We studied the properties of applications running on the network processors and observed that their imbalanced register requirements across different threads at different program points could lead to poor performance. Many times application needs demand some threads to be more performance critical than others and thus by controlling the register allocation across threads one could impact the performance of the threads and get the desired performance properties for concurrent threads. This prompts our work.Our register allocator aims to distribute available registers to different threads according to their needs. The compiler analyzes the register needs of each thread both at the point of a context switch as well as internally. Compiler then designates some registers as shared and some as private to each thread. Shared registers are allocated across all threads explicitly by the compiler. Values that are live across a context switch can not be kept in shared registers due to safety reasons; thus, only those live ranges that are internal to the context switch can be safely allocated to shared registers. Spill can cause a context switch. and thus, the problems of context switch and allocation are closely coupled and we propose a solution to this problem. The proposed interference graphs (GIG,BIG,IIG) distinguish variables that must use a thread's private registers from those that can use shared registers. We first estimate the register requirement bounds, then reduce from the upper bound gradually to achieve a good register balance among threads. To reduce the register needs, move insertions are inserted at program points that split the live ranges or the nodes on the interference graph. We show that the lower bound is reachable via live range splitting and is adequate for our benchmark programs for simultaneously assigning them on different threads. As our objective, the number of move instructions is minimized.Empirical results show that the compiler is able to effectively control the register allocation across threads by maximizing the number of shared registers. Speed-up for performance critical threads ranges from 18 to 24% whereas degradation for performance of non-critical threads ranges only from 1 to 4%.

References

[1]
Feliks J. Welfeld "Network processing in content inspection applications," In Proceedings of International Symposium on Systems Synthesis, Sep. 2001.
[2]
T.Spalink, S.Karlin, L.Peterson, Y.Gottlieb, "Building a Robust Software-Based Router Using Network Processors," In Proceedings of ACM Symposium on Operating Systems Principles, Oct. 2001.
[3]
J. Wagner and R. Leupers, "C Compiler Design for an Industrial Network Processor," In Proceedings of ACM SIGPLAN Conference on Languages, Compiler, and Tools for Embedded Systems, Jun. 2001.
[4]
J.Kim, S.Jung, Y.Park, "Experience with a Retargetable Compiler for a Commercial Network Processor," In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems, Oct. 2002.
[5]
J. Liu, T. Kong, and F. Chow. "Effective compilation support for variable instruction set architecture", In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Sep. 2002.
[6]
T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to algorithms, MIT Press, 1989
[7]
C.H. Papadimitriou, K. Steiglitz, Combinatorial optimization Algorithms and Complexity, Dover Publications INC, 1998.
[8]
A.V. Aho, R. Sethi, J.D. Ullman, Compilers Principles, Techniques and Tools, Addison-Wesley, Reading, MA, 1986
[9]
S.S. Muchnick, Advanced Compiler Design and Imple-mentation, Morgan Kaufman, 1997.
[10]
J. Ferrante, K.J. Ottenstein, J.D. Warren, "The program dependence graph and its use in optimization", ACM Transactions on Programming Languages and Systems, Jul. 1987.
[11]
Fred C. Chow and John L. Hennessy, "The priority-based coloring approach to register allocation", ACM Transactions on Programming Languages and Systems, Oct. 1990.
[12]
L. George and A. Appel. "Iterated Register Coalescing," ACM Transactions on Programming Languages and Systems, May 1996.
[13]
G.J. Chaitin, "Register allocation and spilling via graph coloring", In Proceedings of International Conference on Compiler Construction, Jun. 1982.
[14]
G.J. Chaitin, M.A. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, P. Markstein, "Register allocation via coloring", Computer Language. Vol.6, pp.47--57, Jan. 1981.
[15]
T. Wolf and M. Franklin, "CommBench - A Telecommunication Benchmark for Network Processors", In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, Apr. 2000.
[16]
G.Memik, W.H.Mangione-Smith, W. Hu., "NetBench: A Benchmarking Suite for Network Processors", In Proceedings of the International Conference on Computer Aided Design, Nov. 2001.
[17]
Rajiv Gupta, Mary Lou Soffa, and Tim Steele, "Register allocation via clique separators", In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 1989.
[18]
Xiaotong Zhuang, Jian Liu, "WRAPS Scheduling and Its Efficient Implementation on Network Processors", In Proceedings of the 9th International Conference on High Performance Computing, Dec. 2002.
[19]
L. George, M. Blume, "Taming the IXP Network Processor", In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2003.
[20]
V. Barthelmann, "Inter-Task Register-Allocation for Static Operating Systems", In Proceedings of ACM SIGPLAN Conference on Languages, Compiler, and Tools for Embedded Systems, Jun. 2002.
[21]
http://www.cs.purdue.edu/np/npc.html
[22]
Timothy Sherwood, George Varghese, and Brad Calder, "A Pipelined Memory Architecture for High Throughput Network Processors," In Proceedings of the 30th Annual Intl. Symposium on Computer Architecture, Jun. 2003.
[23]
David Callahan, Brian Koblenz, "Register allocation via hierarchical graph coloring," In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 1991.
[24]
Dirk Grunwald, Rich Neves, "Whole-Program Optimization for Time and Space Efficient Threads," In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1996.

Cited By

View all
  • (2011)Compiler-Supported Thread Management for Multithreaded Network ProcessorsACM Transactions on Embedded Computing Systems10.1145/2043662.204366810:4(1-31)Online publication date: 1-Nov-2011
  • (2010)Balanced bipartite graph based register allocation for network processors in mobile and wireless networksMobile Information Systems10.1155/2010/9861926:1(65-83)Online publication date: 1-Jan-2010
  • (2009)A Register Framework for Network Processors with Banked Register File2009 International Conference on Complex, Intelligent and Software Intensive Systems10.1109/CISIS.2009.120(607-613)Online publication date: Mar-2009
  • Show More Cited By

Index Terms

  1. Balancing register allocation across threads for a multithreaded network processor

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
      June 2004
      310 pages
      ISBN:1581138075
      DOI:10.1145/996841
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 39, Issue 6
        PLDI '04
        May 2004
        299 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/996893
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 June 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. multithreaded processor
      2. network processor
      3. register allocation

      Qualifiers

      • Article

      Conference

      PLDI04
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 406 of 2,067 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2011)Compiler-Supported Thread Management for Multithreaded Network ProcessorsACM Transactions on Embedded Computing Systems10.1145/2043662.204366810:4(1-31)Online publication date: 1-Nov-2011
      • (2010)Balanced bipartite graph based register allocation for network processors in mobile and wireless networksMobile Information Systems10.1155/2010/9861926:1(65-83)Online publication date: 1-Jan-2010
      • (2009)A Register Framework for Network Processors with Banked Register File2009 International Conference on Complex, Intelligent and Software Intensive Systems10.1109/CISIS.2009.120(607-613)Online publication date: Mar-2009
      • (2007)Code Compilation for an Explicitly Parallel Register-Sharing ArchitectureProceedings of the 2007 International Conference on Parallel Processing10.1109/ICPP.2007.24Online publication date: 10-Sep-2007
      • (2006)Compiler assisted dynamic management of registers for network processorsProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898953.1898987(53-53)Online publication date: 25-Apr-2006
      • (2006)An interprocedural code optimization technique for network processors using hardware multi-threading supportProceedings of the conference on Design, automation and test in Europe: Proceedings10.5555/1131481.1131739(919-924)Online publication date: 6-Mar-2006
      • (2006)Effective thread management on network processors with compiler analysisACM SIGPLAN Notices10.1145/1159974.113466241:7(72-82)Online publication date: 14-Jun-2006
      • (2006)Effective thread management on network processors with compiler analysisProceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems10.1145/1134650.1134662(72-82)Online publication date: 14-Jun-2006
      • (2006)Compiler assisted dynamic management of registers for network processorsProceedings 20th IEEE International Parallel & Distributed Processing Symposium10.1109/IPDPS.2006.1639291(10 pp.)Online publication date: 2006
      • (2006)An Interprocedural Code Optimization Technique for Network Processors Using Hardware Multi-Threading SupportProceedings of the Design Automation & Test in Europe Conference10.1109/DATE.2006.243808(1-6)Online publication date: 2006
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media