Abstract
Modern and future server-class processors will incorporate many cores. Some studies have suggested that it may be worthwhile to dedicate some of the many cores for specific tasks such as operating system execution. OS off-loading has two main benefits: improved performance due to better cache utilization and improved power efficiency due to smarter use of heterogeneous cores. However, OS off-loading is a complex process that involves balancing the overheads of off-loading against the potential benefit, which is unknown while making the off-loading decision. In prior work, OS off-loading has been implemented by first profiling system call behavior and then manually instrumenting some OS routines (out of hundreds) to support off-loading. We propose a hardware-based mechanism to help automate the off-load decision-making process, and provide high quality dynamic decisions via performance feedback. Our mechanism dynamically estimates the off-load requirements of the application and relies on a run-length predictor for the upcoming OS system call invocation. The resulting hardware based off-loading policy yields a throughput improvement of up to 18% over a baseline without off-loading, 13% over a static software based policy, and 23% over a dynamic software based policy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The SPARC Architecture Manual Version 9, http://www.sparc.org/standards/SPARCV9.pdf
Agarwal, A., Hennessy, J., Horowitz, M.: Cache Performance of Operating System and Multiprogramming Workloads. ACM Trans. Comput. Syst. 6(4), 393–431 (1988)
Albayraktaroglu, K., Jaleel, A., Wu, X., Franklin, M., Jacob, B., Tseng, C.W., Yeung, D.: BioBench: A Benchmark Suite of Bioinformatics Applications. In: Proceedings of ISPASS (2005)
Anderson, T.E., Levy, H.M., Bershad, B.N., Lazowska, E.D.: The Interaction of Architecture and Operating System Design. In: Proceedings of ASPLOS (1991)
Balasubramonian, R., Dwarkadas, S., Albonesi, D.: Dynamically Managing the Communication-Parallelism Trade-Off in Future Clustered Processors. In: Proceedings of ISCA-30, pp. 275–286 (June 2003)
Barroso, L., Holzle, U.: The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, San Francisco (2009)
Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schupbach, A., Singhania, A.: The Multikernel: A new OS architecture for scalable multicore systems. In: Proceedings of SOSP (October 2009)
Benia, C., et al.: The PARSEC Benchmark Suite: Characterization and Architectural Implications. Tech. rep., Department of Computer Science, Princeton University (2008)
Brown, J.A., Tullsen, D.M.: The Shared-Thread Multiprocessor. In: Proceedings of ICS (2008)
Chakraborty, K., Wells, P.M., Sohi, G.S.: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-Fly. In: Proceedings of ASPLOS (2006)
Gloy, N., Young, C., Chen, J.B., Smith, M.D.: An Analysis of Dynamic Branch Prediction Schemes on System Workloads. In: Proceedings of ISCA (1996)
Henning, J.L.: SPEC CPU2006 Benchmark Descriptions. In: Proceedings of ACM SIGARCH Computer Architecture News (2005)
Hunt, G., Larus, J.: Singularity: rethinking the software stack. Operating Systems Review (2007)
Li, T., John, L., Sivasubramaniam, A., Vijaykrishnan, N., Rubio, J.: Understanding and Improving Operating System Effects in Control Flow Prediction. Operating Systems Review (December 2002)
Li, T., John, L.K.: Operating System Power Minimization through Run-time Processor Resource Adaptation. IEEE Microprocessors and Microsystems 30, 189–198 (2006)
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A Full System Simulation Platform. IEEE Computer 35(2), 50–58 (2002)
Mogul, J., Mudigonda, J., Binkert, N., Ranganathan, P., Talwar, V.: Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems. IEEE Micro (May-June 2008)
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In: Proceedings of MICRO (2007)
Nellans, D., Balasubramonian, R., Brunvand, E.: A Case for Increased Operating System Support in Chip Multi-Processors. In: Proceedings of the 2nd IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (September 2005)
Nellans, D., Balasubramonian, R., Brunvand, E.: OS Execution on Multi-Cores: Is Out-Sourcing Worthwhile? ACM Operating System Review (April 2009)
Redstone, J., Eggers, S.J., Levy, H.M.: An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture. In: Proceedings of ASPLOS (2000)
Strong, R., Mudigonda, J., Mogul, J., Binkert, N., Tullsen, D.: Fast Switching of Threads Between Cores. Operating Systems Review (April 2009)
U.S. Environmental Protection Agency - Energy Star Program: Report To Congress on Server and Data Center Energy Efficiency - Public Law 109-431 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nellans, D., Sudan, K., Brunvand, E., Balasubramonian, R. (2011). Improving Server Performance on Multi-cores via Selective Off-Loading of OS Functionality. In: Varbanescu, A.L., Molnos, A., van Nieuwpoort, R. (eds) Computer Architecture. ISCA 2010. Lecture Notes in Computer Science, vol 6161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24322-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-24322-6_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24321-9
Online ISBN: 978-3-642-24322-6
eBook Packages: Computer ScienceComputer Science (R0)