Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture

Peter A. Rounce¹ &
Alberto F. De Souza²

79 Accesses
3 Altmetric
Explore all metrics

Abstract

Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1–4% that of a single processor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Olukotun, K., Hammond, L.: The Future of Microprocessors. ACM Queue, pp. 27–34, September 2005
Ungerer T., Robic B. and Silc J. (2002). Multithreaded processors. Comput. J. 45(3): 320–348
Article MATH Google Scholar
Schlansker M. and Rau B. (2000). EPIC: Explicitly parallel instruction processing. IEEE Computer 33: 37–45
Google Scholar
Ozer E. and Conte M. (2005). High-performance and low-cost dual-thread VLIW processor using weld architectural paradigm. IEEE Trans. Parallel Distribut. Syst. 16(12): 1132–1142
Article Google Scholar
Özer, E., Conte, T.M., Sharma, S.: Weld: a multithreading technique towards latency-tolerant VLIW processors. In: Proceedings of the 8th International Conference on High Performance Computing–HiPC 2001, Lecture Notes in Computer Science 2228, pp. 192–203, December 2001
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, Assoc. Comput. Mach., pp. 392–403 (1995)
Eggers S.J., Emer J.S., Levy H.M., Lo J.L., Stamm R.L. and Tullsen D.M. (1997). Simultaneous multithreading: a platform for next-generation processors. IEEE Micro. 17(5): 12–19
Article Google Scholar
Rau, B.R.: Dynamically scheduled VLIW processors. In: Proceedings of the 26th Annual International Symposium on Microarchitecture, pp. 80–92. Austin, Texas (1993)
Spadini, F., Fahs, B., Patel, S., Lumetta, S.S.: Improving quasi-dynamic schedules through region slip. In: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. ACM International Conference Proceeding Series, vol. 37, pp. 149–158. San Francisco, California (2003)
Nair, R., Hopkins, M.E.: Exploiting instruction level parallelism in processors by caching scheduled groups. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 13–25 (1997)
De Souza, A.F., Rounce, P.A.: Dynamically trace scheduled VLIW architectures. In: Proceedings of the High-performance Computing and Networking 1998–HPCN’98, Lecture Notes in Computer Science 1401, pp. 993–995, April 1998
De Souza, A.F.: Integer performance evaluation of the dynamically trace scheduled VLIW architecture. Ph.D. thesis, Department of Computer Science, University College London, University of London (1999)
De Souza A.F. (2000). Dynamically scheduling VLIW instructions. J. Parallel Distribut. Comput. 60(12): 1480–1511
Article MATH MathSciNet Google Scholar
De Souza, A.F.: Integer performance via block Compaction. In: Proceedings of the 13th Symposium on Computer Architecture and High Performance Computing, pp. 98–105 (2001)
Santana, S.C., De Souza, A.F., Rounce, P.A.: A comparative analysis between EPIC static instruction scheduling and DTSVLIW dynamic instruction scheduling. In: Proceedings of the ICS 03 Workshop on Exploring the Trace Space for Dynamic Optimization Techniques, International Conference on Supercomputing, San Francisco, ACM SIGARCH, June 22–26, 2003
Rounce, P.A., De Souza, A.F.: The mDTSVLIW: a multi-threaded trace-based VLIW architecture, sbac-pad. In: 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’06), pp. 63–72 (2006)
Fisher J.A. (1984). The VLIW machine: a multiprocessor for compiling scientific code. IEEE Computer 17(7): 45–53
Google Scholar
Hwu W.W., Mahlke S.A., Chen W.Y., Chang P.P., Warter N.J., Bringmann R.A., Ouellette R.G., Hank R.E., Kiyohara T., Haab G.E., Holm J.G. and Lavery D.M. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput. 7: 229–248
Article Google Scholar
Sun Microsystems: The Sparc Architecture Manual—Version 7. Sun Microsystems, Inc. (1987)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
Peter A. Rounce
Departamento de Informática, Universidade Federal do Espírito Santo, Av. Fernando Ferrari, 514, Vitoria, 29075-910, ES, Brazil
Alberto F. De Souza

Authors

Peter A. Rounce
View author publications
You can also search for this author in PubMed Google Scholar
Alberto F. De Souza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter A. Rounce.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rounce, P., De Souza, A. Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture. Int J Parallel Prog 36, 184–205 (2008). https://doi.org/10.1007/s10766-007-0062-1

Download citation

Received: 01 November 2006
Accepted: 02 April 2007
Published: 24 January 2008
Issue Date: April 2008
DOI: https://doi.org/10.1007/s10766-007-0062-1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting Threaded Traces in Simulation Environments

Performance Modelling and Dynamic Scheduling on Heterogeneous-ISA Multi-core Architectures

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Extracting Threaded Traces in Simulation Environments

Performance Modelling and Dynamic Scheduling on Heterogeneous-ISA Multi-core Architectures

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation