Abstract
Simulation results are presented using the hardware-implemented, trace-based dynamic instruction scheduler of our single process DTSVLIW architecture to schedule instructions from several processes into multiple streams of VLIW instructions for execution by a wide-issue, simultaneous multi-threading (SMT) execution engine. The scheduling process involves single instruction execution of each process, dynamically scheduling executed instructions into blocks of VLIW instructions cached for subsequent SMT execution: SMT provides a mechanism to reduce the impact of horizontal and vertical waste, and variable memory latencies, seen in the DTSVLIW. Preliminary experiments explore this extended model. Results achieve PE utilization of up to 87% on a 4-thread, 1-scalar, 8 PE design, with speed-ups of up to 6.3 that of a single processor. Noticeably it only needs a single scalar process to be scheduled at any time, with main memory fetches being 1–4% that of a single processor.
Similar content being viewed by others
References
Olukotun, K., Hammond, L.: The Future of Microprocessors. ACM Queue, pp. 27–34, September 2005
Ungerer T., Robic B. and Silc J. (2002). Multithreaded processors. Comput. J. 45(3): 320–348
Schlansker M. and Rau B. (2000). EPIC: Explicitly parallel instruction processing. IEEE Computer 33: 37–45
Ozer E. and Conte M. (2005). High-performance and low-cost dual-thread VLIW processor using weld architectural paradigm. IEEE Trans. Parallel Distribut. Syst. 16(12): 1132–1142
Özer, E., Conte, T.M., Sharma, S.: Weld: a multithreading technique towards latency-tolerant VLIW processors. In: Proceedings of the 8th International Conference on High Performance Computing–HiPC 2001, Lecture Notes in Computer Science 2228, pp. 192–203, December 2001
Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, Assoc. Comput. Mach., pp. 392–403 (1995)
Eggers S.J., Emer J.S., Levy H.M., Lo J.L., Stamm R.L. and Tullsen D.M. (1997). Simultaneous multithreading: a platform for next-generation processors. IEEE Micro. 17(5): 12–19
Rau, B.R.: Dynamically scheduled VLIW processors. In: Proceedings of the 26th Annual International Symposium on Microarchitecture, pp. 80–92. Austin, Texas (1993)
Spadini, F., Fahs, B., Patel, S., Lumetta, S.S.: Improving quasi-dynamic schedules through region slip. In: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. ACM International Conference Proceeding Series, vol. 37, pp. 149–158. San Francisco, California (2003)
Nair, R., Hopkins, M.E.: Exploiting instruction level parallelism in processors by caching scheduled groups. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 13–25 (1997)
De Souza, A.F., Rounce, P.A.: Dynamically trace scheduled VLIW architectures. In: Proceedings of the High-performance Computing and Networking 1998–HPCN’98, Lecture Notes in Computer Science 1401, pp. 993–995, April 1998
De Souza, A.F.: Integer performance evaluation of the dynamically trace scheduled VLIW architecture. Ph.D. thesis, Department of Computer Science, University College London, University of London (1999)
De Souza A.F. (2000). Dynamically scheduling VLIW instructions. J. Parallel Distribut. Comput. 60(12): 1480–1511
De Souza, A.F.: Integer performance via block Compaction. In: Proceedings of the 13th Symposium on Computer Architecture and High Performance Computing, pp. 98–105 (2001)
Santana, S.C., De Souza, A.F., Rounce, P.A.: A comparative analysis between EPIC static instruction scheduling and DTSVLIW dynamic instruction scheduling. In: Proceedings of the ICS 03 Workshop on Exploring the Trace Space for Dynamic Optimization Techniques, International Conference on Supercomputing, San Francisco, ACM SIGARCH, June 22–26, 2003
Rounce, P.A., De Souza, A.F.: The mDTSVLIW: a multi-threaded trace-based VLIW architecture, sbac-pad. In: 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’06), pp. 63–72 (2006)
Fisher J.A. (1984). The VLIW machine: a multiprocessor for compiling scientific code. IEEE Computer 17(7): 45–53
Hwu W.W., Mahlke S.A., Chen W.Y., Chang P.P., Warter N.J., Bringmann R.A., Ouellette R.G., Hank R.E., Kiyohara T., Haab G.E., Holm J.G. and Lavery D.M. (1993). The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput. 7: 229–248
Sun Microsystems: The Sparc Architecture Manual—Version 7. Sun Microsystems, Inc. (1987)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rounce, P., De Souza, A. Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture. Int J Parallel Prog 36, 184–205 (2008). https://doi.org/10.1007/s10766-007-0062-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-007-0062-1