Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

In here we consider a technique to automatically extract three-argument instructions from sequential arithmetic code. The instructions include: multiply and add, three argument additions and three argument multiplications (MUL3). The proposed solution combines a height reduction technique that generates three-argument instructions and a VLIW scheduling that can benefit from these instructions. The proposed height reduction technique is based on a known theoretical algorithm (MRK) that in some cases can evaluate an algebraic circuit faster than its depth. We modified the MRK algorithm to generate less instructions and emit VLIW instructions. The modified MRK algorithm was implemented in the LLVM compiler and the potential usefulness was measured. Our results show that for arithmetic benchmarks the proposed technique can improve the VLIW scheduling while emitting three-argument instructions. The contribution of this work includes: the modified MRK algorithm as a new technique for height reduction optimizations and the study of the potential usefulness of three-argument instructions. Though our results are for a non existing hardware they show the usefulness of adding such instructions to VLIW CPUs. Note that a previous research showed that MUL3 can be executed as fast as MUL2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arm instruction set, arm7tdmi data sheet. Technical report(2009)

  2. Ben Asher, Y., Stein, E.: Extending booth algorithm to multiplications of three numbers on the fpga. In: IEEE International Conference on Field-Programmable Technology, Taiwan (2008)

  3. Cortadella, J.: Bi-decomposition and tree-height reduction for timing optimization. In: IEEE/ACM International Workshop on Logic and Synthesis (2002)

  4. De Souza A.F., Rounce P.: Dynamically scheduling VLIW instructions. J. Parallel Distrib. Comput. 60(12), 1480–1511 (2000)

    Article  MATH  Google Scholar 

  5. Freescale Semiconductor, Inc., SC140 DSP Core Reference Manual (2005)

  6. Miller G.L., Ramachandran V., Kaltofen E.: Efficient parallel evaluation of straight-line code and arithmetic circuits. SIAM J. Comput. 17(4), 687–695 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ghuloum, A.M., Fisher, A.L.: Parallelizing complex scans and reductions, pp. 135–146 (1994)

  8. Haney, R., Meuse, T., Kepner, J., Lebak, J.: Hpec challenge overview. In: Ninth Anuual High-Performance Embedded Computing Workshop (2005)

  9. Intel 64 and ia-32 architectures, software developers manual, instruction set reference. Technical report, Intel (2011)

  10. Jaja J.: An Introduction to Parallel Algorithms. Addison-Wesley, Boston (1992)

    MATH  Google Scholar 

  11. Lattner, C., Adve, V. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California (2004)

  12. Lee, C., Lee, J.K., Hwang, T.T., Tsai, S.C.: Compiler optimization on instruction scheduling for low power. In: International Symposium on System Synthesis (ISSS), p 55, IEEE Computer Society (2000)

  13. Mangalam, G.N., Narayan, S., van Besouw, P., Avra, L., Mathur, A., Saluja, S.: Graph transformations for improved tree height reduction. In: VLSID ’03: Proceedings of the 16th International Conference on VLSI Design (2003)

  14. McMahon, F.H.: The livermore Fortran Kernels Test of the Numerical Performance Range. Performance Evaluation of Supercomputers. Elsevier, New York, pp. 143–186 (1988)

  15. Muchnick S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, Waltham (1997)

    Google Scholar 

  16. Nakatani, T., Ebcioglu, K.: Combining as a compilation technique for vliw architectures. In: Proceedings of MICRO-22, pp. 43–55 (1989)

  17. Nicolau, A., Potasmann, R.: Incremental tree height reduction for high level synthesis. In: DAC ’91: Proceedings of the 28th conference on ACM/IEEE design automation, pp. 770–774 (1991)

  18. Reif J.: Synthesis of Parallel Algorithms. Morgan Kaufmann, Waltham (1993)

    Google Scholar 

  19. Revol N., Roch J.: Parallel evaluation of arithmetic circuits. Theor. Comput. Sci. 162, 133–150 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  20. Santos R., Azevedo R., Araujo G.: Instruction scheduling based on subgraph isomorphism for a high performance computer processor. J. Univers. Comput. Sci. 14(21), 3465–3480 (2008)

    Google Scholar 

  21. Schlansker, M., Kathail, V., Anik, S.: Height reduction of control recurrences for ilp processors. In: MICRO 27: Proceedings of the 27th Annual International Symposium on Microarchitecture (1994)

  22. Sethi, R., Aho, A.V., Lam, M.S., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Boston, 1986 (2006)

  23. Shobaki G., Wilken K., Heffernan M.: Optimal trace scheduling using enumeration. ACM Trans. Architect. Code Optim. (TACO) 5(4), 1–32 (2009)

    Article  Google Scholar 

  24. Steven, F.L., Potter, R.D., Steven, G.B., Vintan, L., Romania, S.: Static data dependence collapsing in a high performance superscalar architecture. In: The 3-rd International Conference on Massively Parallel Computing Systems (MPCS ’98) (1998)

  25. Stpiczynski, P.: Parallel algorithms for solving linear recurrence systems. Parallel Processing: CONPAR 92-VAPP V. In: Second Joint International Conference on Vector and Parallel Processing, pp. 343–348 (1992)

  26. Texas Instruments, Inc., TMS320C67x/C67x CPU and Instruction Set Reference Guide (2006)

  27. Zory, J., Coelho, F.: Using algebraic transformations to optimize expression evaluation in scientific code. In: PACT ’98: Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yosi Ben-Asher.

Additional information

This research was supported by THE ISRAEL SCIENCE FOUNDATION grant No. 585/09 and Israel Ministry of Science and Technology (MOST) grant No. 3-6496.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abboud, F., Ben-Asher, Y., Shajrawi, Y. et al. Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations. Int J Parallel Prog 40, 488–513 (2012). https://doi.org/10.1007/s10766-012-0196-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0196-7

Keywords

Navigation