Abstract
In here we consider a technique to automatically extract three-argument instructions from sequential arithmetic code. The instructions include: multiply and add, three argument additions and three argument multiplications (MUL3). The proposed solution combines a height reduction technique that generates three-argument instructions and a VLIW scheduling that can benefit from these instructions. The proposed height reduction technique is based on a known theoretical algorithm (MRK) that in some cases can evaluate an algebraic circuit faster than its depth. We modified the MRK algorithm to generate less instructions and emit VLIW instructions. The modified MRK algorithm was implemented in the LLVM compiler and the potential usefulness was measured. Our results show that for arithmetic benchmarks the proposed technique can improve the VLIW scheduling while emitting three-argument instructions. The contribution of this work includes: the modified MRK algorithm as a new technique for height reduction optimizations and the study of the potential usefulness of three-argument instructions. Though our results are for a non existing hardware they show the usefulness of adding such instructions to VLIW CPUs. Note that a previous research showed that MUL3 can be executed as fast as MUL2.
Similar content being viewed by others
References
Arm instruction set, arm7tdmi data sheet. Technical report(2009)
Ben Asher, Y., Stein, E.: Extending booth algorithm to multiplications of three numbers on the fpga. In: IEEE International Conference on Field-Programmable Technology, Taiwan (2008)
Cortadella, J.: Bi-decomposition and tree-height reduction for timing optimization. In: IEEE/ACM International Workshop on Logic and Synthesis (2002)
De Souza A.F., Rounce P.: Dynamically scheduling VLIW instructions. J. Parallel Distrib. Comput. 60(12), 1480–1511 (2000)
Freescale Semiconductor, Inc., SC140 DSP Core Reference Manual (2005)
Miller G.L., Ramachandran V., Kaltofen E.: Efficient parallel evaluation of straight-line code and arithmetic circuits. SIAM J. Comput. 17(4), 687–695 (1988)
Ghuloum, A.M., Fisher, A.L.: Parallelizing complex scans and reductions, pp. 135–146 (1994)
Haney, R., Meuse, T., Kepner, J., Lebak, J.: Hpec challenge overview. In: Ninth Anuual High-Performance Embedded Computing Workshop (2005)
Intel 64 and ia-32 architectures, software developers manual, instruction set reference. Technical report, Intel (2011)
Jaja J.: An Introduction to Parallel Algorithms. Addison-Wesley, Boston (1992)
Lattner, C., Adve, V. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California (2004)
Lee, C., Lee, J.K., Hwang, T.T., Tsai, S.C.: Compiler optimization on instruction scheduling for low power. In: International Symposium on System Synthesis (ISSS), p 55, IEEE Computer Society (2000)
Mangalam, G.N., Narayan, S., van Besouw, P., Avra, L., Mathur, A., Saluja, S.: Graph transformations for improved tree height reduction. In: VLSID ’03: Proceedings of the 16th International Conference on VLSI Design (2003)
McMahon, F.H.: The livermore Fortran Kernels Test of the Numerical Performance Range. Performance Evaluation of Supercomputers. Elsevier, New York, pp. 143–186 (1988)
Muchnick S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, Waltham (1997)
Nakatani, T., Ebcioglu, K.: Combining as a compilation technique for vliw architectures. In: Proceedings of MICRO-22, pp. 43–55 (1989)
Nicolau, A., Potasmann, R.: Incremental tree height reduction for high level synthesis. In: DAC ’91: Proceedings of the 28th conference on ACM/IEEE design automation, pp. 770–774 (1991)
Reif J.: Synthesis of Parallel Algorithms. Morgan Kaufmann, Waltham (1993)
Revol N., Roch J.: Parallel evaluation of arithmetic circuits. Theor. Comput. Sci. 162, 133–150 (1996)
Santos R., Azevedo R., Araujo G.: Instruction scheduling based on subgraph isomorphism for a high performance computer processor. J. Univers. Comput. Sci. 14(21), 3465–3480 (2008)
Schlansker, M., Kathail, V., Anik, S.: Height reduction of control recurrences for ilp processors. In: MICRO 27: Proceedings of the 27th Annual International Symposium on Microarchitecture (1994)
Sethi, R., Aho, A.V., Lam, M.S., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Boston, 1986 (2006)
Shobaki G., Wilken K., Heffernan M.: Optimal trace scheduling using enumeration. ACM Trans. Architect. Code Optim. (TACO) 5(4), 1–32 (2009)
Steven, F.L., Potter, R.D., Steven, G.B., Vintan, L., Romania, S.: Static data dependence collapsing in a high performance superscalar architecture. In: The 3-rd International Conference on Massively Parallel Computing Systems (MPCS ’98) (1998)
Stpiczynski, P.: Parallel algorithms for solving linear recurrence systems. Parallel Processing: CONPAR 92-VAPP V. In: Second Joint International Conference on Vector and Parallel Processing, pp. 343–348 (1992)
Texas Instruments, Inc., TMS320C67x/C67x CPU and Instruction Set Reference Guide (2006)
Zory, J., Coelho, F.: Using algebraic transformations to optimize expression evaluation in scientific code. In: PACT ’98: Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (1998)
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by THE ISRAEL SCIENCE FOUNDATION grant No. 585/09 and Israel Ministry of Science and Technology (MOST) grant No. 3-6496.
Rights and permissions
About this article
Cite this article
Abboud, F., Ben-Asher, Y., Shajrawi, Y. et al. Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations. Int J Parallel Prog 40, 488–513 (2012). https://doi.org/10.1007/s10766-012-0196-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0196-7