Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

An approximate method for filtering out data dependencies with a sufficiently large distance between memory references

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper we present an approximate algorithm for detecting and filtering data dependencies with a sufficiently large distance between memory references. A sequence of the same operations (typically enclosed in a ‘for’ loop) can be replaced with a single SIMD operation if the distance between memory references is greater than or equal to the number of data processed in the SIMD register. Some loops that could not be vectorized on traditional vector processors, can still be parallelized for short SIMD execution. There are a number of approximate data-dependence tests that have been proposed in the literature but in all of them data dependency will be assumed when actually there is no such a dependence that could restrict parallelization related to the short SIMD execution model. By examining the properties of linear subscript expressions of possibly conflicting data references, our algorithm gives the green light to the parallelization process if some sufficient conditions regarding the dependence distance are met. Our method is based on the Banerjee test and checks the minimum and maximum distances between memory references within the iteration space rather than searching for the existence of an integer solution to the dependence equation. The proposed method extends the accuracy and applicability of the classical Banerjee test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aliaga JI et al (2009) Toward the parallelization of GSL. J Supercomput 48:88–114. doi:10.1007/s11227-008-0207-z

    Article  Google Scholar 

  2. Allen R, Kennedy K (2001) Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann, San Mateo

    Google Scholar 

  3. Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitized images. Comput J 30(5):425–432

    Google Scholar 

  4. Banerjee U, Eigenman R, Nicolau A, Padua DA (1993) Automatic program parallelization. Proc IEEE 81(2):211–243

    Article  Google Scholar 

  5. Banerjee U (1997) Dependence analysis: a book series on loop transformations for restructuring compilers. Kluwer Academic, Dordrecht

    MATH  Google Scholar 

  6. Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  7. Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image-processing and computer vision. Int J Pattern Recognit Artif Intell 9(2):201–229

    Article  Google Scholar 

  8. Bik AJC, Girkar M, Grey PM, Tian XM (2002) Automatic intra-register vectorization for the Intel (R) architecture. Int J Parallel Program 30(2):65–98

    Article  MATH  Google Scholar 

  9. Blume W, Eigenman R, Hoeflinger J, Padua DA, Petersen P, Rauchwerger L, Tu P (1994) Automatic detection of parallelism: a great challenge for high-performance computing. IEEE Parallel Distrib Process Technol 2(3):37–47

    Article  Google Scholar 

  10. Blume W, Eigenman R (1998) Nonlinear and symbolic data dependence testing. IEEE Parallel Distrib Process Technol 9(12):1180–1194

    Article  Google Scholar 

  11. Chuang YJ, Wu JL (2005) An efficient matrix-based 2-D DCT splitter and merger for SIMD instructions. IEICE Trans Inf Syst E88D(7):1569–1577. doi:0.1093/ietisy/e88-d.7.1569

    Article  Google Scholar 

  12. Di Blas A, Jagota A, Hughey R (2005) Optimizing neural networks on SIMD parallel computers. Parallel Comput 31(1):97–115. doi:10.1016/j.parco.2004.11.002

    Article  Google Scholar 

  13. Guo M et al (2009) Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems. J Supercomput. doi:10.1007/s11227-009-0280-y

    Google Scholar 

  14. Hassaballah M et al (2008) A review of SIMD multimedia extensions and their usage in scientific and engineering applications. Comput J 51(6):630–649. doi:10.1093/comjnl/bxm099

    Article  Google Scholar 

  15. Huang TC, Yang CM (2000) Data dependence analysis for array references. J Syst Softw 52:55–65

    Article  Google Scholar 

  16. Kim S et al (2009) Region-based parallelization of irregular reductions on explicitly managed memory hierarchies. J Supercomput. doi:10.1007/s11227-009-0340-3

    Google Scholar 

  17. Krall A, Lelait S (2000) Compilation techniques for multimedia processors. Int J Parallel Program 28(4):347–361

    Article  Google Scholar 

  18. Kurzak J et al (2009) Optimizing matrix multiplication for a short-vector SIMD architecture—CELL processor. Parallel Comput 35(3):138–150. doi:10.1016/j.parco.2008.12.010

    Article  MathSciNet  Google Scholar 

  19. Lee WC et al (2008) Algorithm and software optimization of variable block size motion estimation for H.264/AVC on a VLIW-SIMD DSP. J Signal Process Syst Signal Image Video Technol 51(3):289–302. doi:10.1007/s11265-007-0151-9

    Article  Google Scholar 

  20. Lotfi S, Parsa S (2009) Parallel loop generation and scheduling. J Supercomput doi:10.1007/s11227-008-0262-5

    Google Scholar 

  21. Paver NC, Khan MH, Aldrich BC (2006) Optimizing mobile multimedia using SIMD techniques. Multimed Tools Appl 28(2):393–416

    Article  Google Scholar 

  22. Petersen PM, Padua DA (1996) Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parallel Distrib Syst 7(11):1121–1132

    Article  Google Scholar 

  23. Pough W (1992) A practical algorithm for exact array dependence analysis. Commun ACM 35(8):102–114

    Article  Google Scholar 

  24. Pough W (1992) Definitions of dependence distance. ACM Lett Program Lang Syst 1(3):261–265

    Article  Google Scholar 

  25. Psarris K (2002) Program analysis techniques for transforming programs for parallel execution. Parallel Comput 28(3):455–469

    Article  MATH  Google Scholar 

  26. Psarris K, Klappholz D, Kong X (2004) An experimental evaluation of data dependence analysis techniques. IEEE Trans Parallel Distrib Syst 15(3):196–213

    Article  Google Scholar 

  27. Shahbahrami A et al (2008) Implementing the 2-D wavelet transform on SIMD-enhanced general-purpose processors. IEEE Trans Multimed 10(1):43–51. doi:10.1109/TMM.2007.911195

    Article  Google Scholar 

  28. Sreraman N, Govindarajan R (2000) A vectorizing compiler for multimedia extensions. Int J Parallel Program 28(4):363–400

    Article  Google Scholar 

  29. Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable MultiRing network. J Supercomput 25(1):43–62

    Article  MATH  Google Scholar 

  30. Wolfe MJ, Banerjee U (1987) Data dependence and its application to parallel processing. Int J Parallel Program 16(2):137–178

    Article  MathSciNet  MATH  Google Scholar 

  31. Wolfe MJ, Tseng CW (1992) The power test for data dependence. IEEE Trans Parallel Distrib Syst 3(5):591–601

    Article  Google Scholar 

  32. Xue J, Guo M, Wei D (2008) Improving the parallelism of iterative methods by aggressive loop fusion. J Supercomput 43:147–164. doi:10.1007/s11227-007-0124-6

    Article  Google Scholar 

  33. Yang CT, Lai KC (2009) A directive-based MPI code generator for Linux PC clusters. J Supercomput. doi:10.1007/s11227-008-0258-1

    Google Scholar 

  34. Yang X et al (2009) Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor. J Supercomput 47:171–197. doi:10.1007/s11227-008-0186-0

    Article  Google Scholar 

  35. Yuan FN et al (2006) An interactive 3D visualization system based on PC using Intel SIMD, 3D texturing and thinning techniques. Int J Pattern Recognit Artif Intell 20(3):393–416

    Article  Google Scholar 

  36. Zhou J, Zeng G (2008) A general data dependence analysis for parallelizing compilers. J Supercomput 45:236–252. doi:10.1007/s11227-007-0168-7

    Article  Google Scholar 

  37. Zhou J, Zeng G (2008) A general data dependence analysis for parallelizing compilers. ERRATUM. J Supercomput 45:253. doi:10.1007/s11227-008-0197-x

    Article  Google Scholar 

  38. Zhou JP, Shi C (2008) Efficient SIMD optimization for media processors. J Zhejiang Univ-Sci A 9(4):524–530. doi:10.1631/jzus.A071203

    Article  MATH  Google Scholar 

  39. Zima HP, Chapman BM (1990) Supercompilers for parallel and vector computers. Addison-Wesley, Reading

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patricio Bulić.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bulić, P., Dobravec, T. An approximate method for filtering out data dependencies with a sufficiently large distance between memory references. J Supercomput 56, 226–244 (2011). https://doi.org/10.1007/s11227-009-0364-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0364-8

Navigation