Abstract
In several digital signal processing algorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, a systematic design methodology for stride permutation networks is derived. These permutations are represented with Boolean matrices, which are decomposed and mapped directly onto register-based networks. The resulting networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Since the proposed methodology is systematic, it can be exploited in automated design generation.
Similar content being viewed by others
References
J. H. Takala, T. S. Järvinen, and H. T. Sorokin, “Conflict-free Parallel Memory Access Scheme for FFT Processors,” Proc. IEEE ISCAS, Bangkok, Thailand, May 25–28, 2003.
J. A. Hidalgo, J. Lopez, F. Argüello, and E. L. Zapata, “Area-efficient Architecture for Fast Fourier Transform,” IEEE Trans. Circuits Syst. II, vol. 46, no. 2, 1999, pp. 187–193 (Feb).
S. Y. Kim, H. Kim, and I. C. Park, “Path Metric Memory Management for Minimising Interconnections in Viterbi Decoders,” IEEE Elect. Lett., vol. 37, no. 14, 2001, pp. 925–926 (July).
H. S. Stone, “Parallel Processing with the Perfect Shuffle,” IEEE Trans. Comput., vol. C-20, no. 2, 1971, pp. 153–161 (Feb.).
J. Granata, M. Conner, and R. Tolimieri, “Recursive Fast Algorithms and the Role of the Tensor Product,” IEEE Trans. Signal Process., vol. 40, no. 12, 1992, pp. 2921–2930 (Dec.).
J. Astola and D. Akopian, “Architecture-oriented Regular Algorithms for Discrete Sine and Cosine Transforms,” IEEE Trans. Signal Process., vol. 47, no. 4, 1999, pp. 1109–1124 (Apr.).
J. Kwak and J. You, “One- and Two-dimensional Constant Geometry Fast Cosine Transform Algorithms and Architectures,” IEEE Trans. Signal Process., vol. 47, no. 7, 1999, pp. 2023–2034 (July).
J. Takala, T. Järvinen, P. Salmela, and D. Akopian, “Multi-port Interconnection Networks for Radix-r Algorithms,” in Proc. IEEE ICASSP, Salt Lake City, UT, U.S.A., May 7–11 2001, pp. 1177–1180.
J. Takala and T. Järvinen, “Multi-port Interconnection Networks for Matrix Transpose,” in Proc. IEEE ICASSP, Phoenix, AZ, U.S.A., May 26–29 2002, pp. 874–877.
T. Järvinen and J. Takala, “Register-based Permutation Networks for Stride Permutations,” in Computer Systems: Architectures, Modeling, and Simulation (LNCS 3133), A. D. Pimentel and S. Vassiliadis (Eds.), Springer, Berlin Heidelberg New York, 2004, pp. 108–117.
T. Järvinen, P. Salmela, H. Sorokin, and J. Takala, “Stride Permutation Networks for Array Processors,” in Proc. IEEE 15th International Conference on Application-Specific Systems, Architectures, and Processors, Galveston, TX, U.S.A., Sept. 27–29 2004.
T. Järvinen, “Systematic Methods for Designing Stride Permutation Interconnections,” Dr. Tech. dissertation, Tampere University of Technology, Tampere, Finland, Nov. 2004. Available online at http://webhotel.tut.fi/library/tutdiss/pdf/jarvinen.pdf.
E. Bidet, D. Castelain, C. Joanblang, and P. Senn, “A Fast Single-chip Implementation of 8192 Complex Point FFT,” IEEE J. Solid-State Circuits, vol. 30, no. 3, 1995, pp. 300–305 (Mar.).
M. Bóo, F. Argüello, J. Bruguera, R. Doallo, and E. Zapata, “High-performance VLSI Architecture for the Viterbi Algorithm,” IEEE Trans. Commun., vol. 45, no. 2, 1997, pp. 168–176 (Feb.).
C. B. Shung, H.-D. Lin, R. Cypher, P. H. Siegel, and H. K. Thapar, “Area-efficient Architectures for the Viterbi Algorithm Part I: Theory,” IEEE Trans. Commun., vol. 41, no. 4, 1993, pp. 636–644 (Apr.).
K. K. Parhi, “Systematic Synthesis of DSP Data Format Converters Using Life-time Analysis and Forward–backward Register Allocation,” IEEE Trans. Circuits Syst. II, vol. 39, no. 7, 1992, pp. 423–440 (Jul.).
M. Majumdar and K. K. Parhi, “Design of Data Format Converters Using Two-dimensional Register Allocation,” IEEE Trans. Circuits Syst. II, vol. 45, no. 4, 1998, pp. 504–508 (Apr.).
J. Bae and V. K. Prasanna, “Synthesis of Area-efficient and High-throughput Rate Data Format Converters,” IEEE Trans. VLSI Syst., vol. 6, no. 4, 1998, pp. 697–706 (Dec.).
K. Srivatsan, C. Chakrabarti, and L. Lucke, “A New Register Allocation Scheme for Low-power Data Format Converters,” IEEE Trans. Circuits Syst. II, vol. 46, no. 9, 1999, pp. 1250–1253 (Sept.).
M. Davio, “Kronecker Products and Shuffle Algebra,” IEEE Trans. Comput., vol. 30, no. 2, 1981, pp. 116–125 (Feb.).
J. C. Carlach, P. Penard, and J. L. Sicre, “TCAD: a 27 MHz 8 × 8 Discrete Cosine Transform Chip,” in Proc. IEEE ICASSP, Glasgow, UK, May 23–26 1989, pp. 2429–2432.
M. Kovac and P. Ranganathan, “JAGUAR: A High Speed VLSI Chip for JPEG Image Compression Standard,” in Proc. IEEE Int. Conf. on VLSI Design, New Delhi, India, Jan. 4–7 1995, pp. 220–224.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Järvinen, T., Salmela, P., Sorokin, H. et al. Stride Permutation Networks for Array Processors. J VLSI Sign Process Syst Sign Im 49, 51–71 (2007). https://doi.org/10.1007/s11265-006-0031-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-006-0031-8