Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Worst Case <italic>O(N)</italic> Comparison-Free Hardware Sorting Engine

Published: 01 October 2022 Publication History

Abstract

This article proposes a novel comparison-free hardware sorting engine that sorts <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> unique <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>-bit elements (irrespective of signed and unsigned) consuming linear sorting latency of <inline-formula> <tex-math notation="LaTeX">$O(N)$ </tex-math></inline-formula> clock cycles. It can even efficiently sort <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> data elements with a nonzero duplicity rate in less than <inline-formula> <tex-math notation="LaTeX">$O(N)$ </tex-math></inline-formula> clock cycles. This sorting engine is designed using <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>-symmetric cascaded blocks utilizing few fundamental logic components. The entire design is synthesized for several data sets from pseudorandomly generated data elements to unique elements, and also from random to completely sorted elements with various duplicity rates. The architecture appears impartial with respect to ordering of elements. Synthesis results indicate that the proposed approach consumes reasonably lower field programmable gate array resources than existing approaches. The architecture takes per-element sorting latency in sorting 512 unique signed elements as 22.56 ns (48 bit) and takes 26.80 ns (64 bit) to sort 256 unique signed elements. The engine achieves sorting throughput rates as <inline-formula> <tex-math notation="LaTeX">$\approx 117$ </tex-math></inline-formula>-to-142 Million-Elements-per-second (MEps) (16 bit), 79-to-97 MEps (24 bit) for sorting 256-to-1K, whereas 66-to-73 MEps (32 bit) and 44-to-49 MEps (48 bit) for sorting 256-to-512 elements. However, it is 37 MEps (64 bit) in sorting 256 signed elements. This architecture consumes <inline-formula> <tex-math notation="LaTeX">$\approx 1.52~\mu \text{W}$ </tex-math></inline-formula> for the unique signed numbers (SNs) as per-byte processing power and <inline-formula> <tex-math notation="LaTeX">$\approx 1.55~\mu \text{W}$ </tex-math></inline-formula> for the SNs with nonzero duplicity rates.

References

[1]
S. Ghosh, S. Dasgupta, and S. S. Ray, “A comparison-free hardware sorting engine,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), 2019, pp. 586–591.
[2]
M. H. Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan, “Low-cost sorting network circuits using unary processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 8, pp. 1471–1480, Aug. 2018.
[3]
P. Li, D. J. Lilja, W. Qian, K. Bazargan, and M. D. Riedel, “Computation on stochastic bit streams digital image processing case studies,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 3, pp. 449–462, Mar. 2014.
[4]
K. Ratnayake and A. Amer, “An FPGA architecture of stable-sorting on a large data volume: Application to video signals,” in Proc. 41st Annu. Conf. Inf. Sci. Syst., 2007, pp. 431–436.
[5]
G. Graefe, “Implementing sorting in database systems,” ACM Comput. Surveys, vol. 38, no. 3, p. 10, Sep. 2006.
[6]
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha, “GPUTeraSort: High performance graphics co-processor sorting for large database management,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2006, pp. 325–336.
[7]
Y. Tang and N. W. Bergmann, “A hardware scheduler based on task queues for FPGA-based embedded real-time systems,” IEEE Trans. Comput., vol. 64, no. 5, pp. 1254–1267, May 2015.
[8]
A. A. Colavita, A. Cicuttin, F. Fratnik, and G. Capello, “SORTCHIP: A VLSI implementation of a hardware algorithm for continuous data sorting,” IEEE J. Solid-State Circuits, vol. 38, no. 6, pp. 1076–1079, Jun. 2003.
[9]
R. C.-H. Changet al., “Implementation of a high-throughput modified merge sort in MIMO detection systems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 9, pp. 2730–2737, Sep. 2014.
[10]
M. Mahdavi and M. Shabany, “Novel MIMO detection algorithm for high-order constellations in the complex domain,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 5, pp. 834–847, May 2013.
[11]
N. Faujdar and S. P. Ghrera, “A practical approach of GPU bubble sort with CUDA hardware,” in Proc. 7th Int. Conf. Cloud Comput. Data Sci. Eng. Confluence, 2017, pp. 7–12.
[12]
D. S. Banerjee, P. Sakurikar, and K. Kothapalli, “Fast, scalable parallel comparison sort on hybrid multicore architectures,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. Workshops Phd Forum, 2013, pp. 1060–1069.
[13]
L. Kohútka and V. Stopjaková, “Rocket queue: New data sorting architecture for real-time systems,” in Proc. IEEE 20th Int. Symp. Design Diagnost. Electron. Circuits Syst. (DDECS), 2017, pp. 207–212.
[14]
T. Usui, T. V. Chu, and K. Kise, “A cost-effective and scalable merge sorter tree on FPGAs,” in Proc. 4th Int. Symp. Comput. Netw. (CANDAR), Nov. 2016, pp. 47–56.
[15]
N. Matsumoto, K. Nakano, and Y. Ito, “Optimal parallel hardware K-sorter and top K-sorter, with FPGA implementations,” in Proc. 14th Int. Symp. Parallel Distrib. Comput., Jun. 2015, pp. 138–147.
[16]
A. Farmahini-Farahani, H. J. Duwe, III, M. J. Schulte, and K. Compton, “Modular design of high-throughput, low-latency sorting units,” IEEE Trans. Comput., vol. 62, no. 7, pp. 1389–1402, Jul. 2013.
[17]
S.-H. Lin, P.-Y. Chen, and Y.-N. Lin, “Hardware design of low-power high-throughput sorting unit,” IEEE Trans. Comput., vol. 66, no. 8, pp. 1383–1395, Aug. 2017.
[18]
S. Olarlu, M. C. Pinotti, and S. Q. Zheng, “An optimal hardware-algorithm for sorting using a fixed-size parallel sorting device,” IEEE Trans. Comput., vol. 49, no. 12, pp. 1310–1324, Dec. 2000.
[19]
A. Rjabov, “Hardware-based systems for partial sorting of streaming data,” in Proc. 15th Biennial Baltic Electron. Conf. (BEC), Oct. 2016, pp. 59–62.
[20]
S. Mashimo, T. V. Chu, and K. Kise, “High-performance hardware merge sorter,” in Proc. IEEE 25th Annu. Int. Symp. Field-Programmable Custom Comput. Mach. (FCCM), Apr. 2017, pp. 1–8.
[21]
W. Song, D. Koch, M. Luján, and J. Garside, “Parallel hardware merge sorter,” in Proc. IEEE 24th Annu. Int. Symp. Field-Programmable Custom Comput. Mach. (FCCM), May 2016, pp. 95–102.
[22]
S. Abdel-Hafeez and A. Gordon-Ross, “An efficient $O(N)$ comparison-free sorting algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 6, pp. 1930–1942, Jun. 2017.
[23]
S. Abdel-Hafeez, A. Gordon-Ross, and S. Abubaker, “A comparison-free sorting algorithm on CPUs and GPUs,” J. Supercomput., vol. 74, no. 11, pp. 6369–6400, 2018.
[24]
E. A. Elsayed and K. Kise, “Towards an efficient hardware architecture for odd-even based merge sorter,” in Proc. IEEE 13th Int. Symp. Embedded Multicore/Many-Core Syst.-on-Chip (MCSoC), 2019, pp. 249–256.
[25]
A. Norollah, D. Derafshi, H. Beitollahi, and M. Fazeli, “RTHS: A low-cost high-performance real-time hardware sorter, using a multidimensional sorting algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 7, pp. 1601–1613, Jul. 2019.
[26]
J. Casper and K. Olukotun, “Hardware acceleration of database operations,” in Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, 2014, pp. 151–160.
[27]
V. A. Pedroni, R. P. Jasinski, and R. U. Pedroni, “Panning sorter: A minimal-size architecture for hardware implementation of 2D data sorting coprocessors,” in Proc. IEEE Asia Pac. Conf. Circuits Syst., Dec. 2010, pp. 923–926.
[28]
V. A. Pedroni, R. P. Jasinski, and R. U. Pedroni, “Panning sorter: An approach to the design of minimal-hardware parallel-input data sorters,” Electron. Lett., vol. 46, no. 18, pp. 1262–1263, 2010.
[29]
W. Chen, W. Li, and F. Yu, “A hybrid pipelined architecture for high performance top-K sorting on FPGA,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 67, no. 8, pp. 1449–1453, Aug. 2020.
[30]
D. Yan, W.-X. Wang, L. Zuo, and X.-W. Zhang, “A novel scheme for real-time max/min-set-selection sorters on FPGA,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 7, pp. 2665–2669, Jul. 2021.

Cited By

View all
  • (2023)k-Degree Parallel Comparison-Free Hardware Sorter for Complete SortingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320768942:5(1438-1449)Online publication date: 1-May-2023
  • (2023)${O(N)}$O(N) Memory-Free Hardware Architecture for Burrows-Wheeler TransformIEEE Transactions on Computers10.1109/TC.2022.322629572:7(2080-2093)Online publication date: 1-Jul-2023

Index Terms

  1. Worst Case <italic>O(N)</italic> Comparison-Free Hardware Sorting Engine
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
      IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  Volume 41, Issue 10
      Oct. 2022
      401 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 October 2022

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 28 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)k-Degree Parallel Comparison-Free Hardware Sorter for Complete SortingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320768942:5(1438-1449)Online publication date: 1-May-2023
      • (2023)${O(N)}$O(N) Memory-Free Hardware Architecture for Burrows-Wheeler TransformIEEE Transactions on Computers10.1109/TC.2022.322629572:7(2080-2093)Online publication date: 1-Jul-2023

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media