research-article

Worst Case <italic>O(N)</italic> Comparison-Free Hardware Sorting Engine

Authors:

Sanchita Saha Ray,

Dulal Adak,

Surajeet GhoshAuthors Info & Claims

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 41, Issue 10

Pages 3332 - 3345

https://doi.org/10.1109/TCAD.2021.3131554

Published: 01 October 2022 Publication History

Abstract

This article proposes a novel comparison-free hardware sorting engine that sorts <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> unique <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>-bit elements (irrespective of signed and unsigned) consuming linear sorting latency of <inline-formula> <tex-math notation="LaTeX">$O(N)$ </tex-math></inline-formula> clock cycles. It can even efficiently sort <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> data elements with a nonzero duplicity rate in less than <inline-formula> <tex-math notation="LaTeX">$O(N)$ </tex-math></inline-formula> clock cycles. This sorting engine is designed using <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>-symmetric cascaded blocks utilizing few fundamental logic components. The entire design is synthesized for several data sets from pseudorandomly generated data elements to unique elements, and also from random to completely sorted elements with various duplicity rates. The architecture appears impartial with respect to ordering of elements. Synthesis results indicate that the proposed approach consumes reasonably lower field programmable gate array resources than existing approaches. The architecture takes per-element sorting latency in sorting 512 unique signed elements as 22.56 ns (48 bit) and takes 26.80 ns (64 bit) to sort 256 unique signed elements. The engine achieves sorting throughput rates as <inline-formula> <tex-math notation="LaTeX">$\approx 117$ </tex-math></inline-formula>-to-142 Million-Elements-per-second (MEps) (16 bit), 79-to-97 MEps (24 bit) for sorting 256-to-1K, whereas 66-to-73 MEps (32 bit) and 44-to-49 MEps (48 bit) for sorting 256-to-512 elements. However, it is 37 MEps (64 bit) in sorting 256 signed elements. This architecture consumes <inline-formula> <tex-math notation="LaTeX">$\approx 1.52~\mu \text{W}$ </tex-math></inline-formula> for the unique signed numbers (SNs) as per-byte processing power and <inline-formula> <tex-math notation="LaTeX">$\approx 1.55~\mu \text{W}$ </tex-math></inline-formula> for the SNs with nonzero duplicity rates.

References

[1]

S. Ghosh, S. Dasgupta, and S. S. Ray, “A comparison-free hardware sorting engine,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), 2019, pp. 586–591.

Abstract

References

Cited By

Index Terms

Recommendations

<italic>k</italic>-Degree Parallel Comparison-Free Hardware Sorter for Complete Sorting

Relations between Average-Case and Worst-Case Complexity

Collapsing and Separating Completeness Notions Under Average-Case and Worst-Case Hypotheses

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations