Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2502323.2502325acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Counting and occurrence sort for GPUs using an embedded language

Published: 23 September 2013 Publication History

Abstract

This paper investigates two sorting algorithms: counting sort and a variation, occurrence sort, which also removes duplicate elements, and examines their suitability for running on the GPU. The duplicate removing variation turns out to have a natural functional, data-parallel implementation which makes it particularly interesting for GPUs.
The algorithms are implemented in Obsidian, a high-level domain specific language for GPU programming.
Measurements show that our implementations in many cases outperform the sorting algorithm provided by the library Thrust. Furthermore, occurrence sort is another factor of two faster than ordinary counting sort. We conclude that counting sort is an important contender when considering sorting algorithms for the GPU, and that occurrence sort is highly preferable when applicable. We also show that Obsidian can produce very competitive code.

References

[1]
Wikipedia article: Counting sort. http://en.wikipedia.org/wiki/Counting_sort.
[2]
M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In Proc. of the sixth workshop on Declarative aspects of multicore programming, DAMP '11. ACM, 2011.
[3]
K. Claessen, M. Sheeran, and B. J. Svensson. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming, DAMP '12, 2012.
[4]
C. Elliott, S. Finne, and O. de Moor. Compiling embedded languages. Journal of Functional Programming, 13(2), 2003. URL http://conal.net/papers/jfp-saig/.
[5]
M. Harris, S. Sengupta, and J. D. Owens. Parallel Prefix Sum (Scan) with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison Wesley, 2007.
[6]
Joel Svensson. Obsidian: GPU Kernel Programming in Haskell. Technical Report 77L, Computer Science and Enginering, Chalmers University of Technology, Gothenburg, 2011. Thesis for the degree of Licentiate of Philosophy.
[7]
T. Karras. Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees. In Proc. of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics, EGGH-HPG'12. Eurographics Association, 2012.
[8]
D. E. Knuth. The art of computer programming, volume 3: (2nd ed.) sorting and searching. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 1998. ISBN 0-201-89685-0.
[9]
V. Kolonias, A. G. Voyiatzis, G. Goulas, and E. Housos. Design and implementation of an efficient integer count sort in CUDA GPUs. Concurrency And Computing: Practice And Experiance, 23(18), Dec. 2011.
[10]
J. Krueger, M. Grund, I. Jaeckel, A. Zeier, and H. Plattner. Applicability of GPU Computing for Efficient Merge in In-Memory Databases. In The Second International Workshop on Accelerating Data Management Systems using Modern Processor and Storage Architectures (ADMS 11), 2011. http://www.adms-conf.org/p19-KRUEGER.pdf.
[11]
A. Kulkarni and R. R. Newton. Embrace, Defend, Extend: A Methodology for Embedding Preexisting DSLs, 2013. Functional Programming Concepts in Domain-Specific Languages (FPCDSL'13).
[12]
G. Mainland and G. Morrisett. Nikola: embedding compiled GPU functions in Haskell. SIGPLAN Not., 45(11), 2010.
[13]
NVIDIA. CUDA C Programming Guide, . URL http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
[14]
NVIDIA. NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110, . URL http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
[15]
NVIDIA. NVIDIA Thrust Library, . URL https://developer.nvidia.com/thrust.
[16]
O. Olsson, M. Billeter, and U. Assarsson. Clustered deferred and forward shading. In Proc. of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics, EGGH-HPG'12. Eurographics Association, 2012.
[17]
E. Sintorn and U. Assarsson. Fast parallel GPU-sorting using a hybrid algorithm. Journal of Parallel and Distributed Computing, 68(10), 2008.
[18]
J. Sklansky. Conditional Sum Addition Logic. Trans. IRE, EC-9(2), June 1960.

Cited By

View all
  • (2017)A roadmap of parallel sorting algorithms using GPU computing2017 International Conference on Computing, Communication and Automation (ICCCA)10.1109/CCAA.2017.8229919(736-741)Online publication date: May-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
September 2013
104 pages
ISBN:9781450323819
DOI:10.1145/2502323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. embedded language
  2. sorting

Qualifiers

  • Research-article

Conference

ICFP'13
Sponsor:

Acceptance Rates

FHPC '13 Paper Acceptance Rate 8 of 14 submissions, 57%;
Overall Acceptance Rate 18 of 25 submissions, 72%

Upcoming Conference

ICFP '25
ACM SIGPLAN International Conference on Functional Programming
October 12 - 18, 2025
Singapore , Singapore

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)A roadmap of parallel sorting algorithms using GPU computing2017 International Conference on Computing, Communication and Automation (ICCCA)10.1109/CCAA.2017.8229919(736-741)Online publication date: May-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media