Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Cooperative Software-hardware Acceleration of K-means on a Tightly Coupled CPU-FPGA System

Published: 17 August 2020 Publication History

Abstract

We consider software-hardware acceleration of K-means clustering on the Intel Xeon+FPGA platform. We design a pipelined accelerator for K-means and combine it with CPU threads to assess performance benefits of (1) acceleration when data are only accessed from system memory and (2) cooperative CPU-FPGA acceleration. Our evaluation shows that the accelerator is up to 12.7×/2.4× faster than a single CPU thread for the assignment/update step of K-means. The cooperative use of threads and FPGA is roughly 1.9× faster than CPU threads alone or the FPGA by itself. Our approach delivers 4×–5× higher throughput compared to existing offload processing approaches.

References

[1]
T. S. Abdelrahman. 2016. Accelerating K-means clustering on a tightly coupled processor-FPGA heterogeneous system. In Proceedings of the International Conference on Application-specific Systems, Architectures and Processors (ASAP’16). 176--181.
[2]
M. E. Belviranli, L. N. Bhuyan, and R. Gupta. 2013. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans. Archit. Code Optim. 9, 4 (2013).
[3]
Young-Kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2019. In-depth analysis on microarchitectures of modern heterogeneous CPU-FPGA platforms. ACM Trans. Reconfig. Technol. Syst. 12, 1 (2019), 4:1--4:20.
[4]
Yuk-Ming Choi and Hayden Kwok-Hay So. 2014. Map-reduce processing of K-means algorithm with FPGA-accelerated computer cluster. In Proceedings of the International Conference on Application-specific Systems, Architectures and Processors. 9--16.
[5]
Maya Gokhale, Jan Frigo, Kevin Mccabe, James Theiler, Christophe Wolinski, and Dominique Lavenier. 2003. Experience with a hybrid processor: K-means clustering. J. Supercomput. 26, 2 (2003), 131--148.
[6]
P. Gupta. 2015. Xeon+FPGA Platform for the Data Center. Retrieved from http://www.ece.cmu.edu/∼calcm/carl/doku.php?id=pk_gupta_intel_xeon_fpga_platform_for_the_data_center.
[7]
Hanaa M. Hussain, Khaled Benkrid, Ahmet T. Erdogan, and Huseyin Seker. 2011. Highly parameterized K-means clustering on FPGAs: Comparative results with GPPs and GPUs. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. 475--480.
[8]
Hanaa M. Hussain, Khaled Benkrid, Huseyin Seker, and Ahmet T. Erdogan. 2011. FPGA implementation of K-means algorithm for bioinformatics application: An accelerated approach to clustering Microarray data. In Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems (AHS'11). 248--255.
[9]
Intel. 2020. MPF—Memory Properties Factory. Retrieved from https://github.com/OPAE/intel-fpga-bbb/tree/master/BBB_cci_mpf.
[10]
Intel Corp. 2019. Intel Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual. Retrieved from https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl-ias-ccip.pdf.
[11]
Intel Corp. 2020. Intel QuickAssist Technology. Retrieved from http://www.intel.com/content/www/us/en/embedded/technology/quickassist/overview.html.
[12]
Intel Corp. 2020. Power Solutions. Retrieved from https://www.intel.com/content/www/us/en/programmable/support/supportresources/support-centers/power-support.html.
[13]
Intel Documentation. 2020. AN 856: K-Mean Clustering with the Intel FPGA SDK for OpenC. Retrieved from https://www.intel.com/content/www/us/en/programmable/documentation/rgw1528307246592.html.
[14]
Dominique Lavenier. 2000. FPGA implementation of the K-means clustering algorithm for hyperspectral images. Los Alamos National Lab, LAUR #00-3079 (2000), 1--18.
[15]
Dajong Lee. 2017. Designing Hardware Accelerated Systems for Imaging Flow Cytometry. Ph.D. Dissertation. UC San Diego.
[16]
Dajung Lee, Alric Althoff, Dustin Richmond, and Ryan Kastner. 2017. A streaming clustering approach using a heterogeneous system for big data analysis. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17). 699--706.
[17]
M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.
[18]
Zhongduo Lin, Charles Lo, and Paul Chow. 2012. K-means implementation on FPGA for high-dimensional data using triangle inequality. In Proceedings of the International Conference on Field Programmable Logic and Applications. 437--442.
[19]
Wei-Chuan Liu, Jiun-Long Huang, and Ming-Syan Chen. 2005. KACU: K-means with hardware centroid-updating. In Proceedings of the Emerging Information Technology Conference. 3--5.
[20]
S. Lloyd. 1982. Least squares quantization in PCM. Trans. Inf. Theor. 28, 2 (1982), 129--137.
[21]
Enno Luebbers, Song Liu, and Michael Chu. 2020. Simplify Software Integration for FPGA Accelerators with OPAE. Retrieved from https://01.org/sites/default/files/downloads/opae/open-programmable-acceleration-engine-paper.pdf.
[22]
Hadi Mardani Kamali. 2018. Using multi-core HW/SW co-design architecture for accelerating K-means clustering algorithm. CoRR abs/1807.09250 (2018).
[23]
Hadi Mardani Kamali and Avesta Sasan. 2018. MUCH-SWIFT: A high-throughput multi-core HW/SW co-design K-means clustering architecture. In Proceedings of the Great Lakes Symposium on VLSI. 459--462.
[24]
Andrew Putnam et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the International Symposium on Computer Architecuture (ISCA’14). 13--24.
[25]
A. Rodriguez, A. Navarro, R. Asenjo, F. Corbera, R. Gran Tejero, D. Suarez Gracia, and J. Nunez-Yanez. 2019. Parallel multiprocessing and scheduling on the heterogeneous Xeon+FPGA platform. J. Supercomput. (06 2019).
[26]
Matheus Souza, Lucas Maciel, Pedro Penna, and Henrique Freitas. 2018. Energy efficient parallel K-means clustering for an Intel hybrid multi-chip package. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing. 372--379.
[27]
Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar. 2018. Introduction to Data Mining (2nd ed.). Pearson.
[28]
Daniel Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2009. Collecting performance data with PAPI-C. In Proceedings of the 3rd Parallel Tools Workshop on Tools for High Performance Computing. 157--173.
[29]
A. Vilches, R. Asenjo, A. G. Navarro, F. Corbera, R. Gran Tejero, and M. Garzarán. 2015. Adaptive partitioning for irregular applications on heterogeneous CPU-GPU chips. In Proceedings of the International Conference on Computational Science, Vol. 51. 140--149.
[30]
Gabriel Weisz, Joseph Melber, Yu Wang, Kermin Fleming, Eriko Nurvitadhi, and James C. Hoe. 2016. A study of pointer-chasing performance on shared-memory processor-FPGA systems. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). 264--273.
[31]
Bruce Wile. 2014. CAPI is Core to POWER. Retrieved from http://www-03.ibm.com/linux/blogs/capi/.
[32]
R. Wilson. 2014. Heterogeneous Computing Meets the Data Center. Retrieved from https://www.altera.com/solutions/technology/system-design/articles/_2014/heterogeneous-computing.html.
[33]
Junjie Wu. 2012. Advances in K-means Clustering. Springer-Verlag Berlin.
[34]
Xilinx Inc. 2014. Zynq-7000: All Programmable SoC. Retrieved from http://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html.
[35]
Chi Zhang, Ren Chen, and Viktor Prasanna. 2015. High Throughput Large Scale Sorting on a CPU-FPGA Heterogeneous Platform. Technical Report CENG-2015-10. Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California.
[36]
Shijie Zhou and Viktor K. Prasanna. 2017. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing. 137--144.

Cited By

View all
  • (2024)Design and performance analysis of modern computational storage devices: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.123570250(123570)Online publication date: Sep-2024
  • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: 1-Apr-2023
  • (2023)Hardware Software Co-design of k-means Clustering Algorithm2023 9th International Conference on Signal Processing and Communication (ICSC)10.1109/ICSC60394.2023.10441233(688-693)Online publication date: 21-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 17, Issue 3
September 2020
200 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3415154
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2020
Accepted: 01 June 2020
Revised: 01 March 2020
Received: 01 February 2020
Published in TACO Volume 17, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA-based acceleration
  2. Heterogeneous acceleration
  3. K-Means clustering
  4. Performance evaluation
  5. shared-memory CPU-FPGA systems

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)201
  • Downloads (Last 6 weeks)36
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Design and performance analysis of modern computational storage devices: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.123570250(123570)Online publication date: Sep-2024
  • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: 1-Apr-2023
  • (2023)Hardware Software Co-design of k-means Clustering Algorithm2023 9th International Conference on Signal Processing and Communication (ICSC)10.1109/ICSC60394.2023.10441233(688-693)Online publication date: 21-Dec-2023
  • (2021)HCE: A Runtime System for Efficiently Supporting Heterogeneous Cooperative ExecutionIEEE Access10.1109/ACCESS.2021.31248569(147264-147279)Online publication date: 2021
  • (2021)Collision detection algorithm on abrasive belt grinding blisk based on improved octree segmentationThe International Journal of Advanced Manufacturing Technology10.1007/s00170-021-08213-w118:11-12(4105-4121)Online publication date: 22-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media