A Vectorized K-Means Algorithm for Intel Many Integrated Core Architecture

Fuhui Wu¹⁸,
Qingbo Wu¹⁸,
Yusong Tan¹⁸,
Lifeng Wei¹⁸,
Lisong Shao¹⁸ &
…
Long Gao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8299))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

1482 Accesses
6 Citations

Abstract

The K-Means algorithms is one of the most popular and effective clustering algorithms for many practical applications. However, direct K-Means methods, taking objects as processing unit, is computationally expensive especially in Objects-Assignment phase on Single-Instruction Single-Data (SISD) processors, typically as CPUs. In this paper, we propose a vectorized K-Means algorithm for Intel Many Integrated Core (MIC) coprocessor, a newly released product from Intel for highly parallel workloads. This new algorithm is able to achieve fine-grained Single-Instruction Multiple-Data (SIMD) parallelism by taking each dimension of all objects as a long vector. This vectorized algorithm is suitable for any-dimensional objects, which is little taken into consideration in preceding works. We also parallelize the vectorized K-Means algorithm on MIC coprocessor to achieve coarse-grained thread-level parallelism. Finally, we implement and evaluate the vectorized method on the first generation of Intel MIC product. Measurements show that this algorithm based on MIC coprocessor gets desired speedup to sequential algorithm on CPU and demonstrate that MIC coprocessor owns highly parallel computational power as well as scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Optimizing Yinyang K-Means Algorithm on ARMv8 Many-Core CPUs

Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight

Article 18 January 2019

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

References

Dayan, P.: Unsupervised Learning. The MIT Encyclopedia of the Cognitive Sciences
Google Scholar
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2007)
Article Google Scholar
Lloyd, S.: Least Squares Quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–136 (1982)
Article MathSciNet MATH Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA (2007)
Google Scholar
Bahman, B., Moseley, B., Vattani, A.: Scalable K-Means++. Proceedings of the VLDB Endowment 5(7)
Google Scholar
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming K-Means approximation. In: NIPS (2009)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D.: A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributed Computing 68(10), 1370–1380 (2008)
Article Google Scholar
Bai, H.-T., He, L.-L., Ouyang, D.-T., Li, Z.-S., Li, H.: K-Means on commodity GPUs with CUDA. In: World Congress on Computer Science and Information Engineering (2009)
Google Scholar
Farivar, R., Rebolledo, D., Chan, E., Campbell, R.: A parallel implementation of K-Means clustering on GPUs. In: Proceeding of International Conference on Parallel and Distributed Processing Techniques and Applications (2008)
Google Scholar
Mahout, http://mahout.apache.org/
Hadoop, http://hadoop.apache.org/
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Google Scholar
Feldman, M.: TACC Steps Up to the MIC. HPCwire (April 21, 2011), http://www.hpcwire.com/hpcwire/2011-04-21/tacc_steps_up_to_the_mic.html
Nvidia. Oak Ridge National Lab Turns to NVIDIA Tesla GPUs to Deploy World’s Leading Supercomputer. HPCwire (October 11, 2011), http://www.hpcwire.com/hpcwire/2011-10-11/oak_ridge_national_lab_turns_to_Nvidia_tesla_gpus_to_deploy_world_s_leading_supercomputer.html
Intel. Introducing Intel Many Integrated Core Architecture. Press release (2011), http://www.intel.com/technology/architecture-silicon/mic/index.htm
Seiler, L., et al.: Larrabee: A Many-Core x86 Architecture for visual Computing. ACM Trans. Graphics 27(3), 18:1–18:15 (2008)
Article MathSciNet Google Scholar
Saule, E., Catalyurek, U.V.: An early evaluation of the scalability of graph algorithms on the Intel MIC Architecture. In: IEEE IPDPSW (2012), doi:10.1109
Google Scholar
McFarlin, D.S., Arbatov, V., Franchetti, F., Puschel, M., Zurich, E.: Automatic SIMD vectorization of fast fourier transforms for the Larrabee and AVX Instruction sets. In: ACM ICS (2011)
Google Scholar
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU Computing. Proceedings of the IEEE 96(5) (May 2008)
Google Scholar
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75(2), 245–248 (2009)
Article Google Scholar
Henretty, T., Stock, K., Pouchet, L.-N., Franchetti, F., Ramanujam, J., Sadayappan, P.: Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 225–245. Springer, Heidelberg (2011)
Chapter Google Scholar
He, B.S., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient Gather and Scatter operations on graphics processors. In: Proceeding of the 2007 ACM/IEEE Conference on Supercomputing, p. 46. ACM (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, National University of Defense Technology, Changsha, 410073, China
Fuhui Wu, Qingbo Wu, Yusong Tan, Lifeng Wei, Lisong Shao & Long Gao

Authors

Fuhui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qingbo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yusong Tan
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Lisong Shao
View author publications
You can also search for this author in PubMed Google Scholar
Long Gao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing Technology, State Key Laboratory of Computer Architecture, Chinese Academy of Sciences, No. 6 Kexueyuan South road, Haifian District, 100190, Beijing, China
Chenggang Wu
Département d’Informatique, INRIA and École Normale Supérieure, 45 rue d’Ulm, 75005, Paris, France
Albert Cohen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, F., Wu, Q., Tan, Y., Wei, L., Shao, L., Gao, L. (2013). A Vectorized K-Means Algorithm for Intel Many Integrated Core Architecture. In: Wu, C., Cohen, A. (eds) Advanced Parallel Processing Technologies. APPT 2013. Lecture Notes in Computer Science, vol 8299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45293-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-45293-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45292-5
Online ISBN: 978-3-642-45293-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Vectorized K-Means Algorithm for Intel Many Integrated Core Architecture

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Optimizing Yinyang K-Means Algorithm on ARMv8 Many-Core CPUs

Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Vectorized K-Means Algorithm for Intel Many Integrated Core Architecture

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Optimizing Yinyang K-Means Algorithm on ARMv8 Many-Core CPUs

Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation