Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU

Published: 11 November 2019 Publication History

Abstract

Tensors have drawn a growing attention in many applications, such as physics, engineering science, social networks, recommended systems. Tensor decomposition is the key to explore the inherent intrinsic data relationship of tensor. There are many sparse tensor and vector multiplications (SpTV) in tensor decomposition. We analyze a variety of storage formats of sparse tensors and develop a piecewise compression strategy to improve the storage efficiency of large sparse tensors. This compression strategy can avoid storing a large number of empty slices and empty fibers in sparse tensors, and thus the storage space is significantly reduced. A parallel algorithm for the SpTV based on the high-order compressed format based on slices is designed to greatly improve its computing performance on graphics processing unit. Each tensor is cut into multiple slices to form a series of sparse matrix and vector multiplications, which form the pipelined parallelism. The transmission time of the slices can be hidden through pipelined parallel to further optimize the performance of the SpTV.

References

[1]
C. J. Appellof and E. R. Davidson. 1983. Three-dimensional rank annihilation for multi-component determinations. Analytica Chimica Acta 146, FEB (1983), 9--14.
[2]
Brett W. Bader and Tamara G. Kolda. 2015. MATLAB Tensor Toolbox Version 2.6. Retrieved from http://www.sandia.gov/tgkolda/TensorToolbox/.
[3]
David E. Booth. 2004. Multi-way analysis: Applications in the chemical sciences. Technometrics 47, 4 (2004), 518--519.
[4]
Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktäschel, and Andreas Vlachos. 2015. Matrix and tensor factorization methods for natural language processing. In Tutorials. 16--18.
[5]
Rasmus Bro. 1997. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems 38, 2 (1997), 149--171.
[6]
Rasmus Bro. 1998. Multi-way analysis in the food industry. Models, algorithms, and applications. Ethical Theory and Moral Practice 6, 2 (1998), 231--235.
[7]
Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. Second workshop on information heterogeneity and fusion in recommender systems (HetRec 2011). In 5th ACM Conference on Recommender Systems (RecSys’11). ACM, New York, NY.
[8]
J. Douglas Carroll and Jih Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of Eckart-Young decomposition. Psychometrika 35, 3 (1970), 283--319.
[9]
Raymond B. Cattell. 1944. Parallel proportional profiles and other principles for determining the choice of factors by rotation. Psychometrika 9, 4 (1944), 267--283.
[10]
Joon Hee Choi and S. V. N. Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In Advances in Neural Information Processing Systems. 1296--1304.
[11]
F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens datasets. ACM Transactions on Interactive Intelligent Systems 5, 4 (2016), 1--19.
[12]
R. A. Harshman. 1970. Foundations of the PARAFAC Procedure: Models and Conditions for An “Explanatory” Multimode Factor Analysis (Working Papers in Phonetics No. 16). University of California, Los Angeles.
[13]
So Hirata. 2003. Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. Journal of Physical Chemistry A 107, 46 (2003), 9887--9897.
[14]
Frank L. Hitchcock. 1927. The expression of a tensor or a polyadic as a sum of products. Studies in Applied Mathematics 6, 1--4 (1927), 164--189.
[15]
Richard Vuduc Jiajia Li, Yuchen Ma. 2017. ParTI! : A Parallel Tensor Infrastructure. Retrieved from https://github.com/hpcgarage/ParTI.
[16]
U. Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. GigaTensor: Scaling tensor analysis up by 100 times - algorithms and discoveries. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 316--324.
[17]
C. T. King, W. H. Chou, and L. M. Ni. 1990. Pipelined data parallel algorithms-I: Concept and modeling. IEEE Transactions on Parallel and Distributed Systems 1, 4 (1990), 470--485.
[18]
Donald E. Knuth. 1973. The Art of Computer Programming: Seminumerical Algorithms. Pearson Schweiz AG.
[19]
Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455--500.
[20]
L. De Lathauwer and B. De Moor. 1998. From matrix to tensor: Multilinear algebra and signal processing. In 4th IMA International Conference on Mathematics in Signal Processing. 1--15.
[21]
Jiajia Li, Yuchen Ma, Chenggang Yan, and Richard Vuduc. 2017. Optimizing sparse tensor times matrix on multi-core and many-core architectures. In Irregular Applications: Architecture and Algorithms. 26--33.
[22]
Kenli Li, Wangdong Yang, and Keqin Li. 2015. Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Transactions on Parallel and Distributed Systems 26, 1 (2015), 196--205.
[23]
Bangtian Liu, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. 2017. A unified optimization approach for sparse tensor operations on GPUs. In IEEE International Conference on Cluster Computing (CLUSTER’17). 47--57.
[24]
Fishman Matthew, E. Miles Stoudenmire, and Steven R. White. 2017. ITensor. Retrieved from http://itensor.org/index.html.
[25]
Makoto Nakatsuji, Qingpeng Zhang, Xiaohui Lu, Bassem Makni, and James A. Hendler. 2017. Semantic social network analysis by cross-domain tensor factorization. IEEE Transactions on Computational Social Systems 4, 4 (2017), 207--217.
[26]
NVIDIA. 2014. NVIDIA CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. pp 31--36, Version 6.0. Accessed 18 May 2014.
[27]
Ivan Osorio and Naresh C. Bhavaraju. 2013. Stimulation methodologies and apparatus for control of brain states. Flint Hills Scientific, LLC, United States. http://www.freepatentsonline.com/8588898.html.
[28]
Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D. Sidiropoulos. 2012. ParCube: Sparse parallelizable tensor decompositions. ACM Transactions on Knowledge Discovery from Data 10, 1 (2012), 521--536.
[29]
Roman Poya, Antonio J. Gil, and Rogelio Ortigosa. 2017. A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics. Computer Physics Communications 216 (2017), 35--52.
[30]
Liqun Qi, Wenyu Sun, and Yiju Wang. 2007. Numerical multilinear algebra and its applications. Frontiers of Mathematics in China 2, 4 (2007), 501--526.
[31]
Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 727--736.
[32]
Nicholas D. Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E. Papalexakis, and Christos Faloutsos. 2017. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing 65, 13 (2017), 3551--3582.
[33]
N. D. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos. 2014b. A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP). In IEEE International Conference on Acoustics, Speech and Signal Processing. 1--5.
[34]
N. D. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos. 2014a. Parallel randomly compressed cubes: A scalable distributed architecture for big tensor decomposition. IEEE Signal Processing Magazine 31, 5 (2014), 57--70.
[35]
Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In IEEE International Parallel and Distributed Processing Symposium. 61--70.
[36]
Edgar Solomonik, Devin Matthews, Jeff Hammond, and James Demmel. 2013. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In IEEE International Symposium on Parallel and Distributed Processing. 813--824.
[37]
P. A. Tew. 2016. An Investigation of Sparse Tensor Formats for Tensor Libraries. Retrieved from http://groups.csail.mit.edu/commit/papers/2016/parker-thesis.pdf.
[38]
Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279--311.
[39]
Sorber L. Van Barel M. Vervliet N., Debals O. and De Lathauwer L. 2016. MATLAB Tensorlab Version 3.0. Retrieved from https://www.tensorlab.net/.
[40]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. https://arxiv.org/abs/1802.04730.
[41]
Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Bradley A. Malin, and Jimeng Sun. 2015. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1265--1274.
[42]
Wangdong Yang, Kenli Li, and Keqin Li. 2018. A parallel computing method using blocked format with optimal partitioning for SpMV on GPU. Journal of Computer and System Sciences 92, 1 (2018), 152--170.
[43]
Wangdong Yang, Kenli Li, Zeyao Mo, and Keqin Li. 2015. Performance optimization using partitioned SpMV on GPUs and multicore CPUs. IEEE Transactions on Computers 64, 9 (2015), 2623--2636.
[44]
Wu Yanzhao, Cao Wenqi, Sahin Semih, and Liu Ling. 2018. Experimental characterizations and analysis of deep learning frameworks. In Proceedings IEEE International Conference on Big Data (Big Data’18). 372--377.

Cited By

View all
  • (2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024
  • (2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: 19-Apr-2024
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 13, Issue 6
December 2019
282 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3366748
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2019
Accepted: 01 September 2019
Revised: 01 July 2019
Received: 01 May 2018
Published in TKDD Volume 13, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pipeline
  2. slices
  3. tensor
  4. tensor and vector multiplication

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Key Program of National Natural Science Foundation of China
  • National Key R8D Program of China
  • Natural Science Foundation of Hunan Province, China
  • National Natural Science Foundation of China
  • National Outstanding Youth Science Program of National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)3
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024
  • (2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: 19-Apr-2024
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • (2023)A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCNACM Transactions on Parallel Computing10.1145/358437310:2(1-23)Online publication date: 20-Jun-2023
  • (2023)IAP-SpTV: An input-aware adaptive pipeline SpTV via GCN on CPU-GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104741181(104741)Online publication date: Nov-2023
  • (2023)Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systemsNeural Computing and Applications10.1007/s00521-023-08993-035:33(23987-23996)Online publication date: 12-Sep-2023
  • (2022)An Experimental Study on the Lateral Stress of Composite Steel Wall Structure by Using Self‐Compacting ConcreteAdvances in Civil Engineering10.1155/2022/77725562022:1Online publication date: 29-Mar-2022
  • (2022)GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00080(380-387)Online publication date: Dec-2022
  • (2022)Implementation of Modified Multi-Objective Particle Swarm Optimization to multi-machine power system stabilityJournal of Cleaner Production10.1016/j.jclepro.2022.132664365(132664)Online publication date: Sep-2022
  • (2022)Exploiting multi-level parallel metaheuristics and heterogeneous computing to boost phylogeneticsFuture Generation Computer Systems10.1016/j.future.2021.09.011127:C(208-224)Online publication date: 1-Feb-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media