research-article

A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU

Authors:

Keqin LiAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 13, Issue 6

Article No.: 63, Pages 1 - 27

https://doi.org/10.1145/3363575

Published: 11 November 2019 Publication History

Abstract

Tensors have drawn a growing attention in many applications, such as physics, engineering science, social networks, recommended systems. Tensor decomposition is the key to explore the inherent intrinsic data relationship of tensor. There are many sparse tensor and vector multiplications (SpTV) in tensor decomposition. We analyze a variety of storage formats of sparse tensors and develop a piecewise compression strategy to improve the storage efficiency of large sparse tensors. This compression strategy can avoid storing a large number of empty slices and empty fibers in sparse tensors, and thus the storage space is significantly reduced. A parallel algorithm for the SpTV based on the high-order compressed format based on slices is designed to greatly improve its computing performance on graphics processing unit. Each tensor is cut into multiple slices to form a series of sparse matrix and vector multiplications, which form the pipelined parallelism. The transmission time of the slices can be hidden through pipelined parallel to further optimize the performance of the SpTV.

References

[1]

C. J. Appellof and E. R. Davidson. 1983. Three-dimensional rank annihilation for multi-component determinations. Analytica Chimica Acta 146, FEB (1983), 9--14.

[2]

Brett W. Bader and Tamara G. Kolda. 2015. MATLAB Tensor Toolbox Version 2.6. Retrieved from http://www.sandia.gov/tgkolda/TensorToolbox/.

[3]

David E. Booth. 2004. Multi-way analysis: Applications in the chemical sciences. Technometrics 47, 4 (2004), 518--519.

[4]

Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktäschel, and Andreas Vlachos. 2015. Matrix and tensor factorization methods for natural language processing. In Tutorials. 16--18.

[5]

Rasmus Bro. 1997. PARAFAC. Tutorial and applications. Chemometrics and Intelligent Laboratory Systems 38, 2 (1997), 149--171.

[6]

Rasmus Bro. 1998. Multi-way analysis in the food industry. Models, algorithms, and applications. Ethical Theory and Moral Practice 6, 2 (1998), 231--235.

[7]

Iván Cantador, Peter Brusilovsky, and Tsvi Kuflik. 2011. Second workshop on information heterogeneity and fusion in recommender systems (HetRec 2011). In 5th ACM Conference on Recommender Systems (RecSys’11). ACM, New York, NY.

[8]

J. Douglas Carroll and Jih Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an n-way generalization of Eckart-Young decomposition. Psychometrika 35, 3 (1970), 283--319.

[9]

Raymond B. Cattell. 1944. Parallel proportional profiles and other principles for determining the choice of factors by rotation. Psychometrika 9, 4 (1944), 267--283.

[10]

Joon Hee Choi and S. V. N. Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In Advances in Neural Information Processing Systems. 1296--1304.

[11]

F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens datasets. ACM Transactions on Interactive Intelligent Systems 5, 4 (2016), 1--19.

Digital Library

[12]

R. A. Harshman. 1970. Foundations of the PARAFAC Procedure: Models and Conditions for An “Explanatory” Multimode Factor Analysis (Working Papers in Phonetics No. 16). University of California, Los Angeles.

[13]

So Hirata. 2003. Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. Journal of Physical Chemistry A 107, 46 (2003), 9887--9897.

[14]

Frank L. Hitchcock. 1927. The expression of a tensor or a polyadic as a sum of products. Studies in Applied Mathematics 6, 1--4 (1927), 164--189.

[15]

Richard Vuduc Jiajia Li, Yuchen Ma. 2017. ParTI! : A Parallel Tensor Infrastructure. Retrieved from https://github.com/hpcgarage/ParTI.

[16]

U. Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. GigaTensor: Scaling tensor analysis up by 100 times - algorithms and discoveries. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 316--324.

Digital Library

[17]

C. T. King, W. H. Chou, and L. M. Ni. 1990. Pipelined data parallel algorithms-I: Concept and modeling. IEEE Transactions on Parallel and Distributed Systems 1, 4 (1990), 470--485.

Digital Library

[18]

Donald E. Knuth. 1973. The Art of Computer Programming: Seminumerical Algorithms. Pearson Schweiz AG.

[19]

Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455--500.

Digital Library

[20]

L. De Lathauwer and B. De Moor. 1998. From matrix to tensor: Multilinear algebra and signal processing. In 4th IMA International Conference on Mathematics in Signal Processing. 1--15.

[21]

Jiajia Li, Yuchen Ma, Chenggang Yan, and Richard Vuduc. 2017. Optimizing sparse tensor times matrix on multi-core and many-core architectures. In Irregular Applications: Architecture and Algorithms. 26--33.

[22]

Kenli Li, Wangdong Yang, and Keqin Li. 2015. Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Transactions on Parallel and Distributed Systems 26, 1 (2015), 196--205.

[23]

Bangtian Liu, Chengyao Wen, Anand D. Sarwate, and Maryam Mehri Dehnavi. 2017. A unified optimization approach for sparse tensor operations on GPUs. In IEEE International Conference on Cluster Computing (CLUSTER’17). 47--57.

[24]

Fishman Matthew, E. Miles Stoudenmire, and Steven R. White. 2017. ITensor. Retrieved from http://itensor.org/index.html.

[25]

Makoto Nakatsuji, Qingpeng Zhang, Xiaohui Lu, Bassem Makni, and James A. Hendler. 2017. Semantic social network analysis by cross-domain tensor factorization. IEEE Transactions on Computational Social Systems 4, 4 (2017), 207--217.

[26]

NVIDIA. 2014. NVIDIA CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. pp 31--36, Version 6.0. Accessed 18 May 2014.

[27]

Ivan Osorio and Naresh C. Bhavaraju. 2013. Stimulation methodologies and apparatus for control of brain states. Flint Hills Scientific, LLC, United States. http://www.freepatentsonline.com/8588898.html.

[28]

Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D. Sidiropoulos. 2012. ParCube: Sparse parallelizable tensor decompositions. ACM Transactions on Knowledge Discovery from Data 10, 1 (2012), 521--536.

[29]

Roman Poya, Antonio J. Gil, and Rogelio Ortigosa. 2017. A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics. Computer Physics Communications 216 (2017), 35--52.

[30]

Liqun Qi, Wenyu Sun, and Yiju Wang. 2007. Numerical multilinear algebra and its applications. Frontiers of Mathematics in China 2, 4 (2007), 501--526.

[31]

Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 727--736.

Digital Library

[32]

Nicholas D. Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E. Papalexakis, and Christos Faloutsos. 2017. Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing 65, 13 (2017), 3551--3582.

Digital Library

[33]

N. D. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos. 2014b. A parallel algorithm for big tensor decomposition using randomly compressed cubes (PARACOMP). In IEEE International Conference on Acoustics, Speech and Signal Processing. 1--5.

[34]

N. D. Sidiropoulos, E. E. Papalexakis, and C. Faloutsos. 2014a. Parallel randomly compressed cubes: A scalable distributed architecture for big tensor decomposition. IEEE Signal Processing Magazine 31, 5 (2014), 57--70.

[35]

Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In IEEE International Parallel and Distributed Processing Symposium. 61--70.

Digital Library

[36]

Edgar Solomonik, Devin Matthews, Jeff Hammond, and James Demmel. 2013. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In IEEE International Symposium on Parallel and Distributed Processing. 813--824.

[37]

P. A. Tew. 2016. An Investigation of Sparse Tensor Formats for Tensor Libraries. Retrieved from http://groups.csail.mit.edu/commit/papers/2016/parker-thesis.pdf.

[38]

Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279--311.

[39]

Sorber L. Van Barel M. Vervliet N., Debals O. and De Lathauwer L. 2016. MATLAB Tensorlab Version 3.0. Retrieved from https://www.tensorlab.net/.

[40]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. https://arxiv.org/abs/1802.04730.

[41]

Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Bradley A. Malin, and Jimeng Sun. 2015. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1265--1274.

Digital Library

[42]

Wangdong Yang, Kenli Li, and Keqin Li. 2018. A parallel computing method using blocked format with optimal partitioning for SpMV on GPU. Journal of Computer and System Sciences 92, 1 (2018), 152--170.

Digital Library

[43]

Wangdong Yang, Kenli Li, Zeyao Mo, and Keqin Li. 2015. Performance optimization using partitioned SpMV on GPUs and multicore CPUs. IEEE Transactions on Computers 64, 9 (2015), 2623--2636.

Digital Library

[44]

Wu Yanzhao, Cao Wenqi, Sahin Semih, and Liu Ling. 2018. Experimental characterizations and analysis of deep learning frameworks. In Proceedings IEEE International Conference on Big Data (Big Data’18). 372--377.

Cited By

Kanakagiri RSolomonik EAgrawal KPetrank E(2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659985
Xiao GYin CChen YDuan MLi K(2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: 19-Apr-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3391254
Xiao GYin CZhou TLi XChen YLi K(2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3604606
Show More Cited By

Index Terms

A Pipeline Computing Method of SpTV for Three-Order Tensors on CPU and GPU

Recommendations

Best sparse rank-1 approximation to higher-order tensors via a truncated exponential induced regularizer
Highlights
- The best sparse tensor best rank-1 approximation problem is considered, with a truncated exponential induced regularizer introduced to encourage the sparsity,...
Abstract
Best sparse tensor rank-1 approximation consists of finding a projection of a given data tensor onto the set of sparse rank-1 tensors, which is important in sparse tensor decomposition and related problems. Existing models used ℓ 0 or ...
Tensor compressed video sensing reconstruction by combination of fractional-order total variation and sparsifying transform

High reconstructed performance compressed video sensing (CVS) with low computational complexity and memory requirement is very challenging. In order to reconstruct the high quality video frames with low computational complexity, this paper proposes a ...
Several approximation algorithms for sparse best rank-1 approximation to higher-order tensors
Abstract
Sparse tensor best rank-1 approximation (BR1Approx), which is a sparsity generalization of the dense tensor BR1Approx, and is a higher-order extension of the sparse matrix BR1Approx, is one of the most important problems in sparse tensor ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 13, Issue 6

December 2019

282 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3366748

Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2019

Accepted: 01 September 2019

Revised: 01 July 2019

Received: 01 May 2018

Published in TKDD Volume 13, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Key Program of National Natural Science Foundation of China
National Key R8D Program of China
Natural Science Foundation of Hunan Province, China
National Natural Science Foundation of China
National Outstanding Youth Science Program of National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
229
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)3

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kanakagiri RSolomonik EAgrawal KPetrank E(2024)Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor NetworkProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659985(169-181)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659985
Xiao GYin CChen YDuan MLi K(2024)Efficient Utilization of Multi-Threading Parallelism on Heterogeneous Systems for Sparse Tensor ContractionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.339125435:6(1044-1055)Online publication date: 19-Apr-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3391254
Xiao GYin CZhou TLi XChen YLi K(2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3604606
Wang HYang WOuyang RHu RLi KLi K(2023)A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCNACM Transactions on Parallel Computing10.1145/358437310:2(1-23)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3584373
Wang HYang WHu ROuyang RLi KLi K(2023)IAP-SpTV: An input-aware adaptive pipeline SpTV via GCN on CPU-GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104741181(104741)Online publication date: Nov-2023
https://doi.org/10.1016/j.jpdc.2023.104741
Fan ZTang LLi SLiu R(2023)Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systemsNeural Computing and Applications10.1007/s00521-023-08993-035:33(23987-23996)Online publication date: 12-Sep-2023
https://dl.acm.org/doi/10.1007/s00521-023-08993-0
Wu WLi XKadaei S(2022)An Experimental Study on the Lateral Stress of Composite Steel Wall Structure by Using Self‐Compacting ConcreteAdvances in Civil Engineering10.1155/2022/77725562022:1Online publication date: 29-Mar-2022
https://doi.org/10.1155/2022/7772556
Xiao GYin CChen YDuan MLi K(2022)GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00080(380-387)Online publication date: Dec-2022
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00080
Zhu KChen ZZong LMetwally AAli SMohammed AJaszczur M(2022)Implementation of Modified Multi-Objective Particle Swarm Optimization to multi-machine power system stabilityJournal of Cleaner Production10.1016/j.jclepro.2022.132664365(132664)Online publication date: Sep-2022
https://doi.org/10.1016/j.jclepro.2022.132664
Santander-Jiménez SVega-Rodríguez MSousa L(2022)Exploiting multi-level parallel metaheuristics and heterogeneous computing to boost phylogeneticsFuture Generation Computer Systems10.1016/j.future.2021.09.011127:C(208-224)Online publication date: 1-Feb-2022
https://dl.acm.org/doi/10.1016/j.future.2021.09.011
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents