Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2048577.2048595acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

Implementing random indexing on GPU

Published: 03 April 2011 Publication History

Abstract

Vector space models have received a significant attention in recent years. They have been applied in a wide spectrum of areas including information filtering, information retrieval, document indexing and relevancy ranking. Random indexing is one of the methods employing distributional statistics of term co-occurrences to generate vector space models from a set of documents. If the size of the document collection is large, a significant computational power is required to compute the results.
This paper presents an efficient implementation of the random indexing method on GPU which allows efficient training on large datasets. It is only limited by the amount of memory available on the GPU. Various ways to overcome the dependence on the GPU memory are discussed. Speedups in magnitude of tens are achieved for training from random seed vectors, and even much higher figures for retraining. The implementation scales well with both the term vector dimension and the seed length.

References

[1]
G. Salton, A. Wong, C. S. Yang, "A Vector Space Model for Automatic Indexing", Communications of the ACM, vol. 18, nr. 11, pages 613--620, 1975
[2]
G. K. Zipf, "Human Behaviour and the Principle of Least Effort", Addison-Wesley, 1949
[3]
G. H. Golub and C. Reinsch, "Singular value decomposition and least squares solutions", Numerische Mathematik 14 (5): pp 403--420, 1970.
[4]
Magnus Sahlgren, "An Introduction to Random Indexing", Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, 2005
[5]
P. Kanerva, "Sparse distributed memory", The MIT Press, 1988
[6]
Sahlgren, M., Holst, A. & Kanerva, P., "Permutations as a Means to Encode Order in Word Space", Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci'08), July 23--26, Washington D. C., USA, 2008
[7]
Hecht-Nielsen, R.; "Context vectors; general purpose approximate meaning representations self-organized from raw data", in Zurada, J. M.; R. J. Marks II; C. J. Robinson, "Computational intelligence: imitating life". IEEE Press, 1994
[8]
Sheetal Lahabar, P. J. Narayanan, "Singular Value Decomposition on GPU using CUDA", IEEE International Parallel Distributed Processing Symposium, 2009
[9]
J. Krüger, R. Westermann, "Linear algebra operators for GPU implementation of numerical algorithms", proceeding of SIGGRAPH '05 ACM SIGGRAPH Courses, 2005
[10]
V. Volkov, J. Demmel, "LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs", 2008
[11]
D. Widdows, K. Ferraro, "Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application", In Proceedings of the sixth international conference on Language Resources and Evaluation, 2008
[12]
P. Kanerva, J. Kristofersson, and A. Holst, "Random indexing of text samples for latent semantic analysis", in Proceedings of the 22nd Annual Conference of the Cognitive Science Society, page 1036. Erlbaum, 2000
[13]
K. Lund, C. Burgess, "Producing high-dimensional semantic spaces from lexical co-occurrence", Behavior Research Methods, Instruments, & Computers, 28, pages 203--208, 1996
[14]
"The Bible, King James Version Complete Contents", available online at http://www.gutenberg.org/ebooks/7999, 2004
[15]
"NVIDIA OpenCL Programming Guide", available online at http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_OpenCL_ProgrammingGuide.pdf, 2010
[16]
Stroustrup, Bjarne, "The C++ Programming Language, Third Edition", Addison-Wesley, ISBN 0-201-88954-4, 1997
[17]
E. J. O'Neil, P. E. O'Neil, G. Weikum, "The LRU-K page replacement algorithm for database disk buffering", In Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 297--306, 1993
[18]
"NVIDIA CUBLAS User Guide", available online at http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUBLAS_Library.pdf, 2010
[19]
D. Graff, C. Cieri, "English Gigaword", Linguistic Data Consortium, Philadelphia, 2003
[20]
"European language lemmatizer", availbale online at http://lemmatizer.org, 2010
[21]
M. McCandless, E. Hatcher, O. Gospodnetic, "Lucene in Action, Second Edition", Manning Publications Co., ISBN 1-933-98817-7, pages 328--332, 2010
[22]
W. B. Langdon, "PRNG Random Numbers on GPU", 2007
[23]
Wai-Man Pang; Tien-Tsin Wong; Pheng-Ann Heng, "Generating massive high-quality random numbers using GPU", Evolutionary Computation, CEC 2008, 2008
[24]
A. Zafar, M. Olano, "Tiny encryption algorithm for parallel random numbers on the GPU", Proceedings of the 2009 symposium on Interactive 3D graphics and games, 2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPC '11: Proceedings of the 19th High Performance Computing Symposia
April 2011
193 pages

Sponsors

  • SCS: Society for Modeling and Simulation International

In-Cooperation

Publisher

Society for Computer Simulation International

San Diego, CA, United States

Publication History

Published: 03 April 2011

Check for updates

Author Tags

  1. GPGPU
  2. term co-occurrence
  3. word space models

Qualifiers

  • Research-article

Conference

SpringSim '11
Sponsor:
  • SCS
SpringSim '11: 2011 Spring Simulation Multi-conference
April 3 - 7, 2011
Massachusetts, Boston

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 153
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media