Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis

Published: 26 January 2017 Publication History

Abstract

The four-index integral transform is a fundamental and computationally demanding calculation used in many computational chemistry suites such as NWChem. It transforms a four-dimensional tensor from one basis to another. This transformation is most efficiently implemented as a sequence of four tensor contractions that each contract a four- dimensional tensor with a two-dimensional transformation matrix. Differing degrees of permutation symmetry in the intermediate and final tensors in the sequence of contractions cause intermediate tensors to be much larger than the final tensor and limit the number of electronic states in the modeled systems.
Loop fusion, in conjunction with tiling, can be very effective in reducing the total space requirement, as well as data movement. However, the large number of possible choices for loop fusion and tiling, and data/computation distribution across a parallel system, make it challenging to develop an optimized parallel implementation for the four-index integral transform. We develop a novel approach to address this problem, using lower bounds modeling of data movement complexity. We establish relationships between available aggregate physical memory in a parallel computer system and ineffective fusion configurations, enabling their pruning and consequent identification of effective choices and a characterization of optimality criteria. This work has resulted in the development of a significantly improved implementation of the four-index transform that enables higher performance and the ability to model larger electronic systems than the current implementation in the NWChem quantum chemistry software suite.

References

[1]
ACES II, a program product of the quantum theory project. See http://www.qtp.ufl.edu/aces/, 1996.
[2]
The massively parallel quantum chemistry program (MPQC). http://www.mpqc.org/index.php, 2004.
[3]
MOLPRO, a package of ab initio programs. See http://www.molpro. net, 2006.
[4]
Nwchem: A comprehensive and scalable open-source solution for large scale molecular simulations. See http://www.nwchem-sw. org/index.php, 2010.
[5]
Psi4, an open-source ab initio electronic structure program. See http://www.psicode.org/, 2012.
[6]
M. Abe, T. Yanai, T. Nakajima, and K. Hirao. A four-index transformation in dirac's four-component relativistic theory. Chem. Phys. Letters, 388 (1-3): 68--73, 2004.
[7]
G. Bilardi and E. Peserico. A characterization of temporal locality and its portability across memory hierarchies. Automata, Languages and Programming, pages 128--139, 2001.
[8]
L. A. Covick and K. M. Sando. Four-index transformation on distributed-memory parallel computers. J. Comp. Chem., 11 (10): 1151--1159, 1990.
[9]
J. Dongarra, J.-F. Pineau, Y. Robert, and F. Vivien. Matrix product on heterogeneous master-worker platforms. In PPoPP, pages 53--62, 2008.
[10]
G. Fletcher, M. Schmidt, and M. Gordon. Developments in parallel electronic structure theory. Adv. Chem. Phys., 110: 267--294, 1999.
[11]
T. R. Furlani and H. F. King. Implementation of a parallel direct scf algorithm on distributed memory computers. J. Comp. Chem., 16 (1): 91--104, 1995.
[12]
X. Gao, S. Krishnamoorthy, S. K. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan. Efficient search-space pruning for integrated fusion and tiling transformations. CCPE, 19 (18): 2425--2443, 2007.
[13]
J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC, pages 326--333, 1981.
[14]
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64 (9): 1017--1026, 2004.
[15]
C. Lam, T. Rauber, G. Baumgartner, D. Cociorva, and P. Sadayappan. Memory-optimal evaluation of expression trees involving large objects. Comp. Lang. Sys. Struc., 37 (2): 63--75.
[16]
A. C. Limaye and S. R. Gadre. A general parallel solution to the integral transformation and second-order Møller-Plesset energy evaluation on distributed memory parallel machines. J. Chem. Phys., 100 (2): 1303--1307, 1994.
[17]
W. Ma, S. Krishnamoorthy, and G. Agrawal. Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies. In CC 2011, pages 266--285, 2011.
[18]
ga1J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Aprà. Advances, applications and performance of the global arrays shared memory programming toolkit. Int. J. High Perform. Comput. Appl., 20 (2): 203--231, May 2006.
[19]
M. Pernpointner, L. Visscher, W. A. de Jong, and R. Broer. Parallelization of four-component calculations. i. integral generation, SCF, and four-index transformation in the Dirac-Fock package MOLFDIR. J. Comp. Chem., 21 (13): 1176--1186.
[20]
G. Rauhut, P. Pulay, and H.-J. Werner. Integral transformation with low-order scaling for large local second-order Møller-Plesset calculations. J. Comp. Chem., 19 (11): 1241--1254.
[21]
S. Sæbø and J. Almlöf. Avoiding the integral storage bottleneck in LCAO calculations of electron correlation. Chem. Phys. Let., 154 (1): 83 -- 89, 1989.
[22]
S. K. Sahoo, S. Krishnamoorthy, R. Panuganti, and P. Sadayappan. Integrated loop optimizations for data locality enhancement of tensor contraction expressions. In SC 2005.
[23]
M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, et al. General atomic and molecular electronic structure system. J. Comp. Chem., 14 (11): 1347--1363, 1993.
[24]
R. A. Whiteside, J. S. Binkley, M. E. Colvin, and H. F. Schaefer III. Parallel algorithms for quantum chemistry. i. integral transformations on a hypercube multiprocessor. J. Chem. Phys., 86 (4): 2185--2193, 1987.
[25]
S. Wilson. Four-index transformations. In Methods in Computational Chemistry, pages 251--309. Springer, 1987.
[26]
T. L. Windus, M. W. Schmidt, and M. S. Gordon. Parallel algorithm for integral transformations and GUGA MCSCF. Theoretica chimica acta, 89 (1): 77--88, 1994.
[27]
A. T. Wong, R. J. Harrison, and A. P. Rendell. Parallel direct four-index transformations. Th. Chim. Acta, 93 (6): 317--331.

Index Terms

  1. Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 52, Issue 8
    PPoPP '17
    August 2017
    442 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3155284
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      January 2017
      476 pages
      ISBN:9781450344937
      DOI:10.1145/3018743
    © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 January 2017
    Published in SIGPLAN Volume 52, Issue 8

    Check for updates

    Author Tags

    1. 4-index
    2. communication optimization
    3. distributed algorithm
    4. four-index
    5. fusion
    6. lower bounds
    7. optimal schedule
    8. optimizing 4-index transform
    9. parallel algorithm
    10. processor mapping
    11. scheduling
    12. tensor contraction
    13. tensors

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)101
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media