Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Spatial hardware implementation for sparse graph algorithms in GraphStep

Published: 29 September 2011 Publication History

Abstract

How do we develop programs that are easy to express, easy to reason about, and able to achieve high performance on massively parallel machines? To address this problem, we introduce GraphStep, a domain-specific compute model that captures algorithms that act on static, irregular, sparse graphs. In GraphStep, algorithms are expressed directly without requiring the programmer to explicitly manage parallel synchronization, operation ordering, placement, or scheduling details. Problems in the sparse graph domain are usually highly concurrent and communicate along graph edges. Exposing concurrency and communication structure allows scheduling of parallel operations and management of communication that is necessary for performance on a spatial computer. We study the performance of a semantic network application, a shortest-path application, and a max-flow/min-cut application. We introduce a language syntax for GraphStep applications. The total speedup over sequential versions of the applications studied ranges from a factor of 19 to a factor of 15,000. Spatially-aware graph optimizations (e.g., node decomposition, placement and route scheduling) delivered speedups from 3 to 30 times over a spatially-oblivious mapping.

Supplementary Material

Delorimier (delorimier.zip)
Supplemental movie, appendix, image and software files for, Spatial Hardware Implementation for Sparse Graph Algorithms in GraphStep

References

[1]
Agha, G. 1998. ACTORS: A model of Concurrent Computation in Distributed Systems. MIT Press, Cambridge, MA.
[2]
Blelloch, G. E., Chatterjee, S., Hardwick, J. C., Sipelstein, J., and Zagha, M. 1993. Implementation of a portable nested data-parallel language. In Proceedings 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 102--111.
[3]
Boykov, Y., Veksler, O., and Zabih, R. 1998. Markov random fields with efficient approximations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648--655.
[4]
Brook Project. 2004. Brook project web page. http://brook.sourceforge.net.
[5]
Caldwell, A., Kahng, A., and Markov, I. 2000. Improved algorithms for hypergraph bipartitioning. In Proceedings of the Asia and South Pacific Design Automation Conference. 661--666.
[6]
Caspi, E., Chu, M., Huang, R., Weaver, N., Yeh, J., Wawrzynek, J., and DeHon, A. 2000. Stream computations organized for reconfigurable execution (SCORE): Extended abstract. In Proceedings of the International Conference on Field-Programmable Logic and Applications. Lecture Notes in Computer Science. Springer, 605--614.
[7]
Castanos, J. and Savage, J. 2000. Repartitioning unstructured adaptive meshes. In Proceedings of the Parallel and Distributed Processing Symposium. IEEE, 823--832.
[8]
Chandy, K. M. and Misra, J. 1982. Distributed computation on graphs: Shortest path algorithms. Comm. ACM 25, 11, 833--837.
[9]
Cormen, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. MIT Press, Cambridge, MA.
[10]
Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the Symposium on Operating System Design and Implementation. 137--150.
[11]
DeHon, A. 2000. Compact, multilayer layout for butterfly fat-tree. In Proceedings of the 12th ACM Symposium on Parallel Algorithms and Architectures (SPAA'00). ACM, 206--215.
[12]
DeHon, A., Huang, R., and Wawrzynek, J. 2006. Stochastic spatial routing for reconfigurable networks. J. Microprocess. Microsyst. 30, 6, 301--318.
[13]
deLorimier, M. and DeHon, A. 2005. Floating-point sparse matrix-vector multiply for FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 75--85.
[14]
deLorimier, M., Kapre, N., Mehta, N., Rizzo, D., Eslick, I., Rubin, R., Uribe, T. E., Knight, Jr., T. F., and DeHon, A. 2006. GraphStep: A system architecture for sparse-graph algorithms. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 143--151.
[15]
Fahlman, S. E. 1979. NETL: A System for Representing and Using Real-World Knowledge. MIT Press, Cambridge, MA.
[16]
Habata, S., Yokokawa, M., and Kitawaki, S. 2003. The earth simulator system. NEC Res. & Develop. 44, 1, 21--26.
[17]
Hestenes, M. R. and Stiefel, E. 1952. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stand. 49, 6, 409--436.
[18]
Hillis, W. D. 1985. The Connection Machine. MIT Press, Cambridge, MA.
[19]
Hillis, W. D. and Steele, G. L. 1986. Data parallel algorithms. Comm. ACM 29, 12, 1170--1183.
[20]
Kahn, G. 1974. The semantics of a simple language for parallel programming. In Proceedings of the IFIP CONGRESS 74. North-Holland Publishing Company, 471--475.
[21]
Kapre, N. and DeHon, A. 2009. Parallelizing sparse matrix solve for SPICE circuit simulation using FPGAs. In Proceedings of the International Conference on Field-Programmable Technology. IEEE, 190--198.
[22]
Kapre, N., Mehta, N., deLorimier, M., Rubin, R., Barnor, H., Wilson, M. J., Wrighton, M., and DeHon, A. 2006. Packet-Switched vs. time-multiplexed FPGA overlay networks. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, 205--213.
[23]
Karypis, G. and Kumar, V. 1999. A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 1.
[24]
Kildall, G. A. 1973. A unified approach to global program optimization. In Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL'73). ACM Press, New York, 194--206.
[25]
Kim, J.-T. and Moldovan, D. I. 1993. Classification and retrieval of knowledge on a parallel marker-passing architecture. IEEE Trans. Knowl. Data Engin. 5, 5, 753--761.
[26]
Koelbel, C. H., Loveman, D. B., Schreiber, R. S., Guy L. Steele, J., and Zosel, M. E. 1994. The High Performance Fortran Handbook. MIT Press, Cambridge, MA.
[27]
Kolmogorov, V. and Zabih, R. 2001. Computing visual correspondence with occlusions using graph cuts. In Proceedings of the IEEE International Conference on Computer Vision. Vol. 2. 508--515.
[28]
Landman, B. S. and Russo, R. L. 1971. On pin versus block relationship for partitions of logic circuits. IEEE Trans. Comput. 20, 1469--1479.
[29]
Lee, E. 2005. UC Berkley ptolemy project. http://www.ptolemy.eecs.berkeley.edu/.
[30]
Lee, E. A. and Messerschmitt, D. G. 1987. Synchronous data flow. Proc. IEEE 75, 9, 1235--1245.
[31]
Leiserson, C., Rose, F., and Saxe, J. 1983. Optimizing synchronous circuitry by retiming. In Proceedings of the 3rd Caltech Conference On VLSI.
[32]
Leiserson, C. E. 1985. Fat-Trees: Universal networks for hardware efficient supercomputing. IEEE Trans. Comput. C-34, 10, 892--901.
[33]
Lieberman, H. 1987. Concurrent Object-Oriented Programming in Act 1. MIT Press, Cambridge, MA.
[34]
Lindholm, E., Nickolls, J., Oberman, S., and Montrym, J. 2008. Nvidia tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55.
[35]
Liu, H. and Singh, P. 2004. Conceptnet -- A practical commonsense reasoning tool-kit. BT Tech. J. 22, 4, 211.
[36]
Logemann, G., Loveland, D., and Davis, M. 1962. A machine program for theorem proving. Comm. ACM 5, 7, 394--397.
[37]
Marques-Silva, J. P. and Sakallah, K. A. 1999. GRASP: A search algorithm for propositional satisfiability. IEEE Trans. Comput. 48, 5, 506--521.
[38]
Microsystems, S. 1995. The java language environment. White paper. http://java.sun.com/docs/white/langenv/.
[39]
Pierce, B. C. 2002. Types and Programming Languages. MIT Press, Cambridge, MA.
[40]
Shah, N., Plishker, W., Ravindran, K., and Keutzer, K. 2004. NP-Click: A productive software development approach for network processors. IEEE Micro 24, 5, 45--54.
[41]
Valiant, L. G. 1990. A bridging model for parallel computation. Comm. ACM 33, 8, 103--111.
[42]
Wrighton, M. and DeHon, A. 2003. Hardware-assisted simulated annealing with application for fast FPGA placement. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 33--42.

Cited By

View all
  • (2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
  • (2019)GraVF-MACM Transactions on Reconfigurable Technology and Systems10.1145/335759612:4(1-28)Online publication date: 21-Nov-2019
  • (2019)On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs2019 29th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2019.00020(67-73)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 6, Issue 3
September 2011
150 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/2019583
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2011
Accepted: 01 June 2010
Revised: 01 April 2010
Received: 01 August 2009
Published in TAAS Volume 6, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Spatial computing
  2. compute model
  3. graph algorithm
  4. graphStep
  5. parallel programming

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data10.3389/fdata.2022.7874215Online publication date: 12-Apr-2022
  • (2019)GraVF-MACM Transactions on Reconfigurable Technology and Systems10.1145/335759612:4(1-28)Online publication date: 21-Nov-2019
  • (2019)On-The-Fly Parallel Data Shuffling for Graph Processing on OpenCL-Based FPGAs2019 29th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2019.00020(67-73)Online publication date: Sep-2019
  • (2018)An FPGA framework for edge-centric graph processingProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203233(69-77)Online publication date: 8-May-2018
  • (2017)Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2017.25(137-144)Online publication date: Oct-2017
  • (2016)Impact of Parallelism and Memory Architecture on FPGA Communication EnergyACM Transactions on Reconfigurable Technology and Systems10.1145/28570579:4(1-23)Online publication date: 22-Aug-2016
  • (2016)Survey of domain-specific languages for FPGA computing2016 26th International Conference on Field Programmable Logic and Applications (FPL)10.1109/FPL.2016.7577380(1-12)Online publication date: Aug-2016
  • (2016)High-Throughput and Energy-Efficient Graph Processing on FPGA2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM.2016.35(103-110)Online publication date: May-2016
  • (2016)Optimizing large knowledge networks in spatial computersThe Knowledge Engineering Review10.1017/S026988891600018731:4(367-390)Online publication date: 7-Dec-2016
  • (2016)A Review of Image Interest Point Detectors: From Algorithms to FPGA Hardware ImplementationsImage Feature Detectors and Descriptors10.1007/978-3-319-28854-3_3(47-74)Online publication date: 23-Feb-2016
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media