Almási et al., 2004 - Google Patents
Implementing MPI on the BlueGene/L supercomputerAlmási et al., 2004
View PDF- Document ID
- 1469916293074940355
- Author
- Almási G
- Archer C
- Castanos J
- Erway C
- Heidelberger P
- Martorell X
- Moreira J
- Pinnow K
- Ratterman J
- Smeds N
- Steinmacher-Burow B
- Gropp W
- Toonen B
- Publication year
- Publication venue
- Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference, Pisa, Italy, August 31-September 3, 2004. Proceedings 10
External Links
Snippet
The BlueGene/L supercomputer will consist of 65,536 dual-processor compute nodes interconnected by two high-speed networks: a three-dimensional torus network and a tree topology network. Each compute node can only address its own local memory, making …
- 238000004891 communication 0 abstract description 12
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G06F15/17381—Two dimensional, e.g. mesh, torus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1642—Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10887238B2 (en) | High performance, scalable multi chip interconnect | |
Almási et al. | Design and implementation of message-passing services for the Blue Gene/L supercomputer | |
Derradji et al. | The BXI interconnect architecture | |
US8032892B2 (en) | Message passing with a limited number of DMA byte counters | |
Petrini et al. | The Quadrics network: High-performance clustering technology | |
Petrini et al. | Performance evaluation of the quadrics interconnection network | |
US7788334B2 (en) | Multiple node remote messaging | |
US7886084B2 (en) | Optimized collectives using a DMA on a parallel computer | |
Almási et al. | Implementing MPI on the BlueGene/L supercomputer | |
EP1615138A2 (en) | Multiprocessor chip having bidirectional ring interconnect | |
US8756270B2 (en) | Collective acceleration unit tree structure | |
TWI547870B (en) | Method and system for ordering i/o access in a multi-node environment | |
US20090006296A1 (en) | Dma engine for repeating communication patterns | |
Tipparaju et al. | Host-assisted zero-copy remote memory access communication on infiniband | |
Papadopoulou et al. | A performance study of UCX over InfiniBand | |
Muthukrishnan et al. | Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems | |
US11552907B2 (en) | Efficient packet queueing for computer networks | |
Sack et al. | Collective algorithms for multiported torus networks | |
Gao et al. | Impact of reconfigurable hardware on accelerating mpi_reduce | |
Suresh et al. | Network assisted non-contiguous transfers for GPU-aware MPI libraries | |
Afsahi et al. | Efficient communication using message prediction for cluster of multiprocessors | |
Thorson et al. | SGI® UV2: A fused computation and data analysis machine | |
Nüssle et al. | Accelerate communication, not computation! | |
Dhanraj | Enhancement of LiMIC-Based Collectives for Multi-core Clusters | |
Almási et al. | Architecture and performance of the BlueGene/L message layer |