Almási et al., 2004 - Google Patents

Implementing MPI on the BlueGene/L supercomputer

Almási et al., 2004

Document ID: 1469916293074940355
Author: Almási G; Archer C; Castanos J; Erway C; Heidelberger P; Martorell X; Moreira J; Pinnow K; Ratterman J; Smeds N; Steinmacher-Burow B; Gropp W; Toonen B
Publication year: 2004
Publication venue: Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference, Pisa, Italy, August 31-September 3, 2004. Proceedings 10

External Links

Cited by

Snippet

The BlueGene/L supercomputer will consist of 65,536 dual-processor compute nodes interconnected by two high-speed networks: a three-dimensional torus network and a tree topology network. Each compute node can only address its own local memory, making …

Continue reading at www.academia.edu (PDF) (other versions)

238000004891 communication 0 abstract description 12

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17368—Indirect interconnection networks non hierarchical topologies
- G06F15/17381—Two dimensional, e.g. mesh, torus
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17337—Direct connection machines, e.g. completely connected computers, point to point communication networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogramme communication; Intertask communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1642—Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing

Similar Documents

Publication	Publication Date	Title
US10887238B2 (en)	2021-01-05	High performance, scalable multi chip interconnect
Almási et al.	2005	Design and implementation of message-passing services for the Blue Gene/L supercomputer
Derradji et al.	2015	The BXI interconnect architecture
US8032892B2 (en)	2011-10-04	Message passing with a limited number of DMA byte counters
Petrini et al.	2002	The Quadrics network: High-performance clustering technology
Petrini et al.	2003	Performance evaluation of the quadrics interconnection network
US7788334B2 (en)	2010-08-31	Multiple node remote messaging
US7886084B2 (en)	2011-02-08	Optimized collectives using a DMA on a parallel computer
Almási et al.	2004	Implementing MPI on the BlueGene/L supercomputer
EP1615138A2 (en)	2006-01-11	Multiprocessor chip having bidirectional ring interconnect
US8756270B2 (en)	2014-06-17	Collective acceleration unit tree structure
TWI547870B (en)	2016-09-01	Method and system for ordering i/o access in a multi-node environment
US20090006296A1 (en)	2009-01-01	Dma engine for repeating communication patterns
Tipparaju et al.	2004	Host-assisted zero-copy remote memory access communication on infiniband
Papadopoulou et al.	2017	A performance study of UCX over InfiniBand
Muthukrishnan et al.	2023	Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems
US11552907B2 (en)	2023-01-10	Efficient packet queueing for computer networks
Sack et al.	2015	Collective algorithms for multiported torus networks
Gao et al.	2010	Impact of reconfigurable hardware on accelerating mpi_reduce
Suresh et al.	2022	Network assisted non-contiguous transfers for GPU-aware MPI libraries
Afsahi et al.	2000	Efficient communication using message prediction for cluster of multiprocessors
Thorson et al.	2012	SGI® UV2: A fused computation and data analysis machine
Nüssle et al.	2013	Accelerate communication, not computation!
Dhanraj	2012	Enhancement of LiMIC-Based Collectives for Multi-core Clusters
Almási et al.	2004	Architecture and performance of the BlueGene/L message layer