Article

Free access

Vector multiprocessors with arbitrated memory access

Authors:

Eduard Ayguadé,

Tomás LangAuthors Info & Claims

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

Pages 243 - 252

https://doi.org/10.1145/223982.224435

Published: 01 May 1995 Publication History

Abstract

The high latency of memory accesses is one of the factors that most contribute to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in the case of multiprocessors make that the execution time of applications increases significantly. In this work we propose a memory access method that for both cases of vector uniprocessors and multiprocessors allows to perform stream accesses with the smallest possible latency in the majority of the cases. The basic idea is to arbitrate the memory access by defining the order in which the memory modules are visited. The stream elements are requested out of order. In addition, the access method also reduces the cost of the interconnection network.

References

[1]

R M. Kogge, "The Architecture of Pipelined Computers", McGraw-Hill, New York, 1981.

[2]

E Budnik and D. J. Kuck, "The Organization and Use of Parallel Memories", IEEE Trans. on Computers, vol. 20, no. 12, pp. 1566-1569, 1971.

Digital Library

[3]

J. Frailong, W. Jalby and J. Lenfant, "XOR-schemes: A Flexible Data Organization in Parallel Memories", Int. Conf. on Parallel Processing, pp. 276-283, 1985.

[4]

D.T. Harper III, "Address Transformations to Increase Memory Performance", Int. Conf. on Parallel Processing, pp. 237-241, 1989.

[5]

D.T. Harper III and J.R. Jump, "Performance Evaluation of Vector Accesses in Parallel Memories Using a Skewed Storage Scheme", Int. Symp. on Computer Architecture, pp. 324-328, 1986.

Digital Library

[6]

M. Valero, T. Lang, J.M. Llaberia, M. Peiron, E. Ayguade and J.J. Navarro, "Increasing the Number of Strides for Conflict-Free Vector Access", Int. Syrup. on Computer Architecture, pp. 372-381, 1992.

Digital Library

[7]

A. Seznec and J. Lenfant, "Interleaved Parallel Schemes: Improving Memory Throughput on Supercomputers", Int. Symp. on Computer Architecture, pp. 246-255, 1992.

Digital Library

[8]

M.Peiron, M. Valero and E. Ayguad6, "Synchronized Access to Streams in SIMD Vector Multiprocessors", Int. Conf. on Supercomputing, pp. 23-32, 1994.

Digital Library

[9]

D.H. Bailey, "Vector Computer Memory Bank Contention", IEEE Trans. on Computers, vol. 36, no. 3, pp. 293-298, 1987.

Digital Library

[10]

Y. Bucher and D.A. Calahan, "Access Conflicts in Multiprocessor Memories: Queuing Models and Simulation Studies", Int. Conf. on Supercomputing, pp. 428-438, 1990.

Digital Library

[11]

D.A. Calahan and D.H. Bailey, "Measurement and Analysis of Memory Conflicts on Vector Multiprocessors", Performance Evaluation of Supercomputers, Elsevier Science Publishers, pp. 83- 106, 1988.

[12]

W. Oed and O. Lange, "On the Effective Bandwidth of Interleaved Memories in Vector Processing Systems, IEEE Trans. on Computers, vol. 34, no. 10, pp. 949-957, 1985.

Digital Library

[13]

J.E. Smith and W.R. Taylor, "Characterizing Memory Performance in Vector Multiprocessors", Int. Conf. on Supercomputing, pp. 35-44, 1992.

Digital Library

[14]

T. Cheung and J.E. Smith, "A Simulation Study of the Cray X-MP Memory System", IEEE Trans. on Computers, vol. 35, no. 7, pp. 613-622, 1986.

Digital Library

[15]

W. Oed and O. Lange, "Modelling, Measurement and Simulation of Memory Interference in the Cray X-MP", Parallel Computing, vol. 3, no. 4, pp. 343-358, 1986.

Digital Library

[16]

A. Calahan, "Some Results in Memory Conflict Analysis", Proc. of Supercomputing'89, pp. 775-778, 1989.

Digital Library

[17]

D.A. Calahan, "Characterization of Memory Conflict Loading on the Cray-2", Int. Conf. on Parallel Processing, pp. 299-302, 1988.

[18]

Y. Bucher and M.L. Simmons, "Measurement of Memory Access Contentions in Multiple Vector Processor Systems", Supercomputing 91, pp. 806-817, 1991.

Digital Library

[19]

J.E. Smith and W.R. Taylor, "Accurate Modelling of Interconnection Networks in Vector Supercomputers", Int. Conf. on Supercomputing, pp. 264-273, 1991.

Digital Library

[20]

M. Valero, M. Peiron and E. Ayguad6, "Memory Access Synchronization in Vector Multiprocessors", CONPAR 94, pp. 414-425, 1994.

Digital Library

[21]

H. Shing and L.M. Ni, "A Conflict-free Memory Design for Multiprocessors", Supercomputing91, pp. 46-55, 1991.

Digital Library

Cited By

Dimić VMoretó MCasas MValero M(2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_37
Jorda JM'zoughi A(2012)Isomorphic Recursive SplittingProceedings of the 2012 41st International Conference on Parallel Processing Workshops10.1109/ICPPW.2012.78(574-580)Online publication date: 10-Sep-2012
https://dl.acm.org/doi/10.1109/ICPPW.2012.78
Lee JLakshminarayana NKim HVuduc R(2010)Many-Thread Aware Prefetching Mechanisms for GPGPU ApplicationsProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.44(213-224)Online publication date: 4-Dec-2010
https://dl.acm.org/doi/10.1109/MICRO.2010.44
Show More Cited By

Index Terms

Vector multiprocessors with arbitrated memory access
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
      2. Systolic arrays
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Vector multiprocessors with arbitrated memory access
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)

The high latency of memory accesses is one of the factors that most contribute to reduce the performance of current vector supercomputers. The conflicts that can occur in the memory modules plus the collisions in the interconnection network in the case ...
Performance of Processor-Memory Interconnections for Multiprocessors

A class of interconnection networks based on some existing permutation networks is described with applications to processor to memory communication in multiprocessing systems. These networks, termed delta networks, allow a direct link between any ...
A generic, scalable and globally arbitrated memory tree for shared DRAM access in real-time systems
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

Predictable arbitration policies, such as Time Division Multiplexing (TDM) and Round-Robin (RR), are used to provide firm real-time guarantees to clients sharing a single memory resource (DRAM) between the multiple memory clients in multi-core real-time ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

July 1995

426 pages

ISBN:0897916980

DOI:10.1145/223982

Chairman:
David A. Patterson
Univ. of California, Berkeley

ACM SIGARCH Computer Architecture News Volume 23, Issue 2
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
May 1995
412 pages
ISSN:0163-5964
DOI:10.1145/225830
Chairman:
David A. Patterson
Univ. of California, Berkeley
Issue’s Table of Contents

Copyright © 1995 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA: TC on Computer Arhitecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1995

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA95

Sponsor:

SIGARCH
IEEE-CS\TCCA

ISCA95: International Conference on Computer Architecture

June 22 - 24, 1995

S. Margherita Ligure, Italy

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)8

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dimić VMoretó MCasas MValero M(2021)PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_37(599-615)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_37
Jorda JM'zoughi A(2012)Isomorphic Recursive SplittingProceedings of the 2012 41st International Conference on Parallel Processing Workshops10.1109/ICPPW.2012.78(574-580)Online publication date: 10-Sep-2012
https://dl.acm.org/doi/10.1109/ICPPW.2012.78
Lee JLakshminarayana NKim HVuduc R(2010)Many-Thread Aware Prefetching Mechanisms for GPGPU ApplicationsProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.44(213-224)Online publication date: 4-Dec-2010
https://dl.acm.org/doi/10.1109/MICRO.2010.44
Cabarcas FRico AEtsion YRamirez A(2010)Interleaving granularity on high bandwidth memory architecture for CMPs2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation10.1109/ICSAMOS.2010.5642060(250-257)Online publication date: Jul-2010
https://doi.org/10.1109/ICSAMOS.2010.5642060
Seznec AEspasa R(2005)Conflict-Free Accesses to Strided Vectors on a Banked CacheIEEE Transactions on Computers10.1109/TC.2005.11054:7(913-916)Online publication date: 1-Jul-2005
https://dl.acm.org/doi/10.1109/TC.2005.110
Mathew BMcKee SCarter JDavis AMiller GTeng S(2000)Algorithmic foundations for a parallel vector access memory systemProceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures10.1145/341800.341819(156-165)Online publication date: 9-Jul-2000
https://dl.acm.org/doi/10.1145/341800.341819
Quintana FEspasa RValero M(1998)A case for merging the ILP and DLP paradigmsProceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -10.1109/EMPDP.1998.647201(217-224)Online publication date: 1998
https://doi.org/10.1109/EMPDP.1998.647201
del Corral ALlaberia JClayton B(1996)Increasing the effective bandwidth of complex memory systems in multivector processorsProceedings of the 1996 ACM/IEEE conference on Supercomputing10.1145/369028.369084(26-es)Online publication date: 17-Nov-1996
https://dl.acm.org/doi/10.1145/369028.369084
del Corral ALlaberia J(1995)Increasing the effective memory bandwidth in multivector processorsProceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies10.1109/EURMIC.1996.546363(38-45)Online publication date: 1995
https://doi.org/10.1109/EURMIC.1996.546363
Hong FShrivastava ALee J(2018)Return data interleaving for multi-channel embedded CMPs systemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2011.215736820:7(1351-1354)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1109/TVLSI.2011.2157368

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents