Article

Free access

EMC-Y: parallel processing element optimizing communication and computation

Authors:

Yoshinori YamaguchiAuthors Info & Claims

ICS '93: Proceedings of the 7th international conference on Supercomputing

Pages 167 - 174

https://doi.org/10.1145/165939.165967

Published: 01 August 1993 Publication History

PDF eReader

Abstract

EMC-Y is a new processing element for highly parallel computers designed to achieve high performance parallel computation by fusing a dataflow mechanism and a von Neumann execution pipeline. We have already developed EMC-R, which is the processing element used in the EM-4 prototype. EMC-Y improves on EMC-R's packet communication performance, allowing it to tolerate a more network traffic. This paper presents the architecture of EMC-Y, concentrating on the principles of packet communication. EMC-Y uses an output packet buffer and optimal packet routing to improve the performance of packet sending and transferring. EMC-Y changes the memory access priority for input packet buffer operation to improve the performance of receiving packets. Since the EMC-Y processor not only improves the performance of packet input and output but also balances them, it can tolerate a large amount of traffic and can improve the execution performance. We evaluate the improvements of EMC-Y architecture using a clock level simulator. The results show that EMC-Y improves performance by 50% to 70% in several programs over EMC-R at the same clock speed.

References

[1]

Arvind and R.Nickhil. Executing a Program on the MIT Tagged-Token Dataflow Arch~e~ure. IEEE trans, on Computers, 39(3), (1990), pp.300-318.

Digital Library

Google Scholar

[2]

K.Hiraki, T.Shimada and K.Nishida. A hardware design of the SIGMA-1 - a dataflow computer for scientific computations. Proc. of ICPP 84, (1984), pp.524-531.

Google Scholar

[3]

J.L.Hennessy and D.A.Patterson. Computer Arch~ tecture A Quantitative Approach. Morgan Kau~ mann Pub. Inc., (1990).

Digital Library

Google Scholar

[4]

Y.Yamaguchi, S.Sakai, K.Hiraki, Y.Kodama, and T.Yuba. An ArchitecturM Design of a Highly Parallel Dataflow Machine, Proc. of IFIP 89, (1989), pp.1155-1160.

Google Scholar

[5]

S.Sakai, Y.Yamaguchi, K.Hiraki, Y.Kodama, and T.Yuba. An Architecture of a Dataflow Single Chip Processor, Proc. of ISCA 89, (1989), pp.46-53.

Digital Library

Google Scholar

[6]

Y.Kodama, S.Sakai, and Y.yamaguchi. A Prototype of a Highly Paral~l Dataflow Machine EM-4 and its Preliminary Evaluation, Proc. of infoJapan 90, (1990), pp.291-298.

Google Scholar

[7]

S.Sakai, Y.Kodama and Y.Yamaguchi, Design .and Imp~mentation of a Circular Omega Network in the EM-4, Parallel Computing, Vol.19, No.2, (1993), pp.125-142.

Crossref

Google Scholar

[8]

Y.Kodama, S.Sakai, and Y.Yamaguchi. Load Balancing by Function DHtribution on the EM-4 Prototype, Proc. of Supercomputing 91, (1991), pp.522- 531.

Digital Library

Google Scholar

[9]

D.E.Culler, A.Sah, K.E.Schauser, T.von Eicken and J.Wawrzynek. Fine-grain Paral~fism with Minimal Hardware Support: A Comp~e~Control~d Threaded Abstract Machine, Proc. of ASPLOS I, (1991), pp.164-175.

Digital Library

Google Scholar

[10]

A.Agarwal, B.H.Lim, D.kranz and J.Kubiatowicz. APRIL: A Processor Architecture for MuRiproees~ ing, Proc. of ISCA 91, (1991), pp.104-114.

Digital Library

Google Scholar

[11]

W.J.Dally, A.Chien, S.F~ke, W.Horwat, J.Keen, M.Larivee, R.Lethin, P.Nuth and S.Wil~. The J- machine: A Fine-grain Concurrent Computer, Information Proces~ng 89, Proe. of IFIP 89, (1989), pp.1147-1153.

Google Scholar

[12]

D.Lenoski, J.Laudon, T.Joe, D.Nakahira, L.Stevens, A.Gupta and J.Hennessy. The DASH Prototype: Imp~mentation and Performance, Proc. of ISCA 92, (1992), pp.92-103.

Digital Library

Google Scholar

Cited By

View all

Sato MKodama YSakane HSakai SYamaguchi YSekiguchi S(2005)Programming with distributed data structure for EM-X multiprocessorTheory and Practice of Parallel Programming10.1007/BFb0026585(472-483)Online publication date: 15-Jun-2005
https://doi.org/10.1007/BFb0026585
Sohn AKodama YKu JSato MYamaguchi Y(2001)Tolerating communication latency through dynamic thread invocation in a multithreaded architectureCompiler optimizations for scalable parallel systems10.5555/380466.380481(525-549)Online publication date: 1-Jun-2001
https://dl.acm.org/doi/10.5555/380466.380481
Sohn AKodama YKu JSato MYamaguchi Y(2001)Tolerating Communication Latency through Dynamic Thread Invocation in a Multithreaded ArchitectureCompiler Optimizations for Scalable Parallel Systems10.1007/3-540-45403-9_15(525-549)Online publication date: 18-May-2001
https://doi.org/10.1007/3-540-45403-9_15
Show More Cited By

Index Terms

Recommendations

SQUID: a practical 100% throughput scheduler for crosspoint buffered switches

Crosspoint buffered switches are emerging as the focus of research in high-speed routers. They have simpler scheduling algorithms and achieve better performance than bufferless crossbar switches. Crosspoint buffered switches have a buffer at each ...
Excess buffer requirement for EPD schemes in ATM networks

It is known that the performance of TCP over ATM can be significantly degraded if the bandwidth is occupied by cells belonging to packets that are already corrupted by cell loss due to buffer overflow. The Early Packet Discard (EPD) mechanism is a well ...
The impact of bursty traffic on FPCF packet switch performance

This paper analyses and compares the performance of forward planning conflict-free (FPCF), virtual output queuing-partitioned (VOQ-P) and virtual output queuing-shared (VOQ-S) packet switches. The influence of packet burst size, offered switch load and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICS '93: Proceedings of the 7th international conference on Supercomputing

August 1993

425 pages

ISBN:089791600X

DOI:10.1145/165939

Chairman:
Yoichi Muraoka

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1993

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS93

Sponsor:

SIGARCH

ICS93: International Conference on Supercomputing

July 19 - 23, 1993

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
368
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)13

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sato MKodama YSakane HSakai SYamaguchi YSekiguchi S(2005)Programming with distributed data structure for EM-X multiprocessorTheory and Practice of Parallel Programming10.1007/BFb0026585(472-483)Online publication date: 15-Jun-2005
https://doi.org/10.1007/BFb0026585
Sohn AKodama YKu JSato MYamaguchi Y(2001)Tolerating communication latency through dynamic thread invocation in a multithreaded architectureCompiler optimizations for scalable parallel systems10.5555/380466.380481(525-549)Online publication date: 1-Jun-2001
https://dl.acm.org/doi/10.5555/380466.380481
Sohn AKodama YKu JSato MYamaguchi Y(2001)Tolerating Communication Latency through Dynamic Thread Invocation in a Multithreaded ArchitectureCompiler Optimizations for Scalable Parallel Systems10.1007/3-540-45403-9_15(525-549)Online publication date: 18-May-2001
https://doi.org/10.1007/3-540-45403-9_15
Tatebe OKodama YSekiguchi SYamaguchi YEgan GBrent RGannon D(1998)Highly efficient implementation of MPI point-to-point communication using remote memory operationsProceedings of the 12th international conference on Supercomputing10.1145/277830.277890(267-273)Online publication date: 13-Jul-1998
https://dl.acm.org/doi/10.1145/277830.277890
Sohn AKodama YKu JSato MSakane HYamana HSakai SYamaguchi YLeiserson CCuller D(1997)Fine-grain multithreading with the EM-X multiprocessorProceedings of the ninth annual ACM symposium on Parallel algorithms and architectures10.1145/258492.258511(189-198)Online publication date: 1-Jun-1997
https://dl.acm.org/doi/10.1145/258492.258511
Sohn AKim CSato MBic LEvripidou PBöhm WGaudiot J(1995)Multithreading with the EM-4 distributed-memory multiprocessorProceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques10.5555/224659.224676(27-36)Online publication date: 27-Jun-1995
https://dl.acm.org/doi/10.5555/224659.224676
Kodama YSakane HSato MYamana HSakai SYamaguchi Y(1995)The EM-X parallel computerACM SIGARCH Computer Architecture News10.1145/225830.22398723:2(14-23)Online publication date: 1-May-1995
https://dl.acm.org/doi/10.1145/225830.223987
Kodama YSakane HSato MYamana HSakai SYamaguchi YPatterson D(1995)The EM-X parallel computerProceedings of the 22nd annual international symposium on Computer architecture10.1145/223982.223987(14-23)Online publication date: 1-Jul-1995
https://dl.acm.org/doi/10.1145/223982.223987
Kasahara HHonda HAida KOkamoto MYoshida AOgata W(1995)OSCAR Fortran Multigrain CompilerParallel Language and Compiler Research in Japan10.1007/978-1-4615-2269-0_11(271-301)Online publication date: 1995
https://doi.org/10.1007/978-1-4615-2269-0_11
Sohn ASato MSakai SKodama YYamaguchi YJohnson G(1994)Nonnumeric search results on the EM-4 distributed-memory multiprocessorProceedings of the 1994 ACM/IEEE conference on Supercomputing10.5555/602770.602828(301-310)Online publication date: 14-Nov-1994
https://dl.acm.org/doi/10.5555/602770.602828
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

SQUID: a practical 100% throughput scheduler for crosspoint buffered switches

Excess buffer requirement for EPD schemes in ATM networks

The impact of bursty traffic on FPCF packet switch performance