article

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions

Authors:

Ding-Yong Hong,

Wei-Chung HsuAuthors Info & Claims

ACM SIGPLAN Notices, Volume 52, Issue 5

Pages 31 - 40

https://doi.org/10.1145/3140582.3081029

Published: 21 June 2017 Publication History

Abstract

More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, guest memory instructions with strides are emulated by a sequence of scalar instructions, leaving a significant room for performance improvement when the host machines have SIMD instructions available. Structured loads/stores, such as VLDn/VSTn in ARM NEON, are one type of strided SIMD data access instructions. They are widely used in signal processing, multimedia, mathematical and 2D matrix transposition applications. Efficient translation of such structured loads/stores is a critical issue when migrating ARM executables to other ISAs. However, it is quite challenging since not only the translation of structured loads/stores is not trivial, but also the difference between guest and host register configurations must be taken into consideration. In this work, we present the design and implementation of translating structured loads/stores in DBT, including target code generation as well as efficient SIMD register mapping. Our proposed register mapping mechanisms are not limited to handling structured loads/stores, they can be extended to deal with normal SIMD instructions. On a set of OpenCV benchmarks, our QEMU-based system has achieved a maximum speedup of 5.41x, with an average improvement of 2.93x. On a set of BLAS benchmarks, our system has also obtained a maximum speedup of 2.19x and an average improvement of 1.63x.

References

[1]

A. Anderson, A. Malik, and D. Gregg. Automatic vectorization of interleaved data revisited. TACO, 12(4):50, 2016.

Digital Library

[2]

N. Hallou, E. Rohou, P. Clauss, and A. Ketterlin. Dynamic revectorization of binary code. In SAMOS, pages 228–237. IEEE, 2015.

[3]

C. J. Hughes. Single-instruction multiple-data execution. Synthesis Lectures on Computer Architecture, 10(1):1–121, 2015.

[4]

Intel. Intel 64 and ia-32 architectures optimization reference manual. Intel Corporation, Sept, 2016.

[5]

S. Kim and H. Han. Efficient SIMD code generation for irregular kernels. In PPoPP, pages 55–64. ACM, 2012.

Digital Library

[6]

S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 59–69. ACM, 2000.

Digital Library

[7]

R. Leupers. Code selection for media processors with SIMD instructions. In DATE, pages 4–8. ACM, 2000.

Digital Library

[8]

L. Michel, N. Fournel, and F. Pétrot. Speeding-up SIMD instructions dynamic binary translation in embedded processor simulation. In DATE, pages 1–4. ACM, 2011.

[9]

D. Naishlos, M. Biberstein, and A. Zaks. Compiler vectorization techniques for disjoint SIMD architectures. Technical report, 2002.

[10]

D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In CGO, pages 281–294. IEEE Computer Society, 2006.

Digital Library

[11]

D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI, pages 132–143. ACM, 2006.

Digital Library

[12]

V. Porpodas, A. Magni, and T. M. Jones. Pslp: Padded slp automatic vectorization. In CGO, pages 190–201. IEEE Computer Society, 2015.

Digital Library

[13]

Y. Sui, X. Fan, H. Zhou, and J. Xue. Loop-oriented array-and field-sensitive pointer analysis for automatic SIMD vectorization. In LCTES, pages 41–51. ACM, 2016.

Digital Library

[14]

C. Zheng and C. Thompson. Pa-risc to ia-64: Transparent execution, no recompilation. Computer, 33(3):47–52, 2000.

Digital Library

[15]

H. Zhou and J. Xue. A compiler approach for exploiting partial SIMD parallelism. TACO, 13(1):11, 2016.

Digital Library

[16]

H. Zhou and J. Xue. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In CGO, pages 59–69. ACM, 2016.

Digital Library

Cited By

Liu L(2023)Application of Speech Recognition Translator based on Evolutionary Multi-objective Optimization Algorithm2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT)10.1109/EASCT59475.2023.10392501(1-6)Online publication date: 20-Oct-2023
https://doi.org/10.1109/EASCT59475.2023.10392501
Wu JDong JFang RZhao ZGong XWang WZuo DTitzer BXu HZhang I(2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454016
Wu JDong JFang RZhao ZGong XWang WZuo DTitzer BXu HZhang I(2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454016
Show More Cited By

Index Terms

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Dynamic compilers
      2. Retargetable compilers

Recommendations

Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions
LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, ...
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation

Single instruction multiple data (SIMD) has been adopted for decades because of its superior performance and power efficiency. The SIMD capability (i.e., width, number of registers, and advanced instructions) has diverged rapidly on different SIMD ...
Improving SIMD Parallelism via Dynamic Binary Translation

Recent trends in SIMD architecture have tended toward longer vector lengths, and more enhanced SIMD features have been introduced in newer vector instruction sets. However, legacy or proprietary applications compiled with short-SIMD ISA cannot benefit ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 52, Issue 5

LCTES '17

May 2017

120 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3140582

Editor:
Matthew Fluet

Issue’s Table of Contents

LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
June 2017
120 pages
ISBN:9781450350303
DOI:10.1145/3078633
General Chair:
Vijay Nagarajan
University of Edinburgh, UK
,
Program Chair:
Zili Shao
Hong Kong Polytechnic University, China

Copyright © 2017 ACM.

© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2017

Published in SIGPLAN Volume 52, Issue 5

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
181
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu L(2023)Application of Speech Recognition Translator based on Evolutionary Multi-objective Optimization Algorithm2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT)10.1109/EASCT59475.2023.10392501(1-6)Online publication date: 20-Oct-2023
https://doi.org/10.1109/EASCT59475.2023.10392501
Wu JDong JFang RZhao ZGong XWang WZuo DTitzer BXu HZhang I(2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454016
Wu JDong JFang RZhao ZGong XWang WZuo DTitzer BXu HZhang I(2021)Effective exploitation of SIMD resources in cross-ISA virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454016(84-97)Online publication date: 7-Apr-2021
https://dl.acm.org/doi/10.1145/3453933.3454016
Jiang JDong RZhou ZSong CWang WYew PZhang W(2020)More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00043(415-426)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00043
Song CWang WYew PZhai AZhang WDan TDahlia M(2019)Unleashing the power of learningProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358815(77-89)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.5555/3358807.3358815
Fu SHong DLiu YWu JHsu W(2019)Optimizing data permutations in structured loads/stores translation and SIMD register mapping for a cross-ISA dynamic binary translatorJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.07.00898:C(173-190)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.07.008

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents