-
-
Notifications
You must be signed in to change notification settings - Fork 924
core: Add ARM64 FJCVTZS
instruction optimization for f64
to i32
#21780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Did you do any (reproducible) benchmarks to compare performance between using a generic conversion vs checking runtime feature and using the native conversion? |
There are some data from here Sorry I don't have a mac to test but there is a benchamrk setup. |
Why is the generic f64 to i32 implementation in oxc different than ours? Out of those 3 conversions:
Either some has to differ or Ruffle's and oxc's has to be equivalent, in which case it doesn't make sense why oxc's is so complex compared to ours. (Just quickly skimming though it, didn't analyze the differences) |
As a side note: Flash conversions don't always have to match the spec, plus the behavior might be different between 32 and 64 bit runtimes. Do we have extensive testing of this conversion on both runtimes? I'm not sure actually |
It's originally copied from boa engine, and since both implementations pass the test, this shouldn't be a problem. |
Since this a aarch64 instruction, 32 bit target shouldn't be affected. |
So if both impls are equivalent, we cannot use the benchmarks done on oxc's impl, because it's (seemingly) less efficient, so it's expected we'd see better performance of |
I'm talking about Flash's 32/64 bit runtimes, because there are slight differences in behavior between them, and I recall stumbling upon those differences in conversions. Ruffle tries to behave as the 32-bit runtime, but we're thinking about letting the user choose which runtime to emulate, so at least we have to be aware of those differences. |
We could just replace the implementation with our own, and someone with mac to test. Assembly reference: https://godbolt.org/z/Me1TaP79e (pure fjcvtzs function)
Runtime behaviour should depend on how this conversion is used. |
Also why doesn’t Rust’s |
Can't figure out why one scrolling test fails on firefox, need help here. |
That's flaky, nothing to do with this PR. |
261a837
to
8a28f71
Compare
8a28f71
to
8f75b24
Compare
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. [0] ruffle-rs/ruffle#21780
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. [0] ruffle-rs/ruffle#21780
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. This feature is now tracked in this issue[1]. [0] ruffle-rs/ruffle#21780 [1] rust-lang/rust#147555
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. This feature is now tracked in this issue[1]. [0] ruffle-rs/ruffle#21780 [1] rust-lang/rust#147555
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reran those benchmarks with new code: https://github.com/kjarosh/jsconv-benchmark/actions/runs/18427684643
platform | generic | fjcvtzs | fallback |
---|---|---|---|
ubuntu | 29 ns | 10 ns | 33 ns |
macos | 17 ns | 4.8 ns | 21 ns |
windows | 29 ns | 25 ns | 34 ns |
It shows ~3x improvement for ubuntu and macos, 16% improvement for windows. For fallback (CPUs that don't support fjcvtzs
) it's 14%–24% slower.
Do you happen to know what's the popularity of jsconv in ARM processors? I guess if it's reasonable (>50% per platform) we can merge this PR. I think we're pretty safe with Apple hardware here, but what about ARM PCs? What about Android phones? |
I didn't find precise data on these, but as an early instruction set extension of ARM64, ARMv8.3 is supported as built-in functions by both GCC and Clang. Additionally, V8, JSC, and SpiderMonkey also use this instruction. I believe the adoption rate is likely quite high. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
---------------------------------------------------------------------------------- clk.mk 9c014001dac74bd3e443600cd3fbf727a6402e62 # Version: Commits on Oct 13, 2025 ---------------------------------------------------------------------------------- Merge pull request #1600 from TomHarte/ShorterOpenDialogue macOS: shorten prompt to File -> Open..., ------------------------------------------------------------------------------------------ dolphin-emu.mk 1067754d218edab5ece7a1878a987c5d61bdaa65 # Version: Commits on Oct 13, 2025 ------------------------------------------------------------------------------------------ Merge pull request #13983 from jordan-woyak/wmreal-android-fix WiimoteReal: Use GetEnvForThread within IORead/IOWrite calls to fix real Wii remotes on Android., ------------------------------------------------------------------------------------------ duckstation.mk a035c487218e7b5c00645977ed3a7cf4d4d3f783 # Version: Commits on Oct 13, 2025 ------------------------------------------------------------------------------------------ GameList: Fall back to achievement badges for game icons, ----------------------------------------------------------------------------------- eden.mk 56e2dbc6198125bbc26ff7aef11a0c0e435400d4 # Version: Commits on Oct 13, 2025 ----------------------------------------------------------------------------------- added barriers against zero valued overlayControlData.individualScale (#2721) -------------------------------------------------------------------------------------- flycast.mk 8edd8fc447fef631f5ac8f9d3a50e8e592f68aa3 # Version: Commits on Oct 13, 2025 -------------------------------------------------------------------------------------- Merge remote-tracking branch 'origin/dev', -------------------------------------------------------------------------------------- melonds.mk 91ab68090c0aa588aabaeaa5e2c62564fd661ccc # Version: Commits on Oct 13, 2025 -------------------------------------------------------------------------------------- AR: add fun D4 opcodes, ---------------------------------------------------- pcsx2.mk v2.5.222 # Version: Commits on Oct 13, 2025 ---------------------------------------------------- - [Debugger: Floating point display in memory view](PCSX2/pcsx2#13362) , ------------------------------------------------------------------------------------ rpcs3.mk 1cae72c87281525468a09f721d4781aff42d433d # Version: Commits on Oct 12, 2025 ------------------------------------------------------------------------------------ vk: Uniquely identify images using a monotonic incrementing counter, ------------------------------------------------------------------------------------- ikemen.mk 42a345be637db241b57bf056ffe3d349362d0aaf # Version: Commits on Oct 13, 2025 ------------------------------------------------------------------------------------- Merge pull request #2777 from rakieldev/fixes fix: random cycle restarting from the last used index, mirrored palettes in demo mode mirror match, ----------------------------------------------------------------------------------------- lightspark.mk c86d37119f745964b23bedf5b4dfe4939abfccbb # Version: Commits on Oct 13, 2025 ----------------------------------------------------------------------------------------- revert previous commit, --------------------------------------------------------------- ruffle.mk nightly-2025-10-13 # Version: Commits on Oct 13, 2025 --------------------------------------------------------------- ## What's Changed * avm1: Adjust clip removal logic for rewinds by @jarca0123 in ruffle-rs/ruffle#21336 * core: Add ARM64 `FJCVTZS` instruction optimization for `f64` to `i32` by @CrazyboyQCD in ruffle-rs/ruffle#21780 * avm1: Remove `base_clip_unloaded` flag in `Activation` by @moulins in ruffle-rs/ruffle#21866 * avm1: Some `Function` refactors by @moulins in ruffle-rs/ruffle#21849 * chore: fix formatting by @moulins in ruffle-rs/ruffle#21902 * chore: Update wgpu to 26, egui to match by @torokati44 in ruffle-rs/ruffle#21730 * chore: Update cpal to 0.16, dropping oboe by @torokati44 in ruffle-rs/ruffle#21903 * avm2: Refactor Array.splice to ArrayStorage by @kjarosh in ruffle-rs/ruffle#21690 * ci: Remove posting coverage comment by @kjarosh in ruffle-rs/ruffle#21838 ## New Contributors * @CrazyboyQCD made their first contribution in ruffle-rs/ruffle#21780 **Full Changelog**: ruffle-rs/ruffle@nightly-2025-10-12...nightly-2025-10-13, ------------------------------------------------------------------------------------ box64.mk 81c56d7155cdd7a4c49173a2fe4d7bdd87698683 # Version: Commits on Oct 13, 2025 ------------------------------------------------------------------------------------ Bumped version to v0.3.8, --------------------------------------------------------------------------------------- etlegacy.mk 5ac4099159ef2b6282709932101654c522024731 # Version: Commits on Oct 13, 2025 --------------------------------------------------------------------------------------- app: google changed requirements for apps to use only 16KB page size so make sure we are aligned, ------------------------------------------------------------------------------------------- jazz2-native.mk c0a761ebae78e9e964ed2e21a5fb90561b02dc60 # Version: Commits on Oct 13, 2025 ------------------------------------------------------------------------------------------- Fixed build, ----------------------------------------------------------------------------------------- retroarch.mk 8d16395de9d8a2534a00feb354163a68488ca90d # Version: Commits on Sept 01, 2025 ----------------------------------------------------------------------------------------- apple: fix display server resolution/refresh rates on macos/ios, ------------------------------------------------------------------- libserum.mk v2.3.0-concentrate.1 # Version: Commits on Oct 11, 2025 ------------------------------------------------------------------- use fixed unit64_t instead of size_t that varies between 32bit and 64bit, ----------------------------------------------------- libzedmd.mk v0.9.7 # Version: Commits on Oct 05, 2025 ----------------------------------------------------- removed osx build target, ---------------------------------------------------------------------------------------- doomretro.mk 6788a780439384f784b49737471e50011f73314d # Version: Commits on Oct 13, 2025 ---------------------------------------------------------------------------------------- Minor tweaks, ------------------------------------------------------------------------------------- gzdoom.mk e558d2be8a13f74ed16f70646bc442185e682db2 # Version: Commits on Oct 11, 2025 ------------------------------------------------------------------------------------- properly handle reading from embedded compressed archives. Fixes #2622, ----------------------------------------------------------------------------------- tr1x.mk cec5d916a64e9fc6bfc1ee31593644c0c2a381f5 # Version: Commits on Oct 12, 2025 ----------------------------------------------------------------------------------- objects: rename GAME_OBJECT_ID to OBJECT_ID This rename is necessary, as `OBJECT_ID` is supposed to refer to the internal TRX indexing, whereas \game object ID\s refer to the OG level data indexing, and are used strictly by the level reading facilities., ----------------------------------------------------------------------------------- tr2x.mk cec5d916a64e9fc6bfc1ee31593644c0c2a381f5 # Version: Commits on Oct 12, 2025 ----------------------------------------------------------------------------------- objects: rename GAME_OBJECT_ID to OBJECT_ID This rename is necessary, as `OBJECT_ID` is supposed to refer to the internal TRX indexing, whereas \game object ID\s refer to the OG level data indexing, and are used strictly by the level reading facilities., ----------------------------------------------------------------------------------------------- libretro-flycast.mk 8edd8fc447fef631f5ac8f9d3a50e8e592f68aa3 # Version: Commits on Oct 13, 2025 ----------------------------------------------------------------------------------------------- Merge remote-tracking branch 'origin/dev', --------------------------------------------------------------------------------------------- libretro-vba-m.mk 53bc1c173872fb8924488192124bed835f6bcfee # Version: Commits on Oct 13, 2025 --------------------------------------------------------------------------------------------- Default bilinear=false, hide_menu_bar=false Signed-off-by: Rafael Kitover <rkitover@gmail.com>, ------------------------------------------------------------------------------------------- glsl-shaders.mk 2eca39ad100cabf1ebe0323c7b67c321eba6c837 # Version: Commits on Oct 13, 2025 ------------------------------------------------------------------------------------------- Update ntsc_module (#544) * Update ntsc_module.glsl * Update crt-consumer-1w.glsl * Update ntsc_module.glsl,
Closes #21773.