Performance regression between 1.78 and 1.79 #133082

havasd · 2024-11-15T20:38:43Z

We were upgrading from rust 1.78 to 1.79 and later 1.81 and we discovered a performance regression in our applications in certain cases up to 10%.
In our benchmarks we were using iai-callgrind to measure instructions and we have seen a significant increase there.

Unfortunately, I can't provide exact code as they are proprietary but in the nature they have long callchains with lots of small functions.

The same issue was reported here.

I have bisected the 1.78 and 1.79 version and found that the following change causes performance regression: 3412f01#diff-deee82aaf9baf43ab05d939355f6249fdacf8959bc0e06c9574283453f838ee9R702

Release cargo profile

codegen-units = 1
opt-level = 3
lto = "thin"

Workaround

Adding debug = 1 to the profile in release mode essentially disables this change and the performance degradation is not there anymore as expected.

The text was updated successfully, but these errors were encountered:

lqd · 2024-11-15T21:59:47Z

Weird, but cc @scottmcm as the author of #123949.

saethlin · 2024-11-16T00:05:55Z

One side effect of the PR in question is that it changes how aggressive the MIR inliner is. Since you've also commented that you have long callchains with lots of small functions, I suspect that the performance of your code is highly sensitive to inlining.

If I'm right, you can emulate this effect on 1.78 by increasing the MIR inlining thresholds. The defaults are -Zinline-mir-forwarder-threshold=30 -Zinline-mir-hint-threshold=100 -Zinline-mir-threshold=50. Adding 10 or 20 to all those numbers is the kind of experiment I'd try.

Similarly, you could try turning the MIR inliner off in 1.79 with -Zinline-mir=no.

But if the relevant optimization difference is in the precompiled standard library, you'd need to do all this with -Zbuild-std which will itself be a whole mess because the precompiled one is built with debuginfo = 2, and you aren't using that for your benchmarks.

There is really not a lot we can do without a reproducer, and even with one if this boils down to "this code requires a specific MIR inliner behavior", we are likely to take action only if that can be extracted to an inliner heuristic that can stand on its own merits.

scottmcm · 2024-11-16T06:02:55Z

Huh, when I heard "long callchains with lots of small functions" I was expecting this to be caused by #127113, not by #123949.

I agree with Ben that it's really hard to do anything without a reproducer, though. All the complicated interactions between the MIR inliner and LLVM's inliner here make any change really hard to predict what it will actually do, and all the thresholds mean that tiny changes sometimes have no effect, and sometimes have knife-edge effects.

jieyouxu · 2024-11-16T09:17:36Z

(Sorry didn't notice this was open 12 hours ago)

havasd added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Nov 15, 2024

lolbinarycat added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 15, 2024

saethlin added A-mir-opt-inlining Area: MIR inlining and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 15, 2024

scottmcm added the S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. label Nov 16, 2024

This comment has been minimized.

Sign in to view

jieyouxu closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2024

jieyouxu removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 16, 2024

jieyouxu reopened this Nov 16, 2024

jieyouxu added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 16, 2024

jieyouxu removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression between 1.78 and 1.79 #133082

Performance regression between 1.78 and 1.79 #133082

This comment has been minimized.

Performance regression between 1.78 and 1.79 #133082

Performance regression between 1.78 and 1.79 #133082

Comments

Release cargo profile

Workaround

This comment has been minimized.