Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression between 1.78 and 1.79 #133082

Open
havasd opened this issue Nov 15, 2024 · 5 comments
Open

Performance regression between 1.78 and 1.79 #133082

havasd opened this issue Nov 15, 2024 · 5 comments
Labels
A-mir-opt-inlining Area: MIR inlining C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@havasd
Copy link
havasd commented Nov 15, 2024

We were upgrading from rust 1.78 to 1.79 and later 1.81 and we discovered a performance regression in our applications in certain cases up to 10%.
In our benchmarks we were using iai-callgrind to measure instructions and we have seen a significant increase there.

Unfortunately, I can't provide exact code as they are proprietary but in the nature they have long callchains with lots of small functions.

The same issue was reported here.

I have bisected the 1.78 and 1.79 version and found that the following change causes performance regression: 3412f01#diff-deee82aaf9baf43ab05d939355f6249fdacf8959bc0e06c9574283453f838ee9R702

Release cargo profile

codegen-units = 1
opt-level = 3
lto = "thin"

Workaround

Adding debug = 1 to the profile in release mode essentially disables this change and the performance degradation is not there anymore as expected.

@havasd havasd added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Nov 15, 2024
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-untriaged Untriaged performance or correctness regression. labels Nov 15, 2024
@lqd
Copy link
Member
lqd commented Nov 15, 2024

Weird, but cc @scottmcm as the author of #123949.

@lolbinarycat lolbinarycat added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 15, 2024
@saethlin saethlin added A-mir-opt-inlining Area: MIR inlining and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Nov 15, 2024
@saethlin
Copy link
Member

One side effect of the PR in question is that it changes how aggressive the MIR inliner is. Since you've also commented that you have long callchains with lots of small functions, I suspect that the performance of your code is highly sensitive to inlining.

If I'm right, you can emulate this effect on 1.78 by increasing the MIR inlining thresholds. The defaults are -Zinline-mir-forwarder-threshold=30 -Zinline-mir-hint-threshold=100 -Zinline-mir-threshold=50. Adding 10 or 20 to all those numbers is the kind of experiment I'd try.

Similarly, you could try turning the MIR inliner off in 1.79 with -Zinline-mir=no.

But if the relevant optimization difference is in the precompiled standard library, you'd need to do all this with -Zbuild-std which will itself be a whole mess because the precompiled one is built with debuginfo = 2, and you aren't using that for your benchmarks.

There is really not a lot we can do without a reproducer, and even with one if this boils down to "this code requires a specific MIR inliner behavior", we are likely to take action only if that can be extracted to an inliner heuristic that can stand on its own merits.

@scottmcm
Copy link
Member

Huh, when I heard "long callchains with lots of small functions" I was expecting this to be caused by #127113, not by #123949.

I agree with Ben that it's really hard to do anything without a reproducer, though. All the complicated interactions between the MIR inliner and LLVM's inliner here make any change really hard to predict what it will actually do, and all the thresholds mean that tiny changes sometimes have no effect, and sometimes have knife-edge effects.

@scottmcm scottmcm added the S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. label Nov 16, 2024
@jieyouxu

This comment has been minimized.

@jieyouxu jieyouxu closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2024
@jieyouxu jieyouxu removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 16, 2024
@jieyouxu jieyouxu reopened this Nov 16, 2024
@jieyouxu jieyouxu added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 16, 2024
@jieyouxu
Copy link
Member

(Sorry didn't notice this was open 12 hours ago)

@jieyouxu jieyouxu removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt-inlining Area: MIR inlining C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants