-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue re: DDP and create_graph=True #63831
Conversation
Closes #63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. Differential Revision: [D30505573](https://our.internmc.facebook.com/intern/diff/D30505573/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30505573/)! [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 074b89c (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Closes #63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. Differential Revision: [D30505573](https://our.internmc.facebook.com/intern/diff/D30505573/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30505573/)! ghstack-source-id: 136505319 Pull Request resolved: #63831
cc @mcarilli |
Codecov Report
@@ Coverage Diff @@
## gh/rohan-varma/388/base #63831 +/- ##
===========================================================
- Coverage 75.53% 75.53% -0.01%
===========================================================
Files 2122 2122
Lines 212881 212895 +14
===========================================================
+ Hits 160805 160807 +2
- Misses 52076 52088 +12 |
Closes #63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. Differential Revision: [D30505573](https://our.internmc.facebook.com/intern/diff/D30505573/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30505573/)! [ghstack-poisoned]
Pull Request resolved: #63831 Closes #63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. ghstack-source-id: 136514879 Differential Revision: [D30505573](https://our.internmc.facebook.com/intern/diff/D30505573/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30505573/)!
I thought DDP does not support grad's gradients, e.g., the grad's gradients are not synced in current DDP. possibly we should be careful to support this new feature |
Yes, grad of grads will not be synced and I don't think that is a supported feature of DDP. Although, it seems like we do not intend to crash here so possibly we should fix that. Maybe we can clarify in the documentation that grads resulting from higher-order differentiation won't be synced. |
// all the input parameters. | ||
at::mul_out(bucket_view, grad, wrapped); | ||
} else { | ||
// If DDP is running with create_graph=True, gradients require_grad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you mean codes will crash here if grad.requires_grad()? if so, makes sense to fix that here, but would you please also add a warning here saying it is not supported by DDP to sync up its gradients
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will add it
Closes #63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. Note that DDP doesn't really work well with higher order grads. Will add appropriate warning messaging. Differential Revision: [D30505573](https://our.internmc.facebook.com/intern/diff/D30505573/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30505573/)! [ghstack-poisoned]
Pull Request resolved: #63831 Closes #63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. ghstack-source-id: 136614644 Differential Revision: [D30505573](https://our.internmc.facebook.com/intern/diff/D30505573/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D30505573/)!
This pull request has been merged in a6f767e. |
Previous PR #63831 did not actually test the error in #63812. Introduce a test directly from the repro that simulates it. Differential Revision: [D30569719](https://our.internmc.facebook.com/intern/diff/D30569719/) [ghstack-poisoned]
Previous PR #63831 did not actually test the error in #63812. Introduce a test directly from the repro that simulates it. Differential Revision: [D30569719](https://our.internmc.facebook.com/intern/diff/D30569719/) ghstack-source-id: 136824065 Pull Request resolved: #64074
Previous PR #63831 did not actually test the error in #63812. Introduce a test directly from the repro that simulates it. Differential Revision: [D30569719](https://our.internmc.facebook.com/intern/diff/D30569719/) [ghstack-poisoned]
Previous PR #63831 did not actually test the error in #63812. Introduce a test directly from the repro that simulates it. Differential Revision: [D30569719](https://our.internmc.facebook.com/intern/diff/D30569719/) [ghstack-poisoned]
Pull Request resolved: #64074 Previous PR #63831 did not actually test the error in #63812. Introduce a test directly from the repro that simulates it. ghstack-source-id: 137171460 Differential Revision: [D30569719](https://our.internmc.facebook.com/intern/diff/D30569719/)
Summary: Pull Request resolved: #64074 Previous PR #63831 did not actually test the error in #63812. Introduce a test directly from the repro that simulates it. ghstack-source-id: 137171460 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30569719 fbshipit-source-id: fd61250ef6d291c093607663d91d6d2cb5574eb7
Stack from ghstack:
Closes #63812
at::mul_out
is not supported whengrad
itself requires grad, which is useful for computing higher order derivatives.In this case, fall back to a mul + copy instead of mul_out.
Note that DDP doesn't really work well with higher order grads. Will add appropriate warning messaging.
Differential Revision: D30505573
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23