-
Notifications
You must be signed in to change notification settings - Fork 23k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DDP] Log when errors happen #59281
[DDP] Log when errors happen #59281
Conversation
Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. Differential Revision: [D28652717](https://our.internmc.facebook.com/intern/diff/D28652717/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 172aa56 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)Step: "Run tests" (full log | diagnosis details | 🔁 rerun)
|
Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. Differential Revision: [D28652717](https://our.internmc.facebook.com/intern/diff/D28652717/) ghstack-source-id: 130330544 Pull Request resolved: #59281
Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. Differential Revision: [D28652717](https://our.internmc.facebook.com/intern/diff/D28652717/) [ghstack-poisoned]
Looking into the test failures. |
Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. Differential Revision: [D28652717](https://our.internmc.facebook.com/intern/diff/D28652717/) [ghstack-poisoned]
Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. Differential Revision: [D28652717](https://our.internmc.facebook.com/intern/diff/D28652717/) [ghstack-poisoned]
This pull request has been merged in 79aeca0. |
Summary: Pull Request resolved: pytorch#59281 Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has occured in this iteration, and the other fields (performance stats) are not guaranteed to be updated. Errors encountered in python-side DDP will be added in the next diff. ghstack-source-id: 130412974 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28652717 fbshipit-source-id: 9772abc2647a92dac6a325da6976ef5eb877c589
Stack from ghstack:
Adds ability to log when reducer/ddp encounters an error. We add fields "has_error" and "error" to indicate that an error has
occured in this iteration, and the other fields (performance stats) are not
guaranteed to be updated.
Errors encountered in python-side DDP will be added in the next diff.
Differential Revision: D28652717