Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support removing hook in the hook #61250

Closed
wants to merge 9 commits into from

Conversation

kshitij12345
Copy link
Collaborator
@kshitij12345 kshitij12345 commented Jul 5, 2021

Fixes: #58354

Problem:
Once a hook is called

THPObjectPtr res(PyObject_CallFunctionObjArgs(hook, value.get(), nullptr));
if (!res) throw python_error();
if (res == Py_None) continue;
check_single_result(value.get(), res.get(), hook);

If the hook has handle.remove() while executing and if there are no references to the hook function object then python is free to garbage collect.

At the subsequent call to

check_single_result(value.get(), res.get(), hook);

we have hook pointing to invalid memory

Thus when we try to fetch the name for hook from check_single_result with

static std::string hook_name(PyObject* hook) {
if (PyObject_HasAttrString(hook, "__name__")) {
THPObjectPtr name(PyObject_GetAttrString(hook, "__name__"));

we get segfault.

Solution:
Temporarily increase the life-time of hook with Py_INCREF till we have verified the result.

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Jul 5, 2021

💊 CI failures summary and remediations

As of commit e21aecb (more details on the Dr. CI page and at hud.pytorch.org/pr/61250):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@kshitij12345 kshitij12345 marked this pull request as ready for review July 6, 2021 07:09
@kshitij12345
Copy link
Collaborator Author

@soulitzer have updated the code to use THPObjectPtr.
CI failures look irrelevant.
PTAL :)
Thanks!

@soulitzer
Copy link
Contributor

Looks pretty good overall, thanks for the fix! Two small comments:

@kshitij12345
Copy link
Collaborator Author

We should probably update the docs here with a note

Not sure what do we want to update? Should we mention that it is possible to remove hook from the call to hook? Isn't it assumed since the OP actually did it?
Apologies if I missed something :)
Thanks!

We should avoid modifying a dictionary as we iterate through it: https://docs.python.org/3/c-api/dict.html#c.PyDict_Next. We could work around this by copying the hooks into a separate tuple, and iterate through the tuple instead.

Using PyDict_Values which returns a new reference. Have verified the fix locally.

@iramazanli iramazanli added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module high priority and removed high priority labels Jul 8, 2021
Copy link
Contributor
@soulitzer soulitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The behavior is probably expected anyway, so current doc should be OK.

torch/csrc/autograd/python_hook.cpp Outdated Show resolved Hide resolved
test/test_torch.py Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@soulitzer merged this pull request in 5e9bcf9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault when a Tensor backward hook removes itself
5 participants