Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not modify saved variables in-place for spectral norm during power iteration #62293

Closed
wants to merge 4 commits into from

Conversation

soulitzer
Copy link
Contributor

Interestingly enough, the original code did have a mechanism that aims to prevent this very issue:
but it performs a clone AFTER modifying u and v in-place.
This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place.
So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation.

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Jul 27, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit bb92cc1 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

1 failure not recognized by patterns:

Job Step Action
GitHub Actions linux-xenial-py3.6-gcc5.4 / calculate-docker-image Checkout PyTorch 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@soulitzer soulitzer force-pushed the spectral-norm-remove-out branch from 3d63e9f to c0c61ad Compare July 27, 2021 21:26
@soulitzer soulitzer force-pushed the spectral-norm-remove-out branch from c0c61ad to 772c8b0 Compare July 28, 2021 02:26
@soulitzer soulitzer requested a review from albanD July 28, 2021 14:58
@soulitzer soulitzer force-pushed the spectral-norm-remove-out branch from 29836b1 to a3eec40 Compare July 28, 2021 20:14
@soulitzer soulitzer force-pushed the spectral-norm-remove-out branch from ca15208 to c2cb316 Compare July 30, 2021 19:40
@soulitzer soulitzer requested review from albanD and lezcano August 9, 2021 16:08
@soulitzer
Copy link
Contributor Author
soulitzer commented Aug 19, 2021

As discussed with @albanD offline, since we cannot test with gradcheck with the parametrized spectral norm module as-is, we try to test gradcheck again with no_grad removed. See #63593. The CI should reveal whether it works with DP or not.

This was more complicated that initially thought though because there are many side effects of removing the no_grad. Its very possible we're testing completely different.

Anyway, the things changed in linked PR are:

  • (repeat changes from this PR)
  • remove @nograd
  • no longer modify u and v in-place via out arg (because we are in grad mode now), instead we do it out of place then copy_ after so that DP would still work (though I can't test that locally). We need an extra clone before F.normalize(...) because it saves u, v for backward now. Also do the same for weight_mat
  • change clone memory_format=torch.contiguous_format -> perserve_format (TODO: why does this break gradcheck??)
  • In test.nn F.normalize does not support undefined gradients so we need check_undefined=False now.
  • Adds retain_graph to .backward and .grad calls because u and v begin to require grad after the first forward and connects to the graph from the previous iteration

@albanD
Copy link
Collaborator
albanD commented Aug 19, 2021

Adds retain_graph to .backward and .grad calls because u and v begin to require grad after the first forward and connects to the graph from the previous iteration

A wrapper function should be used to make sure we don't change the "original" u and v in any way I think.

In test.nn F.normalize does not support undefined gradients so we need check_undefined=False now.

That is surprising... especially ops with a single output should work out of the box.

@soulitzer
Copy link
Contributor Author

A wrapper function should be used to make sure we don't change the "original" u and v in any way I think.

This fixed everything. The check_undefined_grad issue went away too with the new wrapper. So yeah it's not F.normalize, I should've double checked the first time.

@soulitzer
Copy link
Contributor Author
soulitzer commented Aug 19, 2021

From the linked PR, the test is failing for DP - Not sure if DP is possible when power iteration is done in grad mode because self._u needs to be modified in-place, but self._u can be a view apparently. The grad_fn is BroadcastBackward (which I could not grep) seems to return multiple views as well.

https://app.circleci.com/pipelines/github/pytorch/pytorch/366855/workflows/b0ec5ba5-b366-43d0-8846-5cf6d675e456/jobs/15480216

Copy link
Collaborator
@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok!

@facebook-github-bot
Copy link
Contributor

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@soulitzer soulitzer force-pushed the spectral-norm-remove-out branch from c2cb316 to bb92cc1 Compare August 24, 2021 16:09
@facebook-github-bot
Copy link
Contributor

@soulitzer has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@codecov
Copy link
codecov bot commented Aug 24, 2021

Codecov Report

Merging #62293 (bb92cc1) into master (94d6215) will increase coverage by 0.08%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #62293      +/-   ##
==========================================
+ Coverage   67.05%   67.13%   +0.08%     
==========================================
  Files         691      691              
  Lines       90496    90496              
==========================================
+ Hits        60679    60752      +73     
+ Misses      29817    29744      -73     

@facebook-github-bot
Copy link
Contributor

@soulitzer merged this pull request in 5be17ec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants