Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add channels last for GroupNorm #49821

Closed
wants to merge 40 commits into from

Conversation

mingfeima
Copy link
Collaborator
@mingfeima mingfeima commented Dec 24, 2020

Stack from ghstack:

Differential Revision: D26007053

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Dec 24, 2020

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 7759c07 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang7_asan_test2 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 18 05:37:14 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Aug 18 05:37:14     #4 0x55939858315f  (/opt/conda/bin/python3.6+0x13015f)
Aug 18 05:37:14     #5 0x5593985c58f2  (/opt/conda/bin/python3.6+0x1728f2)
Aug 18 05:37:14     #6 0x55939862dcd5  (/opt/conda/bin/python3.6+0x1dacd5)
Aug 18 05:37:14     #7 0x55939862fd5d  (/opt/conda/bin/python3.6+0x1dcd5d)
Aug 18 05:37:14     #8 0x55939862fdbb  (/opt/conda/bin/python3.6+0x1dcdbb)
Aug 18 05:37:14     #9 0x559398630926  (/opt/conda/bin/python3.6+0x1dd926)
Aug 18 05:37:14     #10 0x55939856a196  (/opt/conda/bin/python3.6+0x117196)
Aug 18 05:37:14     #11 0x7f56492fb83f  (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
Aug 18 05:37:14     #12 0x5593985fa33d  (/opt/conda/bin/python3.6+0x1a733d)
Aug 18 05:37:14 
Aug 18 05:37:14 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Aug 18 05:37:14 + retcode=1
Aug 18 05:37:14 + set -e
Aug 18 05:37:14 + return 1
Aug 18 05:37:14 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX-* ]]
Aug 18 05:37:14 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X ]]
Aug 18 05:37:14 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX2-* ]]
Aug 18 05:37:14 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
Aug 18 05:37:14 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX512-* ]]
Aug 18 05:37:14 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]]
Aug 18 05:37:14 + '[' -n https://github.com/pytorch/pytorch/pull/49821 ']'

See GitHub Actions build linux-bionic-py3.8-gcc9-coverage / test (default, 1, 2, linux.2xlarge) (2/2)

Step: "Test PyTorch" (full log | diagnosis details | 🔁 rerun)

2021-08-18T06:27:08.4574279Z Build left local git repository checkout dirty
2021-08-18T06:26:56.2063269Z real	72m19.773s
2021-08-18T06:26:56.2063772Z user	160m36.687s
2021-08-18T06:26:56.2064179Z sys	11m36.374s
2021-08-18T06:26:56.2064704Z + assert_git_not_dirty
2021-08-18T06:26:56.2066095Z + [[ linux-bionic-py3.8-gcc9-coverage-default != *rocm* ]]
2021-08-18T06:26:56.2067173Z + [[ linux-bionic-py3.8-gcc9-coverage-default != *xla* ]]
2021-08-18T06:26:56.2067921Z ++ git status --porcelain
2021-08-18T06:27:08.4572333Z + git_status='?? third_party/pocketfft/'
2021-08-18T06:27:08.4573117Z + [[ -n ?? third_party/pocketfft/ ]]
2021-08-18T06:27:08.4573736Z + echo 'Build left local git repository checkout dirty'
2021-08-18T06:27:08.4574279Z Build left local git repository checkout dirty
2021-08-18T06:27:08.4574826Z + echo 'git status --porcelain:'
2021-08-18T06:27:08.4575321Z git status --porcelain:
2021-08-18T06:27:08.4575805Z + echo '?? third_party/pocketfft/'
2021-08-18T06:27:08.4576216Z ?? third_party/pocketfft/
2021-08-18T06:27:08.4576553Z + exit 1
2021-08-18T06:27:08.4576825Z + cleanup
2021-08-18T06:27:08.4577123Z + retcode=1
2021-08-18T06:27:08.4577393Z + set +x
2021-08-18T06:27:08.4577756Z =================== sccache compilation log ===================
2021-08-18T06:27:08.4751376Z =========== If your build fails, please take a look at the log above for possible reasons ===========

1 job timed out:

  • pytorch_linux_xenial_py3_clang7_asan_test2

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Dec 24, 2020
ghstack-source-id: 7061ac75798ee55fa6ad82e35a84570e81566495
Pull Request resolved: #49821
@ppwwyyxx ppwwyyxx requested a review from xiaomengy December 24, 2020 07:40
@mingfeima
Copy link
Collaborator Author

From performance point of view, GroupNorm favors NCHW over NHWC, as GroupNorm is accumulating mean and rstd from dimensions of DHW (NCHW = N{GD}HW with C = {GD}) while for channels last memory format, the physical memory layout would be NHW{GD}.

The implementation of this patch tries to do reduce on NHWC=>NC first and then NC=> NG, in this way we can always use C for vectorization. Yet the performance of channels last is still slightly worse than contiguous format:

paper here replaces BN with GN in RN50, so I am benchmarking RN50 BN sizes here:
single core inference result from Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Unit: ms per iteration):

input sizes nchw nhwc
[1,64,112,112] 0.290 0.430
[1,64,56,56] 0.103 0.119
[1,256,56,56] 0.293 0.410
[1,128,56,56] 0.156 0.222
[1,128,28,28] 0.050 0.058
[1,512,28,28] 0.155 0.219
[1,256,28,28] 0.091 0.112
[1,256,14,14] 0.027 0.032
[1,1024,14,14] 0.092 0.116
[1,256,14,14] 0.026 0.031
[1,512,14,14] 0.052 0.056
[1,512,7,7] 0.021 0.022
[1,2048,7,7] 0.056 0.064

Copy link
Contributor
@xiaomengy xiaomengy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will try to add the CUDA impl later to see what performance we can reach on CUDA for NHWC format.

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor

Please rebase, we had to revert thcc_conv2 PR, so I need to know if we can continue this stack in parallel to fixing that one.

@mingfeima
Copy link
Collaborator Author

@VitalyFedyunin rebased, please check!

@seemethere seemethere removed the ci/all label Aug 2, 2021
@mingfeima
Copy link
Collaborator Author

@VitalyFedyunin rebased, please check!

@mingfeima
Copy link
Collaborator Author

rebased!

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 5b7cdc5.

@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/8/head branch August 27, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants