add channels last for GroupNorm #49821

mingfeima · 2020-12-24T07:25:33Z

Stack from ghstack:

opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU #58348 opitimze ConvTransposedND with mkldnn float32 and bfloat16 on CPU
enable BFloat16 mkldnn_convolution on both contiguous and channels last memory format #55864 enable BFloat16 mkldnn_convolution on both contiguous and channels last memory format
add channels last (2d) support for mkldnn_convolution #55584 add channels last (2d) support for mkldnn_convolution
add channels last support for ConvTranspose2d #51185 add channels last support for ConvTranspose2d
add channels last support for PixelShuffle and PixelUnshuffle #50573 add channels last support for PixelShuffle and PixelUnshuffle
add channels last support for ChannelShuffle #50247 add channels last support for ChannelShuffle
add channel last support for MaxUnpool2d #49984 add channel last support for MaxUnpool2d
add channels last for GroupNorm #49821 add channels last for GroupNorm

Differential Revision: D26007053

[ghstack-poisoned]

facebook-github-bot · 2020-12-24T07:25:51Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/49821
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 7759c07 (more details on the Dr. CI page):

3/3 failures possibly* introduced in this PR
- 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_clang7_asan_test2 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 18 05:37:14 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in

Aug 18 05:37:14     #4 0x55939858315f  (/opt/conda/bin/python3.6+0x13015f)
Aug 18 05:37:14     #5 0x5593985c58f2  (/opt/conda/bin/python3.6+0x1728f2)
Aug 18 05:37:14     #6 0x55939862dcd5  (/opt/conda/bin/python3.6+0x1dacd5)
Aug 18 05:37:14     #7 0x55939862fd5d  (/opt/conda/bin/python3.6+0x1dcd5d)
Aug 18 05:37:14     #8 0x55939862fdbb  (/opt/conda/bin/python3.6+0x1dcdbb)
Aug 18 05:37:14     #9 0x559398630926  (/opt/conda/bin/python3.6+0x1dd926)
Aug 18 05:37:14     #10 0x55939856a196  (/opt/conda/bin/python3.6+0x117196)
Aug 18 05:37:14     #11 0x7f56492fb83f  (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)
Aug 18 05:37:14     #12 0x5593985fa33d  (/opt/conda/bin/python3.6+0x1a733d)
Aug 18 05:37:14 
Aug 18 05:37:14 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Aug 18 05:37:14 + retcode=1
Aug 18 05:37:14 + set -e
Aug 18 05:37:14 + return 1
Aug 18 05:37:14 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX-* ]]
Aug 18 05:37:14 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X ]]
Aug 18 05:37:14 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX2-* ]]
Aug 18 05:37:14 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
Aug 18 05:37:14 + [[ pytorch-linux-xenial-py3-clang7-asan-test2 == *-NO_AVX512-* ]]
Aug 18 05:37:14 + [[ '' == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]]
Aug 18 05:37:14 + '[' -n https://github.com/pytorch/pytorch/pull/49821 ']'

linux-bionic-py3.8-gcc9-coverage / test (default, 1, 2, linux.2xlarge) (2/2)

Step: "Test PyTorch" (full log | diagnosis details | 🔁 rerun)

2021-08-18T06:27:08.4574279Z Build left local git repository checkout dirty

2021-08-18T06:26:56.2063269Z real	72m19.773s
2021-08-18T06:26:56.2063772Z user	160m36.687s
2021-08-18T06:26:56.2064179Z sys	11m36.374s
2021-08-18T06:26:56.2064704Z + assert_git_not_dirty
2021-08-18T06:26:56.2066095Z + [[ linux-bionic-py3.8-gcc9-coverage-default != *rocm* ]]
2021-08-18T06:26:56.2067173Z + [[ linux-bionic-py3.8-gcc9-coverage-default != *xla* ]]
2021-08-18T06:26:56.2067921Z ++ git status --porcelain
2021-08-18T06:27:08.4572333Z + git_status='?? third_party/pocketfft/'
2021-08-18T06:27:08.4573117Z + [[ -n ?? third_party/pocketfft/ ]]
2021-08-18T06:27:08.4573736Z + echo 'Build left local git repository checkout dirty'
2021-08-18T06:27:08.4574279Z Build left local git repository checkout dirty
2021-08-18T06:27:08.4574826Z + echo 'git status --porcelain:'
2021-08-18T06:27:08.4575321Z git status --porcelain:
2021-08-18T06:27:08.4575805Z + echo '?? third_party/pocketfft/'
2021-08-18T06:27:08.4576216Z ?? third_party/pocketfft/
2021-08-18T06:27:08.4576553Z + exit 1
2021-08-18T06:27:08.4576825Z + cleanup
2021-08-18T06:27:08.4577123Z + retcode=1
2021-08-18T06:27:08.4577393Z + set +x
2021-08-18T06:27:08.4577756Z =================== sccache compilation log ===================
2021-08-18T06:27:08.4751376Z =========== If your build fails, please take a look at the log above for possible reasons ===========

1 job timed out:

pytorch_linux_xenial_py3_clang7_asan_test2

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.2-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 7061ac75798ee55fa6ad82e35a84570e81566495 Pull Request resolved: #49821

mingfeima · 2020-12-24T07:41:06Z

From performance point of view, GroupNorm favors NCHW over NHWC, as GroupNorm is accumulating mean and rstd from dimensions of DHW (NCHW = N{GD}HW with C = {GD}) while for channels last memory format, the physical memory layout would be NHW{GD}.

The implementation of this patch tries to do reduce on NHWC=>NC first and then NC=> NG, in this way we can always use C for vectorization. Yet the performance of channels last is still slightly worse than contiguous format:

paper here replaces BN with GN in RN50, so I am benchmarking RN50 BN sizes here:
single core inference result from Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (Unit: ms per iteration):

input sizes	nchw	nhwc
[1,64,112,112]	0.290	0.430
[1,64,56,56]	0.103	0.119
[1,256,56,56]	0.293	0.410
[1,128,56,56]	0.156	0.222
[1,128,28,28]	0.050	0.058
[1,512,28,28]	0.155	0.219
[1,256,28,28]	0.091	0.112
[1,256,14,14]	0.027	0.032
[1,1024,14,14]	0.092	0.116
[1,256,14,14]	0.026	0.031
[1,512,14,14]	0.052	0.056
[1,512,7,7]	0.021	0.022
[1,2048,7,7]	0.056	0.064

xiaomengy

LGTM, I will try to add the CUDA impl later to see what performance we can reach on CUDA for NHWC format.

[ghstack-poisoned]

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

VitalyFedyunin · 2021-06-23T16:48:46Z

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

VitalyFedyunin · 2021-06-29T18:08:53Z

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

VitalyFedyunin · 2021-07-12T17:42:25Z

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

VitalyFedyunin · 2021-07-25T16:45:08Z

Please rebase, we had to revert thcc_conv2 PR, so I need to know if we can continue this stack in parallel to fixing that one.

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

mingfeima · 2021-07-26T01:08:59Z

@VitalyFedyunin rebased, please check!

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

mingfeima · 2021-08-03T07:26:49Z

@VitalyFedyunin rebased, please check!

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

mingfeima · 2021-08-18T04:46:50Z

rebased!

VitalyFedyunin · 2021-08-23T00:06:15Z

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-08-24T05:56:10Z

@VitalyFedyunin merged this pull request in 5b7cdc5.

add channels last for GroupNorm

4d479db

[ghstack-poisoned]

facebook-github-bot added the cla signed label Dec 24, 2020

mingfeima added a commit that referenced this pull request Dec 24, 2020

add channels last for GroupNorm

e77e4b9

ghstack-source-id: 7061ac75798ee55fa6ad82e35a84570e81566495 Pull Request resolved: #49821

pytorchbot added the open source label Dec 24, 2020

ppwwyyxx requested a review from xiaomengy December 24, 2020 07:40

xiaomengy approved these changes Dec 25, 2020

View reviewed changes

mingfeima mentioned this pull request Dec 31, 2020

add channel last support for MaxUnpool2d #49984

Closed

mingfeima added 3 commits January 1, 2021 08:55

Update on "add channels last for GroupNorm"

672f3e0

[ghstack-poisoned]

Update on "add channels last for GroupNorm"

9fbe280

[ghstack-poisoned]

Update on "add channels last for GroupNorm"

cea353d

[ghstack-poisoned]

mingfeima mentioned this pull request Jan 8, 2021

add channels last support for ChannelShuffle #50247

Closed

mingfeima mentioned this pull request Jan 15, 2021

add channels last support for PixelShuffle and PixelUnshuffle #50573

Closed

mingfeima added 5 commits January 18, 2021 10:01

Update on "add channels last for GroupNorm"

56917f5

[ghstack-poisoned]

Update on "add channels last for GroupNorm"

bc57860

[ghstack-poisoned]

Update on "add channels last for GroupNorm"

c0802ed

[ghstack-poisoned]

Update on "add channels last for GroupNorm"

6fe303c

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

212303d

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

mingfeima mentioned this pull request Jan 27, 2021

add channels last support for ConvTranspose2d #51185

Closed

mingfeima added 5 commits February 5, 2021 11:02

Update on "add channels last for GroupNorm"

4a488e7

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

a2df806

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

80d6b6a

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

17772dd

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

a041b37

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

mingfeima added 6 commits June 3, 2021 12:54

Update on "add channels last for GroupNorm"

30fef0b

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

771e7e3

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

1ed1233

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

9979057

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

361435e

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

1bfc81b

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

a4624ab

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

45012df

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

This was referenced Jul 7, 2021

add channels last support for FractionalMaxPool2d #61337

Closed

add channels last support for FractionalMaxPool2d and FractionalMaxPool3d #61338

Closed

Update on "add channels last for GroupNorm"

95acfc3

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

VitalyFedyunin approved these changes Jul 20, 2021

View reviewed changes

Update on "add channels last for GroupNorm"

95f751d

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

seemethere removed the ci/all label Aug 2, 2021

Update on "add channels last for GroupNorm"

b9d77a9

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

Update on "add channels last for GroupNorm"

7759c07

Differential Revision: [D26007053](https://our.internmc.facebook.com/intern/diff/D26007053) [ghstack-poisoned]

facebook-github-bot closed this in 5b7cdc5 Aug 24, 2021

facebook-github-bot added the Merged label Aug 24, 2021

facebook-github-bot deleted the gh/mingfeima/8/head branch August 27, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add channels last for GroupNorm #49821

add channels last for GroupNorm #49821

add channels last for GroupNorm #49821

add channels last for GroupNorm #49821

Conversation

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_clang7_asan_test2 (1/2)

linux-bionic-py3.8-gcc9-coverage / test (default, 1, 2, linux.2xlarge) (2/2)

ci.pytorch.org: 1 failed

Choose a reason for hiding this comment