Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PocketFFT support #60976

Closed
wants to merge 5 commits into from
Closed

Add PocketFFT support #60976

wants to merge 5 commits into from

Conversation

malfet
Copy link
Contributor
@malfet malfet commented Jun 29, 2021

Needed on platforms, that do not have MKL, such as aarch64 and M1

  • Add AT_POCKETFFT_ENABLED() to Config.h.in
  • Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT
  • Modify spectral test to use @skipCPUIfNoFFT instead of @skipCPUIfNoMKL

Share implementation of _out functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations

Fixes #41592

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Jun 29, 2021

💊 CI failures summary and remediations

As of commit 94b89e3 (more details on the Dr. CI page and at hud.pytorch.org/pr/60976):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build Lint / clang-tidy (1/1)

Step: "Run clang-tidy" (full log | diagnosis details | 🔁 rerun)

2021-06-30T14:53:33.2438724Z ../aten/src/ATen/P...r: 'omp.h' file not found [clang-diagnostic-error]
2021-06-30T14:53:19.4223992Z + python3 tools/linter/clang_tidy.py --parallel --verbose --paths torch/csrc/ --diff-file pr.diff --include-dir /usr/lib/llvm-11/include/openmp -g-torch/csrc/jit/passes/onnx/helper.cpp -g-torch/csrc/jit/passes/onnx/shape_type_inference.cpp -g-torch/csrc/jit/serialization/onnx.cpp -g-torch/csrc/jit/serialization/export.cpp -g-torch/csrc/jit/serialization/import.cpp -g-torch/csrc/jit/serialization/import_legacy.cpp -g-torch/csrc/onnx/init.cpp -g-torch/csrc/cuda/nccl.* -g-torch/csrc/cuda/python_nccl.cpp -g-torch/csrc/autograd/FunctionsManual.cpp -g-torch/csrc/generic/*.cpp -g-torch/csrc/jit/codegen/cuda/runtime/* -g-torch/csrc/deploy/interpreter/interpreter.cpp -g-torch/csrc/deploy/interpreter/interpreter.h -g-torch/csrc/deploy/interpreter/interpreter_impl.h -g-torch/csrc/deploy/interpreter/test_main.cpp
2021-06-30T14:53:33.2432151Z Traceback (most recent call last):
2021-06-30T14:53:33.2433066Z   File "tools/linter/clang_tidy.py", line 405, in <module>
2021-06-30T14:53:33.2433562Z     main()
2021-06-30T14:53:33.2433928Z   File "tools/linter/clang_tidy.py", line 387, in main
2021-06-30T14:53:33.2434460Z     clang_tidy_output = run_clang_tidy(options, line_filters, files)
2021-06-30T14:53:33.2435275Z   File "tools/linter/clang_tidy.py", line 229, in run_clang_tidy
2021-06-30T14:53:33.2436134Z     raise RuntimeError(message.format(output))
2021-06-30T14:53:33.2437450Z RuntimeError: Found clang-diagnostic-errors in clang-tidy output: >>>
2021-06-30T14:53:33.2437968Z stdout:
2021-06-30T14:53:33.2438724Z ../aten/src/ATen/ParallelOpenMP.h:11:10: error: 'omp.h' file not found [clang-diagnostic-error]
2021-06-30T14:53:33.2439314Z #include <omp.h>
2021-06-30T14:53:33.2439597Z          ^
2021-06-30T14:53:33.2439769Z 
2021-06-30T14:53:33.2440012Z stderr:
2021-06-30T14:53:33.2440365Z 6963 warnings and 1 error generated.
2021-06-30T14:53:33.2440894Z Error while processing /__w/pytorch/pytorch/torch/csrc/Module.cpp.
2021-06-30T14:53:33.2441425Z 13926 warnings and 2 errors generated.
2021-06-30T14:53:33.2441953Z Error while processing /__w/pytorch/pytorch/torch/csrc/Module.cpp.
2021-06-30T14:53:33.2443016Z Suppressed 14268 warnings (13784 in non-user code, 142 due to line filter, 342 NOLINT).
2021-06-30T14:53:33.2444026Z Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_ios_12_0_0_x86_64_full_jit_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Failed to recurse into submodule path 'third_party/kineto'
remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0        
remote: Enumerating objects: 610, done.        
remote: Counting objects:   0% (1/610)        
remote: Counting objects:   1% (7/610)        
remote: Counting objects:   2% (13/610)        
remote: Counting objects:   3% (19/610)        
remote: Counting objects:   4% (25/610)        
remote: Counting objects:   5% (31/610)        
remote: Counting objects:   6% (37/610)        
remote: Counting objects:   7% (43/610)        
remote: Counting objects:   8% (49/610)        
remote: Counting objects:   9% (55/610)        
remote: Counting objects:  10% (61/610)        
remote: Counting objects:  11% (68/610)        
remote: Counting objects:  12% (74/610)        
remote: Counting objects:  13% (80/610)        
remote: Counting objects:  14% (86/610)        
remote: Counting objects:  15% (92/610)        
remote: Counting objects:  16% (98/610)        
remote: Counting objects:  17% (104/610)        
remote: Counting objects:  18% (110/610)        
remote: Counting objects:  19% (116/610)        
remote: Counting objects:  20% (122/610)        
remote: Counting objects:  21% (129/610)        
remote: Counting objects:  22% (135/610)        
remote: Counting objects:  23% (141/610)        
remote: Counting objects:  24% (147/610)        
remote: Counting objects:  25% (153/610)        
remote: Counting objects:  26% (159/610)        
remote: Counting objects:  27% (165/610)        
remote: Counting objects:  28% (171/610)        
remote: Counting objects:  29% (177/610)        
remote: Counting objects:  30% (183/610)        
remote: Counting objects:  31% (190/610)        
remote: Counting objects:  32% (196/610)        
remote: Counting objects:  33% (202/610)        
remote: Counting objects:  34% (208/610)        
remote: Counting objects:  35% (214/610)        
remote: Counting objects:  36% (220/610)        
remote: Counting objects:  37% (226/610)        
remote: Counting objects:  38% (232/610)        
remote: Counting objects:  39% (238/610)        
remote: Counting objects:  40% (244/610)        
remote: Counting objects:  41% (251/610)        
remote: Counting objects:  42% (257/610)        
remote: Counting objects:  43% (263/610)        
remote: Counting objects:  44% (269/610)        
remote: Counting objects:  45% (275/610)        
remote: Counting objects:  46% (281/610)        
remote: Counting objects:  47% (287/610)        
remote: Counting objects:  48% (293/610)        
remote: Counting objects:  49% (299/610)        
remote: Counting objects:  50% (305/610)        
remote: Counting objects:  51% (312/610)        
remote: Counting objects:  52% (318/610)        
remote: Counting objects:  53% (324/610)        
remote: Counting objects:  54% (330/610)        
remote: Counting objects:  55% (336/610)        
remote: Counting objects:  56% (342/610)        
remote: Counting objects:  57% (348/610)        
remote: Counting objects:  58% (354/610)        
remote: Counting objects:  59% (360/610)        
remote: Counting objects:  60% (366/610)        
remote: Counting objects:  61% (373/610)        
remote: Counting objects:  62% (379/610)        
remote: Counting objects:  63% (385/610)        
remote: Counting objects:  64% (391/610)        
remote: Counting objects:  65% (397/610)        
remote: Counting objects:  66% (403/610)        
remote: Counting objects:  67% (409/610)        
remote: Counting objects:  68% (415/610)        
remote: Counting objects:  69% (421/610)        
remote: Counting objects:  70% (427/610)        
remote: Counting objects:  71% (434/610)        
remote: Counting objects:  72% (440/610)        
remote: Counting objects:  73% (446/610)        
remote: Counting objects:  74% (452/610)        
remote: Counting objects:  75% (458/610)        
remote: Counting objects:  76% (464/610)        
remote: Counting objects:  77% (470/610)        
remote: Counting objects:  78% (476/610)        
remote: Counting objects:  79% (482/610)        
remote: Counting objects:  80% (488/610)        
remote: Counting objects:  81% (495/610)        
remote: Counting objects:  82% (501/610)        
remote: Counting objects:  83% (507/610)        
remote: Counting objects:  84% (513/610)        
remote: Counting objects:  85% (519/610)        
remote: Counting objects:  86% (525/610)        
remote: Counting objects:  87% (531/610)        
remote: Counting objects:  88% (537/610)        
remote: Counting objects:  89% (543/610)        
remote: Counting objects:  90% (549/610)        
remote: Counting objects:  91% (556/610)        
remote: Counting objects:  92% (562/610)        
remote: Counting objects:  93% (568/610)        
remote: Counting objects:  94% (574/610)        
remote: Counting objects:  95% (580/610)        
remote: Counting objects:  96% (586/610)        
remote: Counting objects:  97% (592/610)        
remote: Counting objects:  98% (598/610)        
remote: Counting objects:  99% (604/610)        
remote: Counting objects: 100% (610/610)        
remote: Counting objects: 100% (610/610), done.        
remote: Compressing objects:   0% (1/323)        
remote: Compressing objects:   1% (4/323)        
remote: Compressing objects:   2% (7/323)        
remote: Compressing objects:   3% (10/323)        
remote: Compressing objects:   4% (13/323)        
remote: Compressing objects:   5% (17/323)        
remote: Compressing objects:   6% (20/323)        
remote: Compressing objects:   7% (23/323)        
remote: Compressing objects:   8% (26/323)        
remote: Compressing objects:   9% (30/323)        
remote: Compressing objects:  10% (33/323)        
remote: Compressing objects:  11% (36/323)        
remote: Compressing objects:  12% (39/323)        
remote: Compressing objects:  13% (42/323)        
remote: Compressing objects:  14% (46/323)        
remote: Compressing objects:  15% (49/323)        
remote: Compressing objects:  16% (52/323)        
remote: Compressing objects:  17% (55/323)        
remote: Compressing objects:  18% (59/323)        
remote: Compressing objects:  19% (62/323)        
remote: Compressing objects:  20% (65/323)        
remote: Compressing objects:  21% (68/323)        
remote: Compressing objects:  22% (72/323)        
remote: Compressing objects:  23% (75/323)        
remote: Compressing objects:  24% (78/323)        
remote: Compressing objects:  25% (81/323)        
remote: Compressing objects:  26% (84/323)        
remote: Compressing objects:  27% (88/323)        
remote: Compressing objects:  28% (91/323)        
remote: Compressing objects:  29% (94/323)        
remote: Compressing objects:  30% (97/323)        
remote: Compressing objects:  31% (101/323)        
remote: Compressing objects:  32% (104/323)        
remote: Compressing objects:  33% (107/323)        
remote: Compressing objects:  34% (110/323)        
remote: Compressing objects:  35% (114/323)        
remote: Compressing objects:  36% (117/323)        
remote: Compressing objects:  37% (120/323)        
remote: Compressing objects:  38% (123/323)        
remote: Compressing objects:  39% (126/323)        
remote: Compressing objects:  40% (130/323)        
remote: Compressing objects:  41% (133/323)        
remote: Compressing objects:  42% (136/323)        
remote: Compressing objects:  43% (139/323)        
remote: Compressing objects:  44% (143/323)        
remote: Compressing objects:  45% (146/323)        
remote: Compressing objects:  46% (149/323)        
remote: Compressing objects:  47% (152/323)        
remote: Compressing objects:  48% (156/323)        
remote: Compressing objects:  49% (159/323)        
remote: Compressing objects:  50% (162/323)        
remote: Compressing objects:  51% (165/323)        
remote: Compressing objects:  52% (168/323)        
remote: Compressing objects:  53% (172/323)        
remote: Compressing objects:  54% (175/323)        
remote: Compressing objects:  55% (178/323)        
remote: Compressing objects:  56% (181/323)        
remote: Compressing objects:  57% (185/323)        
remote: Compressing objects:  58% (188/323)        
remote: Compressing objects:  59% (191/323)        
remote: Compressing objects:  60% (194/323)        
remote: Compressing objects:  61% (198/323)        
remote: Compressing objects:  62% (201/323)        
remote: Compressing objects:  63% (204/323)        
remote: Compressing objects:  64% (207/323)        
remote: Compressing objects:  65% (210/323)        
remote: Compressing objects:  66% (214/323)        
remote: Compressing objects:  67% (217/323)        
remote: Compressing objects:  68% (220/323)        
remote: Compressing objects:  69% (223/323)        
remote: Compressing objects:  70% (227/323)        
remote: Compressing objects:  71% (230/323)        
remote: Compressing objects:  72% (233/323)        
remote: Compressing objects:  73% (236/323)        
remote: Compressing objects:  74% (240/323)        
remote: Compressing objects:  75% (243/323)        
remote: Compressing objects:  76% (246/323)        
remote: Compressing objects:  77% (249/323)        
remote: Compressing objects:  78% (252/323)        
remote: Compressing objects:  79% (256/323)        
remote: Compressing objects:  80% (259/323)        
remote: Compressing objects:  81% (262/323)        
remote: Compressing objects:  82% (265/323)        
remote: Compressing objects:  83% (269/323)        
remote: Compressing objects:  84% (272/323)        
remote: Compressing objects:  85% (275/323)        
remote: Compressing objects:  86% (278/323)        
remote: Compressing objects:  87% (282/323)        
remote: Compressing objects:  88% (285/323)        
remote: Compressing objects:  89% (288/323)        
remote: Compressing objects:  90% (291/323)        
remote: Compressing objects:  91% (294/323)        
remote: Compressing objects:  92% (298/323)        
remote: Compressing objects:  93% (301/323)        
remote: Compressing objects:  94% (304/323)        
remote: Compressing objects:  95% (307/323)        
remote: Compressing objects:  96% (311/323
Receiving objects:   0% (1/370)
Receiving objects:   1% (4/370)
Receiving objects:   2% (8/370)
Receiving objects:   3% (12/370)
Receiving objects:   4% (15/370)
Receiving objects:   5% (19/370)
Receiving objects:   6% (23/370)
Receiving objects:   7% (26/370)
Receiving objects:   8% (30/370)
Receiving objects:   9% (34/370)
Receiving objects:  10% (37/370)
Receiving objects:  11% (41/370)
Receiving objects:  12% (45/370)
Receiving objects:  13% (49/370)
Receiving objects:  14% (52/370)
Receiving objects:  15% (56/370)
Receiving objects:  16% (60/370)
Receiving objects:  17% (63/370)
Receiving objects:  18% (67/370)
Receiving objects:  19% (71/370)
Receiving objects:  20% (74/370)
Receiving objects:  21% (78/370)
Receiving objects:  22% (82/370)
Receiving objects:  23% (86/370)
Receiving objects:  24% (89/370)
Receiving objects:  25% (93/370)
Receiving objects:  26% (97/370)
Receiving objects:  27% (100/370)
Receiving objects:  28% (104/370)
Receiving objects:  29% (108/370)
Receiving objects:  30% (111/370)
Receiving objects:  31% (115/370)
Receiving objects:  32% (119/370)
Receiving objects:  33% (123/370)
Receiving objects:  34% (126/370)
Receiving objects:  35% (130/370)
Receiving objects:  36% (134/370)
Receiving objects:  37% (137/370)
Receiving objects:  38% (141/370)
Receiving objects:  39% (145/370)
Receiving objects:  40% (148/370)
Receiving objects:  41% (152/370)
Receiving objects:  42% (156/370)
Receiving objects:  43% (160/370)
Receiving objects:  44% (163/370)
Receiving objects:  45% (167/370)
Receiving objects:  46% (171/370)
Receiving objects:  47% (174/370)
Receiving objects:  48% (178/370)
Receiving objects:  49% (182/370)
Receiving objects:  50% (185/370)
Receiving objects:  51% (189/370)
Receiving objects:  52% (193/370)
Receiving objects:  53% (197/370)
Receiving objects:  54% (200/370)
Receiving objects:  55% (204/370)
Receiving objects:  56% (208/370)
Receiving objects:  57% (211/370)
Receiving objects:  58% (215/370)
Receiving objects:  59% (219/370)
Receiving objects:  60% (222/370)
Receiving objects:  61% (226/370)
Receiving objects:  62% (230/370)
Receiving objects:  63% (234/370)
Receiving objects:  64% (237/370)
Receiving objects:  65% (241/370)
Receiving objects:  66% (245/370)
Receiving objects:  67% (248/370)
Receiving objects:  68% (252/370)
Receiving objects:  69% (256/370)
Receiving objects:  70% (259/370)
Receiving objects:  71% (263/370)
Receiving objects:  72% (267/370)
Receiving objects:  73% (271/370)
Receiving objects:  74% (274/370)
Receiving objects:  75% (278/370)
Receiving objects:  76% (282/370)
Receiving objects:  77% (285/370)
Receiving objects:  78% (289/370)
Receiving objects:  79% (293/370)
Receiving objects:  80% (296/370)
Receiving objects:  81% (300/370)
remote: Total 370 (delta 199), reused 107 (delta 23), pack-reused 0        
Receiving objects:  82% (304/370)
Receiving objects:  83% (308/370)
Receiving objects:  84% (311/370)
Receiving objects:  85% (315/370)
Receiving objects:  86% (319/370)
Receiving objects:  87% (322/370)
Receiving objects:  88% (326/370)
Receiving objects:  89% (330/370)
Receiving objects:  90% (333/370)
Receiving objects:  91% (337/370)
Receiving objects:  92% (341/370)
Receiving objects:  93% (345/370)
Receiving objects:  94% (348/370)
Receiving objects:  95% (352/370)
Receiving objects:  96% (356/370)
Receiving objects:  97% (359/370)
Receiving objects:  98% (363/370)
Receiving objects:  99% (367/370)
Receiving objects: 100% (370/370)
Receiving objects: 100% (370/370), 1.18 MiB | 7.52 MiB/s, done.
Resolving deltas:   0% (0/199)
Resolving deltas:   1% (2/199)
Resolving deltas:   2% (4/199)
Resolving deltas:   3% (7/199)
Resolving deltas:   4% (8/199)
Resolving deltas:   5% (11/199)
Resolving deltas:   6% (12/199)
Resolving deltas:   7% (14/199)
Resolving deltas:   8% (16/199)
Resolving deltas:   9% (18/199)
Resolving deltas:  10% (20/199)
Resolving deltas:  12% (24/199)
Resolving deltas:  13% (27/199)
Resolving deltas:  14% (28/199)
Resolving deltas:  17% (34/199)
Resolving deltas:  18% (36/199)
Resolving deltas:  19% (39/199)
Resolving deltas:  20% (40/199)
Resolving deltas:  21% (43/199)
Resolving deltas:  22% (44/199)
Resolving deltas:  23% (46/199)
Resolving deltas:  24% (48/199)
Resolving deltas:  25% (50/199)
Resolving deltas:  26% (52/199)
Resolving deltas:  27% (54/199)
Resolving deltas:  28% (56/199)
Resolving deltas:  29% (58/199)
Resolving deltas:  30% (61/199)
Resolving deltas:  31% (62/199)
Resolving deltas:  32% (64/199)
Resolving deltas:  33% (66/199)
Resolving deltas:  34% (68/199)
Resolving deltas:  35% (70/199)
Resolving deltas:  36% (72/199)
Resolving deltas:  38% (76/199)
Resolving deltas:  39% (79/199)
Resolving deltas:  40% (80/199)
Resolving deltas:  41% (82/199)
Resolving deltas:  42% (84/199)
Resolving deltas:  43% (86/199)
Resolving deltas:  44% (88/199)
Resolving deltas:  45% (90/199)
Resolving deltas:  46% (92/199)
Resolving deltas:  47% (94/199)
Resolving deltas:  48% (96/199)
Resolving deltas:  49% (98/199)
Resolving deltas:  50% (100/199)
Resolving deltas:  51% (102/199)
Resolving deltas:  53% (106/199)
Resolving deltas:  55% (110/199)
Resolving deltas:  56% (112/199)
Resolving deltas:  57% (114/199)
Resolving deltas:  58% (116/199)
Resolving deltas:  59% (118/199)
Resolving deltas:  60% (120/199)
Resolving deltas:  61% (122/199)
Resolving deltas:  62% (124/199)
Resolving deltas:  63% (127/199)
Resolving deltas:  64% (128/199)
Resolving deltas:  65% (130/199)
Resolving deltas:  67% (134/199)
Resolving deltas:  68% (136/199)
Resolving deltas:  69% (138/199)
Resolving deltas:  70% (140/199)
Resolving deltas:  71% (142/199)
Resolving deltas:  72% (144/199)
Resolving deltas:  73% (146/199)
Resolving deltas:  74% (148/199)
Resolving deltas:  75% (150/199)
Resolving deltas:  76% (152/199)
Resolving deltas:  77% (154/199)
Resolving deltas:  78% (156/199)
Resolving deltas:  79% (158/199)
Resolving deltas:  80% (160/199)
Resolving deltas:  81% (162/199)
Resolving deltas:  82% (165/199)
Resolving deltas:  83% (166/199)
Resolving deltas:  84% (168/199)
Resolving deltas:  85% (170/199)
Resolving deltas:  86% (172/199)
Resolving deltas:  87% (174/199)
Resolving deltas:  88% (176/199)
Resolving deltas:  89% (178/199)
Resolving deltas:  90% (180/199)
Resolving deltas:  91% (182/199)
Resolving deltas:  92% (185/199)
Resolving deltas:  93% (186/199)
Resolving deltas:  94% (188/199)
Resolving deltas:  95% (190/199)
Resolving deltas:  96% (192/199)
Resolving deltas:  97% (194/199)
Resolving deltas:  98% (196/199)
Resolving deltas:  99% (198/199)
Resolving deltas: 100% (199/199)
Resolving deltas: 100% (199/199), completed with 151 local objects.
From ssh://github.com/facebook/zstd
 * branch            aec56a52fbab207fc639a1937d1e708a282edca8 -> FETCH_HEAD
Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8'
Failed to recurse into submodule path 'third_party/kineto'


Exited with code exit status 1


Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry mruberry requested a review from peterbell10 June 29, 2021 19:23
@malfet malfet force-pushed the malfet/add-pocketfft-support branch 2 times, most recently from e2efbc5 to 59ed84f Compare June 29, 2021 21:54
@codecov
Copy link
codecov bot commented Jun 30, 2021

Codecov Report

Merging #60976 (8a86f2d) into master (001ff3a) will decrease coverage by 4.87%.
The diff coverage is 89.25%.

❗ Current head 8a86f2d differs from pull request most recent head 94b89e3. Consider uploading reports for the commit 94b89e3 to get more accurate results

@@            Coverage Diff             @@
##           master   #60976      +/-   ##
==========================================
- Coverage   80.60%   75.72%   -4.88%     
==========================================
  Files        1879     2062     +183     
  Lines      202892   209076    +6184     
==========================================
- Hits       163543   158333    -5210     
- Misses      39349    50743   +11394     

malfet added 5 commits June 30, 2021 07:48
Needed on platforms, that do not have MKL, such as aarch64 and M1
- Add `AT_POCKETFFT_ENABLED()` to Config.h.in
- Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT
- Modify spectral test to use @skipCPUIfNoFFT instead of @skipCPUIfNoMKL
@malfet malfet requested a review from peterbell10 June 30, 2021 14:50
@malfet malfet force-pushed the malfet/add-pocketfft-support branch from 72780df to 94b89e3 Compare June 30, 2021 14:50
@malfet
Copy link
Contributor Author
malfet commented Jun 30, 2021

Rebase PR on top of #60313

@facebook-github-bot
Copy link
Contributor

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment on lines +222 to +225
find_path(POCKETFFT_INCLUDE_DIR NAMES pocketfft_hdronly.h
PATHS /usr/local/include
PATHS $ENV{POCKETFFT_HOME}
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not just include in third_party?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is not hosted on github, which might be hard to sync up if used frequently.
But might create a mirror on github and add to 3rd party later on


} // anonymous namespace

Tensor _fft_c2r_mkl(const Tensor& self, IntArrayRef dim, int64_t normalization, int64_t last_dim_size) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth changing these to _cpu instead of _mkl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan to do it in followup PR (and actually move parts of the code away from mkl/SpectralOps.cpp folder)

REGISTER_NO_CPU_DISPATCH(fft_fill_with_conjugate_symmetry_stub, fft_fill_with_conjugate_symmetry_fn);

Tensor _fft_c2r_mkl(const Tensor& self, IntArrayRef dim, int64_t normalization, int64_t last_dim_size) {
AT_ERROR("fft: ATen not compiled with FFT support");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for a follow-up: AT_ERROR is deprecated use TORCH_CHECK(false, ...)

@@ -1094,6 +1094,11 @@ def skipCPUIfNoLapack(fn):
return skipCPUIf(not torch._C.has_lapack, "PyTorch compiled without Lapack")(fn)


# Skips a test on CPU if FFT is not available.
def skipCPUIfNoFFT(fn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a little odd because "NoFFT" suggests there's no fft available on any device type. It would more accurately be "skipCPUIfNoCPUFFT" or "skipCPUIfNoMKLorPocket" or something

Copy link
Collaborator
@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

I made a few minor inline comments but nothing that should stop this PR

@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 4036820.

facebook-github-bot pushed a commit that referenced this pull request Jul 1, 2021
Summary:
The new tag should fix the "Missing <omp.h>" error message on clang-tidy runs.

Pull Request resolved: #61115

Test Plan:
Ran the clang-tidy job using the diff from #60976.

Expected Output:
There should be no clang diagnostic errors.

Reviewed By: walterddr

Differential Revision: D29516845

Pulled By: 1ntEgr8

fbshipit-source-id: 554229904db67eb7a7b93b3def434b30de6a43b0
@erksch
Copy link
erksch commented Jul 21, 2021

Hey there! I just tried to use this functionality by running a PyTorch lite model that uses torch.fft.fft on Android with the org.pytorch:pytorch_android:1.10.0-SNAPSHOT gradle dependency which I assumed contains this change.

However running the model gives:

    com.facebook.jni.CppException: fft: ATen not compiled with FFT support
      
      Debug info for handle, -1, not found.
      
    Exception raised from _fft_r2c_mkl at /var/lib/jenkins/workspace/aten/src/ATen/native/mkl/SpectralOps.cpp:570 (most recent call first):
    (no backtrace available)
        at org.pytorch.LiteNativePeer.forward(Native Method)
        at org.pytorch.Module.forward(Module.java:52)

Is PocketFFT available in the nightly builds? Or is some further configuration needed?

@erksch
Copy link
erksch commented Jul 21, 2021

So it looks to me that I need to build from source with @AT_POCKETFFT_ENABLED@ set to 1 in BUILD.bazel and also need to get my hands on PocketFFT locally. I guess the pocketfft_hdronly.h can be taken from https://github.com/hayguen/pocketfft? Is there anything else that needs to be configured? Like disabling MKL?

@erksch
Copy link
erksch commented Jul 21, 2021

Update: got it working! A gamechanger for us, many thanks! Getting pocketfft_hdronly.h from https://github.com/hayguen/pocketfft worked fine.

@0xGuybrush
Copy link

So it looks to me that I need to build from source with @AT_POCKETFFT_ENABLED@ set to 1 in BUILD.bazel and also need to get my hands on PocketFFT locally. I guess the pocketfft_hdronly.h can be taken from https://github.com/hayguen/pocketfft? Is there anything else that needs to be configured? Like disabling MKL?

HI @erksch, do you have any pointers on getting this working? Facing the exact same error at the moment. Have tried

  1. setting @AT_POCKETFFT_ENABLED@ to 1 as you described
  2. copying pocketfft_hdronly.h headers into /usr/local/include/pocketfft.
  3. installing OpenBLAS

To generate the AAR after that I'm running:

USE_MLK=0 NO_MLK=1 BLAS=OpenBLAS python3 setup.py install
USE_MLK=0 NO_MLK=1 BLAS=OpenBLAS bash scripts/build_mobile.sh 
USE_MLK=0 NO_MLK=1 BLAS=OpenBLAS bash scripts/build_android.sh 

I'm still crashing in Android on:

Exception raised from _fft_r2c_mkl at ../aten/src/ATen/native/mkl/SpectralOps.cpp:569

So obviously I'm not disabling MKL correctly, but I'm not sure where I'm going wrong.

@erksch
Copy link
erksch commented Aug 5, 2021

@another-dave

What helped me was to put some print statements in cmake/Dependencies.cmake where it decided wether to use PocketFFT.

# --- [ PocketFFT
set(AT_POCKETFFT_ENABLED 0)
message(STATUS "Checking POCKETFFT")
if(NOT MKL_FOUND)
  message(STATUS "POCKETFFT No MKL")
  find_path(POCKETFFT_INCLUDE_DIR NAMES pocketfft_hdronly.h
             PATHS /usr/local/include
             PATHS $ENV{POCKETFFT_HOME}
            )
  set(POCKETFFT_INCLUDE_DIR /usr/local/include)
  message(STATUS "POCKETFFT (${POCKETFFT_INCLUDE_DIR})")
   if(POCKETFFT_INCLUDE_DIR)
    message(STATUS "Enabling POCKETFFT")
     set(AT_POCKETFFT_ENABLED 1)
   endif()
endif()

Then, when running CMake you should see

Checking POCKETFFT
POCKETFFT No MKL
POCKETFFT (${POCKETFFT_INCLUDE_DIR})
Enabling POCKETFFT

if not there is still something wrong

And also as you may see I added an explicit

set(POCKETFFT_INCLUDE_DIR /usr/local/include)

to the location where I had the header file. The find_path did not seem to work

@0xGuybrush
Copy link

@another-dave

What helped me was to put some print statements in cmake/Dependencies.cmake where it decided wether to use PocketFFT.

# --- [ PocketFFT
set(AT_POCKETFFT_ENABLED 0)
message(STATUS "Checking POCKETFFT")
if(NOT MKL_FOUND)
  message(STATUS "POCKETFFT No MKL")
  find_path(POCKETFFT_INCLUDE_DIR NAMES pocketfft_hdronly.h
             PATHS /usr/local/include
             PATHS $ENV{POCKETFFT_HOME}
            )
  set(POCKETFFT_INCLUDE_DIR /usr/local/include)
  message(STATUS "POCKETFFT (${POCKETFFT_INCLUDE_DIR})")
   if(POCKETFFT_INCLUDE_DIR)
    message(STATUS "Enabling POCKETFFT")
     set(AT_POCKETFFT_ENABLED 1)
   endif()
endif()

Then, when running CMake you should see

Checking POCKETFFT
POCKETFFT No MKL
POCKETFFT (${POCKETFFT_INCLUDE_DIR})
Enabling POCKETFFT

if not there is still something wrong

And also as you may see I added an explicit

set(POCKETFFT_INCLUDE_DIR /usr/local/include)

to the location where I had the header file. The find_path did not seem to work

Hi @erksch, thanks very much for this!! Got it sorted now 🎉

Realised from following when you said that it was still detecting MKL for some reason. I'm sure there's a more elegant way to flag that off, but if it helps anyone else, I just commented-out lines 381-394 of ./bazel-pytorch/cmake/Modules/FindMKL.cmake.

And I needed to adjust my POCKETFFT_INCLUDE_DIR directly to /usr/local/include/pocketfft rather than just /usr/local/include.

After that I ran:

USE_BLAS=OpenBLAS USE_MKL=0 NO_MKL=1 python3 setup.py install
bash scripts/build_mobile.sh 
bash scripts/build_android.sh
bash scripts/build_pytorch_android.sh

And got the AAR built correctly. (As per this comment, I had to downgrade Android NDK to 21.3.6528147 when running build_android.sh to get it working.

Thanks for the pointers & for the speedy reply!

best
Dave

@erksch
Copy link
erksch commented Aug 5, 2021

Awesome! @another-dave

@erksch
Copy link
erksch commented Aug 5, 2021

@malfet

Do you think the PocketFFT configuration should be standard for mobile? In that case, would be really helpful if the nightly mobile builds would already include this (as you already did for MacOS M1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SpectralOps CPU implementation for ARM/PowerPC processors (where MKL is not available)
6 participants