Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build support for kineto + rocm #58401

Closed
wants to merge 5 commits into from

Conversation

mwootton
Copy link
Contributor

Fixes #58399

CMake changes to allow kineto to build with rocm support.

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented May 17, 2021

💊 CI failures summary and remediations

As of commit 58e3366 (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .azure_pipelines/pytorch-tests-pipeline.yml
Auto-merging .azure_pipelines/pytorch-tests-pipeline.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/wheel-wait-template.yml
Auto-merging .azure_pipelines/job_templates/wheel-wait-template.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/wheel-wait-job-template.yml
Auto-merging .azure_pipelines/job_templates/wheel-wait-job-template.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/pytorch-template-win.yml
Auto-merging .azure_pipelines/job_templates/pytorch-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/pytorch-template-unix.yml
Auto-merging .azure_pipelines/job_templates/pytorch-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .azure_pipelines/pytorch-tests-pipeline.yml
Auto-merging .azure_pipelines/pytorch-tests-pipeline.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/wheel-wait-template.yml
Auto-merging .azure_pipelines/job_templates/wheel-wait-template.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/wheel-wait-job-template.yml
Auto-merging .azure_pipelines/job_templates/wheel-wait-job-template.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/pytorch-template-win.yml
Auto-merging .azure_pipelines/job_templates/pytorch-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/pytorch-template-unix.yml
Auto-merging .azure_pipelines/job_templates/pytorch-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See GitHub Actions build Lint / clang-tidy (3/3)

Step: "Run clang-tidy" (full log | diagnosis details | 🔁 rerun)

2021-05-27T17:04:08.9212740Z /__w/pytorch/pytor...nown type name 'PyConfig' [clang-diagnostic-error]
2021-05-27T17:04:08.9203507Z /__w/pytorch/pytorch/torch/csrc/deploy/interpreter/interpreter_impl.cpp:249:20: warning: variable '_save' is not initialized [cppcoreguidelines-init-variables]
2021-05-27T17:04:08.9204804Z     PyThreadState* _save;
2021-05-27T17:04:08.9205246Z                    ^
2021-05-27T17:04:08.9205646Z                          = nullptr
2021-05-27T17:04:08.9206952Z /__w/pytorch/pytorch/torch/csrc/deploy/interpreter/interpreter_impl.cpp:273:5: error: unknown type name 'PyPreConfig' [clang-diagnostic-error]
2021-05-27T17:04:08.9208062Z     PyPreConfig preconfig;
2021-05-27T17:04:08.9208525Z     ^
2021-05-27T17:04:08.9209848Z /__w/pytorch/pytorch/torch/csrc/deploy/interpreter/interpreter_impl.cpp:275:5: error: unknown type name 'PyStatus' [clang-diagnostic-error]
2021-05-27T17:04:08.9210937Z     PyStatus status = Py_PreInitialize(&preconfig);
2021-05-27T17:04:08.9211444Z     ^
2021-05-27T17:04:08.9212740Z /__w/pytorch/pytorch/torch/csrc/deploy/interpreter/interpreter_impl.cpp:278:5: error: unknown type name 'PyConfig' [clang-diagnostic-error]
2021-05-27T17:04:08.9213679Z     PyConfig config;
2021-05-27T17:04:08.9214058Z     ^
2021-05-27T17:04:08.9215979Z /__w/pytorch/pytorch/torch/csrc/deploy/test_deploy.cpp:151:1: warning: variable 'test_info_' provides global access to a non-const object; consider making the pointed-to data 'const' [cppcoreguidelines-avoid-non-const-global-variables]
2021-05-27T17:04:08.9218123Z TEST(TorchpyTest, AcquireMultipleSessionsInTheSamePackage) {
2021-05-27T17:04:08.9219005Z ^
2021-05-27T17:04:08.9220235Z /__w/pytorch/pytorch/cmake/../third_party/googletest/googletest/include/gtest/gtest.h:2287:42: note: expanded from macro 'TEST'
2021-05-27T17:04:08.9221238Z # define TEST(test_case_name, test_name) GTEST_TEST(test_case_name, test_name)
2021-05-27T17:04:08.9221823Z                                          ^
2021-05-27T17:04:08.9222934Z /__w/pytorch/pytorch/cmake/../third_party/googletest/googletest/include/gtest/gtest.h:2281:3: note: expanded from macro 'GTEST_TEST'
2021-05-27T17:04:08.9223851Z   GTEST_TEST_(test_case_name, test_name, \

1 failure not recognized by patterns:

Job Step Action
GitHub Actions clang-format / clang-format Run clang-format 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@github-actions github-actions bot added the module: rocm AMD GPU support for Pytorch label May 17, 2021
Comment on lines 1846 to 1849
if(USE_CUDA)
set(LIBKINETO_NOCUPTI OFF CACHE STRING "")
message(STATUS "Using Kineto with CUPTI support")
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems wrong if MSVC is set to true and USE_CUDA is set to true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it is a bit convoluted. I tried to make minimal changes to something that clearly only considers one tracing library.
There is also a hidden relation: USE_CUDA && USE_ROCM == False. These never coexist.

So the 1846 guard can only be reached if 1829 is not entered. If MSVC is set then LIBKINETO_NOCUPTI is ON.

At that point the only way into 1838 is if we are a ROCm build. So then the 1846 is always false.

Also, there is no rocm on windows (i.e. msvc).

Happy to refactor it. Suggestions welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I guess here it is a bit confusing to me because
both LIBKINETO_NOCUPTI and LIBKINETO_NOROCTRACER are cached strings. therefore users can override it. and the set statement here are noops if user has explicitly set cache.

Copy link
Contributor
@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if else logic here seems convoluted.
Can we separate the logic setting
LIBKINETO_NOROCTRACER
and
LIBKINETO_NOROCTRACER?

Seems the only part that they need to be used together is printing the message "cpu-only".

@facebook-github-bot
Copy link
Contributor

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mwootton
Copy link
Contributor Author

Indeed, can get rid of the "else" and replace with:

if(USE_ROCM AND NOT LIBKINETO_NOROCTRACER)
endif()
if(USE_CUDA AND NOT MSVC AND NOT LIBKINETO_NOCUPTI)
endif()

Embrace the double negatives. Seems easier to follow. Agree?

@mwootton
Copy link
Contributor Author

Or this:

  if((NOT USE_CUDA) OR MSVC)
    set(LIBKINETO_NOCUPTI ON CACHE STRING "")
  elseif(USE_CUDA)  # MSVC is FALSE
    set(LIBKINETO_NOCUPTI OFF CACHE STRING "")
    message(STATUS "Using Kineto with CUPTI support")
  endif()

  if(NOT USE_ROCM)
    set(LIBKINETO_NOROCTRACER ON CACHE STRING "")
  else()
    if(NOT ENV{ROCM_SOURCE_DIR})
      set(ENV{ROCM_SOURCE_DIR} "/opt/rocm")
    endif()
    set(LIBKINETO_NOROCTRACER OFF CACHE STRING "")
    message(STATUS "Using Kineto with Roctracer support")
  endif()

  if(LIBKINETO_NOCUPTI AND LIBKINETO_NOROCTRACER)
    message(STATUS "Using CPU-only version of Kineto")
  endif()

@codecov
Copy link
codecov bot commented May 17, 2021

Codecov Report

Merging #58401 (87a7508) into master (b1b9fb0) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 87a7508 differs from pull request most recent head 58e3366. Consider uploading reports for the commit 58e3366 to get more accurate results

@@            Coverage Diff             @@
##           master   #58401      +/-   ##
==========================================
- Coverage   76.47%   76.46%   -0.01%     
==========================================
  Files        1992     1992              
  Lines      199860   199860              
==========================================
- Hits       152840   152832       -8     
- Misses      47020    47028       +8     

@heitorschueroff heitorschueroff added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 18, 2021
cmake/Dependencies.cmake Show resolved Hide resolved
cmake/Dependencies.cmake Show resolved Hide resolved
Comment on lines 1836 to 1849
if(LIBKINETO_NOCUPTI AND LIBKINETO_NOROCTRACER)
message(STATUS "Using CPU-only version of Kineto")
else()
set(LIBKINETO_NOCUPTI OFF CACHE STRING "")
message(STATUS "Using Kineto with CUPTI support")
if(USE_ROCM)
if(NOT ENV{ROCM_SOURCE_DIR})
set(ENV{ROCM_SOURCE_DIR} "/opt/rocm")
endif()
set(LIBKINETO_NOROCTRACER OFF CACHE STRING "")
message(STATUS "Using Kineto with Roctracer support")
endif()
if(USE_CUDA)
set(LIBKINETO_NOCUPTI OFF CACHE STRING "")
message(STATUS "Using Kineto with CUPTI support")
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(LIBKINETO_NOCUPTI AND LIBKINETO_NOROCTRACER)
message(STATUS "Using CPU-only version of Kineto")
else()
set(LIBKINETO_NOCUPTI OFF CACHE STRING "")
message(STATUS "Using Kineto with CUPTI support")
if(USE_ROCM)
if(NOT ENV{ROCM_SOURCE_DIR})
set(ENV{ROCM_SOURCE_DIR} "/opt/rocm")
endif()
set(LIBKINETO_NOROCTRACER OFF CACHE STRING "")
message(STATUS "Using Kineto with Roctracer support")
endif()
if(USE_CUDA)
set(LIBKINETO_NOCUPTI OFF CACHE STRING "")
message(STATUS "Using Kineto with CUPTI support")
endif()
# Only one tracing library can be used.
if(NOT LIBKINETO_NOCUPTI)
set(LIBKINETO_NOROCTRACER ON CACHE STRING "" FORCE)
if(NOT USE_CUDA)
message(WARNING "Kineto tracing requested CUPIT but USE_CUDA not set)
set(LIBKINETO_NOCUPTI ON CACHE STRING "" FORCE)
else()
message(STATUS "Using Kineto with CUPTI support")
endif()
elif(LIBKINETO_NOROCTRACER)
set(LIBKINETO_NOCUPTI ON CACHE STRING "" FORCE)
if(NOT USE_CUDA)
message(WARNING "Kineto tracing requested ROCTRACER but USE_ROCM not set)
set(LIBKINETO_NOROCTRACER ON CACHE STRING "" FORCE)
else()
message(STATUS "Using Kineto with Roctracer support")
endif()
else()
message(STATUS "Using CPU-only version of Kineto")
endif()

@mwootton
Copy link
Contributor Author
mwootton commented May 27, 2021

I think I have this adequately cleaned up now.

  1. There is already logic that prevents USE_CUDA and USE_ROCM in the same pytorch build, no need to duplicate it
  2. Change: If USE_CUDA is off we should FORCE disable cupti.
  3. I added cloned logic to set up KINETO_NOROCTRACER
  4. I added a cloned block to set up provisions for roctracer

@mwootton mwootton requested a review from walterddr June 2, 2021 12:47
@facebook-github-bot
Copy link
Contributor

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@walterddr merged this pull request in e66015d.

@mwootton
Copy link
Contributor Author
mwootton commented Jun 3, 2021

Thank you.

deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
Fixes pytorch#58399

CMake changes to allow kineto to build with rocm support.

Pull Request resolved: pytorch#58401

Reviewed By: mruberry

Differential Revision: D28479807

Pulled By: walterddr

fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7
mwootton added a commit to mwootton/pytorch that referenced this pull request Jun 14, 2021
Summary:
Fixes pytorch#58399

CMake changes to allow kineto to build with rocm support.

Pull Request resolved: pytorch#58401

Reviewed By: mruberry

Differential Revision: D28479807

Pulled By: walterddr

fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Jun 14, 2021
Summary:
Fixes pytorch#58399

CMake changes to allow kineto to build with rocm support.

Pull Request resolved: pytorch#58401

Reviewed By: mruberry

Differential Revision: D28479807

Pulled By: walterddr

fbshipit-source-id: fc01f05b2a5592ee1d1dbd71d2d4f7aec1bd74f7
facebook-github-bot pushed a commit to pytorch/kineto that referenced this pull request Nov 8, 2021
Summary:
Issue #426

This adds a RoctracerActivityInterface and hooks it into ActivityProfiler (when a rocm build is detected).
Uses GenericTraceActivity records to avoid muddying up the waters.
Build support in pytorch is already upstreamed - pytorch/pytorch#58401

Pull Request resolved: #427

Reviewed By: briancoutinho, robieta

Differential Revision: D31096888

Pulled By: gdankel

fbshipit-source-id: f6295f7c152e7a972511b657107da75d30a9e108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: rocm AMD GPU support for Pytorch open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add cmake support for ROCm Kineto
5 participants