Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4/n] [c10d] Introduce the multi-tenancy feature in TCPStore #58331

Closed
wants to merge 9 commits into from

Conversation

cbalioglu
Copy link
Contributor
@cbalioglu cbalioglu commented May 14, 2021

Stack from ghstack:

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the TCPStore class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: D28453850

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented May 14, 2021

💊 CI failures summary and remediations

As of commit 93e9882 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

2 failures not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_bionic_py3_8_gcc9_coverage_test2 Run tests 🔁 rerun
CircleCI binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build Spin up environment 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label May 14, 2021
cbalioglu added a commit that referenced this pull request May 14, 2021
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

ghstack-source-id: 129044614
Pull Request resolved: #58331
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request May 16, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 129087844

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
Copy link
Contributor
@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully thought through the concurrency implications. After we allow multi-tenancy, do we need to worry about concurrent calls on TCPStore::* APIs?

torch/lib/c10d/test/TCPStoreTest.cpp Show resolved Hide resolved
torch/lib/c10d/TCPStore.cpp Outdated Show resolved Hide resolved
torch/lib/c10d/TCPStore.cpp Show resolved Hide resolved
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request May 26, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 129973360

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request May 26, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 129995626

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
@cbalioglu cbalioglu requested a review from mrshenli May 26, 2021 23:12
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request Jun 3, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 130522028

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
Copy link
Member
@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! We will finally be able to initialize RPC and c10d without having to open an additional port, and make the user experience better.

# Need to store in an unused variable here to ensure the first
# object is not destroyed before the second object is created.
store1 = c10d.TCPStore(addr, port, 1, True, multi_tenant=True) # type: ignore[call-arg] # noqa: F841
store2 = c10d.TCPStore(addr, port, 1, True, multi_tenant=True) # type: ignore[call-arg] # noqa: F841
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add assertions such as a set by store 1 can be accessed via a get from store2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have an assertion for that in the C++ test, but I can add it here as well.


auto daemon = std::make_unique<TCPStoreMasterDaemon>(std::move(socket));
cachedServers_.emplace(server->port(), server);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this to be a map instead of a single cached server because we want to support TCPStore::start() with multiple ports? Curious if we use that feature anywhere or if there are any use cases for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need a map because there can be multiple TCPStore instances using different ports. For instance:

s1 = TCPStore("localhost", port=1234, multi_tenant=True)
s2 = TCPStore("localhost", port=1234, multi_tenant=True)

s3 = TCPStore("localhost", port=5678, multi_tenant=True)
s4 = TCPStore("localhost", port=5678, multi_tenant=True)

Besides being pedantically correct (since we are offering an API and should not make any assumptions about how our types are going to be used), one use case that comes to my mind where this can be useful is TorchElastic. There we can use two multi-tenant TCPStore instances, one for the rendezvous itself, and one for the worker. This way we can avoid unnecessarily allocating a second port.

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request Jun 4, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 130607854

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request Jun 5, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 130662972

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)

[ghstack-poisoned]
cbalioglu added a commit that referenced this pull request Jun 5, 2021
Pull Request resolved: #58331

This PR is the final part of a stack that addresses the GitHub issue #41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 130676394

Differential Revision: [D28453850](https://our.internmc.facebook.com/intern/diff/D28453850/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1d9c1cc.

@facebook-github-bot facebook-github-bot deleted the gh/cbalioglu/36/head branch June 9, 2021 14:18
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
…#58331)

Summary:
Pull Request resolved: pytorch#58331

This PR is the final part of a stack that addresses the GitHub issue pytorch#41614; it introduces the multi-tenancy feature to the `TCPStore` class allowing two server stores to be instantiated with the same host:port pair.
ghstack-source-id: 130676394

Test Plan:
- Run the existing and newly-introduced tests.
- Run several smoke tests including the short code snippet referred in GitHub issue pytorch#41614.

Reviewed By: H-Huang

Differential Revision: D28453850

fbshipit-source-id: f9066b164305de0f8c257e9d5736e93fd7e21ec6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants