Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve docker.NewResolver race condition #8748

Merged
merged 2 commits into from
Jul 10, 2023

Conversation

djdongjin
Copy link
Member
@djdongjin djdongjin commented Jun 27, 2023

Fix #8742

It can be reproduced with the added test:

# without the change:
$ go test -race ./remotes/docker/...
?       github.com/containerd/containerd/remotes/docker/schema1 [no test files]
==================
WARNING: DATA RACE
Read at 0x00c0003e08a0 by goroutine 189:
  runtime.mapaccess2_faststr()
      /usr/local/go/src/runtime/map_faststr.go:108 +0x0
  github.com/containerd/containerd/remotes/docker.NewResolver()
      /home/azureuser/code/containerd/remotes/docker/resolver.go:157 +0x22c
  github.com/containerd/containerd/remotes/docker.runBasicTest()
...

# with the change
$ go test -race ./remotes/docker/...
?       github.com/containerd/containerd/remotes/docker/schema1 [no test files]
ok      github.com/containerd/containerd/remotes/docker 0.677s
ok      github.com/containerd/containerd/remotes/docker/auth    (cached)
ok      github.com/containerd/containerd/remotes/docker/config  0.046s

Copy link
Contributor
@shuaichang shuaichang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending out this quick PR for fixing it and greate go test with race detection!

remotes/docker/resolver.go Show resolved Hide resolved
@djdongjin djdongjin force-pushed the cri-header-race branch 2 times, most recently from 3f46047 to f11aa82 Compare June 28, 2023 16:36
@djdongjin
Copy link
Member Author
djdongjin commented Jun 28, 2023

Added another commit to replace a similar case to using the builtin Clone(). #4855 fixed a similar issue a while ago.

@thaJeztah @dcantah wonder if you guys feel the current fix looks good or any suggestions I can improve (either in this PR or as follow-up). 😁

Very orthogonal, but I was actually looking at this part of the code recently, as it's somewhat awkward to use

I've always found this design kind of odd (NewResolver), as for CRI if you look at image_pull, we need to make a

Totally agree more improvement/refactor can be made to this part of code. I'm relatively new to this part of code so haven't figure out a good way to make larger improvement 😁

@djdongjin
Copy link
Member Author
djdongjin commented Jul 3, 2023

/cc @thaJeztah @dcantah
kindly ping, really appreciate any comments/feedback 🙏

@k8s-ci-robot k8s-ci-robot requested review from thaJeztah and dcantah July 3, 2023 21:56
@k8s-ci-robot
Copy link

@djdongjin: GitHub didn't allow me to request PR reviews from the following users: kindly, ping.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @thaJeztah @dcantah kindly ping
really appreciate any comments/feedback 🙏

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

remotes/docker/resolver.go Outdated Show resolved Hide resolved
Copy link
Member
@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. @thaJeztah and I were both trying to make sense of the resolver code in general as we'd never dug in, but obviously it's a rabbit hole. I think this is fine given what it fixes. As a follow up (for anyone) we should see if there's anything we can do here to avoid a bunch of copies and needing to keep remaking resolver objects, whether that be API or otherwise

@djdongjin djdongjin force-pushed the cri-header-race branch 2 times, most recently from fd98e3e to 086d968 Compare July 4, 2023 00:54
@djdongjin
Copy link
Member Author

/retest

@djdongjin djdongjin requested a review from dcantah July 5, 2023 21:08
@djdongjin
Copy link
Member Author
djdongjin commented Jul 6, 2023

Thanks @dcantah @thaJeztah for reviewing my PR! 🙏
@thaJeztah appreciate any other comment/suggestion that I can improve in the PR. 😁

@djdongjin djdongjin force-pushed the cri-header-race branch 2 times, most recently from 12bd529 to d0fb31e Compare July 8, 2023 03:12
@thaJeztah
Copy link
Member

I think the changes in this PR are ok to mitigate against situations where a shared headers map was inadvertently passed, but shouldn't we also fix this at the call site? Perhaps these need to make a copy as well;

Headers: c.config.Registry.Headers,

djdongjin added 2 commits July 8, 2023 05:25
Signed-off-by: Jin Dong <djdongjin95@gmail.com>
Signed-off-by: Jin Dong <djdongjin95@gmail.com>
@djdongjin
Copy link
Member Author

but shouldn't we also fix this at the call site? Perhaps these need to make a copy as well;

Good point, yeah I think in general we should .Clone() a header if (1) it's passed from somewhere (e.g., as an arg or from a global config) (2) we make write operations.

That being said, I think the call sites should be fine (https://github.com/search?q=repo%3Acontainerd%2Fcontainerd+%22docker.NewResolver%28%22&type=code), given now NewResolver only reads (no writes) the passed http.Header. Calling NewResolver is similar to other read ops (e.g. read a header key, or Clone the header which just reads all key-value pairs).

Also around these call sites (or any use of http.Header), we should check if it satisfies the above (1) and (2). If so, we need to .Clone() the header, like we did in NewResolver. Checking the functions containing these NewResolver call sites, they didn't make writer operations (except in some case, creating a new header and pass to NewResolver).

Some cases I found in the codebase where we did need to .Clone, similar to the fix, include:

header := r.header.Clone()
if header == nil {
header = http.Header{}
}

(I changed this copy to using the builtin .Clone)

req.Header = http.Header{} // headers need to be copied to avoid concurrent map access
for k, v := range r.header {
req.Header[k] = v
}

@thaJeztah please let me know if this makes any sense or I missed anything 😬

Copy link
Member
@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fuweid fuweid added cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Jul 10, 2023
@estesp estesp merged commit 97f2e3b into containerd:main Jul 10, 2023
@fuweid fuweid added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch and removed cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch cherry-pick/1.7.x Change to be cherry picked to release/1.7 branch labels Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/1.7.x PR commits are cherry-picked into release/1.7 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Containerd panics with fatal error: concurrent map writes when pull images concurrently
7 participants