-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc doesn't work with go1.22 #4233
Comments
The glibc bug used by go 1.22 is breaking runc 1.2.0 in some cases. This does not directly influence Kubernetes, but downstream may rely on Kubernetes as well as runc in their build environments. This means that forcing users to build Kubernetes with go 1.22 may break their build toolchain, which is especially the case for container runtimes. We now relax that restriction to allow building with go 1.21 until the runc issue has been resolved. Refers to: opencontainers/runc#4233 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
… nsexec As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
… nsexec As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
Copying my comment from #4234 (comment), which explains my current thinking on the issue.
|
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
Downgrade to avoid a known incompatibility with runc: opencontainers/runc#4233 Signed-off-by: Ben Cressey <bcressey@amazon.com>
Downgrade to avoid a known incompatibility with runc: opencontainers/runc#4233 Signed-off-by: Ben Cressey <bcressey@amazon.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
Can't update to 1.22 because of opencontainers/runc#4233 Signed-off-by: Peter Morjan <pmorjan@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
Made a proper cherry-pick to Go 1.22 (golang/go#67650). If approved, this will be included into Go 1.22.4. |
Great news, so runc 1.2.0 is ready for prime time once Go 1.22.4 is released? |
Go 1.23 includes a fix (https://go.dev/cl/587919) so it can be used. This |
v1.1.13 -- "There is no certainty in the world. This is the only certainty I have." This is the thirteenth patch release in the 1.1.z release branch of runc. It brings in Go 1.12.x compatibility and fixes a few issues, including an occasional wrong nofile rlimit in runc exec, and a race between runc list and runc delete. NOTE that if using Go 1.22.x to build runc, make sure to use 1.22.4 or a later version. For more details, see issue opencontainers#4233. * Support go 1.22.4+. (opencontainers#4313) * runc list: fix race with runc delete. (opencontainers#4231) * Fix set nofile rlimit error. (opencontainers#4277, opencontainers#4299) * libct/cg/fs: fix setting rt_period vs rt_runtime. (opencontainers#4284) * Fix a debug msg for user ns in nsexec. (opencontainers#4315) * script/*: fix gpg usage wrt keyboxd. (opencontainers#4316) * CI fixes and misc backports. (opencontainers#4241) * Fix codespell warnings. (opencontainers#4300) * Silence security false positives from golang/net. (opencontainers#4244) * libcontainer: allow containers to make apps think fips is enabled/disabled for testing. (opencontainers#4257) * allow overriding VERSION value in Makefile. (opencontainers#4270) * Vagrantfile.fedora: bump Fedora to 39. (opencontainers#4261) * ci/cirrus: rm centos stream 8. (opencontainers#4305, opencontainers#4308) Thanks to all of the contributors who made this release possible: * Akhil Mohan <akhilerm@gmail.com> * Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> * Aleksa Sarai <cyphar@cyphar.com> * Kir Kolyshkin <kolyshkin@gmail.com> * Sohan Kunkerkar <sohank2602@gmail.com> * TTFISH <jiongchiyu@gmail.com> * kychen <kychen@alauda.io> * lifubang <lifubang@acmcoder.com> * ls-ggg <335814617@qq.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEwkKM11cg+s3PdrbqF95ey3WhEA4FAmZrFGYACgkQF95ey3Wh # EA7DPwf9HVwO0EO3s7OuJPBCmZBp92L6AMDBmkpnE14Pi1c4DVcWtlrBna2CNnUJ # 4Hu8rgEtT80Y8L3GBf96Wo3C1DHR6lG6dyu6FjHozWu97WfrTtw92I/254dQZnsr # i7m+5C6Tluewr9pH6ageRI0rRYt4QPpyRihMkiZQHl44Z5ogRGJvCCkjk9nIDlxi # ok2U5aPIw4NWPwnMg3wC6CmcviaM81kyuWh2Twc1OPwRilCPQXWblcUgqujg5tOr # C3Z6AwiIedpMt6Nr0jdWZh9Rh0ffuOXBEiUO/K8vYqE/eDvqJd42c8ALi1HOONoU # ZwrNWNU3o2pIQ4qz0Fs4vauK4wSs1A== # =IFN9 # -----END PGP SIGNATURE----- # gpg: Signature made Thu Jun 13 08:46:46 2024 PDT # gpg: using RSA key C2428CD75720FACDCF76B6EA17DE5ECB75A1100E # gpg: Can't check signature: No public key # Conflicts: # CHANGELOG.md # VERSION # go.mod # go.sum # vendor/golang.org/x/sys/unix/mmap_nomremap.go # vendor/golang.org/x/sys/windows/syscall_windows.go # vendor/modules.txt
v1.1.13 -- "There is no certainty in the world. This is the only certainty I have." This is the thirteenth patch release in the 1.1.z release branch of runc. It brings in Go 1.12.x compatibility and fixes a few issues, including an occasional wrong nofile rlimit in runc exec, and a race between runc list and runc delete. NOTE that if using Go 1.22.x to build runc, make sure to use 1.22.4 or a later version. For more details, see issue opencontainers#4233. * Support go 1.22.4+. (opencontainers#4313) * runc list: fix race with runc delete. (opencontainers#4231) * Fix set nofile rlimit error. (opencontainers#4277, opencontainers#4299) * libct/cg/fs: fix setting rt_period vs rt_runtime. (opencontainers#4284) * Fix a debug msg for user ns in nsexec. (opencontainers#4315) * script/*: fix gpg usage wrt keyboxd. (opencontainers#4316) * CI fixes and misc backports. (opencontainers#4241) * Fix codespell warnings. (opencontainers#4300) * Silence security false positives from golang/net. (opencontainers#4244) * libcontainer: allow containers to make apps think fips is enabled/disabled for testing. (opencontainers#4257) * allow overriding VERSION value in Makefile. (opencontainers#4270) * Vagrantfile.fedora: bump Fedora to 39. (opencontainers#4261) * ci/cirrus: rm centos stream 8. (opencontainers#4305, opencontainers#4308) Thanks to all of the contributors who made this release possible: * Akhil Mohan <akhilerm@gmail.com> * Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> * Aleksa Sarai <cyphar@cyphar.com> * Kir Kolyshkin <kolyshkin@gmail.com> * Sohan Kunkerkar <sohank2602@gmail.com> * TTFISH <jiongchiyu@gmail.com> * kychen <kychen@alauda.io> * lifubang <lifubang@acmcoder.com> * ls-ggg <335814617@qq.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEwkKM11cg+s3PdrbqF95ey3WhEA4FAmZrFGYACgkQF95ey3Wh # EA7DPwf9HVwO0EO3s7OuJPBCmZBp92L6AMDBmkpnE14Pi1c4DVcWtlrBna2CNnUJ # 4Hu8rgEtT80Y8L3GBf96Wo3C1DHR6lG6dyu6FjHozWu97WfrTtw92I/254dQZnsr # i7m+5C6Tluewr9pH6ageRI0rRYt4QPpyRihMkiZQHl44Z5ogRGJvCCkjk9nIDlxi # ok2U5aPIw4NWPwnMg3wC6CmcviaM81kyuWh2Twc1OPwRilCPQXWblcUgqujg5tOr # C3Z6AwiIedpMt6Nr0jdWZh9Rh0ffuOXBEiUO/K8vYqE/eDvqJd42c8ALi1HOONoU # ZwrNWNU3o2pIQ4qz0Fs4vauK4wSs1A== # =IFN9 # -----END PGP SIGNATURE----- # gpg: Signature made Thu Jun 13 08:46:46 2024 PDT # gpg: using RSA key C2428CD75720FACDCF76B6EA17DE5ECB75A1100E # gpg: Can't check signature: No public key
v1.1.13 -- "There is no certainty in the world. This is the only certainty I have." This is the thirteenth patch release in the 1.1.z release branch of runc. It brings in Go 1.12.x compatibility and fixes a few issues, including an occasional wrong nofile rlimit in runc exec, and a race between runc list and runc delete. NOTE that if using Go 1.22.x to build runc, make sure to use 1.22.4 or a later version. For more details, see issue opencontainers#4233. * Support go 1.22.4+. (opencontainers#4313) * runc list: fix race with runc delete. (opencontainers#4231) * Fix set nofile rlimit error. (opencontainers#4277, opencontainers#4299) * libct/cg/fs: fix setting rt_period vs rt_runtime. (opencontainers#4284) * Fix a debug msg for user ns in nsexec. (opencontainers#4315) * script/*: fix gpg usage wrt keyboxd. (opencontainers#4316) * CI fixes and misc backports. (opencontainers#4241) * Fix codespell warnings. (opencontainers#4300) * Silence security false positives from golang/net. (opencontainers#4244) * libcontainer: allow containers to make apps think fips is enabled/disabled for testing. (opencontainers#4257) * allow overriding VERSION value in Makefile. (opencontainers#4270) * Vagrantfile.fedora: bump Fedora to 39. (opencontainers#4261) * ci/cirrus: rm centos stream 8. (opencontainers#4305, opencontainers#4308) Thanks to all of the contributors who made this release possible: * Akhil Mohan <akhilerm@gmail.com> * Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> * Aleksa Sarai <cyphar@cyphar.com> * Kir Kolyshkin <kolyshkin@gmail.com> * Sohan Kunkerkar <sohank2602@gmail.com> * TTFISH <jiongchiyu@gmail.com> * kychen <kychen@alauda.io> * lifubang <lifubang@acmcoder.com> * ls-ggg <335814617@qq.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEwkKM11cg+s3PdrbqF95ey3WhEA4FAmZrFGYACgkQF95ey3Wh # EA7DPwf9HVwO0EO3s7OuJPBCmZBp92L6AMDBmkpnE14Pi1c4DVcWtlrBna2CNnUJ # 4Hu8rgEtT80Y8L3GBf96Wo3C1DHR6lG6dyu6FjHozWu97WfrTtw92I/254dQZnsr # i7m+5C6Tluewr9pH6ageRI0rRYt4QPpyRihMkiZQHl44Z5ogRGJvCCkjk9nIDlxi # ok2U5aPIw4NWPwnMg3wC6CmcviaM81kyuWh2Twc1OPwRilCPQXWblcUgqujg5tOr # C3Z6AwiIedpMt6Nr0jdWZh9Rh0ffuOXBEiUO/K8vYqE/eDvqJd42c8ALi1HOONoU # ZwrNWNU3o2pIQ4qz0Fs4vauK4wSs1A== # =IFN9 # -----END PGP SIGNATURE----- # gpg: Signature made Thu Jun 13 08:46:46 2024 PDT # gpg: using RSA key C2428CD75720FACDCF76B6EA17DE5ECB75A1100E # gpg: Can't check signature: No public key
I'm opening this to act as a tracking issue for the go1.22 issue.
I suspect the only bullet-proof solution is going to be adding another re-exec after the C code finishes to re-exec the Go side of
runc init
(which will make runc even slower...). It really is quite unfortunate that Go's stdlib doesn't provide a lot of knobs we need to removensenter
, and I suspect we will never be able to full switch away fromnsenter
...So, to summarize the investigation done there -- it's a glibc bug, in fact, two bugs:
pthread_self()
returns wrong info after we do what we do in libct/nsenterpthread_getattr_np(pthread_self(), &attr)
(which Go 1.22 calls internally) does a NULL pointer dereference, so the app gets SIGABRT.These two bugs are apparently specific to glibc used by Ubuntu 20.04 (libc6 2.31-0ubuntu9.14) and maybe also Debian 10 (libc6 2.28-10+deb10u2), as I was able to reproduce on both. With Debian 10, it even prints error from free:
free(): invalid pointer
, maybe due to some extra Debian-specific patches, but still gets SIGABRT.For some reason I was unable to repro on older Fedora (F32, glibc-2.31-2.fc32, F33, glibc-2.32-10.fc33) and Debian 11 (libc6 2.31-7).
The bad news is, every version of glibc has the bug 1 above, and https://go-review.googlesource.com/c/go/+/563379 may make it so go 1.22.x will fail runc init on every version of glibc.
Meaning, we need a workaround for that. Perhaps changing runc libct/nsenter logic in some radical way, so that
pthread_self
works.Originally posted by @kolyshkin in #4193 (comment)
The text was updated successfully, but these errors were encountered: