Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes and talos services crashing under memory pressure #8123

Closed
maxheyer opened this issue Jan 1, 2024 · 3 comments
Closed

Kubernetes and talos services crashing under memory pressure #8123

maxheyer opened this issue Jan 1, 2024 · 3 comments

Comments

@maxheyer
Copy link
maxheyer commented Jan 1, 2024

Bug Report

Description

One of our worker nodes crashes rarely. Both kubelet and apid. Since apid also crashes, we have not yet been able to collect any logs.
The problem is solved by restarting the node.

Logs

Not able to receive any yet, but the node get's under DiskPressure and MemoryPressure.
We are in the process of implementing some form of log collection and will provide logs asap.

Environment

  • Talos version: v1.5.4
  • Kubernetes version: v1.28.3
  • Platform: QEMU KVM / Proxmox
@maxheyer
Copy link
Author
maxheyer commented Jan 9, 2024

Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Wed, 06 Dec 2023 02:47:10 +0100   Wed, 06 Dec 2023 02:47:10 +0100   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 09 Jan 2024 14:38:57 +0100   Tue, 09 Jan 2024 14:33:51 +0100   KubeletReady                 kubelet is posting ready status



Node conditions on crash.

@smira
Copy link
Member
smira commented Jan 11, 2024

Talos services has some cgroup reservation, so it'd be nice to see the logs around the crash, as it might be something else.

btw the conditions look good

@rothgar
Copy link
Member
rothgar commented Oct 2, 2024

Andrey recently did additional testing and protections for this in talos 1.8.0 (or maybe it's coming in 1.9.0). If you are still running into this problem with newer versions of Talos please let us know.

@rothgar rothgar closed this as completed Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants