Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error unsealing vault due to key index out of range #1085

Open
gitdr opened this issue Aug 26, 2020 · 22 comments
Open

Error unsealing vault due to key index out of range #1085

gitdr opened this issue Aug 26, 2020 · 22 comments
Assignees

Comments

@gitdr
Copy link
gitdr commented Aug 26, 2020

Describe the bug:
bank-vaults trying to retrieve unseal key with index that is out of range.

Expected behaviour:
Vault gets unsealed using keys is k8s secret.

Steps to reproduce the bug:
helm upgrade --install vault-operator banzaicloud-stable/vault-operator
kubectl apply -f https://github.com/banzaicloud/bank-vaults/blob/master/operator/deploy/cr.yaml

Additional context:
none

Environment details:

  • Kubernetes version: v1.18.6
  • Cloud-provider/provisioner: minikube --driver=none on Centos7
  • bank-vaults version: 1.4.1
  • Install method: helm
  • Logs from the misbehaving component (and any other relevant logs):

$ kubectl logs vault-0 bank-vaults
time="2020-08-26T14:28:24Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-unseal-keys"
time="2020-08-26T14:28:29Z" level=info msg="vault is sealed, unsealing"
time="2020-08-26T14:28:33Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-unseal-keys"
time="2020-08-26T14:28:38Z" level=info msg="vault is sealed, unsealing"
time="2020-08-26T14:28:40Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-unseal-keys"
time="2020-08-26T14:28:45Z" level=info msg="vault is sealed, unsealing"

  • Resource definition (possibly in YAML format) that caused the issue, without sensitive data:

Secret created by vault operator
$ kubectl get secret vault-unseal-keys -o yaml
apiVersion: v1
kind: Secret
metadata:
labels:
app.kubernetes.io/name: vault
vault_cr: vault
name: vault-unseal-keys
namespace: default
type: Opaque
data:
vault-root: cy5TWXRybHUzZDE5VGM0UktrdmRNWEVHNDU=
vault-test: dmF1bHQtdGVzdA==
vault-unseal-0: MGEzNWQxMjVmYzc1YTA0MGIxMmI3YmY5ZDdmMDY4Mzk5MDMzY2NlMjhjMjFlMzJkMTUzODc2NzUwMGZjNDc1MjZl
vault-unseal-1: ZmNmZTg3NjNhYjMzYTgxMTdkMzA0ZjhlNmIzOGZmOGNmNTUyM2YzZjY4MjAzYjMxZjk5ZDI2MzY3YTliZDllYjFk
vault-unseal-2: YWYwMzJiYmJlNmYxNjlkODNlOGFhN2Q0NGUwNzc5ODc2YmM2MzAzOWRhMTI5NGVlMjRhOGQzMDkxZTFkMjc0YTY1
vault-unseal-3: NDRmNWQxYjViMTYyNGZiY2EwNDA2NWYyNmZmYjZmM2IwMGNlN2Y3YWIyZWUzOWJhNDg2NzVkYjA1YjVjMDdlYzJk
vault-unseal-4: YzY5ZGJlZmZiZjZmMjg1NmM3M2EzNThiNDIzMmJiYWI2Mzg2ZmY1NzY3OTQ1NTQ1ZTA0ODc2ZTRmMzMwOTk1Yzlj

/kind bug

@bonifaido bonifaido self-assigned this Aug 27, 2020
@bonifaido
Copy link
Member
bonifaido commented Aug 27, 2020

Hi @gitdr, aren't there any left PVCs with previous instance data? Because usually, that causes this issue.

@GlorifiedTypist
Copy link

I am getting the exact same issue, but with the azure keyvault backend. I deleted the PVCs, but never scaled out to more than 1 vault instance. All other unseal keys are there.

bank-vaults version: 1.4.2

time="2020-09-10T10:05:30Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': error getting secret for key 'vault-unseal-5': keyvault.BaseClient#GetSecret: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="SecretNotFound" Message="A secret with (name/id) vault-unseal-5 was not found in this key vault. If you recently deleted this secret you may be able to recover it using the correct recovery command. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125182\""

@bonifaido
Copy link
Member

Always make sure that if you delete the Vault storage backend (PVC, Object Storage, DB, etc...) you also delete the Unseal Keys from the unseal key storage (in this case Azure Key-Vault). The Vault data and the unseal keys are having the same lifecycle period and they live together.

@GlorifiedTypist
Copy link

I am still getting this issue after performing the below:

  1. Deleted all PVCs, RBAC and secrets in "vault" namespace
  2. Deleted "vault" namespace for good measure
  3. Created a new "vault-infra" namespace
  4. Created a new (empty) azure vault
  5. Deployed operator into new "vault-infra" namespace.

Getting the same result.

$ kubectl logs -f vault-1 bank-vaults
time="2020-09-10T10:45:04Z" level=info msg="joining leader vault..."
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

time="2020-09-10T10:45:04Z" level=info msg="vault metrics exporter enabled: :9091/metrics"
[GIN-debug] GET /metrics --> github.com/gin-gonic/gin.WrapH.func1 (3 handlers)
[GIN-debug] Listening and serving HTTP on :9091
time="2020-09-10T10:45:05Z" level=info msg="joining raft cluster..."
time="2020-09-10T10:45:05Z" level=info msg="vault joined raft cluster"
time="2020-09-10T10:45:05Z" level=info msg="vault is sealed, unsealing"
time="2020-09-10T10:45:08Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': error getting secret for key 'vault-unseal-5': keyvault.BaseClient#GetSecret: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="SecretNotFound" Message="A secret with (name/id) vault-unseal-5 was not found in this key vault. If you recently deleted this secret you may be able to recover it using the correct recovery command. For help resolving this issue, please see https://go.microsoft.com/fwlink/?linkid=2125182\""
time="2020-09-10T10:45:13Z" level=info msg="vault is sealed, unsealing"

Anything else I need to consider?

@bonifaido
Copy link
Member

Took this to Slack, will post the outcome here.

@bonifaido
Copy link
Member

@gitdr could you please post your CR? It turned out to be that with Raft backend if the serviceType is LoadBalancer the HA setup doesn't work in a single cluster setup (at least on Azure for sure).

@gitdr
Copy link
Author
gitdr commented Sep 11, 2020
apiVersion: "vault.banzaicloud.com/v1alpha1"
kind: "Vault"
metadata:
  name: "vault"
spec:
  size: 1
  image: vault:1.5.0
  bankVaultsImage: banzaicloud/bank-vaults:latest

  # Common annotations for all created resources
  annotations:
    common/annotation: "true"

  # Vault Pods , Services and TLS Secret annotations
  vaultAnnotations:
    type/instance: "vault"

  # Vault Configurer Pods and Services annotations
  vaultConfigurerAnnotations:
    type/instance: "vaultconfigurer"

  # Vault Pods , Services and TLS Secret labels
  vaultLabels:
    example.com/log-format: "json"

  # Vault Configurer Pods and Services labels
  vaultConfigurerLabels:
    example.com/log-format: "string"

  # Support for nodeAffinity Rules
  # nodeAffinity:
  #   requiredDuringSchedulingIgnoredDuringExecution:
  #     nodeSelectorTerms:
  #     - matchExpressions:
  #       - key : "node-role.kubernetes.io/your_role"
  #         operator: In
  #         values: ["true"]

  # Support for pod nodeSelector rules to control which nodes can be chosen to run
  # the given pods
  # nodeSelector:
  #   "node-role.kubernetes.io/your_role": "true"

  # Support for node tolerations that work together with node taints to control
  # the pods that can like on a node
  # tolerations:
  # - effect: NoSchedule
  #   key: node-role.kubernetes.io/your_role
  #   operator: Equal
  #   value: "true"

  # Specify the ServiceAccount where the Vault Pod and the Bank-Vaults configurer/unsealer is running
  serviceAccount: vault

  # Specify the Service's type where the Vault Service is exposed
  # Please note that some Ingress controllers like https://github.com/kubernetes/ingress-gce
  # forces you to expose your Service on a NodePort
  serviceType: ClusterIP

  # Specify existing secret contains TLS certificate (accepted secret type: kubernetes.io/tls)
  # If it is set, generating certificate will be disabled
  existingTlsSecretName: vault-tls

  # Specify threshold for renewing certificates. Valid time units are "ns", "us", "ms", "s", "m", "h".
  # tlsExpiryThreshold: 168h

  # Request an Ingress controller with the default configuration
  ingress:
    # Specify Ingress object annotations here, if TLS is enabled (which is by default)
    # the operator will add NGINX, Traefik and HAProxy Ingress compatible annotations
    # to support TLS backends
    annotations:
    # Override the default Ingress specification here
    # This follows the same format as the standard Kubernetes Ingress
    # See: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.13/#ingressspec-v1beta1-extensions
    spec: {}

  # Use local disk to store Vault file data, see config section.
  volumes:
    - name: vault-file
      persistentVolumeClaim:
        claimName: vault-file

  volumeMounts:
    - name: vault-file
      mountPath: /vault/file

  # Support for distributing the generated CA certificate Secret to other namespaces.
  # Define a list of namespaces or use ["*"] for all namespaces.
  caNamespaces:
    - "vswh"

  # Describe where you would like to store the Vault unseal keys and root token.
  unsealConfig:
    options:
      # The preFlightChecks flag enables unseal and root token storage tests
      # This is true by default
      preFlightChecks: true
    kubernetes:
      secretNamespace: default

  # A YAML representation of a final vault config file.
  # See https://www.vaultproject.io/docs/configuration/ for more information.
  config:
    storage:
      file:
        path: "${ .Env.VAULT_STORAGE_FILE }" # An example how Vault config environment interpolation can be used
    listener:
      tcp:
        address: "0.0.0.0:8200"
        # Uncommenting the following line and deleting tls_cert_file and tls_key_file disables TLS
        # tls_disable: true
        tls_cert_file: /vault/tls/server.crt
        tls_key_file: /vault/tls/server.key
    telemetry:
      statsd_address: localhost:9125
    ui: true

  # See: https://github.com/banzaicloud/bank-vaults#example-external-vault-configuration for more details.
  externalConfig:
    policies:
      - name: allow_secrets
        rules: path "secret/*" {
          capabilities = ["create", "read", "update", "delete", "list"]
          }
      - name: allow_pki
        rules: path "pki/*" {
          capabilities = ["create", "read", "update", "delete", "list"]
          }
    auth:
      - type: kubernetes
        roles:
          # Allow every pod in the default namespace to use the secret kv store
          - name: default
            bound_service_account_names: ["default", "vault-secrets-webhook", "vault"]
            bound_service_account_namespaces: ["default", "vswh"]
            policies: ["allow_secrets", "allow_pki"]
            ttl: 1h

    secrets:
      - path: secret
        type: kv
        description: General secrets.
        options:
          version: 2

      - type: pki
        description: Vault PKI Backend
        config:
          default_lease_ttl: 168h
          max_lease_ttl: 720h
        configuration:
          config:
          - name: urls
            issuing_certificates: https://vault.test.test:8200/v1/pki/ca
            crl_distribution_points: https://vault.test.test:8200/v1/pki/crl
          root/generate:
          - name: internal
            common_name: vault.default
          roles:
          - name: default
            # allowed_domains: "*"
            allow_any_name: true
            # allowed_domains: localhost,pod,svc,default,local,test.test
            allow_subdomains: true
            generate_lease: true
            ttl: 720h

    # Allows writing some secrets to Vault (useful for development purposes).
    # See https://www.vaultproject.io/docs/secrets/kv/index.html for more information.
    startupSecrets:
      - type: kv
        path: secret/data/accounts/aws
        data:
          data:
            AWS_ACCESS_KEY_ID: secretId
            AWS_SECRET_ACCESS_KEY: s3cr3t
      - type: kv
        path: secret/data/dockerrepo
        data:
          data:
            DOCKER_REPO_USER: dockerrepouser
            DOCKER_REPO_PASSWORD: dockerrepopassword
      - type: kv
        path: secret/data/mysql
        data:
          data:
            MYSQL_ROOT_PASSWORD: s3cr3t
            MYSQL_PASSWORD: 3xtr3ms3cr3t

  vaultEnvsConfig:
    - name: VAULT_LOG_LEVEL
      value: debug
    - name: VAULT_STORAGE_FILE
      value: "/vault/file"

  # If you are using a custom certificate and are setting the hostname in a custom way
  # sidecarEnvsConfig:
  #   - name: VAULT_ADDR
  #     value: https://vault.local:8200

  # # https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/
  # vaultPodSpec:
  #   hostAliases:
  #   - ip: "127.0.0.1"
  #     hostnames:
  #     - "vault.local"

  # Marks presence of Istio, which influences things like port namings
  istioEnabled: false

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vault-file
spec:
  # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
  # storageClassName: ""
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

# ---
# apiVersion: v1
# kind: PersistentVolume
# metadata:
#   name: vault-file
# spec:
#   capacity:
#     storage: 1Gi
#   accessModes:
#   - ReadWriteOnce
#   persistentVolumeReclaimPolicy: Recycle
#   hostPath:
#     path: /vault/file

@bonifaido
Copy link
Member

These are probably two unrelated issues here in the same thread:

@gitdr
Copy link
Author
gitdr commented Sep 22, 2020

The vault deployment is the only vault deployment on this test node. So there can not be any stale secrets volumes etc as every time the host is built from scratch via an automated process.

It turned out that main vault gets OOM killed at regular intervals.

[root@netcup ~]# kubectl get pods 
NAME                                READY   STATUS    RESTARTS   AGE
vault-0                             3/3     Running   5345       28d

At some point error logs have stopped and vault gets unsealed again on every OOM kill.

# kubectl logs vault-0 -c bank-vaults
time="2020-08-25T15:15:12Z" level=info msg="successfully unsealed vault"
time="2020-08-25T15:15:17Z" level=info msg="vault is sealed, unsealing"
time="2020-08-25T15:15:20Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-unseal-keys"

(error messages continue) ...

time="2020-08-26T20:19:49Z" level=info msg="vault is sealed, unsealing"
time="2020-08-26T20:19:52Z" level=error msg="error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-unseal-keys"
time="2020-08-26T20:19:57Z" level=info msg="vault is sealed, unsealing"
time="2020-08-26T20:20:01Z" level=info msg="successfully unsealed vault"

(this is where it gets killed again)

time="2020-08-26T20:20:20Z" level=error msg="error checking if vault is sealed: error checking status: Get \"https://127.0.0.1:8200/v1/sys/seal-status\": dial tcp 127.0.0.1:8200: connect: connection refused"
time="2020-08-26T20:20:29Z" level=error msg="error checking if vault is sealed: error checking status: Get \"https://127.0.0.1:8200/v1/sys/seal-status\": dial tcp 127.0.0.1:8200: connect: connection refused"
time="2020-08-26T20:20:34Z" level=info msg="vault is sealed, unsealing"
time="2020-08-26T20:20:35Z" level=info msg="successfully unsealed vault"

@Centro1993
Copy link

I have the same Issue as the OP, basically my vault-2 pod doesn't start because key vault-unseal-5 is missing.
If i manually add a key with this name, it asks for vault-unseal-6 and so on. Tried deleting the pods PVC & PV to no avail.
bank-vaults {"level":"info","msg":"vault joined raft cluster","time":"2022-02-10T14:34:38Z"} bank-vaults {"level":"info","msg":"vault is sealed, unsealing","time":"2022-02-10T14:34:38Z"} bank-vaults {"level":"error","msg":"error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-unseal-keys","time":"2022-02-10T14:34: 41Z"}

@kannamr
Copy link
kannamr commented Jul 28, 2022

@bonifaido Do we have a solution or fix for this issue. I am facing the same issue as @Centro1993

@dmolik
Copy link
dmolik commented Aug 31, 2022

is there a fix for this?

I'm getting the off by one error in my raft follower cluster. I feel like there is a silent error going on

@dmolik
Copy link
dmolik commented Sep 8, 2022

So I was able to solve this issue in follower mode by flattening my network and ensuring remote nodes were able to get bi-directional communication with the main cluster.

@smark88
Copy link
smark88 commented Sep 30, 2022

+1 I am also facing this the first replica seems to create unseal keys 0-4 but the second one always wants the key 5
Unseal from replica 1. I am testing this on EKS 1.22 in a multi AZ cluster.

I orginally deployed 1 node as a POC and redeployed via the following process

  1. Delete all PVCs, unseal secrets
  2. Delete vault config k delete vault global-vault
  3. let argocd re sync everything

CR Config:

---
apiVersion: vault.banzaicloud.com/v1alpha1
kind: Vault
metadata:
  name: vault-global
  namespace: vault-global
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - vault
        topologyKey: kubernetes.io/hostname
  bankVaultsImage: ghcr.io/banzaicloud/bank-vaults:1.16.0
  caNamespaces:
  - '*'
  config:
    api_addr: https://vault-global.vault-global:8200
    cluster_addr: http://$(POD_IP):8201
    listener:
      tcp:
        address: 0.0.0.0:8200
        tls_cert_file: /vault/tls/server.crt
        tls_key_file: /vault/tls/server.key
    storage:
      raft:
        path: /vault/file
    ui: true
  existingTlsSecretName: vault-global-secrets-tls
  image: vault:1.8.12
  ingress:
    annotations:
      kubernetes.io/ingress.class: private
      kubernetes.io/tls-acme: "true"
    spec:
      rules:
      - host: vault-global.stag.aws.io
        http:
          paths:
          - backend:
              service:
                name: vault-global
                port:
                  number: 8200
            path: /
            pathType: Prefix
      tls:
      - hosts:
        - vault-global.stag.aws.io
        secretName: external-tls
  resources:
    vault:
      limits:
        cpu: 1250m
        memory: 3Gi
      requests:
        cpu: 500m
        memory: 3Gi
  serviceAccount: vault-global
  serviceRegistrationEnabled: true
  serviceType: ClusterIP
  size: 3
  statsdDisabled: true
  tlsExpiryThreshold: 43800h0m0s
  vaultConfigurerPodSpec:
    imagePullSecrets:
    - name: quay-docker-secret
    - name: dockerhub-docker-secret
    - name: gar-docker-secret
    priorityClassName: aws-infra-high
  vaultContainerSpec:
    name: vault-global
    securityContext:
      capabilities:
        add:
        - IPC_LOCK
  vaultEnvsConfig:
  - name: VAULT_LOG_LEVEL
    value: info
  - name: POD_IP
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: status.podIP
  - name: VAULT_ADDR
    value: http://$(POD_IP):8200
  vaultLabels:
    app: vault-global-bank-vault
  vaultPodSpec:
    priorityClassName: aws-infra-high
    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app: vault-global-bank-vault
      maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
  volumeClaimTemplates:
  - metadata:
      name: vault-raft
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: ebs-1
      volumeMode: Filesystem
  volumeMounts:
  - mountPath: /vault/file
    name: vault-raft

Unseal replica 0

❯ klo vault-global-0 bank-vaults
{"level":"info","msg":"joining leader vault...","time":"2022-09-30T15:19:48Z"}
{"level":"info","msg":"vault metrics exporter enabled: :9091/metrics","time":"2022-09-30T15:19:48Z"}
{"level":"info","msg":"initializing vault...","time":"2022-09-30T15:19:48Z"}
{"level":"info","msg":"initializing vault","time":"2022-09-30T15:19:48Z"}
{"key":"vault-unseal-0","level":"info","msg":"unseal key stored in key store","time":"2022-09-30T15:19:58Z"}
{"key":"vault-unseal-1","level":"info","msg":"unseal key stored in key store","time":"2022-09-30T15:19:58Z"}
{"key":"vault-unseal-2","level":"info","msg":"unseal key stored in key store","time":"2022-09-30T15:19:58Z"}
{"key":"vault-unseal-3","level":"info","msg":"unseal key stored in key store","time":"2022-09-30T15:19:58Z"}
{"key":"vault-unseal-4","level":"info","msg":"unseal key stored in key store","time":"2022-09-30T15:19:59Z"}
{"key":"vault-root","level":"info","msg":"root token stored in key store","time":"2022-09-30T15:19:59Z"}
{"level":"info","msg":"vault is sealed, unsealing","time":"2022-09-30T15:19:59Z"}
{"level":"info","msg":"successfully unsealed vault","time":"2022-09-30T15:20:00Z"}

Error replica 1:

❯ klo vault-global-1 bank-vaults
{"level":"info","msg":"joining leader vault...","time":"2022-09-30T15:20:10Z"}
{"level":"info","msg":"vault metrics exporter enabled: :9091/metrics","time":"2022-09-30T15:20:10Z"}
{"level":"info","msg":"joining raft cluster...","time":"2022-09-30T15:20:10Z"}
{"level":"info","msg":"vault joined raft cluster","time":"2022-09-30T15:20:10Z"}
{"level":"info","msg":"vault is sealed, unsealing","time":"2022-09-30T15:20:10Z"}
{"level":"error","msg":"error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-global-unseal-keys","time":"2022-09-30T15:20:12Z"}
{"level":"info","msg":"vault is sealed, unsealing","time":"2022-09-30T15:20:17Z"}
{"level":"error","msg":"error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-global-unseal-keys","time":"2022-09-30T15:20:20Z"}
{"level":"info","msg":"vault is sealed, unsealing","time":"2022-09-30T15:20:25Z"}
{"level":"error","msg":"error unsealing vault: unable to get key 'vault-unseal-5': key 'vault-unseal-5' is not present in secret: vault-global-unseal-keys","time":"2022-09-30T15:20:29Z"}

@primeroz
Copy link
Collaborator
primeroz commented Oct 3, 2022

Looking at the Unseal code https://github.com/banzaicloud/bank-vaults/blob/main/internal/vault/operator_client.go#L188

it seems we just iterate on all keys from 0 to X until either the vault container is unsealed or we get an error ( by hitting a key id that does not exist)

I will add some debug log to my instance to make sure that is what is happening but to me it looks like for some reason using the first 5 keys from 0-4 are not managing to unseal the vault. in my case we are testing starting up a cluster with RAFT so i guess our followers vault don't have access to the right data and so they can't unseal using the vault unseal keys

It might be good to add a different error when key id is > 4 to prevent people getting confused

@smark88
Copy link
smark88 commented Oct 3, 2022

So as @primeroz stated it is a raft issue. The first raft node would come up and the second one would not join and re try unsealing itself but in reality the first one already unsealed the vault server. Which means that the second pod is provisioning its own RAFT and not joining the first one. I got mine working by adding the following to my config above so that RAFT can be aware about what nodes there are.
config:

 cluster_addr: http://${.Env.POD_NAME}:8201    ## previous $(POD_IP)
 disable_mlock: true

env var:

 - name: VAULT_RAFT_NODE_ID
    valueFrom:
      fieldRef:
        fieldPath: metadata.name

You can see here in the vault docs that disable_mlock = true and the node id need to be set so RAFT can know who is who. As for cluster_addr within the docs you can have it set at run time via go-sockaddr template

The examples within this repo here are very old and some incorrect the one using more than 3 replicas is setting cluster_addr via http://${.Env.POD_NAME}:8201 whereas the single replica is using the POD_IP. I suspect it is how RAFT works that i needs the hostname instead of the IP since it will change each time. This post tipped me about it hashicorp/vault#8489 (comment)

Links:
https://learn.hashicorp.com/tutorials/vault/raft-storage
https://www.vaultproject.io/docs/configuration/storage/raft#node_id
https://www.vaultproject.io/docs/configuration#cluster_addr

@kannamr
Copy link
kannamr commented Feb 15, 2023

can someone please tell how you solved this issue.

@muspelheim
Copy link

can someone please tell how you solved this issue.

do you find a way to solve it?

@camaeel
Copy link
camaeel commented Jan 16, 2024

I switched to my own solution: https://github.com/camaeel/vault-k8s-helper/

For bank vaults I noticed that scaling first to 1 pod and then to 3 helped it to initialize properly.

@ServerNinja
Copy link
ServerNinja commented Jun 10, 2024

Is there a solution for this yet? I'm running into this problem and am blocked on a project. I'm curious what people do to get around it. For some reason, its trying to get vault unseal key 5 when there are only keys 0-4 being generated.

This is my config:

apiVersion: "vault.banzaicloud.com/v1alpha1"
kind: "Vault"
metadata:
  name: "vault"
  namespace: "vault"
  labels:
    app.kubernetes.io/name: vault
    vault_cr: vault
spec:
  size: 3
  image: hashicorp/vault:1.14.1

  # Common annotations for all created resources
  annotations:
    common/annotation: "true"

  # Vault Pods , Services and TLS Secret annotations
  vaultAnnotations:
    type/instance: "vault"

  # Vault Configurer Pods and Services annotations
  vaultConfigurerAnnotations:
    type/instance: "vaultconfigurer"

  vaultConfigurerPodSpec:
    imagePullSecrets: 
    - name: ghcr-login-secret

  vaultPodSpec:
    imagePullSecrets: 
    - name: ghcr-login-secret

  # Vault Pods , Services and TLS Secret labels
  vaultLabels:
    example.com/log-format: "json"

  # Vault Configurer Pods and Services labels
  vaultConfigurerLabels:
    example.com/log-format: "string"

  # Support for affinity Rules
  # affinity:
  #   nodeAffinity:
  #     requiredDuringSchedulingIgnoredDuringExecution:
  #       nodeSelectorTerms:
  #       - matchExpressions:
  #         - key : "node-role.kubernetes.io/your_role"
  #           operator: In
  #           values: ["true"]

  # Support for pod nodeSelector rules to control which nodes can be chosen to run
  # the given pods
  # nodeSelector:
  #   "node-role.kubernetes.io/your_role": "true"

  # Support for node tolerations that work together with node taints to control
  # the pods that can like on a node
  # tolerations:
  # - effect: NoSchedule
  #   key: node-role.kubernetes.io/your_role
  #   operator: Equal
  #   value: "true"

  # Specify the ServiceAccount where the Vault Pod and the Bank-Vaults configurer/unsealer is running
  serviceAccount: vault

  # Specify the Service's type where the Vault Service is exposed
  # Please note that some Ingress controllers like https://github.com/kubernetes/ingress-gce
  # forces you to expose your Service on a NodePort
  serviceType: ClusterIP

  # Request an Ingress controller with the default configuration
  ingress:
    # Specify Ingress object annotations here, if TLS is enabled (which is by default)
    # the operator will add NGINX, Traefik and HAProxy Ingress compatible annotations
    # to support TLS backends
    annotations: {}
    # Override the default Ingress specification here
    # This follows the same format as the standard Kubernetes Ingress
    # See: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.13/#ingressspec-v1beta1-extensions
    spec: {}

  # In some cases, you have to set permissions for the raft directory.
  # For example in the case of using a local kind cluster, uncomment the lines below.
  vaultInitContainers:
    - name: raft-permission
      image: busybox
      command:
        - /bin/sh
        - -c
        - |
          chown -R 100:1000 /vault/file
      volumeMounts:
        - name: vault-raft
          mountPath: /vault/file

  # Use local disk to store Vault raft data, see config section.
  volumeClaimTemplates:
    - metadata:
        name: vault-raft
      spec:
        # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
        storageClassName: "standard"
        accessModes:
          - ReadWriteOnce
        volumeMode: Filesystem
        resources:
          requests:
            storage: 1Gi

  volumeMounts:
    - name: vault-raft
      mountPath: /vault/file

  # Add Velero fsfreeze sidecar container and supporting hook annotations to Vault Pods:
  # https://velero.io/docs/v1.2.0/hooks/
  veleroEnabled: false

  # Support for distributing the generated CA certificate Secret to other namespaces.
  # Define a list of namespaces or use ["*"] for all namespaces.
  caNamespaces:
    - "*"

  # Describe where you would like to store the Vault unseal keys and root token.
  unsealConfig:
    options:
      # The preFlightChecks flag enables unseal and root token storage tests
      # This is true by default
      preFlightChecks: true
      # The storeRootToken flag enables storing of root token in chosen storage
      # This is true by default
      storeRootToken: true
     # The secretShares represents the total number of unseal key shares
     # This is 5 by default
      secretShares: 5
     # The secretThreshold represents the minimum number of shares required to reconstruct the unseal key
     # This is 3 by default
      secretThreshold: 3
    kubernetes:
      secretNamespace: vault

  # A YAML representation of a final vault config file.
  # See https://www.vaultproject.io/docs/configuration/ for more information.
  config:
    storage:
      raft:
        path: "/vault/file"
    listener:
      tcp:
        address: "0.0.0.0:8200"
        tls_cert_file: /vault/tls/server.crt
        tls_key_file: /vault/tls/server.key
    api_addr: https://[::]:8200
    cluster_addr: "https://[::]:8201"
    disable_mlock: true
    ui: true

  statsdDisabled: true

  serviceRegistrationEnabled: true

  resources:
    # A YAML representation of resource ResourceRequirements for vault container
    # Detail can reference: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
    bankVaults:
      limits:
        memory: "512Mi"
        cpu: "200m"
      requests:
        memory: "256Mi"
        cpu: "100m"
    vault:
      limits:
        memory: "512Mi"
        cpu: "200m"
      requests:
        memory: "256Mi"
        cpu: "100m"

  # See: https://banzaicloud.com/docs/bank-vaults/cli-tool/#example-external-vault-configuration
  # The repository also contains a lot examples in the test/deploy and operator/deploy directories.
  externalConfig:
    policies:
      - name: allow_secrets
        rules: path "secret/*" {
          capabilities = ["create", "read", "update", "delete", "list"]
          }
    auth:
      - type: kubernetes
        roles:
          # Allow every pod in the default namespace to use the secret kv store
          - name: default
            bound_service_account_names: ["*"]
            bound_service_account_namespaces: ["*"]
            policies: allow_secrets
            ttl: 1h

    secrets:
      - path: secret
        type: kv
        description: General secrets.
        options:
          version: 2

  vaultEnvsConfig:
    - name: SKIP_SETCAP
      value: "true"
    - name: VAULT_DISABLE_MLOCK
      value: "true"
    - name: VAULT_LOG_LEVEL
      value: debug
    - name: VAULT_RAFT_NODE_ID
      valueFrom:
        fieldRef:
          fieldPath: metadata.name

@ServerNinja
Copy link

I fixed the problem. It appears my config block was incorrect. This fixed it:

  config:
    storage:
      raft:
        path: "/vault/file"
    listener:
      tcp:
        address: "0.0.0.0:8200"
        tls_cert_file: /vault/tls/server.crt
        tls_disable: false
        tls_key_file: /vault/tls/server.key
    api_addr: https://vault.vault.svc.cluster.local:8200
    cluster_addr: "https://${.Env.POD_NAME}:8201"
    disable_mlock: true
    ui: true

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR that has become stale and will be auto-closed. label Nov 10, 2024
@bank-vaults bank-vaults deleted a comment from github-actions bot Nov 10, 2024
@csatib02 csatib02 removed the lifecycle/stale Denotes an issue or PR that has become stale and will be auto-closed. label Nov 10, 2024
@csatib02
Copy link
Member

Does anyone still require assistance regarding this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests