Pod cannot mount Persistent Volume created by ozone CSI provisioner - kubernetes

I am using kubernetes to deploy ozone (a sub for hdfs), and basically followed instructions from here and here (just a few steps).
First I created few pvs with hostpath to my local dir, then I slightly edited the yamls from ozone/kubernetes/example/ozone by changing nfs claim to host path claim:
volumeClaimTemplates:
- metadata:
name: data
spec:
storageClassName: manual
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
selector:
matchLabels:
type: local
and I commented out the nodeAffinity settings in datanode-stateful.yaml since my kubernetes only had master node.
The deployment was succesful.
Then I applied the csi and pv-test as the instructions in csi protocol said, the pv (bucket in s3v) was automatically established, and pvc did bound the pv, but the test pod stopped at containerCreating.
Attaching the pv-test pod desc:
Name: ozone-csi-test-webserver-778c8c87b7-rngfk
Namespace: default
Priority: 0
Node: k8s-master/192.168.100.202
Start Time: Fri, 18 Jun 2021 14:23:54 +0800
Labels: app=ozone-csi-test-webserver
pod-template-hash=778c8c87b7
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/ozone-csi-test-webserver-778c8c87b7
Containers:
web:
Container ID:
Image: python:3.7.3-alpine3.8
Image ID:
Port: <none>
Host Port: <none>
Args:
python
-m
http.server
--directory
/www
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gqknv (ro)
/www from webroot (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
webroot:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: ozone-csi-test-webserver
ReadOnly: false
default-token-gqknv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gqknv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 7m7s (x58 over 122m) kubelet, k8s-master MountVolume.SetUp failed for volume "pvc-1913bd70-09fd-4eba-a459-73fe3bd397b8" : rpc error: code = Unknown desc =
Warning FailedMount 31s (x54 over 120m) kubelet, k8s-master Unable to mount volumes for pod "ozone-csi-test-webserver-778c8c87b7-rngfk_default(b1a59143-00b9-47f6-94fe-1845c29aee93)": timeout expired waiting for volumes to attach or mount for pod "default"/"ozone-csi-test-webserver-778c8c87b7-rngfk". list of unmounted volumes=[webroot]. list of unattached volumes=[webroot default-token-gqknv]
Attach events for the whole process:
7m51s Normal SuccessfulCreate statefulset/s3g create Claim data-s3g-0 Pod s3g-0 in StatefulSet s3g success
7m51s Warning FailedScheduling pod/s3g-0 pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m51s Normal SuccessfulCreate statefulset/scm create Pod scm-0 in StatefulSet scm successful
7m51s Normal SuccessfulCreate statefulset/om create Pod om-0 in StatefulSet om successful
7m51s Warning FailedScheduling pod/om-0 pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m51s Warning FailedScheduling pod/datanode-0 pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m51s Normal SuccessfulCreate statefulset/datanode create Pod datanode-0 in StatefulSet datanode successful
7m51s Normal SuccessfulCreate statefulset/datanode create Claim data-datanode-0 Pod datanode-0 in StatefulSet datanode success
7m51s Normal SuccessfulCreate statefulset/scm create Claim data-scm-0 Pod scm-0 in StatefulSet scm success
7m51s Normal SuccessfulCreate statefulset/s3g create Pod s3g-0 in StatefulSet s3g successful
7m51s Normal SuccessfulCreate statefulset/om create Claim data-om-0 Pod om-0 in StatefulSet om success
7m51s Warning FailedScheduling pod/scm-0 pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m50s Normal Scheduled pod/s3g-0 Successfully assigned default/s3g-0 to hadoop104
7m50s Normal Scheduled pod/datanode-0 Successfully assigned default/datanode-0 to hadoop103
7m50s Normal Scheduled pod/scm-0 Successfully assigned default/scm-0 to hadoop104
7m50s Normal Scheduled pod/om-0 Successfully assigned default/om-0 to hadoop103
7m49s Normal Created pod/datanode-0 Created container datanode
7m49s Normal Started pod/datanode-0 Started container datanode
7m49s Normal Pulled pod/datanode-0 Container image "apache/ozone:1.1.0" already present on machine
7m48s Normal SuccessfulCreate statefulset/datanode create Claim data-datanode-1 Pod datanode-1 in StatefulSet datanode success
7m48s Warning FailedScheduling pod/datanode-1 pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m48s Normal Pulled pod/scm-0 Container image "apache/ozone:1.1.0" already present on machine
7m48s Normal Created pod/scm-0 Created container init
7m48s Normal Started pod/scm-0 Started container init
7m48s Normal Pulled pod/s3g-0 Container image "apache/ozone:1.1.0" already present on machine
7m48s Normal Created pod/s3g-0 Created container s3g
7m48s Normal Started pod/s3g-0 Started container s3g
7m48s Normal SuccessfulCreate statefulset/datanode create Pod datanode-1 in StatefulSet datanode successful
7m46s Normal Scheduled pod/datanode-1 Successfully assigned default/datanode-1 to hadoop104
7m45s Normal Created pod/datanode-1 Created container datanode
7m45s Normal Pulled pod/datanode-1 Container image "apache/ozone:1.1.0" already present on machine
7m44s Normal Created pod/scm-0 Created container scm
7m44s Normal Started pod/scm-0 Started container scm
7m44s Normal Started pod/datanode-1 Started container datanode
7m44s Normal Pulled pod/scm-0 Container image "apache/ozone:1.1.0" already present on machine
7m43s Warning FailedScheduling pod/datanode-2 pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m43s Normal SuccessfulCreate statefulset/datanode create Pod datanode-2 in StatefulSet datanode successful
7m43s Normal SuccessfulCreate statefulset/datanode create Claim data-datanode-2 Pod datanode-2 in StatefulSet datanode success
7m42s Normal Scheduled pod/datanode-2 Successfully assigned default/datanode-2 to hadoop103
7m38s Normal Pulled pod/datanode-2 Container image "apache/ozone:1.1.0" already present on machine
7m38s Normal Created pod/datanode-2 Created container datanode
7m38s Normal Started pod/datanode-2 Started container datanode
7m23s Normal ScalingReplicaSet deployment/csi-provisioner Scaled up replica set csi-provisioner-5649bc9474 to 1
7m23s Warning FailedCreate daemonset/csi-node Error creating: pods "csi-node-" is forbidden: error looking up service account default/csi-ozone: serviceaccount "csi-ozone" not found
7m22s Normal Scheduled pod/csi-node-nbfnw Successfully assigned default/csi-node-nbfnw to hadoop104
7m22s Normal Scheduled pod/csi-provisioner-5649bc9474-n5jf2 Successfully assigned default/csi-provisioner-5649bc9474-n5jf2 to hadoop103
7m22s Normal SuccessfulCreate replicaset/csi-provisioner-5649bc9474 Created pod: csi-provisioner-5649bc9474-n5jf2
7m22s Normal Scheduled pod/csi-node-c97fz Successfully assigned default/csi-node-c97fz to hadoop103
7m22s Normal SuccessfulCreate daemonset/csi-node Created pod: csi-node-c97fz
7m22s Normal SuccessfulCreate daemonset/csi-node Created pod: csi-node-nbfnw
7m14s Normal Pulling pod/csi-node-c97fz Pulling image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.2"
7m14s Normal Pulling pod/csi-provisioner-5649bc9474-n5jf2 Pulling image "quay.io/k8scsi/csi-provisioner:v1.0.1"
7m13s Normal Pulling pod/csi-node-nbfnw Pulling image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.2"
6m56s Warning Unhealthy pod/om-0 Liveness probe failed: dial tcp 10.244.1.7:9862: connect: connection refused
6m56s Normal Killing pod/om-0 Container om failed liveness probe, will be restarted
6m55s Normal Created pod/om-0 Created container om
6m55s Normal Started pod/om-0 Started container om
6m55s Normal Pulled pod/om-0 Container image "apache/ozone:1.1.0" already present on machine
6m48s Normal Pulled pod/csi-provisioner-5649bc9474-n5jf2 Successfully pulled image "quay.io/k8scsi/csi-provisioner:v1.0.1"
6m48s Normal Started pod/csi-provisioner-5649bc9474-n5jf2 Started container ozone-csi
6m48s Normal Created pod/csi-provisioner-5649bc9474-n5jf2 Created container ozone-csi
6m48s Normal Pulled pod/csi-provisioner-5649bc9474-n5jf2 Container image "apache/ozone:1.1.0" already present on machine
6m48s Normal Started pod/csi-provisioner-5649bc9474-n5jf2 Started container csi-provisioner
6m48s Normal Created pod/csi-provisioner-5649bc9474-n5jf2 Created container csi-provisioner
6m45s Normal Pulled pod/csi-node-nbfnw Successfully pulled image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.2"
6m44s Normal Started pod/csi-node-nbfnw Started container driver-registrar
6m44s Normal Started pod/csi-node-nbfnw Started container csi-node
6m44s Normal Created pod/csi-node-nbfnw Created container csi-node
6m44s Normal Created pod/csi-node-nbfnw Created container driver-registrar
6m44s Normal Pulled pod/csi-node-nbfnw Container image "apache/ozone:1.1.0" already present on machine
6m25s Normal Pulled pod/csi-node-c97fz Successfully pulled image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.2"
6m25s Normal Pulled pod/csi-node-c97fz Container image "apache/ozone:1.1.0" already present on machine
6m25s Normal Started pod/csi-node-c97fz Started container csi-node
6m25s Normal Created pod/csi-node-c97fz Created container csi-node
6m17s Normal Created pod/csi-node-c97fz Created container driver-registrar
6m17s Normal Pulled pod/csi-node-c97fz Container image "quay.io/k8scsi/csi-node-driver-registrar:v1.0.2" already present on machine
6m17s Normal Started pod/csi-node-c97fz Started container driver-registrar
6m3s Normal Provisioning persistentvolumeclaim/ozone-csi-test-webserver External provisioner is provisioning volume for claim "default/ozone-csi-test-webserver"
6m3s Normal ScalingReplicaSet deployment/ozone-csi-test-webserver Scaled up replica set ozone-csi-test-webserver-7cbdc5d65c to 1
6m3s Normal SuccessfulCreate replicaset/ozone-csi-test-webserver-7cbdc5d65c Created pod: ozone-csi-test-webserver-7cbdc5d65c-dpzhc
6m3s Normal ExternalProvisioning persistentvolumeclaim/ozone-csi-test-webserver waiting for a volume to be created, either by external provisioner "org.apache.hadoop.ozone" or manually created by system administrator
6m2s Warning FailedScheduling pod/ozone-csi-test-webserver-7cbdc5d65c-dpzhc pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
6m1s Normal ProvisioningSucceeded persistentvolumeclaim/ozone-csi-test-webserver Successfully provisioned volume pvc-cd01c58d-793f-41ce-9e12-057ade02e07c
5m59s Normal Scheduled pod/ozone-csi-test-webserver-7cbdc5d65c-dpzhc Successfully assigned default/ozone-csi-test-webserver-7cbdc5d65c-dpzhc to hadoop104
97s Warning FailedMount pod/ozone-csi-test-webserver-7cbdc5d65c-dpzhc Unable to attach or mount volumes: unmounted volumes=[webroot], unattached volumes=[webroot default-token-l9lng]: timed out waiting for the condition
94s Warning FailedMount pod/ozone-csi-test-webserver-7cbdc5d65c-dpzhc MountVolume.SetUp failed for volume "pvc-cd01c58d-793f-41ce-9e12-057ade02e07c" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc =

Related

Hashicorp vault on k8s: getting error 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity

I'm deploying ha vault on k8s (EKS) and getting this error on one of the vault pods, which I think is causing other pods to fail also :
This is the output of the kubectl get events:
search for : nodes are available: 1 Insufficient memory
26m Normal Created pod/vault-1 Created container vault
26m Normal Started pod/vault-1 Started container vault
26m Normal Pulled pod/vault-1 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m40s Warning BackOff pod/vault-1 Back-off restarting failed container
2m38s Normal Scheduled pod/vault-1 Successfully assigned vault-foo/vault-1 to ip-10-101-0-103.ec2.internal
2m35s Normal SuccessfulAttachVolume pod/vault-1 AttachVolume.Attach succeeded for volume "pvc-acfc7e26-3616-4075-ab79-0c3f7b0f6470"
2m35s Normal SuccessfulAttachVolume pod/vault-1 AttachVolume.Attach succeeded for volume "pvc-19d03d48-1de2-41f8-aadf-02d0a9f4bfbd"
48s Normal Pulled pod/vault-1 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
48s Normal Created pod/vault-1 Created container vault
99s Normal Started pod/vault-1 Started container vault
60s Warning BackOff pod/vault-1 Back-off restarting failed container
27m Normal TaintManagerEviction pod/vault-2 Cancelling deletion of Pod vault-foo/vault-2
28m Warning FailedScheduling pod/vault-2 0/4 nodes are available: 1 Insufficient memory, 4 Insufficient cpu.
28m Warning FailedScheduling pod/vault-2 0/5 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
27m Normal Scheduled pod/vault-2 Successfully assigned vault-foo/vault-2 to ip-10-101-0-103.ec2.internal
27m Normal SuccessfulAttachVolume pod/vault-2 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
27m Normal SuccessfulAttachVolume pod/vault-2 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
27m Normal Pulling pod/vault-2 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
27m Normal Pulled pod/vault-2 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
26m Normal Created pod/vault-2 Created container vault
26m Normal Started pod/vault-2 Started container vault
26m Normal Pulled pod/vault-2 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
7m26s Warning BackOff pod/vault-2 Back-off restarting failed container
2m36s Warning FailedScheduling pod/vault-2 0/7 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
114s Warning FailedScheduling pod/vault-2 0/8 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 4 Insufficient cpu.
104s Warning FailedScheduling pod/vault-2 0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
93s Normal Scheduled pod/vault-2 Successfully assigned vault-foo/vault-2 to ip-10-101-0-82.ec2.internal
88s Normal SuccessfulAttachVolume pod/vault-2 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
88s Normal SuccessfulAttachVolume pod/vault-2 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
83s Normal Pulling pod/vault-2 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
81s Normal Pulled pod/vault-2 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
38s Normal Created pod/vault-2 Created container vault
37s Normal Started pod/vault-2 Started container vault
38s Normal Pulled pod/vault-2 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
4s Warning BackOff pod/vault-2 Back-off restarting failed container
2m38s Normal Scheduled pod/vault-agent-injector-d54bdc675-qwsmz Successfully assigned vault-foo/vault-agent-injector-d54bdc675-qwsmz to ip-10-101-2-91.ec2.internal
2m37s Normal Pulling pod/vault-agent-injector-d54bdc675-qwsmz Pulling image "hashicorp/vault-k8s:latest"
2m36s Normal Pulled pod/vault-agent-injector-d54bdc675-qwsmz Successfully pulled image "hashicorp/vault-k8s:latest"
2m36s Normal Created pod/vault-agent-injector-d54bdc675-qwsmz Created container sidecar-injector
2m35s Normal Started pod/vault-agent-injector-d54bdc675-qwsmz Started container sidecar-injector
28m Normal Scheduled pod/vault-agent-injector-d54bdc675-wz9ws Successfully assigned vault-foo/vault-agent-injector-d54bdc675-wz9ws to ip-10-101-0-87.ec2.internal
28m Normal Pulled pod/vault-agent-injector-d54bdc675-wz9ws Container image "hashicorp/vault-k8s:latest" already present on machine
28m Normal Created pod/vault-agent-injector-d54bdc675-wz9ws Created container sidecar-injector
28m Normal Started pod/vault-agent-injector-d54bdc675-wz9ws Started container sidecar-injector
3m22s Normal Killing pod/vault-agent-injector-d54bdc675-wz9ws Stopping container sidecar-injector
3m22s Warning Unhealthy pod/vault-agent-injector-d54bdc675-wz9ws Readiness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: connection refused
3m18s Warning Unhealthy pod/vault-agent-injector-d54bdc675-wz9ws Liveness probe failed: Get https://10.101.0.73:8080/health/ready: dial tcp 10.101.0.73:8080: connect: no route to host
28m Normal SuccessfulCreate replicaset/vault-agent-injector-d54bdc675 Created pod: vault-agent-injector-d54bdc675-wz9ws
2m38s Normal SuccessfulCreate replicaset/vault-agent-injector-d54bdc675 Created pod: vault-agent-injector-d54bdc675-qwsmz
28m Normal ScalingReplicaSet deployment/vault-agent-injector Scaled up replica set vault-agent-injector-d54bdc675 to 1
2m38s Normal ScalingReplicaSet deployment/vault-agent-injector Scaled up replica set vault-agent-injector-d54bdc675 to 1
28m Normal EnsuringLoadBalancer service/vault-ui Ensuring load balancer
28m Normal EnsuredLoadBalancer service/vault-ui Ensured load balancer
26m Normal UpdatedLoadBalancer service/vault-ui Updated load balancer with new hosts
3m24s Normal DeletingLoadBalancer service/vault-ui Deleting load balancer
3m23s Warning PortNotAllocated service/vault-ui Port 32476 is not allocated; repairing
3m23s Warning ClusterIPNotAllocated service/vault-ui Cluster IP 172.20.216.143 is not allocated; repairing
3m22s Warning FailedToUpdateEndpointSlices service/vault-ui Error updating Endpoint Slices for Service vault-foo/vault-ui: failed to update vault-ui-crtg4 EndpointSlice for Service vault-foo/vault-ui: Operation cannot be fulfilled on endpointslices.discovery.k8s.io "vault-ui-crtg4": the object has been modified; please apply your changes to the latest version and try again
3m16s Warning FailedToUpdateEndpoint endpoints/vault-ui Failed to update endpoint vault-foo/vault-ui: Operation cannot be fulfilled on endpoints "vault-ui": the object has been modified; please apply your changes to the latest version and try again
2m52s Normal DeletedLoadBalancer service/vault-ui Deleted load balancer
2m39s Normal EnsuringLoadBalancer service/vault-ui Ensuring load balancer
2m36s Normal EnsuredLoadBalancer service/vault-ui Ensured load balancer
96s Normal UpdatedLoadBalancer service/vault-ui Updated load balancer with new hosts
28m Normal NoPods poddisruptionbudget/vault No matching pods found
28m Normal SuccessfulCreate statefulset/vault create Pod vault-0 in StatefulSet vault successful
28m Normal SuccessfulCreate statefulset/vault create Pod vault-1 in StatefulSet vault successful
28m Normal SuccessfulCreate statefulset/vault create Pod vault-2 in StatefulSet vault successful
2m40s Normal NoPods poddisruptionbudget/vault No matching pods found
2m38s Normal SuccessfulCreate statefulset/vault create Pod vault-0 in StatefulSet vault successful
2m38s Normal SuccessfulCreate statefulset/vault create Pod vault-1 in StatefulSet vault successful
2m38s Normal SuccessfulCreate statefulset/vault create Pod vault-2 in StatefulSet vault successful
And this is my helm :
# Vault Helm Chart Value Overrides
global:
enabled: true
tlsDisable: false
injector:
enabled: true
# Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
image:
repository: "hashicorp/vault-k8s"
tag: "latest"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
server:
# Use the Enterprise Image
image:
repository: "hashicorp/vault-enterprise"
tag: "1.5.0_ent"
# These Resource Limits are in line with node requirements in the
# Vault Reference Architecture for a Small Cluster
resources:
requests:
memory: 8Gi
cpu: 2000m
limits:
memory: 16Gi
cpu: 2000m
# For HA configuration and because we need to manually init the vault,
# we need to define custom readiness/liveness Probe settings
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
# extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
# used to include variables required for auto-unseal.
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
# extraVolumes is a list of extra volumes to mount. These will be exposed
# to Vault in the path .
#extraVolumes:
# - type: secret
# name: tls-server
# - type: secret
# name: tls-ca
# - type: secret
# name: kms-creds
extraVolumes:
- type: secret
name: vault-server-tls
# This configures the Vault Statefulset to create a PVC for audit logs.
# See https://www.vaultproject.io/docs/audit/index.html to know more
auditStorage:
enabled: true
standalone:
enabled: false
# Run Vault in "HA" mode.
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "http://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "http://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "http://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 8200
# For Added Security, edit the below
#loadBalancerSourceRanges:
# - < Your IP RANGE Ex. 10.0.0.0/16 >
# - < YOUR SINGLE IP Ex. 1.78.23.3/32 >
what did I not configure right?
There are several issue here and they are all represented by the error messages like:
0/9 nodes are available: 1 Insufficient memory, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had volume node affinity conflict, 1 node(s) were unschedulable, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 4 Insufficient cpu.
You got 9 Nodes but none of them are available for scheduling due to a different set of conditions. Note that each Node can be affected by multiple issues and so the numbers can add up to more than what you have on total nodes.
Let's break them down one by one:
Insufficient memory: Execute kubectl describe node <node-name> to check how much free memory is available there. Check the requests and limits of your pods. Note that Kubernetes will block the full amount of memory a pod requests regardless how much this pod uses.
Insufficient cpu: Analogical as above.
node(s) didn't match pod affinity/anti-affinity: Check your affinity/anti-affinity rules.
node(s) didn't satisfy existing pods anti-affinity rules: Same as above.
node(s) had volume node affinity conflict: Happens when pod was not able to be scheduled because it cannot connect to the volume from another Availability Zone. You can fix this by creating a storageclass for a single zone and than use that storageclass in your PVC.
node(s) were unschedulable: This is because the node is marked as Unschedulable. Which leads us to the next issue below:
node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate: This corresponds to the NodeCondition Ready = False. You can use kubectl describe node to check taints and kubectl taint nodes <node-name> <taint-name>- in order to remove them. Check the Taints and Tolerations for more details.
Also there is a GitHub thread with a similar issue that you may find useful.
Try checking/eliminating those issue one by one (starting from the first listed above) as they can make a "chain reaction" in some scenarios.

Pods starting but not working in Kubernetes

Created Kubernetes cluster deployment with 3 Pods, and all are running fine, but when trying to run them cannot do it, tried doing curl the Ip (Internal)of the Pods in describe section i could see this error "" MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache:
errors below:
5m51s Normal RegisteredNode node/ip-10-1-1-4 Node ip-10-1-1-4 event: Registered Node ip-10-1-1-4 in Controller
57m Normal Scheduled pod/nginx-deployment-585449566-9bqp7 Successfully assigned default/nginx-deployment-585449566-9bqp7 to ip-10-1-1-4
57m Warning FailedMount pod/nginx-deployment-585449566-9bqp7 MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache: timed out waiting for the condition
57m Normal Pulling pod/nginx-deployment-585449566-9bqp7 Pulling image "nginx:latest"
56m Normal Pulled pod/nginx-deployment-585449566-9bqp7 Successfully pulled image "nginx:latest" in 12.092210534s
56m Normal Created pod/nginx-deployment-585449566-9bqp7 Created container nginx
56m Normal Started pod/nginx-deployment-585449566-9bqp7 Started container nginx
57m Normal Scheduled pod/nginx-deployment-585449566-9hlhz Successfully assigned default/nginx-deployment-585449566-9hlhz to ip-10-1-1-4
57m Warning FailedMount pod/nginx-deployment-585449566-9hlhz MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache: timed out waiting for the condition
57m Normal Pulling pod/nginx-deployment-585449566-9hlhz Pulling image "nginx:latest"
56m Normal Pulled pod/nginx-deployment-585449566-9hlhz Successfully pulled image "nginx:latest" in 15.127984291s
56m Normal Created pod/nginx-deployment-585449566-9hlhz Created container nginx
56m Normal Started pod/nginx-deployment-585449566-9hlhz Started container nginx
57m Normal Scheduled pod/nginx-deployment-585449566-ffkwf Successfully assigned default/nginx-deployment-585449566-ffkwf to ip-10-1-1-4
57m Warning FailedMount pod/nginx-deployment-585449566-ffkwf MountVolume.SetUp failed for volume "default-token-twhht" : failed to sync secret cache: timed out waiting for the condition
57m Normal Pulling pod/nginx-deployment-585449566-ffkwf Pulling image "nginx:latest"
56m Normal Pulled pod/nginx-deployment-585449566-ffkwf Successfully pulled image "nginx:latest" in 9.459864756s
56m Normal Created pod/nginx-deployment-585449566-ffkwf Created container nginx
You can add an additional RBAC role permission to your Pod's service account, reference 1 2 3.
Assure as well that you have the workload identity set up, reference 4.
This can also happen when apiserver is on high load, you could have more smaller nodes to spread your pods and increase your resource requests.
This error message is a bit misleading, since it suggests a K8s cluster internal connectivity problem. In reality it is an RBAC permission problem.
The default service account within the namespace you are deploying to is not authorized to mount the secret that you are trying to mount into your Pod.
To solve this, you have to add additional RBAC role permission to your Pod's service account.

Configuring the Crunchy Data PostgreSQL operator in a Digital Ocean managed kubernetes cluster

I am having trouble configuring the Crunchy Data PostgreSQL operator in my Digital Ocean managed kubernetes cluster. Per their official installation/troubleshooting guide, I changed the default storage classes in the provided manifest file to do-block-storage and I've tried toggling the disable_fsgroup value, all to no avail. I'm getting the following output from running kubectl describe... on the operator pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned pgo/postgres-operator-697fd6dbb6-n764r to test-dev-pool-35jcv
Normal Started 69s kubelet, test-dev-pool-35jcv Started container event
Normal Created 69s kubelet, test-dev-pool-35jcv Created container event
Normal Pulled 69s kubelet, test-dev-pool-35jcv Container image "registry.developers.crunchydata.com/crunchydata/pgo-event:centos7-4.5.0" already present on machine
Normal Started 68s (x2 over 69s) kubelet, test-dev-pool-35jcv Started container scheduler
Normal Created 68s (x2 over 69s) kubelet, test-dev-pool-35jcv Created container scheduler
Normal Pulled 68s (x2 over 69s) kubelet, test-dev-pool-35jcv Container image "registry.developers.crunchydata.com/crunchydata/pgo-scheduler:centos7-4.5.0" already present on machine
Normal Started 64s (x2 over 69s) kubelet, test-dev-pool-35jcv Started container operator
Normal Created 64s (x2 over 70s) kubelet, test-dev-pool-35jcv Created container operator
Normal Pulled 64s (x2 over 70s) kubelet, test-dev-pool-35jcv Container image "registry.developers.crunchydata.com/crunchydata/postgres-operator:centos7-4.5.0" already present on machine
Normal Started 64s (x2 over 70s) kubelet, test-dev-pool-35jcv Started container apiserver
Normal Created 64s (x2 over 70s) kubelet, test-dev-pool-35jcv Created container apiserver
Normal Pulled 64s (x2 over 70s) kubelet, test-dev-pool-35jcv Container image "registry.developers.crunchydata.com/crunchydata/pgo-apiserver:centos7-4.5.0" already present on machine
Warning BackOff 63s (x4 over 67s) kubelet, test-dev-pool-35jcv Back-off restarting failed container
Any ideas?
Edit: Solved! I was specifying the default storage incorrectly. The proper edits are
- name: BACKREST_STORAGE
value: "digitalocean"
- name: BACKUP_STORAGE
value: "digitalocean"
- name: PRIMARY_STORAGE
value: "digitalocean"
- name: REPLICA_STORAGE
value: "digitalocean"
- name: STORAGE5_NAME
value: "digitalocean"
- name: STORAGE5_ACCESS_MODE
value: "ReadWriteOnce"
- name: STORAGE5_SIZE
value: "1Gi"
- name: STORAGE5_TYPE
value: "dynamic"
- name: STORAGE5_CLASS
value: "do-block-storage"
See this GitHub issue for how to correctly format the file for DO.

ceph rook deployment issue found at mount pvc

I am warren . try to setup ceph by rook in my k8s environment . i followed the offical document
https://rook.io/docs/rook/v1.4/ceph-quickstart.html. almaost everythings looks well during the ceph setup. I also verified it by
ceph status
cluster:
id: 356efdf1-a1a7-4365-9ee6-b65ecf8481f9
health: HEALTH_OK
But failed at examples https://rook.io/docs/rook/v1.4/ceph-block.html, try to use block storage in the k8s environment. my k8s env is v1.18.2.
after deploy mysql and workpress. found error at pod . like below. I also checked the pv and pvc. all of them created success and bounded. so I thinks something error about mount compatibility. please help.
-----------------------------------------------------
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "wordpress-mysql-764fc64f97-qwtjd": pod has unbound immediate PersistentVolumeClaims
Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "wordpress-mysql-764fc64f97-qwtjd": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled <unknown> default-scheduler Successfully assigned default/wordpress-mysql-764fc64f97-qwtjd to master1
Normal SuccessfulAttachVolume 7m14s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-dc8567bb-c2e3-44a4-a56a-c74616059db4"
Warning FailedMount 5m11s kubelet, master1 Unable to attach or mount volumes: unmounted volumes=[mysql-persistent-storage], unattached volumes=[default-token-czg9j mysql-persistent-storage]: timed out waiting for the condition
Warning FailedMount 40s (x2 over 2m54s) kubelet, master1 Unable to attach or mount volumes: unmounted volumes=[mysql-persistent-storage], unattached volumes=[mysql-persistent-storage default-token-czg9j]: timed out waiting for the condition
Warning FailedMount 6s (x4 over 6m6s) kubelet, master1 MountVolume.MountDevice failed for volume "pvc-dc8567bb-c2e3-44a4-a56a-c74616059db4" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 110) occurred while running rbd args: [--id csi-rbd-node -m 10.109.63.94:6789,10.96.135.241:6789,10.110.131.193:6789 --keyfile=***stripped*** map replicapool/csi-vol-5ccc546b-0914-11eb-9135-62dece6c0d98 --device-type krbd], rbd error output: rbd: sysfs write failed
-------------------------------------------------

Pull an Image from a Private Registry fails - ImagePullBackOff

On our K8S Worker node with below command have created "secret" to pull images from our private (Nexus) registry.
kubectl create secret docker-registry regcred --docker-server=https://nexus-server/nexus/ --docker-username=admin --docker-password=password --docker-email=user#company.com
Created my-private-reg-pod.yaml in K8S Worker node, It has below.
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: private-reg-container
image: nexus-server:4546/ubuntu-16:version-1
imagePullSecrets:
- name: regcred
Created pod with below command
kubectl create -f my-private-reg-pod.yaml
kubectl get pods
NAME READY STATUS RESTARTS AGE
test-pod 0/1 ImagePullBackOff 0 27m
kubectl describe pod test-pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/test-pod to k8s-worker01
Warning Failed 26m (x6 over 28m) kubelet, k8s-worker01 Error: ImagePullBackOff
Normal Pulling 26m (x4 over 28m) kubelet, k8s-worker01 Pulling image "sonatype:4546/ubuntu-16:version-1"
Warning Failed 26m (x4 over 28m) kubelet, k8s-worker01 Failed to pull image "nexus-server:4546/ubuntu-16:version-1": rpc error: code = Unknown desc = Error response from daemon: Get https://nexus-server.domain.com/nexus/v2/ubuntu-16/manifests/ver-1: no basic auth credentials
Warning Failed 26m (x4 over 28m) kubelet, k8s-worker01 Error: ErrImagePull
Normal BackOff 3m9s (x111 over 28m) kubelet, k8s-worker01 Back-off pulling image "nexus-server:4546/ubuntu-16:version-1"
On terminal nexus login works
docker login nexus-server:4546
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
Something i am missing with this section?
Since my docker login to nexus succeeded on terminal, So i have deleted my secret and created with kubectl create secret generic regcred \ --from-file=.dockerconfigjson=<path/to/.docker/config.json> \ --type=kubernetes.io/dockerconfigjson it worked.