ITNOA
I try to creating redis enterprise cluster with redis operator.
For declaration of my cluster I write something like below
apiVersion: "app.redislabs.com/v1"
kind: "RedisEnterpriseCluster"
metadata:
name: "harbor-cluster"
spec:
nodes: 3
persistentSpec:
enabled: false
redisEnterpriseNodeResources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
But my problem is even I set presistentSpec to false, I see kubectl describe pvc redis-enterprise-storage-harbor-cluster-0 show redis try to claim pv and my bootstrapping of my pods is failed.
Name: redis-enterprise-storage-harbor-cluster-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=redis-enterprise
redis.io/cluster=harbor-cluster
redis.io/role=node
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: harbor-cluster-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 108s (x1321 over 5h31m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
If I run kubectl get pods you can see harbor-cluster-0 does not ready (because bootstrapping of redis pod is failed)
NAME READY STATUS RESTARTS AGE
harbor-cluster-0 1/2 Running 0 72s
harbor-cluster-services-rigger-557b6f75c8-hgfzj 1/1 Running 0 73s
redis-enterprise-operator-7f8d8548c5-qvd48 2/2 Running 0 6h16m
my question is how to resolve it?
Posting comment as the community wiki answer for better visibility
Is it possible that you had previously created a Redis Enterprise Cluster with the same name before? I am thinking the PVC could be from a previous run. Can you check if the PVC is older than the REC by comparing their creation timestamp?
Related
I am simply trying to deploy this Grafana app as-is, no changes to the YAML have been made: https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/
VMs are Ubuntu 20.04 LTS. The Kubernetes cluster is made up of the Control-Plane/Mstr & 3x Worker nodes:
root#k8s-master:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 35d v1.24.2
k8s-worker1 Ready worker 4h24m v1.24.2
k8s-worker2 Ready worker 4h24m v1.24.2
k8s-worker3 Ready worker 4h24m v1.24.2v
Other K8s Pods such as NGINX run without issue.
However, the Grafana pod cannot start and is stuck in a Pending state:
root#k8s-master:~# kubectl create -f grafana.yaml
persistentvolumeclaim/grafana-pvc created
deployment.apps/grafana created
service/grafana created
# time passed here...
root#k8s-master:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-9bd5bbd6b-k7ljz 0/1 Pending 0 3h39m
Troubleshooting this, I found there is an issue with the storage PersistentVolumeClaim (the pvc):
root#k8s-master:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-pvc Pending 2m22s
root#k8s-master:~#
root#k8s-master:~# kubectl describe pvc grafana-pvc
Name: grafana-pvc
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: grafana-9bd5bbd6b-k7ljz
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 6s (x11 over 2m30s) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
UPDATE:
I created a StorageClass and set it as default:
root#k8s-master:~# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
generic (default) no-provisioner Delete Immediate false 19m
I also created a PersistentVolume:
root#k8s-master:~# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Released default/task-pv-claim manual 12m
However, now when I try to deploy the Grafana PVC it is still stuck - why?
root#k8s-master:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-pvc Pending generic 4m16s
root#k8s-master:~# kubectl describe pvc grafana-pvc
Name: grafana-pvc
Namespace: default
StorageClass: generic
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: no-provisioner
volume.kubernetes.io/storage-provisioner: no-provisioner
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: grafana-9bd5bbd6b-mmqs6
grafana-9bd5bbd6b-pvhtm
grafana-9bd5bbd6b-rtwgj
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 12s (x19 over 4m27s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "no-provisioner" or manually created by system administrator
I have tried creating a Grafana configuration file from the documentation, and was able to create successfully. The pod has a Running state, also the PVC(PersistentVolumeClaim) shows the Storage class as standard.
The below is the output of PVC:
$ kubectl describe pvc grafana-pvc
Name: grafana-pvc
Namespace: default
StorageClass: standard
Status: Bound
Volume: pvc-ee20cc5d-6ca5-4075-b5f3-d1a6323a5241
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: grafana-75789d79d4-wbgtv
Events: <none>
But in your use case the StorageClass field is showing as empty. So, try deleting the existing one and recreate the Grafana configuration file. If you were not able to create and are still facing the same error message which is “no persistent volumes available for this claim and no storage class is set” then you will have to create PV(PersistentVolume).
Because, your error says, "Your PVC hasn't found a matching PV and you also haven't mentioned any storageClass name". After you create the PersistentVolumeClaim, the Kubernetes control plane looks for a PersistentVolume that satisfies the claim's requirements. If the control plane finds a suitable PersistentVolume with the same StorageClass, it binds the claim to the volume.
In order to resolve your issue you will need to create a StorageClass with no-provisioner and then create a PV(PersistentVolume) by defining this storageClassName. Then you have to create PVC and Pod/Deployment .
Refer to stackpost1 and stackpost2 for more information.
We deployed OpenSearch using Kubernetes according documentation instructions on 3 nodes cluster (https://opensearch.org/docs/latest/opensearch/install/helm/) , after deployment pods are on Pending state and when checking it, we see following msg:
"
persistentvolume-controller no persistent volumes available for this claim and no storage class is set
"
Can you please advise what could be wrong in our OpenSearch/Kubernetes deployment or what can be missing from configuration perspective?
sharing some info:
Cluster nodes:
[root#I***-M1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ir***-m1 Ready control-plane,master 4h34m v1.23.4
ir***-w1 Ready 3h41m v1.23.4
ir***-w2 Ready 3h19m v1.23.4
Pods State:
[root#I****1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
opensearch-cluster-master-0 0/1 Pending 0 80m
opensearch-cluster-master-1 0/1 Pending 0 80m
opensearch-cluster-master-2 0/1 Pending 0 80m
[root#I****M1 ~]# kubectl describe pvc
Name: opensearch-cluster-master-opensearch-cluster-master-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app.kubernetes.io/instance=my-deployment
app.kubernetes.io/name=opensearch
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: opensearch-cluster-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 2m24s (x18125 over 3d3h) persistentvolume-controller **no persistent
volumes available for this claim and no storage class is set**
.....
[root#IR****M1 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM
POLICY STATUS CLAIM STORAGECLASS REASON AGE
opensearch-cluster-master-opensearch-cluster-master-0 30Gi RWO Retain Available manual 6h24m
opensearch-cluster-master-opensearch-cluster-master-1 30Gi RWO Retain Available manual 6h22m
opensearch-cluster-master-opensearch-cluster-master-2 30Gi RWO Retain Available manual 6h23m
task-pv-volume 60Gi RWO Retain Available manual 7h48m
[root#I****M1 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
opensearch-cluster-master-opensearch-cluster-master-0 Pending 3d3h
opensearch-cluster-master-opensearch-cluster-master-1 Pending 3d3h
opensearch-cluster-master-opensearch-cluster-master-2 Pending 3d3h
...no storage class is set...
Try upgrade your deployment with storage class, presumed you run on AWS EKS: helm upgrade my-deployment opensearch/opensearch --set persistence.storageClass=gp2
If you are running on GKE, change gp2 to standard. On AKS change to default.
i've looked through several solutions but couldnt find an answer, so i'm trying to run a stateful set on the cluster, but the pod fails to run because of unbound claim. I'm running t2.large machines with Bottlerocket host types.
kubectl get events
28m Warning FailedScheduling pod/carabbitmq-0 pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
28m Normal Scheduled pod/carabbitmq-0 Successfully assigned default/carabbitmq-0 to ip-x.compute.internal
28m Normal SuccessfulAttachVolume pod/carabbitmq-0 AttachVolume.Attach succeeded for volume "pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7"
28m Normal Pulled pod/carabbitmq-0 Container image "busybox:1.30.1" already present on machine
kubectl get pv,pvc + describe
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-carabbitmq-0 Bound pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7 30Gi RWO gp2 12m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7 30Gi RWO Retain Bound rabbitmq/data-carabbitmq-0 gp2 12m
describe pv:
Name: pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7
Labels: failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1b
Annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pv-protection]
StorageClass: gp2
Status: Bound
Claim: rabbitmq/data-carabbitmq-0
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 30Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [eu-west-1b]
failure-domain.beta.kubernetes.io/region in [eu-west-1]
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://eu-west-1b/vol-xx
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
describe pvc:
Name: data-carabbitmq-0
Namespace: rabbitmq
StorageClass: gp2
Status: Bound
Volume: pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7
Labels: app=rabbitmq-ha
release=rabbit-mq
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 30Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: carabbitmq-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ProvisioningSucceeded 36m persistentvolume-controller Successfully provisioned volume pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7 using kubernetes.io/aws-ebs
The storage type is gp2.
Name: gp2
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/aws-ebs
Parameters: encrypted=true,type=gp2
AllowVolumeExpansion: <unset>
MountOptions:
debug
ReclaimPolicy: Retain
VolumeBindingMode: Immediate
Events: <none>
I'm not sure what i'm missing, same configuration used to work until i switched to "t" type of EC2s
So, it was weird but i had some readiness probe that failed its healthchecks, i thought that it was because the volume was not mounted well.
The healthcheck basically did some request to localhost, which it had issues on (not sure why) - changing to 127.0.0.1 made the check pass, and then the volume error disappeard.
So - if you have this weird issue (volumes were mounted, but you still get that error) - check the pod's probes.
I deployed the Datadog agent using the Datadog Helm chart which deploys a Daemonset in Kubernetes. However when checking the state of the Daemonset I saw it was not creating all pods:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
datadog-agent-datadog 5 2 2 2 2 <none> 1h
When describing the Daemonset to figure out what was going wrong I saw it did not have enough resources:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedPlacement 42s (x6 over 42s) daemonset-controller failed to place pod on "ip-10-0-1-124.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
Warning FailedPlacement 42s (x6 over 42s) daemonset-controller failed to place pod on "<ip>": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
Warning FailedPlacement 42s (x5 over 42s) daemonset-controller failed to place pod on "<ip>": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
Warning FailedPlacement 42s (x7 over 42s) daemonset-controller failed to place pod on "<ip>": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
Normal SuccessfulCreate 42s daemonset-controller Created pod: datadog-agent-7b2kp
However, I have the Cluster-autoscaler installed in the cluster and configured properly (It does trigger on regular Pod deployments that do not have enough resources to schedule), but it does not seem to trigger on the Daemonset:
I0424 14:14:48.545689 1 static_autoscaler.go:273] No schedulable pods
I0424 14:14:48.545700 1 static_autoscaler.go:280] No unschedulable pods
The AutoScalingGroup has enough nodes left:
Did I miss something in the configuration of the Cluster-autoscaler? What can I do to make sure it triggers on Daemonset resources as well?
Edit:
Describe of the Daemonset
Name: datadog-agent
Selector: app=datadog-agent
Node-Selector: <none>
Labels: app=datadog-agent
chart=datadog-1.27.2
heritage=Tiller
release=datadog-agent
Annotations: deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 5
Current Number of Nodes Scheduled: 2
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=datadog-agent
Annotations: checksum/autoconf-config: 38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
checksum/checksd-config: 38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
checksum/confd-config: 38e0b9de817f645c4bec37c0d4a3e58baecccb040f5718dc069a72c7385a0bed
Service Account: datadog-agent
Containers:
datadog:
Image: datadog/agent:6.10.1
Port: 8125/UDP
Host Port: 0/UDP
Limits:
cpu: 200m
memory: 256Mi
Requests:
cpu: 200m
memory: 256Mi
Liveness: http-get http://:5555/health delay=15s timeout=5s period=15s #success=1 #failure=6
Environment:
DD_API_KEY: <set to the key 'api-key' in secret 'datadog-secret'> Optional: false
DD_LOG_LEVEL: INFO
KUBERNETES: yes
DD_KUBERNETES_KUBELET_HOST: (v1:status.hostIP)
DD_HEALTH_PORT: 5555
Mounts:
/host/proc from procdir (ro)
/host/sys/fs/cgroup from cgroups (ro)
/var/run/docker.sock from runtimesocket (ro)
/var/run/s6 from s6-run (rw)
Volumes:
runtimesocket:
Type: HostPath (bare host directory volume)
Path: /var/run/docker.sock
HostPathType:
procdir:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
cgroups:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
s6-run:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedPlacement 33m (x6 over 33m) daemonset-controller failed to place pod on "ip-10-0-2-144.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
Normal SuccessfulCreate 33m daemonset-controller Created pod: datadog-agent-7b2kp
Warning FailedPlacement 16m (x25 over 33m) daemonset-controller failed to place pod on "ip-10-0-1-124.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1810, capacity: 2000
Warning FailedPlacement 16m (x25 over 33m) daemonset-controller failed to place pod on "ip-10-0-2-174.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
Warning FailedPlacement 16m (x25 over 33m) daemonset-controller failed to place pod on "ip-10-0-3-250.eu-west-1.compute.internal": Node didn't have enough resource: cpu, requested: 200, used: 1860, capacity: 2000
You can add priorityClassName to point to a high priority PriorityClass to your DaemonSet. Kubernetes will then remove other pods in order to run the DaemonSet's pods. If that results in unschedulable pods, cluster-autoscaler should add a node to schedule them on.
See the docs (Most examples based on that) (For some pre-1.14 versions, the apiVersion is likely a beta (1.11-1.13) or alpha version (1.8 - 1.10) instead)
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "High priority class for essential pods"
Apply it to your workload
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: datadog-agent
spec:
template:
metadata:
labels:
app: datadog-agent
name: datadog-agent
spec:
priorityClassName: high-priority
serviceAccountName: datadog-agent
containers:
- image: datadog/agent:latest
############ Rest of template goes here
You should understand how cluster autoscaler works. It is responsible only for adding or removing nodes. It is not responsible for creating or destroying pods. So in your case cluster autoscaler is not doing anything because it's useless. Even if you add one more node - there will be still a requirement to run DaemonSet pods on nodes where is not enough CPU. That's why it is not adding nodes.
What you should do is to manually remove some pods from occupied nodes. Then it will be able to schedule DaemonSet pods.
Alternatively you can reduce CPU requests of Datadog to, for example, 100m or 50m. This should be enough to start those pods.
I'm experiencing a strange problem using k8s 1.3.2 on GCE. I have a 100GB disk set up, and a valid (and Bound) PersistentVolume. However, my PersistentVolumeClaim is showing up with a capacity of 0, even though its status is Bound, and the pod that is trying to use it is stuck in ContainerCreating.
Hopefully the outputs from kubectl below summarise the problem:
$ gcloud compute disks list
NAME ZONE SIZE_GB TYPE STATUS
disk100-001 europe-west1-d 100 pd-standard READY
gke-unrest-micro-pool-199acc6c-3p31 europe-west1-d 100 pd-standard READY
gke-unrest-micro-pool-199acc6c-4q55 europe-west1-d 100 pd-standard READY
$ kubectl get pv
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv-disk100-001 100Gi RWO Bound default/graphite-statsd-claim 2m
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
graphite-statsd-claim Bound pv-disk100-001 0 3m
$ kubectl describe pvc
Name: graphite-statsd-claim
Namespace: default
Status: Bound
Volume: pv-disk100-001
Labels: <none>
Capacity: 0
Access Modes:
$ kubectl describe pv
Name: pv-disk100-001
Labels: <none>
Status: Bound
Claim: default/graphite-statsd-claim
Reclaim Policy: Recycle
Access Modes: RWO
Capacity: 100Gi
Message:
Source:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: disk100-001
FSType: ext4
Partition: 0
ReadOnly: false
# Events for pod that is supposed to mount this volume:
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
6h 1m 183 {kubelet gke-unrest-micro-pool-199acc6c-4q55} Warning FailedMount Unable to mount volumes for pod "graphite-statsd-1873928417-i05ef_default(bf9fa0e5-4d8e-11e6-881c-42010af001fe)": timeout expired waiting for volumes to attach/mount for pod "graphite-statsd-1873928417-i05ef"/"default". list of unattached/unmounted volumes=[graphite-data]
6h 1m 183 {kubelet gke-unrest-micro-pool-199acc6c-4q55} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "graphite-statsd-1873928417-i05ef"/"default". list of unattached/unmounted volumes=[graphite-data]
# Extract from deploy yaml file:
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-disk100-001
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
gcePersistentDisk:
pdName: disk100-001
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: graphite-statsd-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
Any help gratefully received!
Dan, the first issue "PVC capacity 0" looks like a bug. I opened https://github.com/kubernetes/kubernetes/issues/29425 you can track it there.
The second issue sounds like https://github.com/kubernetes/kubernetes/issues/29166 which is currently under investigation. Feel free to add your repro information on there with your logs, and I'll take a look.