Prometheus server in pending state after installation using Helm - kubernetes

I am new to k8s and trying to setup prometheus monitoring for k8s. I used
"helm install" to setup prometheus. Now:
two pods are still in pending state:
prometheus-server
prometheus-alertmanager
I manually created persistent volume for both
Can anyone help me with how to map these PV with PVC created by helm chart?
[centos#k8smaster1 ~]$ kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-alertmanager-7757d759b8-x6bd7 0/2 Pending 0 44m
prometheus-kube-state-metrics-7f85b5d86c-cq9kr 1/1 Running 0 44m
prometheus-node-exporter-5rz2k 1/1 Running 0 44m
prometheus-pushgateway-5b8465d455-672d2 1/1 Running 0 44m
prometheus-server-7f8b5fc64b-w626v 0/2 Pending 0 44m
[centos#k8smaster1 ~]$ kubectl get pv
prometheus-alertmanager 3Gi RWX Retain Available 22m
prometheus-server 12Gi RWX Retain Available 30m
[centos#k8smaster1 ~]$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Pending 20m
prometheus-server Pending 20m
[centos#k8smaster1 ~]$ kubectl describe pvc prometheus-alertmanager -n monitoring
Name: prometheus-alertmanager
Namespace: monitoring
StorageClass:
Status: Pending
Volume:
Labels: app=prometheus
chart=prometheus-8.15.0
component=alertmanager
heritage=Tiller
release=prometheus
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 116s (x83 over 22m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: prometheus-alertmanager-7757d759b8-x6bd7
I am expecting the pods to get into running state
!!!UPDATE!!!
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-alertmanager Pending local-storage 4m29s
prometheus-server Pending local-storage 4m29s
[centos#k8smaster1 prometheus_pv_storage]$ kubectl describe pvc prometheus-server -n monitoring
Name: prometheus-server
Namespace: monitoring
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=prometheus
chart=prometheus-8.15.0
component=server
heritage=Tiller
release=prometheus
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 11s (x22 over 4m59s) persistentvolume-controller waiting for first consumer to be created before binding
Mounted By: prometheus-server-7f8b5fc64b-bqf42
!!UPDATE-2!!
[centos#k8smaster1 ~]$ kubectl get pods prometheus-server-7f8b5fc64b-bqf42 -n monitoring -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-08-18T16:10:54Z"
generateName: prometheus-server-7f8b5fc64b-
labels:
app: prometheus
chart: prometheus-8.15.0
component: server
heritage: Tiller
pod-template-hash: 7f8b5fc64b
release: prometheus
name: prometheus-server-7f8b5fc64b-bqf42
namespace: monitoring
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: prometheus-server-7f8b5fc64b
uid: c1979bcb-c1d2-11e9-819d-fa163ebb8452
resourceVersion: "2461054"
selfLink: /api/v1/namespaces/monitoring/pods/prometheus-server-7f8b5fc64b-bqf42
uid: c19890d1-c1d2-11e9-819d-fa163ebb8452
spec:
containers:
- args:
- --volume-dir=/etc/config
- --webhook-url=http://127.0.0.1:9090/-/reload
image: jimmidyson/configmap-reload:v0.2.2
imagePullPolicy: IfNotPresent
name: prometheus-server-configmap-reload
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-server-token-7h2df
readOnly: true
- args:
- --storage.tsdb.retention.time=15d
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
image: prom/prometheus:v2.11.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /-/healthy
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
name: prometheus-server
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /-/ready
port: 9090
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: config-volume
- mountPath: /data
name: storage-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: prometheus-server-token-7h2df
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
serviceAccount: prometheus-server
serviceAccountName: prometheus-server
terminationGracePeriodSeconds: 300
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: prometheus-server
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus-server
- name: prometheus-server-token-7h2df
secret:
defaultMode: 420
secretName: prometheus-server-token-7h2df
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-08-18T16:10:54Z"
message: '0/2 nodes are available: 1 node(s) didn''t find available persistent
volumes to bind, 1 node(s) had taints that the pod didn''t tolerate.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
Also I have the volumes created and assigned to local storage
[centos#k8smaster1 prometheus_pv]$ kubectl get pv -n monitoring
prometheus-alertmanager 3Gi RWX Retain Available local-storage 2d19h
prometheus-server 12Gi RWX Retain Available local-storage 2d19h

If you are in EKS, your node need to have the next permission
arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy
and the Amazon EBS CSI Driver Add-on

Prometheus will try to create PersiatentVolumeClaims with accessModes as ReadWriteOnce, PVC will get matched to PersistentVolume only if accessmodes are same. Change your accessmode of PV to ReadWriteOnce, it should work.

Related

Kubernetes Pod is unable to mount volumes to GCP Filestore

I am new to Kubernetes, and as a part of tutorial I have spun up a GKE cluster and a GCP Filestore instance.
Now I am trying to mount Grafana's volume to this Filestore instance. However, it is getting timed out. I am unable to decipher where the mistake lies. I need your help in addressing the issue.
PFB the output.
C:\Users\ak>kubectl describe pod/grafana-7c666cff94-vkgh4
Name: grafana-7c666cff94-vkgh4
Namespace: bc
Priority: 0
Node: gke-bc-gke-cluster-bc-nodepool-9496e187-zsnw/10.51.0.5
Start Time: Fri, 02 Sep 2022 16:21:28 +0530
Labels: app=grafana
pod-template-hash=7c666cff94
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/grafana-7c666cff94
Containers:
grafana:
Container ID:
Image: grafana/grafana:8.4.4
Image ID:
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 250m
memory: 750Mi
Liveness: tcp-socket :3000 delay=30s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:3000/robots.txt delay=10s timeout=2s period=30s #success=1 #failure=3
Environment: <none>
Mounts:
/var/lib/grafana from fileserver (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v7qjd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
fileserver:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: fileserver-claim
ReadOnly: false
kube-api-access-v7qjd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned bluecopa/grafana-7c666cff94-vkgh4 to gke-bc-gke-cluster-bc-nodepool-9496e187-zsnw
Warning FailedMount 4m15s (x11 over 40m) kubelet MountVolume.SetUp failed for volume "fileserver" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs 10.168.189.130:/bc_fs /var/lib/kubelet/pods/cf44b980-7461-4c0e-a32f-673588160692/volumes/kubernetes.io~nfs/fileserver
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs xx.xx.xx.xx:/bc_fs /var/lib/kubelet/pods/cf44b980-7461-4c0e-a32f-673588160692/volumes/kubernetes.io~nfs/fileserver]
Output: mount.nfs: Connection timed out
Warning FailedMount 3m16s (x12 over 37m) kubelet Unable to attach or mount volumes: unmounted volumes=[fileserver], unattached volumes=[fileserver kube-api-access-v7qjd]: timed out waiting for the condition
Warning FailedMount 59s (x7 over 41m) kubelet Unable to attach or mount volumes: unmounted volumes=[fileserver], unattached volumes=[kube-api-access-v7qjd fileserver]: timed out waiting for the condition
PV.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: fileserver
namespace: bluecopa
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteMany
nfs:
path: /bc_fs
server: xx.xx.xx.xx
PVC.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fileserver-claim
namespace: bluecopa
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
volumeName: fileserver
resources:
requests:
storage: 100Gi
Deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana
name: grafana
namespace: bluecopa
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: grafana/grafana:8.4.4
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 750Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: fileserver
volumes:
- name: fileserver
persistentVolumeClaim:
claimName: fileserver-claim
While using the volume mounts in pods we need to watch out for security context
Use the securitycontext as follows in deployment file
securityContext:
runAsUser: 0
Use the following security context in the deployment file
This will help you out to mount the volume without any issues.
For more information check out this documents
Doc1 &
Doc2
Here is the output of deployment pod

"No appropriate python interpreter found. command terminated with exit code 1"

I am trying to perform replication in Cassandra. The cassandra nodes are setup in kubernetes. I am running it locally. I am just a beginner. I am unable to run cqlsh on the cassandra pods.
The configuration file for the pods is:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v14
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
resources:
limits:
cpu: "100m"
memory: 1Gi
requests:
cpu: "100m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "K8Demo"
- name: CASSANDRA_DC
value: "DC1-K8Demo"
- name: CASSANDRA_RACK
value: "Rack1-K8Demo"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
# readinessProbe:
# exec:
# command:
# - /bin/bash
# - -c
# - /ready-probe.sh
# initialDelaySeconds: 20
# timeoutSeconds: 10
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
volumeMounts:
- name: cassandra-data-storage
mountPath: "/mnt/data"
volumes:
- name: cassandra-data-storage
persistentVolumeClaim:
claimName: cassandra-claim
# These are converted to volume claims by the controller
# and mounted at the paths mentioned above.
# do not use these in production until ssd GCEPersistentDisk or other ssd pd
# volumeClaimTemplates:
# - metadata:
# name: cassandra-claim
# spec:
# storageClassName: localstorage
# accessModes: ["ReadWriteOnce"]
# resources:
# requests:
# storage: 1Gi
# selector:
# matchLabels:
# type: local
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: cassandra-data-volume
labels:
type: local
spec:
storageClassName: local-storage
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cassandra-claim
spec:
storageClassName: local-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Details of pods running & the error:
ubuntu#ubuntu-VirtualBox:~/cassandra$ k get all
NAME READY STATUS RESTARTS AGE
pod/nginx-sample-9456bbbf9-9v78w 1/1 Running 2 (40m ago) 18h
pod/nginx-sample-9456bbbf9-4kdq5 1/1 Running 2 (40m ago) 18h
pod/nginx-sample-9456bbbf9-4nc8d 1/1 Running 2 (40m ago) 18h
pod/cassandra-1 1/1 Running 0 12m
pod/cassandra-2 1/1 Running 0 12m
pod/cassandra-0 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 25d
service/cassandra ClusterIP None <none> 9042/TCP 170m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-sample 3/3 3 3 18h
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-sample-9456bbbf9 3 3 3 18h
NAME READY AGE
statefulset.apps/cassandra 3/3 12m
ubuntu#ubuntu-VirtualBox:~/cassandra$ kubectl exec --stdin --tty cassandra-0 -- cqlsh
No appropriate python interpreter found.
command terminated with exit code 1
ubuntu#ubuntu-VirtualBox:~/cassandra$ kubectl exec --stdin --tty cassandra-0 -- /bin/bash
root#cassandra-0:/# cqlsh
No appropriate python interpreter found.
root#cassandra-0:/#

Kubernetes use the same volumeMount in initContainer and Container

I am trying to get a volume mounted as a non-root user in one of my containers. I'm trying an approach from this SO post using an initContainer to set the correct user, but when I try to start the configuration I get an "unbound immediate PersistentVolumneClaims" error. I suspect it's because the volume is mounted in both my initContainer and container, but I'm not sure why that would be the issue: I can see the initContainer taking the claim, but I would have thought when it exited that it would release it, letting the normal container take the claim. Any ideas or alternatives to getting the directory mounted as a non-root user? I did try using securityContext/fsGroup, but that seemed to have no effect. The /var/rdf4j directory below is the one that is being mounted as root.
Configuration:
apiVersion: v1
kind: PersistentVolume
metadata:
name: triplestore-data-storage-dir
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
storageClassName: local-storage
volumeMode: Filesystem
persistentVolumeReclaimPolicy: Delete
hostPath:
path: /run/desktop/mnt/host/d/workdir/k8s-data/triplestore
type: DirectoryOrCreate
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: triplestore-data-storage
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
storageClassName: local-storage
volumeName: "triplestore-data-storage-dir"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: triplestore
labels:
app: demo
role: triplestore
spec:
selector:
matchLabels:
app: demo
role: triplestore
replicas: 1
template:
metadata:
labels:
app: demo
role: triplestore
spec:
containers:
- name: triplestore
image: eclipse/rdf4j-workbench:amd64-3.5.0
imagePullPolicy: Always
ports:
- name: http
protocol: TCP
containerPort: 8080
resources:
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: storage
mountPath: /var/rdf4j
initContainers:
- name: take-data-dir-ownership
image: eclipse/rdf4j-workbench:amd64-3.5.0
command:
- chown
- -R
- 100:65533
- /var/rdf4j
volumeMounts:
- name: storage
mountPath: /var/rdf4j
volumes:
- name: storage
persistentVolumeClaim:
claimName: "triplestore-data-storage"
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
triplestore-data-storage Bound triplestore-data-storage-dir 10Gi RWX local-storage 13s
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
triplestore-data-storage-dir 10Gi RWX Delete Bound default/triplestore-data-storage local-storage 17s
kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
21s Warning FailedScheduling pod/triplestore-6d6876f49-2s84c 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
19s Normal Scheduled pod/triplestore-6d6876f49-2s84c Successfully assigned default/triplestore-6d6876f49-2s84c to docker-desktop
3s Normal Pulled pod/triplestore-6d6876f49-2s84c Container image "eclipse/rdf4j-workbench:amd64-3.5.0" already present on machine
3s Normal Created pod/triplestore-6d6876f49-2s84c Created container take-data-dir-ownership
3s Normal Started pod/triplestore-6d6876f49-2s84c Started container take-data-dir-ownership
2s Warning BackOff pod/triplestore-6d6876f49-2s84c Back-off restarting failed container
46m Normal Pulled pod/triplestore-6d6876f49-9n5kt Container image "eclipse/rdf4j-workbench:amd64-3.5.0" already present on machine
79s Warning BackOff pod/triplestore-6d6876f49-9n5kt Back-off restarting failed container
21s Normal SuccessfulCreate replicaset/triplestore-6d6876f49 Created pod: triplestore-6d6876f49-2s84c
21s Normal ScalingReplicaSet deployment/triplestore Scaled up replica set triplestore-6d6876f49 to 1
kubectl describe pods/triplestore-6d6876f49-tw8r8
Name: triplestore-6d6876f49-tw8r8
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.4
Start Time: Mon, 17 Jan 2022 10:17:20 -0500
Labels: app=demo
pod-template-hash=6d6876f49
role=triplestore
Annotations: <none>
Status: Pending
IP: 10.1.2.133
IPs:
IP: 10.1.2.133
Controlled By: ReplicaSet/triplestore-6d6876f49
Init Containers:
take-data-dir-ownership:
Container ID: docker://89e7b1e3ae76c30180ee5083624e1bf5f30b55fd95bf1c24422fabe41ae74408
Image: eclipse/rdf4j-workbench:amd64-3.5.0
Image ID: docker-pullable://registry.com/publicrepos/docker_cache/eclipse/rdf4j-workbench#sha256:14621ad610b0d0269dedd9939ea535348cc6c147f9bd47ba2039488b456118ed
Port: <none>
Host Port: <none>
Command:
chown
-R
100:65533
/var/rdf4j
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 17 Jan 2022 10:22:59 -0500
Finished: Mon, 17 Jan 2022 10:22:59 -0500
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/rdf4j from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8wdv (ro)
Containers:
triplestore:
Container ID:
Image: eclipse/rdf4j-workbench:amd64-3.5.0
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Environment: <none>
Mounts:
/var/rdf4j from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s8wdv (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: triplestore-data-storage
ReadOnly: false
kube-api-access-s8wdv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m24s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 6m13s default-scheduler Successfully assigned default/triplestore-6d6876f49-tw8r8 to docker-desktop
Normal Pulled 4m42s (x5 over 6m12s) kubelet Container image "eclipse/rdf4j-workbench:amd64-3.5.0" already present on machine
Normal Created 4m42s (x5 over 6m12s) kubelet Created container take-data-dir-ownership
Normal Started 4m42s (x5 over 6m12s) kubelet Started container take-data-dir-ownership
Warning BackOff 70s (x26 over 6m10s) kubelet Back-off restarting failed container
Solution
As it turns out the problem was that the initContainer wasn't running as root, it was running as the default user of the container, and so didn't have the permissions to run the chown command. In the linked SO comment, this was the first comment to the answer, with the response being that the initContainer ran as root - this has apparently changed in newer versions of kubernetes. There is a solution though, you can set the securityContext on the container to run as root, giving it permission to run the chown command, and that successfully allows the volume to be mounted as a non-root user. Here's the final configuration of the initContainer.
initContainers:
- name: take-data-dir-ownership
image: eclipse/rdf4j-workbench:amd64-3.5.0
securityContext:
runAsUser: 0
command:
- chown
- -R
- 100:65533
- /var/rdf4j
volumeMounts:
- name: storage
mountPath: /var/rdf4j
1 pod has unbound immediate PersistentVolumeClaims. - this error means the pod cannot bound to the PVC on the node where it has been scheduled to run on. This can happen when the PVC bounded to a PV that refers to a location that is not valid on the node that the pod is scheduled to run on. It will be helpful if you can post the complete output of kubectl get nodes -o wide, kubectl describe pvc triplestore-data-storage, kubectl describe pv triplestore-data-storage-dir to the question.
The mean time, PVC/PV is optional when using hostPath, can you try the following spec and see if the pod can come online:
apiVersion: apps/v1
kind: Deployment
metadata:
name: triplestore
labels:
app: demo
role: triplestore
spec:
selector:
matchLabels:
app: demo
role: triplestore
replicas: 1
template:
metadata:
labels:
app: demo
role: triplestore
spec:
containers:
- name: triplestore
image: eclipse/rdf4j-workbench:amd64-3.5.0
imagePullPolicy: IfNotPresent
ports:
- name: http
protocol: TCP
containerPort: 8080
resources:
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: storage
mountPath: /var/rdf4j
initContainers:
- name: take-data-dir-ownership
image: eclipse/rdf4j-workbench:amd64-3.5.0
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 0
command:
- chown
- -R
- 100:65533
- /var/rdf4j
volumeMounts:
- name: storage
mountPath: /var/rdf4j
volumes:
- name: storage
hostPath:
path: /run/desktop/mnt/host/d/workdir/k8s-data/triplestore
type: DirectoryOrCreate

GKE Pod not scheduled in different namespace

I want to deploy a monstache deployment in my already existing namespace "test-namespace". When I deploy it in "default" namespace it works but when I deploy it in "test-namespace" the pod does not schedule.
kubectl get pods -n test-namespace monstache-74466dc7-5tnrr -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
monstache-74466dc7-5tnrr 0/1 Pending 0 57m <none> <none> <none> <none>
and:
kubectl describe pods -n test-namespace monstache-74466dc7-5tnrr
Name: monstache-74466dc7-5tnrr
Namespace: test-namespace
Priority: 0
Node: <none>
Labels: app=monstache
pod-template-hash=74466dc7
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/monstache-74466dc7
Containers:
monstache:
Image: rwynn/monstache:latest
Port: <none>
Host Port: <none>
Command:
/bin/monstache
-f
/app/monstache.test.config.toml
Environment:
MONSTACHE_DIRECT_READ_NS: xxx.XXX
MONSTACHE_CHANGE_STREAM_NS: xxx.XXX
MONSTACHE_MONGO_URL: mongodb://xxx?replicaSet=rs0
MONSTACHE_ES_USER: elastic
MONSTACHE_ES_PASS: XXX
Mounts:
/app from monstache-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qmcwm (ro)
Volumes:
monstache-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: monstache-config
Optional: false
default-token-qmcwm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qmcwm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
and:
kubectl get events -n test-namespace
LAST SEEN TYPE REASON OBJECT MESSAGE
55m Normal SuccessfulCreate replicaset/monstache-74466dc7 Created pod: monstache-74466dc7-snrdb
55m Normal SuccessfulCreate replicaset/monstache-74466dc7 Created pod: monstache-74466dc7-5tnrr
55m Normal ScalingReplicaSet deployment/monstache Scaled up replica set monstache-74466dc7 to 1
55m Normal ScalingReplicaSet deployment/monstache Scaled up replica set monstache-74466dc7 to 1
and:
This is my monstache deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: monstache
namespace: test-namespace
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: monstache
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: monstache
spec:
containers:
- command:
- /bin/monstache
- -f
- /app/monstache.test.config.toml
env:
- name: MONSTACHE_DIRECT_READ_NS
value: xxx.xxx
- name: MONSTACHE_CHANGE_STREAM_NS
value: xxx.xxx
- name: MONSTACHE_MONGO_URL
value: mongodb://mongodb-service:27017/xxx?replicaSet=rs0
- name: MONSTACHE_ES_USER
value: elastic
- name: MONSTACHE_ES_PASS
value: XXXX
image: rwynn/monstache:latest
imagePullPolicy: Always
name: monstache
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /app
name: monstache-config
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 420
name: monstache-config
name: monstache-config
---
apiVersion: v1
data:
monstache.test.config.toml: |
resume = true
gzip = true
elasticsearch-urls = ["https://elasticsearch:9200"]
elasticsearch-max-conns = 10
elasticsearch-max-seconds = 1
elasticsearch-max-docs = 1
#namespace-regex = '*'
verbose = false
enable-http-server = true
elasticsearch-validate-pem-file = false
[[mapping]]
namespace = "XXX.XXX"
index = "XXX"
kind: ConfigMap
metadata:
name: monstache-config
namespace: test-namespace
Few more Things to know:
I already have pods scheduled in that namespace
I tried to delete the deployment an re-create
I even created a new nodepool and tried to schedule the deployment there - also didn't work.
I searched for a pod count limit and pod quota, and it does not conflict.
I have 12 namespaces in that GKE cluster
I have total 113 pods in that GKE cluster
I have some successfully scheduled monstache deployments in other namespaces in that cluster.
It happens in the 2 most recent namespaces I've created.
Any clues?
It was a bug. Re-deploying it with GKE version 1.17.14-gke.1200 solved the problem.

kubernetes Pod's readinessProbe errored but endpoint not removed from Service

I'm running Spinnaker on Kubernetes 1.10.111. One of the Spinnaker services is a Pod running a service called Clouddriver. This Pod was running fine, but then the readinessProbe started erroring continuously. Kubernetes docs say
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod.
— https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
But this Pod's IP is still in the Service's endpoints. Why?
Clouddriver Pod YAML
kubectl -n spinnaker-test get pods spin-clouddriver-5559d44484-mp8q9 -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: spotify.backend-service
creationTimestamp: 2019-02-15T20:46:38Z
generateName: spin-clouddriver-5559d44484-
labels:
app: spin
app.kubernetes.io/managed-by: halyard
app.kubernetes.io/name: clouddriver
app.kubernetes.io/part-of: spinnaker
app.kubernetes.io/version: 1.12.1
cluster: spin-clouddriver
pod-template-hash: "1115800040"
name: spin-clouddriver-5559d44484-mp8q9
namespace: spinnaker-test
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: spin-clouddriver-5559d44484
uid: ce79561c-3161-11e9-acdf-42010a800082
resourceVersion: "53541277"
selfLink: /api/v1/namespaces/spinnaker-test/pods/spin-clouddriver-5559d44484-mp8q9
uid: caa66d7c-3162-11e9-acdf-42010a800082
spec:
containers:
- env:
- name: JAVA_OPTS
value: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2
- name: SPRING_PROFILES_ACTIVE
value: local
image: gcr.io/spinnaker-marketplace/clouddriver:4.3.1-20190130095322
imagePullPolicy: IfNotPresent
lifecycle: {}
name: clouddriver
ports:
- containerPort: 7002
protocol: TCP
readinessProbe:
exec:
command:
- wget
- --no-check-certificate
- --spider
- -q
- http://localhost:7002/health
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "20"
memory: 5000Mi
requests:
cpu: "20"
memory: 5000Mi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/spinnaker/config
name: spin-clouddriver-files-1952526246
- mountPath: /home/halyard/.hal/k8s-spinnaker/staging/dependencies
name: spin-clouddriver-files-1757773194
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-w2lt5
readOnly: true
dnsPolicy: ClusterFirst
nodeName: gke-production-us-ce-terraform-201812-d63606d6-9vq9
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 720
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: spin-clouddriver-files-1952526246
secret:
defaultMode: 420
secretName: spin-clouddriver-files-1952526246
- name: spin-clouddriver-files-1757773194
secret:
defaultMode: 420
secretName: spin-clouddriver-files-1757773194
- name: default-token-w2lt5
secret:
defaultMode: 420
secretName: default-token-w2lt5
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-02-15T20:46:38Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-02-15T20:53:40Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-02-15T20:46:38Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://3509b48511b1ea7bc97812cb82831c559d9410cb9eaaa26b4f492d881603fb31
image: gcr.io/spinnaker-marketplace/clouddriver:4.3.1-20190130095322
imageID: docker-pullable://gcr.io/spinnaker-marketplace/clouddriver#sha256:466228b97b8c4a61a0270c53ae4c397eb04bc3661bc4f1ee9ef4d5fce70d187d
lastState: {}
name: clouddriver
ready: true
restartCount: 0
state:
running:
startedAt: 2019-02-15T20:47:26Z
hostIP: 10.178.32.98
phase: Running
podIP: 10.179.34.24
qosClass: Guaranteed
startTime: 2019-02-15T20:46:38Z
Describing the Pod shows the readinessProbe has been continuously erroring for over a day.
kubectl -n spinnaker-test describe pods spin-clouddriver-5559d44484-mp8q9
Name: spin-clouddriver-5559d44484-mp8q9
Namespace: spinnaker-test
Node: gke-production-us-ce-terraform-201812-d63606d6-9vq9/10.178.32.98
Start Time: Fri, 15 Feb 2019 15:46:38 -0500
Labels: app=spin
app.kubernetes.io/managed-by=halyard
app.kubernetes.io/name=clouddriver
app.kubernetes.io/part-of=spinnaker
app.kubernetes.io/version=1.12.1
cluster=spin-clouddriver
pod-template-hash=1115800040
Annotations: kubernetes.io/psp=spotify.backend-service
Status: Running
IP: 10.179.34.24
Controlled By: ReplicaSet/spin-clouddriver-5559d44484
Containers:
clouddriver:
Container ID: docker://3509b48511b1ea7bc97812cb82831c559d9410cb9eaaa26b4f492d881603fb31
Image: gcr.io/spinnaker-marketplace/clouddriver:4.3.1-20190130095322
Image ID: docker-pullable://gcr.io/spinnaker-marketplace/clouddriver#sha256:466228b97b8c4a61a0270c53ae4c397eb04bc3661bc4f1ee9ef4d5fce70d187d
Port: 7002/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 15 Feb 2019 15:47:26 -0500
Ready: True
Restart Count: 0
Limits:
cpu: 20
memory: 5000Mi
Requests:
cpu: 20
memory: 5000Mi
Readiness: exec [wget --no-check-certificate --spider -q http://localhost:7002/health] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
JAVA_OPTS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2
SPRING_PROFILES_ACTIVE: local
Mounts:
/home/halyard/.hal/k8s-spinnaker/staging/dependencies from spin-clouddriver-files-1757773194 (rw)
/opt/spinnaker/config from spin-clouddriver-files-1952526246 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-w2lt5 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
spin-clouddriver-files-1952526246:
Type: Secret (a volume populated by a Secret)
SecretName: spin-clouddriver-files-1952526246
Optional: false
spin-clouddriver-files-1757773194:
Type: Secret (a volume populated by a Secret)
SecretName: spin-clouddriver-files-1757773194
Optional: false
default-token-w2lt5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-w2lt5
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m (x321 over 1d) kubelet, gke-production-us-ce-terraform-201812-d63606d6-9vq9 Readiness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded
But Service still has the Pod's IP of 10.179.34.24 in its Endpoints.
kubectl -n spinnaker-test describe services spin-clouddriver
Name: spin-clouddriver
Namespace: spinnaker-test
Labels: app=spin
cluster=spin-clouddriver
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"spin","cluster":"spin-clouddriver"},"name":"spin-clouddriver","namesp...
Selector: app=spin,cluster=spin-clouddriver
Type: ClusterIP
IP: 10.178.65.100
Port: <unset> 7002/TCP
TargetPort: 7002/TCP
Endpoints: 10.179.34.24:7002
Session Affinity: None
Events: <none>
kubectl -n spinnaker-test describe endpoints spin-clouddriver
Name: spin-clouddriver
Namespace: spinnaker-test
Labels: app=spin
cluster=spin-clouddriver
Annotations: <none>
Subsets:
Addresses: 10.179.34.24
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
<unset> 7002 TCP
Events: <none>
footnotes
GKE 1.10.11-gke.1 to be exact, but the fact that it's GKE shouldn't matter.
A probe by the kubelet can end in one of three states:
successful
failed (command returned a non-0 exit code)
errored (command did not return before the timeout elapsed, the command does not exist inside the container, etc)
Here is the code (in 1.10.11) where the event probe errored is recorded. Note that err != nil.
Here is the code that calls the above function - when err != nil (the probe returned an error), the result is discarded.
Only probes that fail will actually cause the pod's ready state to be changed.