Pod restart very often after adding resources - kubernetes

Pod restart very often after adding resources.
Before that the resources were not added.
The pod doesn't restart at all, or maybe it only happens once or twice a day.
I'm not sure if resources affect health-check or not, so pod restarts very often.
apiVersion: apps/v1
kind: Deployment
metadata:
name: testservice-dpm
labels:
app: testservice-api
spec:
replicas: 1
selector:
matchLabels:
app: testservice-api
template:
metadata:
labels:
app: testservice-api
spec:
containers:
- name: testservice
image: testservice:v6.0.0
env:
- name: MSSQL_PORT
value: "1433"
resources:
limits:
cpu: 500m
memory: 1000Mi
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 80
name: test-p
volumeMounts:
- name: test-v
mountPath: /app/appsettings.json
subPath: appsettings.json
livenessProbe:
httpGet:
path: /api/ServiceHealth/CheckLiveness
port: 80
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 1
periodSeconds: 3
successThreshold: 1
failureThreshold: 1
readinessProbe:
httpGet:
path: /api/ServiceHealth/CheckReadiness
port: 80
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 1
periodSeconds: 3
successThreshold: 1
failureThreshold: 1
volumes:
- name: test-v
configMap:
name: testservice-config
Below are the results describe all of the testservice pods.
testservice-dpm-d7979cc69-rwxr4
(restart 7 times in 10 minutes and still Back-off restarting failed container now)
Name: testservice-dpm-d7979cc69-rwxr4
Namespace: testapi
Priority: 0
Node: workernode3/yyy.yyy.yy.yy
Start Time: Thu, 30 Dec 2021 12:48:50 +0700
Labels: app=testservice-api
pod-template-hash=d7979cc69
Annotations: kubectl.kubernetes.io/restartedAt: 2021-12-29T20:02:45Z
Status: Running
IP: xx.xxx.x.xxx
IPs:
IP: xx.xxx.x.xxx
Controlled By: ReplicaSet/testservice-dpm-d7979cc69
Containers:
testservice:
Container ID: docker://86a50f98b48bcf8bfa209a478c1127e998e36c1c7bcece71599f50feabb89834
Image: testservice:v6.0.0
Image ID: docker-pullable://testservice#sha256:57a3955d07febf4636eeda1bc6a18468aacf66e883d7f6d8d3fdcb5163724a84
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Dec 2021 12:55:13 +0700
Finished: Thu, 30 Dec 2021 12:55:19 +0700
Ready: False
Restart Count: 7
Limits:
cpu: 500m
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://:80/api/ServiceHealth/CheckLiveness delay=3s timeout=1s period=3s #success=1 #failure=1
Readiness: http-get http://:80/api/ServiceHealth/CheckReadiness delay=3s timeout=1s period=3s #success=1 #failure=1
Environment:
MSSQL_PORT: 1433
Mounts:
/app/appsettings.json from authen-v (rw,path="appsettings.json")
/etc/localtime from tz-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fd9bt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
authen-v:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: testservice-config
Optional: false
tz-config:
Type: HostPath (bare host directory volume)
Path: /usr/share/zoneinfo/Asia/Bangkok
HostPathType: File
kube-api-access-fd9bt:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned testapi/testservice-dpm-d7979cc69-rwxr4 to workernode3
Warning Unhealthy 11m (x2 over 11m) kubelet Readiness probe failed: Get "http://xx.xxx.x.xxx:80/api/ServiceHealth/CheckReadiness": dial tcp xx.xxx.x.xxx:80: connect: connection refused
Warning Unhealthy 11m (x3 over 11m) kubelet Readiness probe failed: Get "http://xx.xxx.x.xxx:80/api/ServiceHealth/CheckReadiness": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 11m (x3 over 11m) kubelet Liveness probe failed: Get "http://xx.xxx.x.xxx:80/api/ServiceHealth/CheckLiveness": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Killing 11m (x3 over 11m) kubelet Container testservice failed liveness probe, will be restarted
Normal Created 10m (x4 over 11m) kubelet Created container testservice
Normal Started 10m (x4 over 11m) kubelet Started container testservice
Normal Pulled 10m (x4 over 11m) kubelet Container image "testservice:v6.0.0" already present on machine
Warning BackOff 80s (x51 over 11m) kubelet Back-off restarting failed container
testservice-dpm-d7979cc69-7nq28
(restart 4 times in 10 minutes and running now)
Name: testservice-dpm-d7979cc69-7nq28
Namespace: testapi
Priority: 0
Node: workernode3/yyy.yyy.yy.yy
Start Time: Thu, 30 Dec 2021 12:47:37 +0700
Labels: app=testservice-api
pod-template-hash=d7979cc69
Annotations: kubectl.kubernetes.io/restartedAt: 2021-12-29T20:02:45Z
Status: Running
IP: xx.xxx.x.xxx
IPs:
IP: xx.xxx.x.xxx
Controlled By: ReplicaSet/testservice-dpm-d7979cc69
Containers:
testservice:
Container ID: docker://03739fc1694370abda202ba56928b46fb5f3ef7545f527c2dd73764e55f725cd
Image: testservice:v6.0.0
Image ID: docker-pullable://testservice#sha256:57a3955d07febf4636eeda1bc6a18468aacf66e883d7f6d8d3fdcb5163724a84
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 30 Dec 2021 12:48:44 +0700
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 30 Dec 2021 12:48:10 +0700
Finished: Thu, 30 Dec 2021 12:48:14 +0700
Ready: True
Restart Count: 4
Limits:
cpu: 500m
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://:80/api/ServiceHealth/CheckLiveness delay=3s timeout=1s period=3s #success=1 #failure=1
Readiness: http-get http://:80/api/ServiceHealth/CheckReadiness delay=3s timeout=1s period=3s #success=1 #failure=1
Environment:
MSSQL_PORT: 1433
Mounts:
/app/appsettings.json from authen-v (rw,path="appsettings.json")
/etc/localtime from tz-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-slz4b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
authen-v:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: testservice-config
Optional: false
tz-config:
Type: HostPath (bare host directory volume)
Path: /usr/share/zoneinfo/Asia/Bangkok
HostPathType: File
kube-api-access-slz4b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned testapi/testservice-dpm-d7979cc69-7nq28 to workernode3
Warning Unhealthy 14m (x2 over 14m) kubelet Readiness probe failed: Get "http://xx.xxx.x.xxx:80/api/ServiceHealth/CheckReadiness": dial tcp xx.xxx.x.xxx:80: connect: connection refused
Warning Unhealthy 14m (x3 over 14m) kubelet Readiness probe failed: Get "http://xx.xxx.x.xxx:80/api/ServiceHealth/CheckReadiness": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 14m (x3 over 14m) kubelet Liveness probe failed: Get "http://xx.xxx.x.xxx:80/api/ServiceHealth/CheckLiveness": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Killing 14m (x3 over 14m) kubelet Container testservice failed liveness probe, will be restarted
Warning BackOff 14m (x2 over 14m) kubelet Back-off restarting failed container
Normal Started 14m (x4 over 14m) kubelet Started container testservice
Normal Pulled 14m (x4 over 14m) kubelet Container image "testservice:v6.0.0" already present on machine
Normal Created 14m (x4 over 14m) kubelet Created container testservice
testservice-dpm-d7979cc69-z566c
(no restart in 10 minutes and running now)
Name: testservice-dpm-d7979cc69-z566c
Namespace: testapi
Priority: 0
Node: workernode3/yyy.yyy.yy.yy
Start Time: Thu, 30 Dec 2021 12:47:30 +0700
Labels: app=testservice-api
pod-template-hash=d7979cc69
Annotations: kubectl.kubernetes.io/restartedAt: 2021-12-29T20:02:45Z
Status: Running
IP: xx.xxx.x.xxx
IPs:
IP: xx.xxx.x.xxx
Controlled By: ReplicaSet/testservice-dpm-d7979cc69
Containers:
testservice:
Container ID: docker://19c3a672cd8453e1c5526454ffb0fbdec67fa5b17d6d8166fae38930319ed247
Image: testservice:v6.0.0
Image ID: docker-pullable://testservice#sha256:57a3955d07febf4636eeda1bc6a18468aacf66e883d7f6d8d3fdcb5163724a84
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 30 Dec 2021 12:47:31 +0700
Ready: True
Restart Count: 0
Limits:
cpu: 500m
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Liveness: http-get http://:80/api/ServiceHealth/CheckLiveness delay=3s timeout=1s period=3s #success=1 #failure=1
Readiness: http-get http://:80/api/ServiceHealth/CheckReadiness delay=3s timeout=1s period=3s #success=1 #failure=1
Environment:
MSSQL_PORT: 1433
Mounts:
/app/appsettings.json from authen-v (rw,path="appsettings.json")
/etc/localtime from tz-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cpdnc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
authen-v:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: testservice-config
Optional: false
tz-config:
Type: HostPath (bare host directory volume)
Path: /usr/share/zoneinfo/Asia/Bangkok
HostPathType: File
kube-api-access-cpdnc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned testapi/testservice-dpm-d7979cc69-z566c to workernode3
Normal Pulled 16m kubelet Container image "testservice:v6.0.0" already present on machine
Normal Created 16m kubelet Created container testservice
Normal Started 16m kubelet Started container testservice

Related

k8s readiness probes working in GKE, not in Microk8s (on MacOS)

I have a Kong deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-test-kong
labels:
app: local-test-kong
spec:
replicas: 1
selector:
matchLabels:
app: local-test-kong
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: local-test-kong
spec:
automountServiceAccountToken: false
containers:
- envFrom:
- configMapRef:
name: kong-env-vars
image: kong:2.6
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- /bin/sleep 15 && kong quit
livenessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: proxy
ports:
- containerPort: 8000
name: proxy
protocol: TCP
- containerPort: 8100
name: status
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: # ToDo
limits:
cpu: 256m
memory: 256Mi
requests:
cpu: 256m
memory: 256Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /kong_prefix/
name: kong-prefix-dir
- mountPath: /tmp
name: tmp-dir
- mountPath: /kong_dbless/
name: kong-custom-dbless-config-volume
terminationGracePeriodSeconds: 30
volumes:
- name: kong-prefix-dir
- name: tmp-dir
- configMap:
defaultMode: 0555
name: kong-declarative
name: kong-custom-dbless-config-volume
I applied this YAML in GKE. Then i ran kubectl describe on its pod.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-kong-678598ffc6-ll9s8 1/1 Running 0 25m
➜ kubectl describe pod/local-test-kong-678598ffc6-ll9s8
Name: local-test-kong-678598ffc6-ll9s8
Namespace: local-test-kong
Priority: 0
Node: gke-paas-cluster-prd-tf9-default-pool-e7cb502a-ggxl/10.128.64.95
Start Time: Wed, 23 Nov 2022 00:12:56 +0800
Labels: app=local-test-kong
pod-template-hash=678598ffc6
Annotations: kubectl.kubernetes.io/restartedAt: 2022-11-23T00:12:56+08:00
Status: Running
IP: 10.128.96.104
IPs:
IP: 10.128.96.104
Controlled By: ReplicaSet/local-test-kong-678598ffc6
Containers:
proxy:
Container ID: containerd://1bd392488cfe33dcc62f717b3b8831349e8cf573326add846c9c843c7bf15e2a
Image: kong:2.6
Image ID: docker.io/library/kong#sha256:62eb6d17133b007cbf5831b39197c669b8700c55283270395b876d1ecfd69a70
Ports: 8000/TCP, 8100/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 23 Nov 2022 00:12:58 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 256m
memory: 256Mi
Requests:
cpu: 256m
memory: 256Mi
Liveness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment Variables from:
kong-env-vars ConfigMap Optional: false
Environment: <none>
Mounts:
/kong_dbless/ from kong-custom-dbless-config-volume (rw)
/kong_prefix/ from kong-prefix-dir (rw)
/tmp from tmp-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kong-prefix-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kong-custom-dbless-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kong-declarative
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned local-test-kong/local-test-kong-678598ffc6-ll9s8 to gke-paas-cluster-prd-tf9-default-pool-e7cb502a-ggxl
Normal Pulled 25m kubelet Container image "kong:2.6" already present on machine
Normal Created 25m kubelet Created container proxy
Normal Started 25m kubelet Started container proxy
➜
I applied the same YAML in my localhost's MicroK8S (on MacOS) and then I ran kubectl describe on its pod.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-kong-54cfc585cb-7grj8 1/1 Running 0 86s
➜ kubectl describe pod/local-test-kong-54cfc585cb-7grj8
Name: local-test-kong-54cfc585cb-7grj8
Namespace: local-test-kong
Priority: 0
Node: microk8s-vm/192.168.64.5
Start Time: Wed, 23 Nov 2022 00:39:33 +0800
Labels: app=local-test-kong
pod-template-hash=54cfc585cb
Annotations: cni.projectcalico.org/podIP: 10.1.254.79/32
cni.projectcalico.org/podIPs: 10.1.254.79/32
kubectl.kubernetes.io/restartedAt: 2022-11-23T00:39:33+08:00
Status: Running
IP: 10.1.254.79
IPs:
IP: 10.1.254.79
Controlled By: ReplicaSet/local-test-kong-54cfc585cb
Containers:
proxy:
Container ID: containerd://d60d09ca8b77ee59c80ea060dcb651c3e346c3a5f0147b0d061790c52193d93d
Image: kong:2.6
Image ID: docker.io/library/kong#sha256:62eb6d17133b007cbf5831b39197c669b8700c55283270395b876d1ecfd69a70
Ports: 8000/TCP, 8100/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 23 Nov 2022 00:39:37 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 256m
memory: 256Mi
Requests:
cpu: 256m
memory: 256Mi
Liveness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment Variables from:
kong-env-vars ConfigMap Optional: false
Environment: <none>
Mounts:
/kong_dbless/ from kong-custom-dbless-config-volume (rw)
/kong_prefix/ from kong-prefix-dir (rw)
/tmp from tmp-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kong-prefix-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kong-custom-dbless-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kong-declarative
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 92s default-scheduler Successfully assigned local-test-kong/local-test-kong-54cfc585cb-7grj8 to microk8s-vm
Normal Pulled 90s kubelet Container image "kong:2.6" already present on machine
Normal Created 90s kubelet Created container proxy
Normal Started 89s kubelet Started container proxy
Warning Unhealthy 68s kubelet Readiness probe failed: Get "http://10.1.254.79:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 68s kubelet Liveness probe failed: Get "http://10.1.254.79:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
➜
It's the exact same deployment YAML. However, the deployment created inside GKE cluster are running all fine with no complaints. But, the deployment created inside my localhost microk8s (on MacOS) is showing probe failures.
What could i be missing here while deploying to microk8s (on MacOS)?
Your readiness probes are failing on the local pod on port 8100. It looks like you have a firewall(s) rule preventing internal pod and/or pod to pod communication.
As per the docs:
You may need to configure your firewall to allow pod-to-pod and pod-to-internet communication:
sudo ufw allow in on cni0 && sudo ufw allow out on cni0
sudo ufw default allow routed

K3S multiple Kube-System pods not running with Unknown Status, including DNS pods

Environmental Info:
K3s Version:
k3s version v1.24.3+k3s1 (990ba0e8)
go version go1.18.1
Node(s) CPU architecture, OS, and Version:
Five RPI 4s Running Headless 64-bit Raspbian, each with following information
Linux 5.15.56-v8+ #1575 SMP PREEMPT Fri Jul 22 20:31:26 BST 2022 aarch64 GNU/Linux
Cluster Configuration:
3 Nodes configured as control plane, 2 Nodes as Worker Nodes
Describe the bug:
The Pods: coredns-b96499967-ktgtc, local-path-provisioner-7b7dc8d6f5-5cfds, metrics-server-668d979685-9szb9, traefik-7cd4fcff68-gfmhm, and svclb-traefik-aa9f6b38-j27sw are at status unknown, with 0/1 pods ready. What this means is that the Cluster DNS service does not work and therefore that pods not are not able to resolve internal or external names
Steps To Reproduce:
Installed K3s in HA mode using following instructions: https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/
Expected behavior:
The Important pods should be running, wit known status. Additionally, DNS should work, which means that, among other things headless services should work, and pods should be able to resolve hostnames inside and outside the cluster
Actual behavior:
DNS Pods Should be running with a known state, Pods should be able to resolve hostnames inside and outside the cluster, and headless services should be able to work
Additional context / logs:
kubectl -n kube-system get configmap coredns -o go-template={{.data.Corefile}}
.:53 {
errors
health
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
hosts /etc/coredns/NodeHosts {
ttl 60
reload 15s
fallthrough
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
import /etc/coredns/custom/*.server
Description of Relevant Pods:
kubectl describe pods --namespace=kube-system
Name: coredns-b96499967-ktgtc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=kube-dns
pod-template-hash=b96499967
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-b96499967
Containers:
coredns:
Container ID: containerd://1a83a59275abdb7b783aa06eb56cb1e5367c1ca196598851c2b7d5154c0a4bb9
Image: rancher/mirrored-coredns-coredns:1.9.1
Image ID: docker.io/rancher/mirrored-coredns-coredns#sha256:35e38f3165a19cb18c65d83334c13d61db6b24905f45640aa8c2d2a6f55ebcb0
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/etc/coredns/custom from custom-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zbbxf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
custom-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns-custom
Optional: true
kube-api-access-zbbxf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x419 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11421 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m24s (x139 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: metrics-server-668d979685-9szb9
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:09:38 +0100
Labels: k8s-app=metrics-server
pod-template-hash=668d979685
Annotations: <none>
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/metrics-server-668d979685
Containers:
metrics-server:
Container ID: containerd://cd02643f7d7bc78ea98abdec20558626cfac39f70e1127b2281342dd00905e44
Image: rancher/mirrored-metrics-server:v0.5.2
Image ID: docker.io/rancher/mirrored-metrics-server#sha256:48ecad4fe641a09fa4459f93c7ad29d4916f6b9cf7e934d548f1d8eff96e2f35
Port: 4443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-djqgk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-djqgk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x418 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11427 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m27s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
Name: traefik-7cd4fcff68-gfmhm
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master0/192.168.0.68
Start Time: Fri, 05 Aug 2022 16:10:43 +0100
Labels: app.kubernetes.io/instance=traefik
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-10.19.300
pod-template-hash=7cd4fcff68
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 9100
prometheus.io/scrape: true
Status: Running
IP:
IPs: <none>
Controlled By: ReplicaSet/traefik-7cd4fcff68
Containers:
traefik:
Container ID: containerd://779a1596fb204a7577acda97e9fb3f4c5728cf1655071d8e5faad6a8d407d217
Image: rancher/mirrored-library-traefik:2.6.2
Image ID: docker.io/rancher/mirrored-library-traefik#sha256:ad2226527eea71b7591d5e9dcc0bffd0e71b2235420c34f358de6db6d529561f
Ports: 9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
--global.checknewversion
--global.sendanonymoususage
--entrypoints.metrics.address=:9100/tcp
--entrypoints.traefik.address=:9000/tcp
--entrypoints.web.address=:8000/tcp
--entrypoints.websecure.address=:8443/tcp
--api.dashboard=true
--ping=true
--metrics.prometheus=true
--metrics.prometheus.entrypoint=metrics
--providers.kubernetescrd
--providers.kubernetesingress
--providers.kubernetesingress.ingressendpoint.publishedservice=kube-system/traefik
--entrypoints.websecure.http.tls=true
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Fri, 05 Aug 2022 19:19:19 +0100
Finished: Fri, 05 Aug 2022 19:20:29 +0100
Ready: False
Restart Count: 8
Liveness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=3
Readiness: http-get http://:9000/ping delay=10s timeout=2s period=10s #success=1 #failure=1
Environment: <none>
Mounts:
/data from data (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jw4qc (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-jw4qc:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 41d (x415 over 41d) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 64m (x11418 over 42h) kubelet Pod sandbox changed, it will be killed and re-created.
Normal SandboxChanged 2m30s (x141 over 32m) kubelet Pod sandbox changed, it will be killed and re-created.
The Solution that I found to resolve the problem - at least for now, was to manually restart all of the kube-system deployments found using the command deployments
kubectl get deployments --namespace=kube-system
If all of them are similarly not ready they can be restarted using the command
kubectl -n kube-system rollout restart <deployment>
Specifically, coredns, local-path-provisioner, metrics-server, and traefik deployments all needed to be restarted

Kubernetes ingress controller - Error: ImagePullBackOff

I'm unable to get the controller working. Tried many times and still I get Error: ImagePullBackOff.
Is there a alternative that I can try or any idea why its failing?
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.27.0/deploy/static/mandatory.yaml
kubectl describe pod nginx-ingress-controller-7fcb6cffc5-m8m5c -n ingress-nginx
Name: nginx-ingress-controller-7fcb6cffc5-m8m5c
Namespace: ingress-nginx
Priority: 0
Node: ip-10-0-0-244.ap-south-1.compute.internal/10.0.0.244
Start Time: Mon, 07 Dec 2020 08:21:13 -0500
Labels: app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
pod-template-hash=7fcb6cffc5
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container nginx-ingress-controller
kubernetes.io/psp: eks.privileged
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Pending
IP: 10.0.0.231
IPs:
IP: 10.0.0.231
Controlled By: ReplicaSet/nginx-ingress-controller-7fcb6cffc5
Containers:
nginx-ingress-controller:
Container ID:
Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master
Image ID:
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--publish-service=$(POD_NAMESPACE)/ingress-nginx
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=0s timeout=10s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-7fcb6cffc5-m8m5c (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-xtnz9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-xtnz9:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-xtnz9
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-7fcb6cffc5-m8m5c to ip-10-0-0-244.ap-south-1.compute.internal
Normal Pulling 18s kubelet Pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master"
Warning Failed 3s kubelet Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 3s kubelet Error: ErrImagePull
Normal BackOff 3s kubelet Back-off pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master"
Warning Failed 3s kubelet Error: ImagePullBackOff
I had the same problem, with the ingress-nginx installation.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/cloud/deploy.yaml
For some reason it couldn't get the ingress-nginx-controller.
$ kubectl get pods --namespace=ingress-nginx
NAME READY STATUS RE
ingress-nginx-admission-create-6q4wx 0/1 Completed 0
ingress-nginx-admission-patch-fr5ct 0/1 Completed 1
ingress-nginx-controller-686556747b-dg68h 0/1 ImagePullBackOff 0
What I did was, I ran $ kubectl describe pod ingress-nginx-controller-686556747b-dg68h --namespace ingress-nginx
and got the following output:
Name: ingress-nginx-controller-686556747b-dg68h
Namespace: ingress-nginx
Priority: 0
Node: docker-desktop/x.x.x.x
Start Time: Wed, 11 May 2022 20:11:55 +0430
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=686556747b
Annotations: <none>
Status: Pending
IP: x.x.x.x
IPs:
IP: x.x.x.x
Controlled By: ReplicaSet/ingress-nginx-controller-686556747b
Containers:
controller:
Container ID:
Image: k8s.gcr.io/ingress-nginx/controller:v1.2.0#sha256:d819
Image ID:
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s perio
Readiness: http-get http://:10254/healthz delay=10s timeout=1s perio
Environment:
POD_NAME: ingress-nginx-controller-686556747b-dg68h (v1:metad
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
From Containers.controller.Image, I got the image name that kubernetes is trying to download but is unsuccessful to do so and tried to docker pull that image myself like so:
docker pull k8s.gcr.io/ingress-nginx/controller:v1.2.0#sha256:d819
Docker could pull the image successfully and after that everything worked just fine.
It's failing because kubernetes cannot download the specified image. Check the events section
Warning Failed 3s kubelet Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Maybe you dont have internet connectivity or this image does not exist. You can try running docker pull quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master from your computer
As mentioned by John, creating a nat router and nat config allowed docker images to be pulled when I was facing the same issue. If you create a vpc native GKE cluster which is private it by default has no access to the internet. Unless you deploy a NAT router.
gcloud compute routers create nat-router \
--network my-vpc \
--region us-east4
gcloud compute routers nats create nat-config \
--router-region us-east4 \
--router nat-router \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
rules:
- host: your host name
http:
paths:
- backend:
service:
name: your service name
port:
number: 3000
path: /api/?(.*)
pathType: Prefix
I think this YAML file solves your problem. I faced the same issue.

Volume is not mounted even when pvc is created

I am using helm chart for the installation of the application, the volume is not mounted. I am doing something wrong but not sure what is it. I am new to devops
values.yaml
persistence:
enabled: true
existingClaim: grafana-persistent-storage
mountPath: "/dev/grafana/"
pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-block-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Block
storageClassName: grafana-persistent-storage
resources:
requests:
storage: 10Gi
storageClass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: grafana-persistent-storage
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
iopsPerGB: "10"
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: Immediate
PVC is creaed
kubectl --kubeconfig=<configfile> get pvc -n grafana
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-block-pvc Bound pvc-6dc39e0d-471e-11ea-b432-0a505018290a 10Gi RWO grafana-persistent-storage 10m
PV created too
pvc-6dc39e0d-471e-11ea-b432-0a505018290a 10Gi RWO Retain Bound grafana/grafana-block-pvc grafana-persistent-storage 10m
Kubectl describe pod - the description of the pod created.
Name: grafana1-v1-79fb988995-lnnl6
Namespace: grafana
Priority: 0
Node: ip-10-10-108-165.ap-southeast-1.compute.internal/10.10.108.165
Start Time: Tue, 04 Feb 2020 13:15:17 +0530
Labels: app.kubernetes.io/instance=grafana1
app.kubernetes.io/name=grafana1
pod-template-hash=79fb988995
Annotations: kubernetes.io/psp: eks.privileged
sidecar.istio.io/status:
{"version":"761ebc53976754715f22fcf548f05270fb4b8db07324894aebdb31fa81d960","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.10.127.38
IPs: <none>
Controlled By: ReplicaSet/grafana1-v1-79fb988995
Init Containers:
istio-init:
Container ID: docker://a95db52c5b45c8147fb6c6d0ce4013bef6d495752dc820565188032bc36926
Image: docker.io/istio/proxy_init:1.2.5
Image ID: docker-pullable://istio/proxy_init#sha256:c9964a8c28b85cc631bbc90390eac238c90f82c8f929495d1e9f9a9135b724
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
3000
-d
15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 04 Feb 2020 13:15:18 +0530
Finished: Tue, 04 Feb 2020 13:15:19 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts: <none>
Containers:
grafana1:
Container ID: docker://92338e43bbf69a2c0919e81f5ae16948e6f7966353a3db52274a5a14902599
Image: grafana/grafana:latest
Image ID: docker-pullable://grafana/grafana#sha256:4319ca3e5592ee408f5842ce5b5955312549d89dc1572d2543f2f6d67ca619
Port: 3000/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 04 Feb 2020 13:15:23 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Environment:
GF_SECURITY_ADMIN_PASSWORD: deskera#reports
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-99rfk (ro)
istio-proxy:
Container ID: docker://21b965ec954474b3bcb941a20782f99642f002bb0e9a212aed20e19838c2f0
Image: docker.io/istio/proxyv2:1.2.5
Image ID: docker-pullable://istio/proxyv2#sha256:8f210c3d09beb6b8658a55d9ac30e25549295834a44083ed67d652ad7453e4
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
istio-proxy.grafana
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15010
--zipkinAddress
zipkin.istio-system:9411
--dnsRefreshRate
300s
--connectTimeout
10s
--proxyAdminPort
15000
--concurrency
2
--controlPlaneAuthPolicy
NONE
--statusPort
15020
--applicationPorts
3000
State: Running
Started: Tue, 04 Feb 2020 13:15:23 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
POD_NAME: grafana1-v1-79fb988995-lnnl6 (v1:metadata.name)
POD_NAMESPACE: grafana (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: grafana1-v1-79fb988995-lnnl6 (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: grafana (v1:metadata.namespace)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_INCLUDE_INBOUND_PORTS: 3000
ISTIO_METAJSON_ANNOTATIONS: {"kubernetes.io/psp":"eks.privileged"}
ISTIO_METAJSON_LABELS: {"app.kubernetes.io/instance":"grafana1","app.kubernetes.io/name":"grafana1","pod-template-hash":"79fb988995"}
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-99rfk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-99rfk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-99rfk
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned grafana/grafana1-v1-79fb988995-lnnl6 to ip-10-10-108-165.ap-southeast-1.compute.internal
Normal Pulled 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Container image "docker.io/istio/proxy_init:1.2.5" already present on machine
Normal Created 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Created container istio-init
Normal Started 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Started container istio-init
Normal Pulling 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Pulling image "grafana/grafana:latest"
Normal Pulled 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Successfully pulled image "grafana/grafana:latest"
Normal Created 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Created container grafana1
Normal Started 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Started container grafana1
Normal Pulled 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Container image "docker.io/istio/proxyv2:1.2.5" already present on machine
Normal Created 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Created container istio-proxy
Normal Started 13m kubelet, ip-10-10-108-165.ap-southeast-1.compute.internal Started container istio-proxy
Please refer the describe part of the pod. The volume is still not mounted even after changing the existing claim to pvc
persistence:
enabled: true
existingClaim: grafana-block-pvc
mountPath: "/dev/grafana/"
Claim name should be grafana-block-pvc rather than grafana-persistent-storage in your values.yaml

NEG says Pods are 'unhealthy', but actually the Pods are healthy

I'm trying to apply gRPC load balancing with Ingress on GCP, and for this I referenced this example. The example shows gRPC load balancing is working by 2 ways(one with envoy side-car and the other one is HTTP mux, handling both gRPC/HTTP-health-check on same Pod.) However, the envoy proxy example doesn't work.
What makes me confused is, the Pods are running/healthy(confirmed by kubectl describe, kubectl logs)
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
fe-deployment-757ffcbd57-4w446 2/2 Running 0 4m22s
fe-deployment-757ffcbd57-xrrm9 2/2 Running 0 4m22s
$ kubectl describe pod fe-deployment-757ffcbd57-4w446
Name: fe-deployment-757ffcbd57-4w446
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc/10.128.0.64
Start Time: Thu, 26 Sep 2019 16:15:18 +0900
Labels: app=fe
pod-template-hash=757ffcbd57
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container fe-envoy; cpu request for container fe-container
Status: Running
IP: 10.56.1.29
Controlled By: ReplicaSet/fe-deployment-757ffcbd57
Containers:
fe-envoy:
Container ID: docker://b4789909494f7eeb8d3af66cb59168e009c582d412d8ca683a7f435559989421
Image: envoyproxy/envoy:latest
Image ID: docker-pullable://envoyproxy/envoy#sha256:9ef9c4fd6189fdb903929dc5aa0492a51d6783777de65e567382ac7d9a28106b
Port: 8080/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/envoy
Args:
-c
/data/config/envoy.yaml
State: Running
Started: Thu, 26 Sep 2019 16:15:19 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Liveness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data/certs from certs-volume (rw)
/data/config from envoy-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
fe-container:
Container ID: docker://a533224d3ea8b5e4d5e268a616d73762b37df69f434342459f35caa8fac32dab
Image: salrashid123/grpc_only_backend
Image ID: docker-pullable://salrashid123/grpc_only_backend#sha256:ebfac594116445dd67aff7c9e7a619d73222b60947e46ef65ee6d918db3e1f4b
Port: 50051/TCP
Host Port: 0/TCP
Command:
/grpc_server
Args:
--grpcport
:50051
--insecure
State: Running
Started: Thu, 26 Sep 2019 16:15:20 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
certs-volume:
Type: Secret (a volume populated by a Secret)
SecretName: fe-secret
Optional: false
envoy-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: envoy-configmap
Optional: false
default-token-c7nqc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c7nqc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m25s default-scheduler Successfully assigned default/fe-deployment-757ffcbd57-4w446 to gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc
Normal Pulled 4m25s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Container image "envoyproxy/envoy:latest" already present on machine
Normal Created 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Created container
Normal Started 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Started container
Normal Pulling 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc pulling image "salrashid123/grpc_only_backend"
Normal Pulled 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Successfully pulled image "salrashid123/grpc_only_backend"
Normal Created 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Created container
Normal Started 4m23s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Started container
Warning Unhealthy 4m10s (x2 over 4m20s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 4m9s (x2 over 4m19s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Liveness probe failed: HTTP probe failed with statuscode: 503
$ kubectl describe pod fe-deployment-757ffcbd57-xrrm9
Name: fe-deployment-757ffcbd57-xrrm9
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9/10.128.0.22
Start Time: Thu, 26 Sep 2019 16:15:18 +0900
Labels: app=fe
pod-template-hash=757ffcbd57
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container fe-envoy; cpu request for container fe-container
Status: Running
IP: 10.56.0.23
Controlled By: ReplicaSet/fe-deployment-757ffcbd57
Containers:
fe-envoy:
Container ID: docker://255dd6cab1e681e30ccfe158f7d72540576788dbf6be60b703982a7ecbb310b1
Image: envoyproxy/envoy:latest
Image ID: docker-pullable://envoyproxy/envoy#sha256:9ef9c4fd6189fdb903929dc5aa0492a51d6783777de65e567382ac7d9a28106b
Port: 8080/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/envoy
Args:
-c
/data/config/envoy.yaml
State: Running
Started: Thu, 26 Sep 2019 16:15:19 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Liveness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data/certs from certs-volume (rw)
/data/config from envoy-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
fe-container:
Container ID: docker://f6a0246129cc89da846c473daaa1c1770d2b5419b6015098b0d4f35782b0a9da
Image: salrashid123/grpc_only_backend
Image ID: docker-pullable://salrashid123/grpc_only_backend#sha256:ebfac594116445dd67aff7c9e7a619d73222b60947e46ef65ee6d918db3e1f4b
Port: 50051/TCP
Host Port: 0/TCP
Command:
/grpc_server
Args:
--grpcport
:50051
--insecure
State: Running
Started: Thu, 26 Sep 2019 16:15:20 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
certs-volume:
Type: Secret (a volume populated by a Secret)
SecretName: fe-secret
Optional: false
envoy-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: envoy-configmap
Optional: false
default-token-c7nqc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c7nqc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m8s default-scheduler Successfully assigned default/fe-deployment-757ffcbd57-xrrm9 to gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9
Normal Pulled 5m8s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Container image "envoyproxy/envoy:latest" already present on machine
Normal Created 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Created container
Normal Started 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Started container
Normal Pulling 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 pulling image "salrashid123/grpc_only_backend"
Normal Pulled 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Successfully pulled image "salrashid123/grpc_only_backend"
Normal Created 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Created container
Normal Started 5m6s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Started container
Warning Unhealthy 4m53s (x2 over 5m3s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 4m52s (x2 over 5m2s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Liveness probe failed: HTTP probe failed with statuscode: 503
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fe-srv-ingress NodePort 10.123.5.165 <none> 8080:30816/TCP 6m43s
fe-srv-lb LoadBalancer 10.123.15.36 35.224.69.60 50051:30592/TCP 6m42s
kubernetes ClusterIP 10.123.0.1 <none> 443/TCP 2d2h
$ kubectl describe service fe-srv-ingress
Name: fe-srv-ingress
Namespace: default
Labels: type=fe-srv
Annotations: cloud.google.com/neg: {"ingress": true}
cloud.google.com/neg-status:
{"network_endpoint_groups":{"8080":"k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2"},"zones":["us-central1-a"]}
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"cloud.google.com/neg":"{\"ingress\": true}","service.alpha.kubernetes.io/a...
service.alpha.kubernetes.io/app-protocols: {"fe":"HTTP2"}
Selector: app=fe
Type: NodePort
IP: 10.123.5.165
Port: fe 8080/TCP
TargetPort: 8080/TCP
NodePort: fe 30816/TCP
Endpoints: 10.56.0.23:8080,10.56.1.29:8080
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Create 6m47s neg-controller Created NEG "k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2" for default/fe-srv-ingress-8080/8080 in "us-central1-a".
Normal Attach 6m40s neg-controller Attach 2 network endpoint(s) (NEG "k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2" in zone "us-central1-a")
but NEG says they are unhealthy(so Ingress also says backend is unhealthy).
I couldn't found what caused this. Does anyone know how to solve this?
Test environment:
GKE, 1.13.7-gke.8 (VPC enabled)
Default HTTP(s) load balancer on Ingress
YAML files I used(same with the example previously mentioned),
envoy-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: envoy-configmap
labels:
app: fe
data:
config: |-
---
admin:
access_log_path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 9000
node:
cluster: service_greeter
id: test-id
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
path: "/echo.EchoServer/SayHello"
route: { cluster: local_grpc_endpoint }
http_filters:
- name: envoy.lua
config:
inline_code: |
package.path = "/etc/envoy/lua/?.lua;/usr/share/lua/5.1/nginx/?.lua;/etc/envoy/lua/" .. package.path
function envoy_on_request(request_handle)
if request_handle:headers():get(":path") == "/_ah/health" then
local headers, body = request_handle:httpCall(
"local_admin",
{
[":method"] = "GET",
[":path"] = "/clusters",
[":authority"] = "local_admin"
},"", 50)
str = "local_grpc_endpoint::127.0.0.1:50051::health_flags::healthy"
if string.match(body, str) then
request_handle:respond({[":status"] = "200"},"ok")
else
request_handle:logWarn("Envoy healthcheck failed")
request_handle:respond({[":status"] = "503"},"unavailable")
end
end
end
- name: envoy.router
typed_config: {}
tls_context:
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/data/certs/tls.crt"
private_key:
filename: "/data/certs/tls.key"
clusters:
- name: local_grpc_endpoint
connect_timeout: 0.05s
type: STATIC
http2_protocol_options: {}
lb_policy: ROUND_ROBIN
common_lb_config:
healthy_panic_threshold:
value: 50.0
health_checks:
- timeout: 1s
interval: 5s
interval_jitter: 1s
no_traffic_interval: 5s
unhealthy_threshold: 1
healthy_threshold: 3
grpc_health_check:
service_name: "echo.EchoServer"
authority: "server.domain.com"
hosts:
- socket_address:
address: 127.0.0.1
port_value: 50051
- name: local_admin
connect_timeout: 0.05s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 127.0.0.1
port_value: 9000
fe-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: fe-deployment
labels:
app: fe
spec:
replicas: 2
template:
metadata:
labels:
app: fe
spec:
containers:
- name: fe-envoy
image: envoyproxy/envoy:latest
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /_ah/health
scheme: HTTPS
port: fe
readinessProbe:
httpGet:
path: /_ah/health
scheme: HTTPS
port: fe
ports:
- name: fe
containerPort: 8080
protocol: TCP
command: ["/usr/local/bin/envoy"]
args: ["-c", "/data/config/envoy.yaml"]
volumeMounts:
- name: certs-volume
mountPath: /data/certs
- name: envoy-config-volume
mountPath: /data/config
- name: fe-container
image: salrashid123/grpc_only_backend # This runs gRPC secure/insecure server using port argument(:50051). Port 50051 is also exposed on Dockerfile.
imagePullPolicy: Always
ports:
- containerPort: 50051
protocol: TCP
command: ["/grpc_server"]
args: ["--grpcport", ":50051", "--insecure"]
volumes:
- name: certs-volume
secret:
secretName: fe-secret
- name: envoy-config-volume
configMap:
name: envoy-configmap
items:
- key: config
path: envoy.yaml
fe-srv-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
name: fe-srv-ingress
labels:
type: fe-srv
annotations:
service.alpha.kubernetes.io/app-protocols: '{"fe":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
spec:
type: NodePort
ports:
- name: fe
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: fe
fe-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: fe-ingress
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- hosts:
- server.domain.com
secretName: fe-secret
rules:
- host: server.domain.com
http:
paths:
- path: /echo.EchoServer/*
backend:
serviceName: fe-srv-ingress
servicePort: 8080
I had to allow any traffic from IP range specified as health checks source in documentation pages - 130.211.0.0/22, 35.191.0.0/16 , seen it here: https://cloud.google.com/kubernetes-engine/docs/how-to/standalone-neg
And I had to allow it for default network and for the new network (regional) the cluster lives in.
When I added these firewall rules, health checks could reach the pods exposed in NEG used as a regional backend within a backend service of our Http(s) load balancer.
May be there is a more restrictive firewall setup, but I just cut the corners and allowed anything from IP range declared to be healthcheck source range from the page referenced above.
GCP committer says this is kind of bug, so there is no way to fix this at this time.
Related issue is this, and pull request is now progressing.