kube-dns keeps restarting with kubenetes on coreos - kubernetes

I have Kubernetes installed on Container Linux by CoreOS alpha (1353.1.0)
using hyperkube v1.5.5_coreos.0 using my fork of coreos-kubernetes install scripts at https://github.com/kfirufk/coreos-kubernetes.
I have two ContainerOS machines.
coreos-2.tux-in.com resolved as 192.168.1.2 as controller
coreos-3.tux-in.com resolved as 192.168.1.3 as worker
kubectl get pods --all-namespaces returns
NAMESPACE NAME READY STATUS RESTARTS AGE
ceph ceph-mds-2743106415-rkww4 0/1 Pending 0 1d
ceph ceph-mon-check-3856521781-bd6k5 1/1 Running 0 1d
kube-lego kube-lego-3323932148-g2tf4 1/1 Running 0 1d
kube-system calico-node-xq6j7 2/2 Running 0 1d
kube-system calico-node-xzpp2 2/2 Running 4560 1d
kube-system calico-policy-controller-610849172-b7xjr 1/1 Running 0 1d
kube-system heapster-v1.3.0-beta.0-2754576759-v1f50 2/2 Running 0 1d
kube-system kube-apiserver-192.168.1.2 1/1 Running 0 1d
kube-system kube-controller-manager-192.168.1.2 1/1 Running 1 1d
kube-system kube-dns-3675956729-r7hhf 3/4 Running 3924 1d
kube-system kube-dns-autoscaler-505723555-l2pph 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.2 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.3 1/1 Running 0 1d
kube-system kube-scheduler-192.168.1.2 1/1 Running 1 1d
kube-system kubernetes-dashboard-3697905830-vdz23 1/1 Running 1246 1d
kube-system monitoring-grafana-4013973156-m2r2v 1/1 Running 0 1d
kube-system monitoring-influxdb-651061958-2mdtf 1/1 Running 0 1d
nginx-ingress default-http-backend-150165654-s4z04 1/1 Running 2 1d
so I can see that kube-dns-782804071-h78rf keeps restarting.
kubectl describe pod kube-dns-3675956729-r7hhf --namespace=kube-system returns:
Name: kube-dns-3675956729-r7hhf
Namespace: kube-system
Node: 192.168.1.2/192.168.1.2
Start Time: Sat, 11 Mar 2017 17:54:14 +0000
Labels: k8s-app=kube-dns
pod-template-hash=3675956729
Status: Running
IP: 10.2.67.243
Controllers: ReplicaSet/kube-dns-3675956729
Containers:
kubedns:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:kubedns
Image: gcr.io/google_containers/kubedns-amd64:1.9
Image ID: rkt://sha512-c7b7c9c4393bea5f9dc5bcbe1acf1036c2aca36ac14b5e17fd3c675a396c4219
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: False
Restart Count: 981
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables:
PROMETHEUS_PORT: 10055
dnsmasq:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
Image ID: rkt://sha512-8c5f8b40f6813bb676ce04cd545c55add0dc8af5a3be642320244b74ea03f872
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
Requests:
cpu: 150m
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
dnsmasq-metrics:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq-metrics
Image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
Image ID: rkt://sha512-ceb3b6af1cd67389358be14af36b5e8fb6925e78ca137b28b93e0d8af134585b
Port: 10054/TCP
Args:
--v=2
--logtostderr
Requests:
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
healthz:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:healthz
Image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
Image ID: rkt://sha512-3a85b0533dfba81b5083a93c7e091377123dac0942f46883a4c10c25cf0ad177
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-zqbdp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zqbdp
QoS Class: Burstable
Tolerations: CriticalAddonsOnly=:Exists
No events.
which shows that kubedns-amd64:1.9 is in Ready: false
this is my kude-dns-de.yaml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
spec:
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
containers:
- name: kubedns
image: gcr.io/google_containers/kubedns-amd64:1.9
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthz-kubedns
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-map=kube-dns
# This should be set to v=2 only after the new image (cut from 1.5) has
# been released, otherwise we will flood the logs.
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
- name: dnsmasq
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
livenessProbe:
httpGet:
path: /healthz-dnsmasq
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
- --log-facility=-
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 10Mi
- name: dnsmasq-metrics
image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 10Mi
- name: healthz
image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
resources:
limits:
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
args:
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- --url=/healthz-dnsmasq
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
- --url=/healthz-kubedns
- --port=8080
- --quiet
ports:
- containerPort: 8080
protocol: TCP
dnsPolicy: Default
and this is my kube-dns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.3.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
any information regarding the issue would be greatly appreciated!
update
rkt list --full 2> /dev/null | grep kubedns shows:
744a4579-0849-4fae-b1f5-cb05d40f3734 kubedns gcr.io/google_containers/kubedns-amd64:1.9 sha512-c7b7c9c4393b running 2017-03-22 22:14:55.801 +0000 UTC 2017-03-22 22:14:56.814 +0000 UTC
journalctl -m _MACHINE_ID=744a45790849b1f5cb05d40f3734 provides:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
I tried to add - --proxy-mode=userspace to /etc/kubernetes/manifests/kube-proxy.yaml but the results are the same.
kubectl get svc --all-namespaces provides:
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ceph ceph-mon None <none> 6789/TCP 1h
default kubernetes 10.3.0.1 <none> 443/TCP 1h
kube-system heapster 10.3.0.2 <none> 80/TCP 1h
kube-system kube-dns 10.3.0.10 <none> 53/UDP,53/TCP 1h
kube-system kubernetes-dashboard 10.3.0.116 <none> 80/TCP 1h
kube-system monitoring-grafana 10.3.0.187 <none> 80/TCP 1h
kube-system monitoring-influxdb 10.3.0.214 <none> 8086/TCP 1h
nginx-ingress default-http-backend 10.3.0.233 <none> 80/TCP 1h
kubectl get cs provides:
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
my kube-proxy.yaml has the following content:
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
annotations:
rkt.alpha.kubernetes.io/stage1-name-override: coreos.com/rkt/stage1-fly
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: quay.io/coreos/hyperkube:v1.5.5_coreos.0
command:
- /hyperkube
- proxy
- --cluster-cidr=10.2.0.0/16
- --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/ssl/certs
name: "ssl-certs"
- mountPath: /etc/kubernetes/controller-kubeconfig.yaml
name: "kubeconfig"
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: "etc-kube-ssl"
readOnly: true
- mountPath: /var/run/dbus
name: dbus
readOnly: false
volumes:
- hostPath:
path: "/usr/share/ca-certificates"
name: "ssl-certs"
- hostPath:
path: "/etc/kubernetes/controller-kubeconfig.yaml"
name: "kubeconfig"
- hostPath:
path: "/etc/kubernetes/ssl"
name: "etc-kube-ssl"
- hostPath:
path: /var/run/dbus
name: dbus
this is all the valuable information I could find. any ideas? :)
update 2
output of iptables-save on the controller ContainerOS at http://pastebin.com/2GApCj0n
update 3
I ran curl on the controller node
# curl https://10.3.0.1 --insecure
Unauthorized
means it can access it properly, i didn't add enough parameters for it to be authorized right ?
update 4
thanks to #jaxxstorm I removed calico manifests, updated their quay/cni and quay/node versions and reinstalled them.
now kubedns keeps restarting, but I think that now calico works. because for the first time it tries to install kubedns on the worker node and not on the controller node, and also when I rkt enter the kubedns pod and try to wget https://10.3.0.1 I get:
# wget https://10.3.0.1
Connecting to 10.3.0.1 (10.3.0.1:443)
wget: can't execute 'ssl_helper': No such file or directory
wget: error getting response: Connection reset by peer
which clearly shows that there is some kind of response. which is good right ?
now kubectl get pods --all-namespaces shows:
kube-system kube-dns-3675956729-ljz2w 4/4 Running 88 42m
so.. 4/4 ready but it keeps restarting.
kubectl describe pod kube-dns-3675956729-ljz2w --namespace=kube-system output at http://pastebin.com/Z70U331G
so it can't connect to http://10.2.47.19:8081/readiness, i'm guessing this is the ip of kubedns since it uses port 8081. don't know how to continue investigating this issue further.
thanks for everything!

Lots of great debugging info here, thanks!
This is the clincher:
# curl https://10.3.0.1 --insecure
Unauthorized
You got an unauthorized response because you didn't pass a client cert, but that's fine, it's not what we're after. This proves that kube-proxy is working as expected and is accessible. Your rkt logs:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
Are indicating that the containers that there's network connectivity issues inside the containers, which indicates to me that you haven't configured container networking/CNI.
Please have a read through this document: https://coreos.com/rkt/docs/latest/networking/overview.html
You may also have to reconfigure calico, there's some more information that here: http://docs.projectcalico.org/master/getting-started/rkt/

kube-dns has a readiness probe that tries resolving trough the Service IP of kube-dns. Is it possible that there is a problem with your Service network?
Check out the answer and solution here:
kubernetes service IPs not reachable

Related

k8s readiness probes working in GKE, not in Microk8s (on MacOS)

I have a Kong deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-test-kong
labels:
app: local-test-kong
spec:
replicas: 1
selector:
matchLabels:
app: local-test-kong
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: local-test-kong
spec:
automountServiceAccountToken: false
containers:
- envFrom:
- configMapRef:
name: kong-env-vars
image: kong:2.6
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- /bin/sleep 15 && kong quit
livenessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: proxy
ports:
- containerPort: 8000
name: proxy
protocol: TCP
- containerPort: 8100
name: status
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /status
port: status
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: # ToDo
limits:
cpu: 256m
memory: 256Mi
requests:
cpu: 256m
memory: 256Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /kong_prefix/
name: kong-prefix-dir
- mountPath: /tmp
name: tmp-dir
- mountPath: /kong_dbless/
name: kong-custom-dbless-config-volume
terminationGracePeriodSeconds: 30
volumes:
- name: kong-prefix-dir
- name: tmp-dir
- configMap:
defaultMode: 0555
name: kong-declarative
name: kong-custom-dbless-config-volume
I applied this YAML in GKE. Then i ran kubectl describe on its pod.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-kong-678598ffc6-ll9s8 1/1 Running 0 25m
➜ kubectl describe pod/local-test-kong-678598ffc6-ll9s8
Name: local-test-kong-678598ffc6-ll9s8
Namespace: local-test-kong
Priority: 0
Node: gke-paas-cluster-prd-tf9-default-pool-e7cb502a-ggxl/10.128.64.95
Start Time: Wed, 23 Nov 2022 00:12:56 +0800
Labels: app=local-test-kong
pod-template-hash=678598ffc6
Annotations: kubectl.kubernetes.io/restartedAt: 2022-11-23T00:12:56+08:00
Status: Running
IP: 10.128.96.104
IPs:
IP: 10.128.96.104
Controlled By: ReplicaSet/local-test-kong-678598ffc6
Containers:
proxy:
Container ID: containerd://1bd392488cfe33dcc62f717b3b8831349e8cf573326add846c9c843c7bf15e2a
Image: kong:2.6
Image ID: docker.io/library/kong#sha256:62eb6d17133b007cbf5831b39197c669b8700c55283270395b876d1ecfd69a70
Ports: 8000/TCP, 8100/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 23 Nov 2022 00:12:58 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 256m
memory: 256Mi
Requests:
cpu: 256m
memory: 256Mi
Liveness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment Variables from:
kong-env-vars ConfigMap Optional: false
Environment: <none>
Mounts:
/kong_dbless/ from kong-custom-dbless-config-volume (rw)
/kong_prefix/ from kong-prefix-dir (rw)
/tmp from tmp-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kong-prefix-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kong-custom-dbless-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kong-declarative
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned local-test-kong/local-test-kong-678598ffc6-ll9s8 to gke-paas-cluster-prd-tf9-default-pool-e7cb502a-ggxl
Normal Pulled 25m kubelet Container image "kong:2.6" already present on machine
Normal Created 25m kubelet Created container proxy
Normal Started 25m kubelet Started container proxy
➜
I applied the same YAML in my localhost's MicroK8S (on MacOS) and then I ran kubectl describe on its pod.
➜ kubectl get pods
NAME READY STATUS RESTARTS AGE
local-test-kong-54cfc585cb-7grj8 1/1 Running 0 86s
➜ kubectl describe pod/local-test-kong-54cfc585cb-7grj8
Name: local-test-kong-54cfc585cb-7grj8
Namespace: local-test-kong
Priority: 0
Node: microk8s-vm/192.168.64.5
Start Time: Wed, 23 Nov 2022 00:39:33 +0800
Labels: app=local-test-kong
pod-template-hash=54cfc585cb
Annotations: cni.projectcalico.org/podIP: 10.1.254.79/32
cni.projectcalico.org/podIPs: 10.1.254.79/32
kubectl.kubernetes.io/restartedAt: 2022-11-23T00:39:33+08:00
Status: Running
IP: 10.1.254.79
IPs:
IP: 10.1.254.79
Controlled By: ReplicaSet/local-test-kong-54cfc585cb
Containers:
proxy:
Container ID: containerd://d60d09ca8b77ee59c80ea060dcb651c3e346c3a5f0147b0d061790c52193d93d
Image: kong:2.6
Image ID: docker.io/library/kong#sha256:62eb6d17133b007cbf5831b39197c669b8700c55283270395b876d1ecfd69a70
Ports: 8000/TCP, 8100/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 23 Nov 2022 00:39:37 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 256m
memory: 256Mi
Requests:
cpu: 256m
memory: 256Mi
Liveness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:status/status delay=10s timeout=5s period=10s #success=1 #failure=3
Environment Variables from:
kong-env-vars ConfigMap Optional: false
Environment: <none>
Mounts:
/kong_dbless/ from kong-custom-dbless-config-volume (rw)
/kong_prefix/ from kong-prefix-dir (rw)
/tmp from tmp-dir (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kong-prefix-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kong-custom-dbless-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kong-declarative
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 92s default-scheduler Successfully assigned local-test-kong/local-test-kong-54cfc585cb-7grj8 to microk8s-vm
Normal Pulled 90s kubelet Container image "kong:2.6" already present on machine
Normal Created 90s kubelet Created container proxy
Normal Started 89s kubelet Started container proxy
Warning Unhealthy 68s kubelet Readiness probe failed: Get "http://10.1.254.79:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 68s kubelet Liveness probe failed: Get "http://10.1.254.79:8100/status": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
➜
It's the exact same deployment YAML. However, the deployment created inside GKE cluster are running all fine with no complaints. But, the deployment created inside my localhost microk8s (on MacOS) is showing probe failures.
What could i be missing here while deploying to microk8s (on MacOS)?
Your readiness probes are failing on the local pod on port 8100. It looks like you have a firewall(s) rule preventing internal pod and/or pod to pod communication.
As per the docs:
You may need to configure your firewall to allow pod-to-pod and pod-to-internet communication:
sudo ufw allow in on cni0 && sudo ufw allow out on cni0
sudo ufw default allow routed

Kubernetes ingress controller - Error: ImagePullBackOff

I'm unable to get the controller working. Tried many times and still I get Error: ImagePullBackOff.
Is there a alternative that I can try or any idea why its failing?
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/nginx-0.27.0/deploy/static/mandatory.yaml
kubectl describe pod nginx-ingress-controller-7fcb6cffc5-m8m5c -n ingress-nginx
Name: nginx-ingress-controller-7fcb6cffc5-m8m5c
Namespace: ingress-nginx
Priority: 0
Node: ip-10-0-0-244.ap-south-1.compute.internal/10.0.0.244
Start Time: Mon, 07 Dec 2020 08:21:13 -0500
Labels: app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/part-of=ingress-nginx
pod-template-hash=7fcb6cffc5
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container nginx-ingress-controller
kubernetes.io/psp: eks.privileged
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Pending
IP: 10.0.0.231
IPs:
IP: 10.0.0.231
Controlled By: ReplicaSet/nginx-ingress-controller-7fcb6cffc5
Containers:
nginx-ingress-controller:
Container ID:
Image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master
Image ID:
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--publish-service=$(POD_NAMESPACE)/ingress-nginx
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=0s timeout=10s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-7fcb6cffc5-m8m5c (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-xtnz9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-xtnz9:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-xtnz9
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-7fcb6cffc5-m8m5c to ip-10-0-0-244.ap-south-1.compute.internal
Normal Pulling 18s kubelet Pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master"
Warning Failed 3s kubelet Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 3s kubelet Error: ErrImagePull
Normal BackOff 3s kubelet Back-off pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master"
Warning Failed 3s kubelet Error: ImagePullBackOff
I had the same problem, with the ingress-nginx installation.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/cloud/deploy.yaml
For some reason it couldn't get the ingress-nginx-controller.
$ kubectl get pods --namespace=ingress-nginx
NAME READY STATUS RE
ingress-nginx-admission-create-6q4wx 0/1 Completed 0
ingress-nginx-admission-patch-fr5ct 0/1 Completed 1
ingress-nginx-controller-686556747b-dg68h 0/1 ImagePullBackOff 0
What I did was, I ran $ kubectl describe pod ingress-nginx-controller-686556747b-dg68h --namespace ingress-nginx
and got the following output:
Name: ingress-nginx-controller-686556747b-dg68h
Namespace: ingress-nginx
Priority: 0
Node: docker-desktop/x.x.x.x
Start Time: Wed, 11 May 2022 20:11:55 +0430
Labels: app.kubernetes.io/component=controller
app.kubernetes.io/instance=ingress-nginx
app.kubernetes.io/name=ingress-nginx
pod-template-hash=686556747b
Annotations: <none>
Status: Pending
IP: x.x.x.x
IPs:
IP: x.x.x.x
Controlled By: ReplicaSet/ingress-nginx-controller-686556747b
Containers:
controller:
Container ID:
Image: k8s.gcr.io/ingress-nginx/controller:v1.2.0#sha256:d819
Image ID:
Ports: 80/TCP, 443/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
/nginx-ingress-controller
--publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
--election-id=ingress-controller-leader
--controller-class=k8s.io/ingress-nginx
--ingress-class=nginx
--configmap=$(POD_NAMESPACE)/ingress-nginx-controller
--validating-webhook=:8443
--validating-webhook-certificate=/usr/local/certificates/cert
--validating-webhook-key=/usr/local/certificates/key
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 90Mi
Liveness: http-get http://:10254/healthz delay=10s timeout=1s perio
Readiness: http-get http://:10254/healthz delay=10s timeout=1s perio
Environment:
POD_NAME: ingress-nginx-controller-686556747b-dg68h (v1:metad
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
LD_PRELOAD: /usr/local/lib/libmimalloc.so
Mounts:
/usr/local/certificates/ from webhook-cert (ro)
From Containers.controller.Image, I got the image name that kubernetes is trying to download but is unsuccessful to do so and tried to docker pull that image myself like so:
docker pull k8s.gcr.io/ingress-nginx/controller:v1.2.0#sha256:d819
Docker could pull the image successfully and after that everything worked just fine.
It's failing because kubernetes cannot download the specified image. Check the events section
Warning Failed 3s kubelet Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Maybe you dont have internet connectivity or this image does not exist. You can try running docker pull quay.io/kubernetes-ingress-controller/nginx-ingress-controller:master from your computer
As mentioned by John, creating a nat router and nat config allowed docker images to be pulled when I was facing the same issue. If you create a vpc native GKE cluster which is private it by default has no access to the internet. Unless you deploy a NAT router.
gcloud compute routers create nat-router \
--network my-vpc \
--region us-east4
gcloud compute routers nats create nat-config \
--router-region us-east4 \
--router nat-router \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/use-regex: 'true'
spec:
rules:
- host: your host name
http:
paths:
- backend:
service:
name: your service name
port:
number: 3000
path: /api/?(.*)
pathType: Prefix
I think this YAML file solves your problem. I faced the same issue.

NEG says Pods are 'unhealthy', but actually the Pods are healthy

I'm trying to apply gRPC load balancing with Ingress on GCP, and for this I referenced this example. The example shows gRPC load balancing is working by 2 ways(one with envoy side-car and the other one is HTTP mux, handling both gRPC/HTTP-health-check on same Pod.) However, the envoy proxy example doesn't work.
What makes me confused is, the Pods are running/healthy(confirmed by kubectl describe, kubectl logs)
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
fe-deployment-757ffcbd57-4w446 2/2 Running 0 4m22s
fe-deployment-757ffcbd57-xrrm9 2/2 Running 0 4m22s
$ kubectl describe pod fe-deployment-757ffcbd57-4w446
Name: fe-deployment-757ffcbd57-4w446
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc/10.128.0.64
Start Time: Thu, 26 Sep 2019 16:15:18 +0900
Labels: app=fe
pod-template-hash=757ffcbd57
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container fe-envoy; cpu request for container fe-container
Status: Running
IP: 10.56.1.29
Controlled By: ReplicaSet/fe-deployment-757ffcbd57
Containers:
fe-envoy:
Container ID: docker://b4789909494f7eeb8d3af66cb59168e009c582d412d8ca683a7f435559989421
Image: envoyproxy/envoy:latest
Image ID: docker-pullable://envoyproxy/envoy#sha256:9ef9c4fd6189fdb903929dc5aa0492a51d6783777de65e567382ac7d9a28106b
Port: 8080/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/envoy
Args:
-c
/data/config/envoy.yaml
State: Running
Started: Thu, 26 Sep 2019 16:15:19 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Liveness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data/certs from certs-volume (rw)
/data/config from envoy-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
fe-container:
Container ID: docker://a533224d3ea8b5e4d5e268a616d73762b37df69f434342459f35caa8fac32dab
Image: salrashid123/grpc_only_backend
Image ID: docker-pullable://salrashid123/grpc_only_backend#sha256:ebfac594116445dd67aff7c9e7a619d73222b60947e46ef65ee6d918db3e1f4b
Port: 50051/TCP
Host Port: 0/TCP
Command:
/grpc_server
Args:
--grpcport
:50051
--insecure
State: Running
Started: Thu, 26 Sep 2019 16:15:20 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
certs-volume:
Type: Secret (a volume populated by a Secret)
SecretName: fe-secret
Optional: false
envoy-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: envoy-configmap
Optional: false
default-token-c7nqc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c7nqc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m25s default-scheduler Successfully assigned default/fe-deployment-757ffcbd57-4w446 to gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc
Normal Pulled 4m25s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Container image "envoyproxy/envoy:latest" already present on machine
Normal Created 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Created container
Normal Started 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Started container
Normal Pulling 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc pulling image "salrashid123/grpc_only_backend"
Normal Pulled 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Successfully pulled image "salrashid123/grpc_only_backend"
Normal Created 4m24s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Created container
Normal Started 4m23s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Started container
Warning Unhealthy 4m10s (x2 over 4m20s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 4m9s (x2 over 4m19s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-l7vc Liveness probe failed: HTTP probe failed with statuscode: 503
$ kubectl describe pod fe-deployment-757ffcbd57-xrrm9
Name: fe-deployment-757ffcbd57-xrrm9
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9/10.128.0.22
Start Time: Thu, 26 Sep 2019 16:15:18 +0900
Labels: app=fe
pod-template-hash=757ffcbd57
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container fe-envoy; cpu request for container fe-container
Status: Running
IP: 10.56.0.23
Controlled By: ReplicaSet/fe-deployment-757ffcbd57
Containers:
fe-envoy:
Container ID: docker://255dd6cab1e681e30ccfe158f7d72540576788dbf6be60b703982a7ecbb310b1
Image: envoyproxy/envoy:latest
Image ID: docker-pullable://envoyproxy/envoy#sha256:9ef9c4fd6189fdb903929dc5aa0492a51d6783777de65e567382ac7d9a28106b
Port: 8080/TCP
Host Port: 0/TCP
Command:
/usr/local/bin/envoy
Args:
-c
/data/config/envoy.yaml
State: Running
Started: Thu, 26 Sep 2019 16:15:19 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Liveness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:fe/_ah/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data/certs from certs-volume (rw)
/data/config from envoy-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
fe-container:
Container ID: docker://f6a0246129cc89da846c473daaa1c1770d2b5419b6015098b0d4f35782b0a9da
Image: salrashid123/grpc_only_backend
Image ID: docker-pullable://salrashid123/grpc_only_backend#sha256:ebfac594116445dd67aff7c9e7a619d73222b60947e46ef65ee6d918db3e1f4b
Port: 50051/TCP
Host Port: 0/TCP
Command:
/grpc_server
Args:
--grpcport
:50051
--insecure
State: Running
Started: Thu, 26 Sep 2019 16:15:20 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c7nqc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
certs-volume:
Type: Secret (a volume populated by a Secret)
SecretName: fe-secret
Optional: false
envoy-config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: envoy-configmap
Optional: false
default-token-c7nqc:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c7nqc
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m8s default-scheduler Successfully assigned default/fe-deployment-757ffcbd57-xrrm9 to gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9
Normal Pulled 5m8s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Container image "envoyproxy/envoy:latest" already present on machine
Normal Created 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Created container
Normal Started 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Started container
Normal Pulling 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 pulling image "salrashid123/grpc_only_backend"
Normal Pulled 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Successfully pulled image "salrashid123/grpc_only_backend"
Normal Created 5m7s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Created container
Normal Started 5m6s kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Started container
Warning Unhealthy 4m53s (x2 over 5m3s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Readiness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 4m52s (x2 over 5m2s) kubelet, gke-ingress-grpc-loadbal-default-pool-92d3aed5-52l9 Liveness probe failed: HTTP probe failed with statuscode: 503
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fe-srv-ingress NodePort 10.123.5.165 <none> 8080:30816/TCP 6m43s
fe-srv-lb LoadBalancer 10.123.15.36 35.224.69.60 50051:30592/TCP 6m42s
kubernetes ClusterIP 10.123.0.1 <none> 443/TCP 2d2h
$ kubectl describe service fe-srv-ingress
Name: fe-srv-ingress
Namespace: default
Labels: type=fe-srv
Annotations: cloud.google.com/neg: {"ingress": true}
cloud.google.com/neg-status:
{"network_endpoint_groups":{"8080":"k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2"},"zones":["us-central1-a"]}
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"cloud.google.com/neg":"{\"ingress\": true}","service.alpha.kubernetes.io/a...
service.alpha.kubernetes.io/app-protocols: {"fe":"HTTP2"}
Selector: app=fe
Type: NodePort
IP: 10.123.5.165
Port: fe 8080/TCP
TargetPort: 8080/TCP
NodePort: fe 30816/TCP
Endpoints: 10.56.0.23:8080,10.56.1.29:8080
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Create 6m47s neg-controller Created NEG "k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2" for default/fe-srv-ingress-8080/8080 in "us-central1-a".
Normal Attach 6m40s neg-controller Attach 2 network endpoint(s) (NEG "k8s1-963b7b91-default-fe-srv-ingress-8080-e459b0d2" in zone "us-central1-a")
but NEG says they are unhealthy(so Ingress also says backend is unhealthy).
I couldn't found what caused this. Does anyone know how to solve this?
Test environment:
GKE, 1.13.7-gke.8 (VPC enabled)
Default HTTP(s) load balancer on Ingress
YAML files I used(same with the example previously mentioned),
envoy-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: envoy-configmap
labels:
app: fe
data:
config: |-
---
admin:
access_log_path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 9000
node:
cluster: service_greeter
id: test-id
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
path: "/echo.EchoServer/SayHello"
route: { cluster: local_grpc_endpoint }
http_filters:
- name: envoy.lua
config:
inline_code: |
package.path = "/etc/envoy/lua/?.lua;/usr/share/lua/5.1/nginx/?.lua;/etc/envoy/lua/" .. package.path
function envoy_on_request(request_handle)
if request_handle:headers():get(":path") == "/_ah/health" then
local headers, body = request_handle:httpCall(
"local_admin",
{
[":method"] = "GET",
[":path"] = "/clusters",
[":authority"] = "local_admin"
},"", 50)
str = "local_grpc_endpoint::127.0.0.1:50051::health_flags::healthy"
if string.match(body, str) then
request_handle:respond({[":status"] = "200"},"ok")
else
request_handle:logWarn("Envoy healthcheck failed")
request_handle:respond({[":status"] = "503"},"unavailable")
end
end
end
- name: envoy.router
typed_config: {}
tls_context:
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/data/certs/tls.crt"
private_key:
filename: "/data/certs/tls.key"
clusters:
- name: local_grpc_endpoint
connect_timeout: 0.05s
type: STATIC
http2_protocol_options: {}
lb_policy: ROUND_ROBIN
common_lb_config:
healthy_panic_threshold:
value: 50.0
health_checks:
- timeout: 1s
interval: 5s
interval_jitter: 1s
no_traffic_interval: 5s
unhealthy_threshold: 1
healthy_threshold: 3
grpc_health_check:
service_name: "echo.EchoServer"
authority: "server.domain.com"
hosts:
- socket_address:
address: 127.0.0.1
port_value: 50051
- name: local_admin
connect_timeout: 0.05s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 127.0.0.1
port_value: 9000
fe-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: fe-deployment
labels:
app: fe
spec:
replicas: 2
template:
metadata:
labels:
app: fe
spec:
containers:
- name: fe-envoy
image: envoyproxy/envoy:latest
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /_ah/health
scheme: HTTPS
port: fe
readinessProbe:
httpGet:
path: /_ah/health
scheme: HTTPS
port: fe
ports:
- name: fe
containerPort: 8080
protocol: TCP
command: ["/usr/local/bin/envoy"]
args: ["-c", "/data/config/envoy.yaml"]
volumeMounts:
- name: certs-volume
mountPath: /data/certs
- name: envoy-config-volume
mountPath: /data/config
- name: fe-container
image: salrashid123/grpc_only_backend # This runs gRPC secure/insecure server using port argument(:50051). Port 50051 is also exposed on Dockerfile.
imagePullPolicy: Always
ports:
- containerPort: 50051
protocol: TCP
command: ["/grpc_server"]
args: ["--grpcport", ":50051", "--insecure"]
volumes:
- name: certs-volume
secret:
secretName: fe-secret
- name: envoy-config-volume
configMap:
name: envoy-configmap
items:
- key: config
path: envoy.yaml
fe-srv-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
name: fe-srv-ingress
labels:
type: fe-srv
annotations:
service.alpha.kubernetes.io/app-protocols: '{"fe":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
spec:
type: NodePort
ports:
- name: fe
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: fe
fe-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: fe-ingress
annotations:
kubernetes.io/ingress.allow-http: "false"
spec:
tls:
- hosts:
- server.domain.com
secretName: fe-secret
rules:
- host: server.domain.com
http:
paths:
- path: /echo.EchoServer/*
backend:
serviceName: fe-srv-ingress
servicePort: 8080
I had to allow any traffic from IP range specified as health checks source in documentation pages - 130.211.0.0/22, 35.191.0.0/16 , seen it here: https://cloud.google.com/kubernetes-engine/docs/how-to/standalone-neg
And I had to allow it for default network and for the new network (regional) the cluster lives in.
When I added these firewall rules, health checks could reach the pods exposed in NEG used as a regional backend within a backend service of our Http(s) load balancer.
May be there is a more restrictive firewall setup, but I just cut the corners and allowed anything from IP range declared to be healthcheck source range from the page referenced above.
GCP committer says this is kind of bug, so there is no way to fix this at this time.
Related issue is this, and pull request is now progressing.

Why do pods remain in 'pending' status?

I am very confused about why my pods are staying in pending status.
Vitess seems have problem scheduling the vttablet pod on nodes. I built a 2-worker-node Kubernetes cluster (nodes A & B), and started vttablets on the cluster, but only two vttablets start normally, the other three is stay in pending state.
When I allow the master node to schedule pods, then the three pending vttablets all start on the master (first error, then running normally), and I create tables, two vttablet failed to execute.
When I add two new nodes (nodes C & D) to my kubernetes cluster, tear down vitess and restart vttablet, I find that the three vttablet pods still remain in pending
state, also if I kick off node A or node B, I get vttablet lost, and it will not restart on new node. I tear down vitess, and also tear down k8s cluster, rebuild it, and this time I use nodes C & D to build a 2-worker-node k8s cluster, and all vttablet now remain in pending status.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default etcd-global-5zh4k77slf 1/1 Running 0 46m 192.168.2.3 t-searchredis-a2 <none>
default etcd-global-f7db9nnfq9 1/1 Running 0 45m 192.168.2.5 t-searchredis-a2 <none>
default etcd-global-ksh5r9k45l 1/1 Running 0 45m 192.168.1.4 t-searchredis-a1 <none>
default etcd-operator-6f44498865-t84l5 1/1 Running 0 50m 192.168.2.2 t-searchredis-a2 <none>
default etcd-test-5g5lmcrl2x 1/1 Running 0 46m 192.168.2.4 t-searchredis-a2 <none>
default etcd-test-g4xrkk7wgg 1/1 Running 0 45m 192.168.1.5 t-searchredis-a1 <none>
default etcd-test-jkq4rjrwm8 1/1 Running 0 45m 192.168.2.6 t-searchredis-a2 <none>
default vtctld-z5d46 1/1 Running 0 44m 192.168.1.6 t-searchredis-a1 <none>
default vttablet-100 0/2 Pending 0 40m <none> <none> <none>
default vttablet-101 0/2 Pending 0 40m <none> <none> <none>
default vttablet-102 0/2 Pending 0 40m <none> <none> <none>
default vttablet-103 0/2 Pending 0 40m <none> <none> <none>
default vttablet-104 0/2 Pending 0 40m <none> <none> <none>
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2018-11-27T07:25:19Z
labels:
app: vitess
component: vttablet
keyspace: test_keyspace
shard: "0"
tablet: test-0000000100
name: vttablet-100
namespace: default
resourceVersion: "22304"
selfLink: /api/v1/namespaces/default/pods/vttablet-100
uid: 98258046-f215-11e8-b6a1-fa163e0411d1
spec:
containers:
- command:
- bash
- -c
- |-
set -e
mkdir -p $VTDATAROOT/tmp
chown -R vitess /vt
su -p -s /bin/bash -c "/vt/bin/vttablet -binlog_use_v3_resharding_mode -topo_implementation etcd2 -topo_global_server_address http://etcd-global-client:2379 -topo_global_root /global -log_dir $VTDATAROOT/tmp -alsologtostderr -port 15002 -grpc_port 16002 -service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' -tablet-path test-0000000100 -tablet_hostname $(hostname -i) -init_keyspace test_keyspace -init_shard 0 -init_tablet_type replica -health_check_interval 5s -mysqlctl_socket $VTDATAROOT/mysqlctl.sock -enable_semi_sync -enable_replication_reporter -orc_api_url http://orchestrator/api -orc_discover_interval 5m -restore_from_backup -backup_storage_implementation file -file_backup_storage_root '/usr/local/MySQL_DB_Backup/test'" vitess
env:
- name: EXTRA_MY_CNF
value: /vt/config/mycnf/master_mysql56.cnf
image: vitess/lite
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /debug/vars
port: 15002
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: vttablet
ports:
- containerPort: 15002
name: web
protocol: TCP
- containerPort: 16002
name: grpc
protocol: TCP
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/log
name: syslog
- mountPath: /vt/vtdataroot
name: vtdataroot
- mountPath: /etc/ssl/certs/ca-certificates.crt
name: certs
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-7g2jb
readOnly: true
- command:
- sh
- -c
- |-
mkdir -p $VTDATAROOT/tmp && chown -R vitess /vt
su -p -c "/vt/bin/mysqlctld -log_dir $VTDATAROOT/tmp -alsologtostderr -tablet_uid 100 -socket_file $VTDATAROOT/mysqlctl.sock -init_db_sql_file $VTROOT/config/init_db.sql" vitess
env:
- name: EXTRA_MY_CNF
value: /vt/config/mycnf/master_mysql56.cnf
image: vitess/lite
imagePullPolicy: Always
name: mysql
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/log
name: syslog
- mountPath: /vt/vtdataroot
name: vtdataroot
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-7g2jb
readOnly: true
dnsPolicy: ClusterFirst
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- hostPath:
path: /dev/log
type: ""
name: syslog
- emptyDir: {}
name: vtdataroot
- hostPath:
path: /etc/ssl/certs/ca-certificates.crt
type: ""
name: certs
- name: default-token-7g2jb
secret:
defaultMode: 420
secretName: default-token-7g2jb
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-27T07:25:19Z
message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
2 Insufficient cpu.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Guaranteed
As you can see down at the bottom:
message: '0/3 nodes are available: 1 node(s) had taints that the pod didn''t tolerate,
2 Insufficient cpu.'
Meaning that your two worker nodes are out of resources based on the limits you specified in the pod. You will need more workers, or smaller CPU requests.

How to prevent kubernates probing https?

I'm trying to run a service exposed via port 80 and 443. The SSL termination happens on the pod.
I specified only port 80 for liveness probe but for some reasons kubernates is probing https (443) as well. Why is that and how can I stop it probing 443?
Kubernates config
apiVersion: v1
kind: Secret
metadata:
name: myregistrykey
namespace: default
data:
.dockerconfigjson: xxx==
type: kubernetes.io/dockerconfigjson
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: example-com
spec:
replicas: 0
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 50%
minReadySeconds: 30
template:
metadata:
labels:
app: example-com
spec:
imagePullSecrets:
- name: myregistrykey
containers:
- name: example-com
image: DOCKER_HOST/DOCKER_IMAGE_VERSION
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
name: http
- containerPort: 443
protocol: TCP
name: https
livenessProbe:
httpGet:
scheme: "HTTP"
path: "/_ah/health"
port: 80
httpHeaders:
- name: Host
value: example.com
initialDelaySeconds: 35
periodSeconds: 35
readinessProbe:
httpGet:
scheme: "HTTP"
path: "/_ah/health"
port: 80
httpHeaders:
- name: Host
value: example.com
initialDelaySeconds: 35
periodSeconds: 35
resources:
requests:
cpu: 250m
limits:
cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
name: example-com
spec:
type: LoadBalancer
ports:
- port: 80
protocol: TCP
targetPort: 80
nodePort: 0
name: http
- port: 443
protocol: TCP
targetPort: 443
nodePort: 0
name: https
selector:
app: example-com
The error/logs on pods clearly indicate that kubernates is trying to access the service via https.
kubectl describe pod example-com-86876875c7-b75hr
Name: example-com-86876875c7-b75hr
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: aks-agentpool-37281605-0/10.240.0.4
Start Time: Sat, 17 Nov 2018 19:58:30 +0200
Labels: app=example-com
pod-template-hash=4243243173
Annotations: <none>
Status: Running
IP: 10.244.0.65
Controlled By: ReplicaSet/example-com-86876875c7
Containers:
example-com:
Container ID: docker://c5eeb03558adda435725a0df3cc2d15943966c3df53e9462e964108969c8317a
Image: example-com.azurecr.io/example-com:2018-11-17_19-58-05
Image ID: docker-pullable://example-com.azurecr.io/example-com#sha256:5d425187b8663ecfc5d6cc78f6c5dd29f1559d3687ba9d4c0421fd0ad109743e
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Sat, 17 Nov 2018 20:07:59 +0200
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sat, 17 Nov 2018 20:05:39 +0200
Finished: Sat, 17 Nov 2018 20:07:55 +0200
Ready: False
Restart Count: 3
Limits:
cpu: 500m
Requests:
cpu: 250m
Liveness: http-get http://:80/_ah/health delay=35s timeout=1s period=35s #success=1 #failure=3
Readiness: http-get http://:80/_ah/health delay=35s timeout=1s period=35s #success=1 #failure=3
Environment:
NABU: nabu
KUBERNETES_PORT_443_TCP_ADDR: agile-kube-b3e5753f.hcp.westeurope.azmk8s.io
KUBERNETES_PORT: tcp://agile-kube-b3e5753f.hcp.westeurope.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://agile-kube-b3e5753f.hcp.westeurope.azmk8s.io:443
KUBERNETES_SERVICE_HOST: agile-kube-b3e5753f.hcp.westeurope.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rcr7c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-rcr7c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rcr7c
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/example-com-86876875c7-b75hr to aks-agentpool-37281605-0
Warning Unhealthy 3m46s (x6 over 7m16s) kubelet, aks-agentpool-37281605-0 Liveness probe failed: Get https://example.com/_ah/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal Pulling 3m45s (x3 over 10m) kubelet, aks-agentpool-37281605-0 pulling image "example-com.azurecr.io/example-com:2018-11-17_19-58-05"
Normal Killing 3m45s (x2 over 6m5s) kubelet, aks-agentpool-37281605-0 Killing container with id docker://example-com:Container failed liveness probe.. Container will be killed andrecreated.
Normal Pulled 3m44s (x3 over 10m) kubelet, aks-agentpool-37281605-0 Successfully pulled image "example-com.azurecr.io/example-com:2018-11-17_19-58-05"
Normal Created 3m42s (x3 over 10m) kubelet, aks-agentpool-37281605-0 Created container
Normal Started 3m42s (x3 over 10m) kubelet, aks-agentpool-37281605-0 Started container
Warning Unhealthy 39s (x9 over 7m4s) kubelet, aks-agentpool-37281605-0 Readiness probe failed: Get https://example.com/_ah/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
As per your comments, you are doing an HTTP to HTTPS redirect in the pod and basically, the probe cannot connect to it. If you still want to serve a probe on port 80 you should consider using TCP probes. For example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: example-com
spec:
...
minReadySeconds: 30
template:
metadata:
labels:
app: example-com
spec:
imagePullSecrets:
- name: myregistrykey
containers:
- name: example-com
...
livenessProbe:
httpGet:
scheme: "HTTP"
path: "/_ah/health"
port: 80
httpHeaders:
- name: Host
value: example.com
initialDelaySeconds: 35
periodSeconds: 35
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 35
periodSeconds: 35
...
Or you can ignore some redirects in your application depending on the URL, just like mentioned in #night-gold's answer.
The problem doesn't come from Kubernetes but from your web server. Kubernetes is doing exactly what you are asking, probing the http url but your server is redirecting it to https, that is causing the error.
If you are using apache, you should look here Apache https block redirect or there if you use nginx nginx https block redirect