k8s pod readiness probe fails with connection refused, but pod is serving requests just fine - kubernetes

I'm having a hard time understanding why a pods readiness probe is failing.
Warning Unhealthy 21m (x2 over 21m) kubelet, REDACTED Readiness probe failed: Get http://192.168.209.74:8081/actuator/health: dial tcp 192.168.209.74:8081: connect: connection refused
If I exec into this pod (or in fact into any other I have for that application), I can run a curl against that very URL without issue:
kubectl exec -it REDACTED-l2z5w /bin/bash
$ curl -v http://192.168.209.74:8081/actuator/health
$ curl -v http://192.168.209.74:8081/actuator/health
* Expire in 0 ms for 6 (transfer 0x5611b949ff50)
* Trying 192.168.209.74...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5611b949ff50)
* Connected to 192.168.209.74 (192.168.209.74) port 8081 (#0)
> GET /actuator/health HTTP/1.1
> Host: 192.168.209.74:8081
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200
< Set-Cookie: CM_SESSIONID=E62390F0FF8C26D51C767835988AC690; Path=/; HttpOnly
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Tue, 02 Jun 2020 15:07:21 GMT
<
* Connection #0 to host 192.168.209.74 left intact
{"status":"UP",...REDACTED..}
I'm getting this behavior from both a Docker-for-Desktop k8s cluster on my Mac as well as an OpenShift cluster.
The readiness probe is shown like this in kubectl describe:
Readiness: http-get http://:8081/actuator/health delay=20s timeout=3s period=5s #success=1 #failure=10
The helm chart has this to configure it:
readinessProbe:
failureThreshold: 10
httpGet:
path: /actuator/health
port: 8081
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
I cannot fully rule out that HTTP proxy settings are to blame, but the k8s docs say that HTTP_PROXY is ignored for checks since v1.13, so it shouldn't happen locally.
The OpenShift k8s version is 1.11, my local one is 1.16.

Describing events always show the last event on the resource you are checking. The thing is that the last event logged was an error while checking the readinessProbe.
I tested it in my lab with the following pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: readiness-exec
spec:
containers:
- name: readiness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- sleep 30; touch /tmp/healthy; sleep 600
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
As can be seen, a file /tmp/healthy will be created in the pod after 30 seconds and the readinessProbe will check if the file exists after 5 seconds and repeat the check after every 5 seconds.
Describing this pod will give me that:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m56s default-scheduler Successfully assigned default/readiness-exec to yaki-118-2
Normal Pulling 7m55s kubelet, yaki-118-2 Pulling image "k8s.gcr.io/busybox"
Normal Pulled 7m55s kubelet, yaki-118-2 Successfully pulled image "k8s.gcr.io/busybox"
Normal Created 7m55s kubelet, yaki-118-2 Created container readiness
Normal Started 7m55s kubelet, yaki-118-2 Started container readiness
Warning Unhealthy 7m25s (x6 over 7m50s) kubelet, yaki-118-2 Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory
The readinessProbe looked for the file 6 times with no success and it's completely right as I configured it to check every 5 seconds and the file was created after 30 seconds.
What you think is a problem is actually the expected behavior. Your Events is telling you that the readinessProbe failed to check 21 minutes ago. It means actually that your pod is healthy since 21 minutes ago.

Related

Why am I seeing LoadBalancerNegNotReady when this Deployment has no services or ingresses connected to it?

The Pod in my Deployment takes 20+ minutes to be marked as healthy every time there is a new revision.
The logs show the ready and live endpoints responding with 200.
The application is not throwing errors.
Doing a describe on the pod I see this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal LoadBalancerNegNotReady 52s neg-readiness-reflector Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-mypod]
Normal Scheduled 52s default-scheduler Successfully assigned mypod to mycluster
Normal Pulling 51s kubelet Pulling image "gcr.io/myproject/myimage"
Normal Pulled 51s kubelet Successfully pulled image "gcr.io/myproject/myimage" in 196.170923ms
Normal Created 51s kubelet Created container custom-metrics
Normal Started 51s kubelet Started container custom-metrics
Warning Unhealthy 50s kubelet Readiness probe failed: Get "http://10.123.123.123:80/api/ready": dial tcp 10.123.123.123:80: connect: connection refused
I know there is an initial fail of the probe, but it starts responding healthy very quickly after that as I see in the logs.
I'm at a loss. Do I need to set initialDelaySeconds or something? I don't see how that would change anything.
These are my healthcheck settings:
livenessProbe:
timeoutSeconds: 30
httpGet:
port: 80
path: /api/live
readinessProbe:
timeoutSeconds: 30
httpGet:
port: 80
path: /api/ready

Using HTTP basic authentication for readinessProbe when using GKE Ingress

I'm using a nosqlclient docker image with GKE. There is a default health-check URL available with the image at /healthcheck . However, when I try to enable authentication for the app, it also enables authentication for this URL. I need to use GKE Ingress along with the app. GKE Ingress requires that I create an HTTP readinessProbe which can return a 200. However, when I try to use this path for readinessCheck, the readiness check fails to work. The strange thing is, no readiness check logs comes up when I run kubectl describe pods <pod_name>. This is part of my deployment yaml file:
...
spec:
containers:
- name: mongoclient
image: mongoclient/mongoclient:2.2.0
resources:
requests:
memory: "32Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 3000
env:
- name: MONGOCLIENT_AUTH
value: "true"
- name: MONGOCLIENT_USERNAME
value: "admin"
- name: MONGOCLIENT_PASSWORD
value: "password"
readinessProbe:
httpGet:
httpHeaders:
- name: "Authorization"
value: "Basic YWRtaW46cGFzc3dvcmQ="
port: 3000
path: /healthcheck
initialDelaySeconds: 60
timeoutSeconds: 5
...
When I try a curl with authorization from the pod, it returns 200 though:
node#mongoclient-deployment-7c6856d6f6-mkxqh:/opt/meteor/dist/bundle$ curl -i http://localhost:3000/healthcheck
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Authorization Required"
Date: Fri, 17 Jan 2020 18:02:20 GMT
Connection: keep-alive
Transfer-Encoding: chunked
Unauthorizednode#mongoclient-deployment-7c6856d6f6-mkxqh:/opt/meteor/dist/bundle$ curl -i http://admin:password#localhost:3000/healthcheck
HTTP/1.1 200 OK
Date: Fri, 17 Jan 2020 18:02:30 GMT
Connection: keep-alive
Transfer-Encoding: chunked
Server is up and running !
node#mongoclient-deployment-86bc77cc5b-9qg67:/opt/meteor/dist/bundle$ curl -i -H "Authorization: Basic YWRtaW46cGFzc3dvcmQ=" http://localhost:3000/healthcheck
HTTP/1.1 200 OK
Date: Sat, 18 Jan 2020 07:19:49 GMT
Connection: keep-alive
Transfer-Encoding: chunked
Some further information:
> kubectl get pods -l app=mongoclient-app -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mongoclient-deployment-7c6856d6f6-mkxqh 1/1 Running 0 5m25s 10.28.1.152 ************************************** <none> 0/1
> kubectl describe pods -l app=mongoclient-app
...
Liveness: http-get http://:3000/healthcheck delay=70s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:3000/healthcheck delay=60s timeout=5s period=10s #success=1 #failure=3
...
I can't find any information on passing such a custom header through Ingress by making use of backend config resource . Even if such a thing works, I'm using this same ingress for other services and meddling with Ingress in such a scenario doesn't seem to be a good thing.
I'm new to GKE and Kubernetes. So, I'm not sure if there are any other places to look for. The pod logs didn't provide much insights about access patterns. How can I proceed in this situation?
Update 1: So, I upgraded a development cluster to 1.15.7-gke.2 as it support custom headers for ingress, and added the following:
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: mongoclient-backendconfig
spec:
timeoutSec: 300
connectionDraining:
drainingTimeoutSec: 400
sessionAffinity:
affinityType: "GENERATED_COOKIE"
affinityCookieTtlSec: 86400
customRequestHeaders:
headers:
- "Authorization: Basic YWRtaW46cGFzc3dvcmQ="
Though the headers are coming up in the load balancer backend, the readiness check time out:
Normal Scheduled 15m default-scheduler Successfully assigned default/mongoclient-deployment-86bc77cc5b-9qg67 to gke-kubernetes-default-pool-50ccdc3e-d608
Normal LoadBalancerNegNotReady 15m (x2 over 15m) neg-readiness-reflector Waiting for pod to become healthy in at least one of the NEG(s): [k8s1-00c7387d-default-mongoclient-mayamd-ai-service-80-292db9f4]
Normal Pulled 15m kubelet, gke-kubernetes-default-pool-50ccdc3e-d608 Container image "mongoclient/mongoclient:2.2.0" already present on machine
Normal Created 15m kubelet, gke-kubernetes-default-pool-50ccdc3e-d608 Created container mongoclient
Normal Started 15m kubelet, gke-kubernetes-default-pool-50ccdc3e-d608 Started container mongoclient
Normal LoadBalancerNegTimeout 5m43s neg-readiness-reflector Timeout waiting for pod to become healthy in at least one of the NEG(s): [k8s1-00c7387d-default-mongoclient-mayamd-ai-service-80-292db9f4]. Marking condition "cloud.google.com/load-balancer-neg-ready" to True.

Getting 403 Forbidden from envoy when attempting to curl between sidecar enabled pods

I'm using a Kubernetes/Istio setup and my list of pods and services are as below:
NAME READY STATUS RESTARTS AGE
hr--debug-deployment-86575cffb6-wl6rx 2/2 Running 0 33m
hr--hr-deployment-596946948d-jrd7g 2/2 Running 0 33m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hr--debug-service ClusterIP 10.104.160.61 <none> 80/TCP 33m
hr--hr-service ClusterIP 10.102.117.177 <none> 80/TCP 33m
I'm attempting to curl into hr--hr-service from hr--debug-deployment-86575cffb6-wl6rx
pasan#ubuntu:~/product-vick$ kubectl exec -it hr--debug-deployment-86575cffb6-wl6rx /bin/bash
Defaulting container name to debug.
Use 'kubectl describe pod/hr--debug-deployment-86575cffb6-wl6rx -n default' to see all of the containers in this pod.
root#hr--debug-deployment-86575cffb6-wl6rx:/# curl hr--hr-service -v
* Rebuilt URL to: hr--hr-service/
* Trying 10.102.117.177...
* Connected to hr--hr-service (10.102.117.177) port 80 (#0)
> GET / HTTP/1.1
> Host: hr--hr-service
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 403 Forbidden
< date: Thu, 03 Jan 2019 04:06:17 GMT
< server: envoy
< content-length: 0
<
* Connection #0 to host hr--hr-service left intact
Can you please explain why I'm getting a 403 forbidden by envoy and how I can troubleshoot it?
If you have the envoy sidecar injected it really depends on what type of authentication policy you have between your services. Are you using a MeshPolicy or a Policy?
You can also try disabling authentication between your services to debug. Something like this (if your policy is defined like this):
apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
name: "hr--hr-service"
spec:
targets:
- name: hr--hr-service
peers:
- mTLS:
mode: PERMISSIVE

Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy

Link: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Started: 2018-12-01
Title: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Body:
I have windows 10 home (1803 update) host machine, Virtual Box 5.22, 2 guest ubuntu 18.04.1 servers.
Each guest has 2 networks: NAT (host IP 10.0.2.15) and shared host-only with gateway IP 192.168.151.1.
I set IPs:
for k8s master(ubuk8sma) - 192.168.151.21
for worker1 (ubuk8swrk1) - 192.168.151.22
I remained docker as is, version is 18.09.0.
I installed k8s version stable-1.12 on master and worker. For master init is:
K8S_POD_CIDR='10.244.0.0/16'
K8S_IP_ADDR='192.168.151.21'
K8S_VER='stable-1.12' # or latest
sudo kubeadm init --pod-network-cidr=${K8S_POD_CIDR} --apiserver-advertise-address=${K8S_IP_ADDR} --kubernetes-version ${K8S_VER} --ignore-preflight-errors=all
Why I set "ignore errors" flag:
[ERROR SystemVerification]: unsupported docker version: 18.09.0
I was reluctant to reinstall k8s fully compatible docker version (may be not very smart move, just I'm usually eager to try the latest stuff).
For CNI I installed flannel network:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
After installing worker1 nodes state looks like:
u1#ubuk8sma:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuk8sma Ready master 6d v1.12.2
ubuk8swrk1 Ready <none> 4d1h v1.12.2
No big issues shown up. Next I wanted is to have visualization of this pretty k8s bundle ecosystem, so I headed towards installing k8s dashboard.
I followed "defaults" path, with zero intervention, if possible. I used this yaml:
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
From basic level it looks like installed, deployed to worker Pod, running. From pod list info:
u1#ubuk8sma:~$ kubectl get all --namespace=kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-576cbf47c7-4tzm9 1/1 Running 5 6d
pod/coredns-576cbf47c7-tqtpw 1/1 Running 5 6d
pod/etcd-ubuk8sma 1/1 Running 7 6d
pod/kube-apiserver-ubuk8sma 1/1 Running 7 6d
pod/kube-controller-manager-ubuk8sma 1/1 Running 11 6d
pod/kube-flannel-ds-amd64-rt442 1/1 Running 3 4d1h
pod/kube-flannel-ds-amd64-zx78x 1/1 Running 5 6d
pod/kube-proxy-6b6mc 1/1 Running 6 6d
pod/kube-proxy-zcchn 1/1 Running 3 4d1h
pod/kube-scheduler-ubuk8sma 1/1 Running 10 6d
pod/kubernetes-dashboard-77fd78f978-crl7b 1/1 Running 1 2d1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 6d
service/kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 2d1h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6d
...
daemonset.apps/kube-proxy 2 2 2 2 2 <none> 6d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2 2 2 2 6d
deployment.apps/kubernetes-dashboard 1 1 1 1 2d1h
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-576cbf47c7 2 2 2 6d
replicaset.apps/kubernetes-dashboard-77fd78f978 1 1 1 2d1h
I started proxy for both API server and dashboard service validation:
kubectl proxy
Version check for API server:
u1#ubuk8sma:~$ curl http://localhost:8001/version
{
"major": "1",
"minor": "12",
"gitVersion": "v1.12.2",
"gitCommit": "17c77c7898218073f14c8d573582e8d2313dc740",
"gitTreeState": "clean",
"buildDate": "2018-10-24T06:43:59Z",
"goVersion": "go1.10.4",
"compiler": "gc",
"platform": "linux/amd64"
}
And here is problem I'm writing this question about:
u1#ubuk8sma:~$ curl "http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/"
Error: 'dial tcp 10.244.1.8:8443: i/o timeout'
Trying to reach: 'https://10.244.1.8:8443/'
Fragment of Pod info:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://fb0937959c7680046130e670c483877e4c0f1854870cb0b20ed4fe066d72df18
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
imageID: docker-pullable://k8s.gcr.io/kubernetes-dashboard-amd64#sha256:1d2e1229a918f4bc38b5a3f9f5f11302b3e71f8397b492afac7f273a0008776a
lastState:
terminated:
containerID: docker://f85e1cc50f59adbd8a13d42694aef7c5e726c07b3d852a26288c4bfc1124c718
exitCode: 2
finishedAt: 2018-11-30T06:53:21Z
reason: Error
startedAt: 2018-11-29T07:16:07Z
name: kubernetes-dashboard
ready: true
restartCount: 1
state:
running:
startedAt: 2018-11-30T06:53:23Z
hostIP: 10.0.2.15
phase: Running
podIP: 10.244.1.8
qosClass: BestEffort
startTime: 2018-11-29T07:16:04Z
Docker check on worker1 node:
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
fb0937959c... sha256:0dab2435c100... "/dashboard --insecure-bind-address=0.0.0.0 --bind-address=0.0.0.0 --auto-generate-certificates" 27 hours ago Up 27 hours k8s_kubernetes-dashboard_kube...
Tried to check Pod logs, no luck:
DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
kubectl -n kube-system logs $DASHBOARD_POD_NAME
Error from server (NotFound): the server could not find the requested resource ( pods/log kubernetes-dashboard-77fd78f978-crl7b)
Tried to wget from API server:
API_SRV_POD_NAME='kube-apiserver-ubuk8sma'
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME wget https://10.244.1.8:8443/
No response.
Tried to check dashboard service existence, no luck:
u1#ubuk8sma:~$ kubectl get svc $DASHBOARD_SVC_NAME
Error from server (NotFound): services "kubernetes-dashboard" not found
Checked IP route table on API server:
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME ip route show
default via 10.0.2.2 dev enp0s3 src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 scope link src 10.0.2.15
10.0.2.2 dev enp0s3 scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
192.168.151.0/24 dev enp0s8 scope link src 192.168.151.21
For reference, enp0s3 is NAT NIC adapter, enp0s8 - host-only one.
I see flannel route 10.244.1.x. Seems to be the issue is hardly about network misconfig (but I can be wrong).
So, dashboard Pod looks like running, but has some errors and I cannot diagnose which ones. Could you help to find root cause and ideally make dashboard service run without errors?
Thanks in advance, folks!
Update1:
I see events on master:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 11h kubelet, ubuk8swrk1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "43191144d447d0e9da52c8b6600bd96a23fab1e96c79af8c8fedc4e4e50882c7" network for pod "kubernetes-dashboard-77fd78f978-crl7b": NetworkPlugin cni failed to set up pod "kubernetes-dashboard-77fd78f978-crl7b_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 11h (x4 over 11h) kubelet, ubuk8swrk1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 11h kubelet, ubuk8swrk1 Container image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0" already present on machine
Normal Created 11h kubelet, ubuk8swrk1 Created container
Normal Started 11h kubelet, ubuk8swrk1 Started container
Error about subnet.env absence - a bit strange, as both master and minion have it (well, maybe created on the fly):
u1#ubuk8swrk1:~$ ls -la /run/flannel/subnet.env
-rw-r--r-- 1 root root 96 Dec 3 08:15 /run/flannel/subnet.env
This is dashboard service descriptor:
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: deployment.kubernetes.io/revision: 1
Selector: k8s-app=kubernetes-dashboard
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=kubernetes-dashboard
Service Account: kubernetes-dashboard
Containers:
kubernetes-dashboard:
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kubernetes-dashboard-77fd78f978 (1/1 replicas created)
Events: <none>
This is reduced description of pods(original yaml is 35K, too much to share):
Name: coredns-576cbf47c7-4tzm9
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Status: Running
IP: 10.244.0.14
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0efcd043407d93fb9d052045828489f6b99bb59b4f0882ec89e1897071609b77
Image: k8s.gcr.io/coredns:1.2.2
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 6
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: etcd-ubuk8sma
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: ubuk8sma/10.0.2.15
Labels: component=etcd
tier=control-plane
Status: Running
IP: 10.0.2.15
Containers:
etcd:
Container ID: docker://ba2bdcf5fa558beabdd8578628d71480d595d5ee3bb5c4edf42407419010144b
Image: k8s.gcr.io/etcd:3.2.24
Image ID: docker-pullable://k8s.gcr.io/etcd#sha256:905d7ca17fd02bc24c0eba9a062753aba15db3e31422390bc3238eb762339b20
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://127.0.0.1:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://127.0.0.1:2380
--initial-cluster=ubuk8sma=https://127.0.0.1:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
--name=ubuk8sma
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 03 Dec 2018 08:12:56 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 28 Nov 2018 09:31:46 +0000
Finished: Mon, 03 Dec 2018 08:12:35 +0000
Ready: True
Restart Count: 8
Liveness: exec [/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo] delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: kube-apiserver-ubuk8sma
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
Containers:
kube-apiserver:
Container ID: docker://099b2a30772b969c3919b57fd377980673f03a820afba6034daa70f011271a52
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.151.21
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Mon, 03 Dec 2018 08:13:00 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 8
Liveness: http-get https://192.168.151.21:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Events: <none>
Name: kube-flannel-ds-amd64-rt442
Namespace: kube-system
Node: ubuk8swrk1/10.0.2.15
Status: Running
IP: 10.0.2.15
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://a6377b0fe1b040235c24e9ca19455c56e77daecf688b212cfea5553b6e59ff68
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Ready: True
Restart Count: 4
Containers:
kube-flannel:
Container ID: docker://f7029bc2144c1ab8654407d742c1079df0059d418b7ba86b886091b5ad8c34a3
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 4
Events: <none>
Name: kube-proxy-6b6mc
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
The biggest suspect is node IP. I see 10.0.2.15 (NAT IP) everywhere. But host-only NIC should be used. I had long story of setting up network properly for my ubuntu VMs.
I edited /etc/netplan/01-netcfg.yaml before k8s setup (thanks https://askubuntu.com/questions/984445/netplan-configuration-on-ubuntu-17-04-virtual-machine?rq=1 for help). Example for master config:
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
dhcp6: yes
routes:
- to: 0.0.0.0/0
via: 10.0.2.2
metric: 0
enp0s8:
dhcp4: no
dhcp6: no
addresses: [192.168.151.21/24]
routes:
- to: 192.168.151.1/24
via: 192.168.151.1
metric: 100
Only after this and a few more changes NAT and host-only networks start work together. NAT remains default net adapter. Likely that's why its IP is everywhere. For api server I set --advertise-address=192.168.151.21 explicitly. That reduced using NAT IP at least for it.
So, maybe root cause is different, but current question, how to reconfigure networks to replace NAT IP to host-only. I already tried this for /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.151.21"
Restarted kubelet:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Didn't help. Restarted VMs. Didn't help (I only expected kubelet related changes, but nothing changed). Explored a few configs (5+) for potential changes, no luck.
Update2:
I mentioned NAT address config issue above. I resolved it with editing /etc/default/kubelet config. I found that idea in comments for this article:
https://medium.com/#joatmon08/playing-with-kubeadm-in-vagrant-machines-part-2-bac431095706
Dashboard config part now has proper IP:
hostIP: 192.168.151.22
phase: Running
podIP: 10.244.1.13
Then I went to docker container for API and tried to reach podIP via wget,ping,traceroute. Timeouts everywhere. Routes:
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s3
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 enp0s3
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.151.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s8
Attempt to perform curl call from master VM:
u1#ubuk8sma:~$ curl -v -i -kSs "https://192.168.151.21:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/" -H "$K8S_AUTH_HEADER"
...
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x565072b5a750)
> GET /api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ HTTP/2
> Host: 192.168.151.21:6443
> User-Agent: curl/7.58.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJSUzI1.....
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
HTTP/2 503
< content-type: text/plain; charset=utf-8
content-type: text/plain; charset=utf-8
< content-length: 92
content-length: 92
< date: Tue, 04 Dec 2018 08:44:25 GMT
date: Tue, 04 Dec 2018 08:44:25 GMT
<
Error: 'dial tcp 10.244.1.13:8443: i/o timeout'
* Connection #0 to host 192.168.151.21 left intact
Trying to reach: 'https://10.244.1.13:8443/'
Service info for dashboard:
u1#ubuk8sma:~$ kubectl -n kube-system get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 5d
A bit more details:
u1#ubuk8sma:~$ kubectl -n kube-system describe services kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: ClusterIP
IP: 10.103.36.134
Port: <unset> 443/TCP
TargetPort: 8443/TCP
Endpoints: 10.244.1.13:8443
Session Affinity: None
Events: <none>
Also I tried to go to shell, both via kubectl and docker. For any usual linux command I see this 'OCI runtime exec failed' issue:
u1#ubuk8sma:~$ DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
u1#ubuk8sma:~$ kubectl -v=9 -n kube-system exec "$DASHBOARD_POD_NAME" -- env
I1204 09:57:17.673345 23517 loader.go:359] Config loaded from file /home/u1/.kube/config
I1204 09:57:17.679526 23517 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b'
I1204 09:57:17.703924 23517 round_trippers.go:405] GET https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b 200 OK in 23 milliseconds
I1204 09:57:17.703957 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.703971 23517 round_trippers.go:414] Content-Length: 3435
I1204 09:57:17.703984 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
I1204 09:57:17.703997 23517 round_trippers.go:414] Content-Type: application/json
I1204 09:57:17.704866 23517 request.go:942] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kubernetes-dashboard-77fd78f978-crl7b","generateName":"kubernetes-dashboard-77fd78f978-","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b","uid":"a1d005b8-f3a6-11e8-a2d0-08002783a80f"...
I1204 09:57:17.740811 23517 round_trippers.go:386] curl -k -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true'
I1204 09:57:17.805528 23517 round_trippers.go:405] POST https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true 101 Switching Protocols in 64 milliseconds
I1204 09:57:17.805565 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.805581 23517 round_trippers.go:414] Connection: Upgrade
I1204 09:57:17.805594 23517 round_trippers.go:414] Upgrade: SPDY/3.1
I1204 09:57:17.805607 23517 round_trippers.go:414] X-Stream-Protocol-Version: v4.channel.k8s.io
I1204 09:57:17.805620 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"env\": executable file not found in $PATH": unknown
F1204 09:57:18.088488 23517 helpers.go:119] command terminated with exit code 126
So, I cannot reach pod, cannot go to shell there. But at least I see some logs:
u1#ubuk8sma:~$ kubectl -n kube-system logs -p $DASHBOARD_POD_NAME
2018/12/03 08:15:16 Starting overwatch
2018/12/03 08:15:16 Using in-cluster config to connect to apiserver
2018/12/03 08:15:16 Using service account token for csrf signing
2018/12/03 08:15:16 No request provided. Skipping authorization
2018/12/03 08:15:16 Successful initial request to the apiserver, version: v1.12.2
2018/12/03 08:15:16 Generating JWE encryption key
2018/12/03 08:15:16 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2018/12/03 08:15:16 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2018/12/03 08:15:18 Initializing JWE encryption key from synchronized object
2018/12/03 08:15:18 Creating in-cluster Heapster client
2018/12/03 08:15:19 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/03 08:15:19 Auto-generating certificates
2018/12/03 08:15:19 Successfully created certificates
2018/12/03 08:15:19 Serving securely on HTTPS port: 8443
2018/12/03 08:15:49 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
No ideas, where to go further for now to fix this timeout.

Container is not accessible via kubernetes readiness probe

I see this error in kubectl describe podName:
9m 2s 118 kubelet, gke-wordpress-default-pool-2e82c1f4-0zpw spec.containers{nginx} Warning Unhealthy Readiness probe failed: Get http://10.24.0.27:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
The container logs (nginx) is of the following:
10.24.1.1 - - [22/Aug/2017:11:09:51 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
However if I exec into the container via kubectl exec -it podName -c nginx sh, and do a wget http://localhost I am able to succesfully get a HTTP 200 response. As well as if I SSH into the host (GCP compute instance), I'm able to successfully get a HTTP 200 response.
I believe this issue occurred shortly after I replace a LoadBalancer service with a NodePort service. I wonder if it's some port conflict?
The service in question:
wordpress-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: wordpress
name: wordpress
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: wordpress
The container is a Nginx container serving content on port 80.
What might be the cause of the readiness probe failing?
If I remove the readiness probe in my config:
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
Everything works fine, the pods are able to be accessed via a LoadBalancer service.
ARGHUGIHIRHHHHHH.
I've been staring at this error for a day at least and for some reason I didn't comprehend it.
Essentially the error net/http: request canceled (Client.Timeout exceeded while awaiting headers) means the container took longer then the timeout period (default of 1s).