Kubernetes nginx ingress controller kills connections - kubernetes

I am a newbie to k8s and I am trying to deploy a private docker registry in Kubernetes.
The problem is that whenever I have to upload a heavy image (1GB size) via docker push, the command eventually returns EOF.
Apparently, I believe the issue has to do with kubernetes ingress nginx controller.
I will provide you with some useful information, in case you need more, do not hesitate to ask:
Docker push (to internal k8s docker registry) fail:
[root#bastion ~]# docker push docker-registry.apps.kube.lab/example:stable
The push refers to a repository [docker-registry.apps.kube.lab/example]
c0acde035881: Pushed
f6d2683cee8b: Pushed
00b1a6ab6acd: Retrying in 1 second
28c41b4dd660: Pushed
36957997ca7a: Pushed
5c4d527d6b3a: Pushed
a933681cf349: Pushing [==================================================>] 520.4 MB
f49d20b92dc8: Retrying in 20 seconds
fe342cfe5c83: Retrying in 15 seconds
630e4f1da707: Retrying in 13 seconds
9780f6d83e45: Waiting
EOF
Ingress definition:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: docker-registry
namespace: docker-registry
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "86400"
nginx.ingress.kubernetes.io/proxy-read-timeout: "86400"
nginx.ingress.kubernetes.io/proxy-send-timeout: "86400"
spec:
rules:
- host: docker-registry.apps.kube.lab
http:
paths:
- backend:
serviceName: docker-registry
servicePort: 5000
path: /
Docker registry configuration (/etc/docker/registry/config.yml):
version: 0.1
log:
level: info
formatter: json
fields:
service: registry
storage:
redirect:
disable: true
cache:
blobdescriptor: inmemory
filesystem:
rootdirectory: /var/lib/registry
http:
addr: :5000
host: docker-registry.apps.kube.lab
headers:
X-Content-Type-Options: [nosniff]
health:
storagedriver:
enabled: true
interval: 10s
threshold: 3
Docker registry logs:
{"go.version":"go1.11.2","http.request.host":"docker-registry.apps.kube.lab","http.request.id":"c079b639-0e8a-4a27-96fa-44c4c0182ff7","http.request.method":"HEAD","http.request.remoteaddr":"10.233.70.0","http.request.uri":"/v2/example/blobs/sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","http.request.useragent":"docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))","level":"debug","msg":"authorizing request","time":"2020-11-07T14:43:22.893626513Z","vars.digest":"sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","vars.name":"example"}
{"go.version":"go1.11.2","http.request.host":"docker-registry.apps.kube.lab","http.request.id":"c079b639-0e8a-4a27-96fa-44c4c0182ff7","http.request.method":"HEAD","http.request.remoteaddr":"10.233.70.0","http.request.uri":"/v2/example/blobs/sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","http.request.useragent":"docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))","level":"debug","msg":"GetBlob","time":"2020-11-07T14:43:22.893751065Z","vars.digest":"sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","vars.name":"example"}
{"go.version":"go1.11.2","http.request.host":"docker-registry.apps.kube.lab","http.request.id":"c079b639-0e8a-4a27-96fa-44c4c0182ff7","http.request.method":"HEAD","http.request.remoteaddr":"10.233.70.0","http.request.uri":"/v2/example/blobs/sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","http.request.useragent":"docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))","level":"debug","msg":"filesystem.GetContent(\"/docker/registry/v2/repositories/example/_layers/sha256/751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee/link\")","time":"2020-11-07T14:43:22.893942372Z","trace.duration":74122,"trace.file":"/go/src/github.com/docker/distribution/registry/storage/driver/base/base.go","trace.func":"github.com/docker/distribution/registry/storage/driver/base.(*Base).GetContent","trace.id":"11e24830-7d16-404a-90bc-8a738cab84ea","trace.line":95,"vars.digest":"sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","vars.name":"example"}
{"err.code":"blob unknown","err.detail":"sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","err.message":"blob unknown to registry","go.version":"go1.11.2","http.request.host":"docker-registry.apps.kube.lab","http.request.id":"c079b639-0e8a-4a27-96fa-44c4c0182ff7","http.request.method":"HEAD","http.request.remoteaddr":"10.233.70.0","http.request.uri":"/v2/example/blobs/sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","http.request.useragent":"docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))","http.response.contenttype":"application/json; charset=utf-8","http.response.duration":"1.88607ms","http.response.status":404,"http.response.written":157,"level":"error","msg":"response completed with error","time":"2020-11-07T14:43:22.894147954Z","vars.digest":"sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee","vars.name":"example"}
10.233.105.66 - - [07/Nov/2020:14:43:22 +0000] "HEAD /v2/example/blobs/sha256:751620502a7a2905067c2f32d4982fb9b310b9808670ce82c0e2b40f5307a3ee HTTP/1.1" 404 157 "" "docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \\(linux\\))"
I believe the issue has to do with ingress controller because when EOF error shows up, there is something weird in ingress-controller logs:
10.233.70.0 - - [07/Nov/2020:14:43:41 +0000] "PUT /v2/example/blobs/uploads/dab984a8-7e71-4481-91fb-af53c7790a20?_state=usMX2WH24Veunay0ozOF-RMZIUMNTFSC8MSPbMcxz-B7Ik5hbWUiOiJleGFtcGxlIiwiVVVJRCI6ImRhYjk4NGE4LTdlNzEtNDQ4MS05MWZiLWFmNTNjNzc5MGEyMCIsIk9mZnNldCI6NzgxMTczNywiU3RhcnRlZEF0IjoiMjAyMC0xMS0wN1QxNDo0MzoyOFoifQ%3D%3D&digest=sha256%3A101c41d0463bc77661fb3343235b16d536a92d2efb687046164d413e51bd4fc4 HTTP/1.1" 201 0 "-" "docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \x5C(linux\x5C))" 606 0.026 [docker-registry-docker-registry-5000] [] 10.233.70.84:5000 0 0.026 201 06304ff584d252812dff016374be73ae
172.16.1.123 - - [07/Nov/2020:14:43:42 +0000] "HEAD /v2/example/blobs/sha256:101c41d0463bc77661fb3343235b16d536a92d2efb687046164d413e51bd4fc4 HTTP/1.1" 200 0 "-" "docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \x5C(linux\x5C))" 299 0.006 [docker-registry-docker-registry-5000] [] 10.233.70.84:5000 0 0.006 200 a5a93c7b7f4644139fcb0697d3e5e43f
I1107 14:44:05.285478 6 main.go:184] "Received SIGTERM, shutting down"
I1107 14:44:05.285517 6 nginx.go:365] "Shutting down controller queues"
I1107 14:44:06.294533 6 status.go:132] "removing value from ingress status" address=[172.16.1.123]
I1107 14:44:06.306793 6 status.go:277] "updating Ingress status" namespace="kube-system" ingress="example-ingress" currentValue=[{IP:172.16.1.123 Hostname:}] newValue=[]
I1107 14:44:06.307650 6 status.go:277] "updating Ingress status" namespace="kubernetes-dashboard" ingress="dashboard" currentValue=[{IP:172.16.1.123 Hostname:}] newValue=[]
I1107 14:44:06.880987 6 status.go:277] "updating Ingress status" namespace="test-nfs" ingress="example-nginx" currentValue=[{IP:172.16.1.123 Hostname:}] newValue=[]
I1107 14:44:07.872659 6 status.go:277] "updating Ingress status" namespace="test-ingress" ingress="example-ingress" currentValue=[{IP:172.16.1.123 Hostname:}] newValue=[]
I1107 14:44:08.505295 6 queue.go:78] "queue has been shutdown, failed to enqueue" key="&ObjectMeta{Name:sync status,GenerateName:,Namespace:,SelfLink:,UID:,ResourceVersion:,Generation:0,CreationTimestamp:0001-01-01 00:00:00 +0000 UTC,DeletionTimestamp:<nil>,DeletionGracePeriodSeconds:nil,Labels:map[string]string{},Annotations:map[string]string{},OwnerReferences:[]OwnerReference{},Finalizers:[],ClusterName:,ManagedFields:[]ManagedFieldsEntry{},}"
I1107 14:44:08.713579 6 status.go:277] "updating Ingress status" namespace="docker-registry" ingress="docker-registry" currentValue=[{IP:172.16.1.123 Hostname:}] newValue=[]
I1107 14:44:09.772593 6 nginx.go:373] "Stopping admission controller"
I1107 14:44:09.772697 6 nginx.go:381] "Stopping NGINX process"
E1107 14:44:09.773208 6 nginx.go:314] "Error listening for TLS connections" err="http: Server closed"
2020/11/07 14:44:09 [notice] 114#114: signal process started
10.233.70.0 - - [07/Nov/2020:14:44:16 +0000] "PATCH /v2/example/blobs/uploads/adbe3173-9928-4eb5-97bb-7893970f032a?_state=nEr2ip9eoLNCTe8KQ6Ck7k3C8oS9IY7AnBOi1_f5mSl7Ik5hbWUiOiJleGFtcGxlIiwiVVVJRCI6ImFkYmUzMTczLTk5MjgtNGViNS05N2JiLTc4OTM5NzBmMDMyYSIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyMC0xMS0wN1QxNDo0MzoyOC45ODY3MTQwNTlaIn0%3D HTTP/1.1" 202 0 "-" "docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \x5C(linux\x5C))" 50408825 46.568 [docker-registry-docker-registry-5000] [] 10.233.70.84:5000 0 14.339 202 55d9cab4f915f54e5c130321db4dc8fc
10.233.70.0 - - [07/Nov/2020:14:44:19 +0000] "PATCH /v2/example/blobs/uploads/63d4a54a-cdfd-434b-ae63-dc434dcb15f9?_state=9UK7MRYJYST--u7BAUFTonCdPzt_EO2KyfJblVroBxd7Ik5hbWUiOiJleGFtcGxlIiwiVVVJRCI6IjYzZDRhNTRhLWNkZmQtNDM0Yi1hZTYzLWRjNDM0ZGNiMTVmOSIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyMC0xMS0wN1QxNDo0MzoyMy40MjIwMDI4NThaIn0%3D HTTP/1.1" 202 0 "-" "docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \x5C(linux\x5C))" 51842691 55.400 [docker-registry-docker-registry-5000] [] 10.233.70.84:5000 0 18.504 202 1f1de1ae89caa8540b6fd13ea5b165ab
10.233.70.0 - - [07/Nov/2020:14:44:50 +0000] "PATCH /v2/example/blobs/uploads/0c97923d-ed9f-4599-8a50-f2c21cfe85fe?_state=WmIRW_3owlin1zo4Ms98UwaMGf1D975vUuzbk1JWRuN7Ik5hbWUiOiJleGFtcGxlIiwiVVVJRCI6IjBjOTc5MjNkLWVkOWYtNDU5OS04YTUwLWYyYzIxY2ZlODVmZSIsIk9mZnNldCI6MCwiU3RhcnRlZEF0IjoiMjAyMC0xMS0wN1QxNDo0MzoyMC41ODA5MjUyNDlaIn0%3D HTTP/1.1" 202 0 "-" "docker/1.13.1 go/go1.10.3 kernel/3.10.0-1127.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/1.13.1 \x5C(linux\x5C))" 192310965 89.937 [docker-registry-docker-registry-5000] [] 10.233.70.84:5000 0 22.847 202 d8971d2f543e936c2f805d5b257f1130
I1107 14:44:50.832669 6 nginx.go:394] "NGINX process has stopped"
I1107 14:44:50.832703 6 main.go:192] "Handled quit, awaiting Pod deletion"
I1107 14:45:00.832892 6 main.go:195] "Exiting" code=0
[root#bastion registry]#
After that happens, ingres-controller pod is not ready, and after some seconds it is again ready.
Is it to do with config reload of kubernetes nginx ingress controller? In such case, do I have to add any special variable to nginx.conf?
Any help is welcome! Kind regards!
EDIT
The moment EOF appears, ingress-nginx crashes, and pods become not ready.
[root#bastion ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-lbmd6 0/1 Completed 0 5d4h
ingress-nginx-admission-patch-btv27 0/1 Completed 0 5d4h
ingress-nginx-controller-7dcc8d6478-n8dkx 0/1 Running 3 15m
Warning Unhealthy 29s (x8 over 2m39s) kubelet Liveness probe failed: Get http://10.233.70.100:10254/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
As a consequence, any of my applications are not reachable:
[root#bastion ~]# curl http://hello-worrld.apps.kube.lab
Hello, world!
Version: 1.0.0
Hostname: web-6785d44d5-4r5q5
[root#bastion ~]# date
sáb nov 7 18:58:16 -01 2020
[root#bastion ~]# curl http://hello-worrld.apps.kube.lab
curl: (52) Empty reply from server
[root#bastion ~]# date
sáb nov 7 18:58:53 -01 2020
Is the issue to do with performance of nginx? If so, what options would you recommend me to tweak ingress-nginx?

You should try another Docker registry to ensure its actually caused by ingress. It does not make sense why ingress would fail due to an image size.
You can try JFrog JCR which is free and you could then deploy JCR into your kubernetes and expose it via a LoadBalancer (external ip) or ingress.
You then have the option to verify this way that it is really an ingress issue as you can push a docker image via LoadBalancer (external ip) and if that works but ingress fails you know this is specifically caused by your ingress.
JFrog JCR is also free and available at chartcenter here

Related

Docker registry created not accessible inside cluster

I am using a EKS cluster over aws
I have create docker registry as a deployment and then created a svc and an ingress over it
In the ingress , I have placed tls secrets for the ingress Host
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.org/client-max-body-size: "0"
creationTimestamp: "2021-06-18T05:10:02Z"
generation: 1
name: registry-ingress
namespace: devops
resourceVersion: "4126584"
selfLink: /apis/extensions/v1beta1/namespaces/devops/ingresses/registry-ingress
uid: d808177b-cb0b-4da2-82aa-5ab2f3c99109
spec:
rules:
- host: docker-registry.xxxx.com
http:
paths:
- backend:
serviceName: docker-registry
servicePort: 5000
path: /
pathType: ImplementationSpecific
tls:
- hosts:
- docker-registry.xxxx.com
secretName: tls-registry
I have 4 worker nodes and a jump server
Issue I am facing is that I am able to access the docker registry on ingress address from the jump host but from worker nodes it is failing with error , so when i create any pods with images from registry they also fail with below error
Worker and Jump host are in same subnet
request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I have tried placing the service IP and the registry ingress host in /etc/hosts copying the certs to /etc/docker.certs.d/registryname .
Any hint would be great
Cluster information:
Kubernetes version: v1.19.8-eks-96780e
kubectl version o/p :
Client Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.4-eks-6b7464", GitCommit:"6b746440c04cb81db4426842b4ae65c3f7035e53", GitTreeState:"clean", BuildDate:"2021-03-19T19:35:50Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.8-eks-96780e", GitCommit:"96780e1b30acbf0a52c38b6030d7853e575bcdf3", GitTreeState:"clean", BuildDate:"2021-03-10T21:32:29Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Cloud being used: AWS
Installation method: EKS
EDIT 1
I checked on one worker node to find the CRI , kubelet process is as below , so I think CRI is docker
/usr/bin/kubelet --cloud-provider aws --config /etc/kubernetes/kubelet/kubelet-config.json --kubeconfig /var/lib/kubelet/kubeconfig --container-runtime docker
but i did see both dockerd and containerd processes running on the worker node.
Also on checking the docker service logs I got same error.
Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-06-14 08:31:57 UTC; 4 days ago
Docs: https://docs.docker.com
Process: 12574 ExecStartPre=/usr/libexec/docker/docker-setup-runtimes.sh (code=exited, status=0/SUCCESS)
Process: 12571 ExecStartPre=/bin/mkdir -p /run/docker (code=exited, status=0/SUCCESS)
Main PID: 12579 (dockerd)
Tasks: 23
Memory: 116.5M
CGroup: /system.slice/docker.service
└─12579 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Jun 19 02:23:45 ip-xxxxx dockerd[12579]: time=“2021-06-19T02:23:45.876987774Z” level=error msg=“Handler for POST /v1.40/images/create returned error: Get https://xxxx: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)”

Kubernetes API timeout when deploying ingress-controller

I am using a Self-Managed Bare-Metal kubernetes cluster with the following three nodes:
NAME STATUS ROLES AGE VERSION
ruby0 Ready master 17h v1.18.5
ruby1 Ready <none> 17h v1.18.5
ruby2 Ready <none> 17h v1.18.5
They are hosted on three identical debian systems with the following kernel (uname -a):
Linux ruby0 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux
To setup the nodes, I followed the basic installation guide of Imixs-Cloud. As mentioned in the tutorial, I've deployed the pod network flannel. I finished up the installation with the deployment of a traefik ingress-controller, which is explained here. I left out the 011-persistencevolume.yaml for simplicity.
After that, i got the following traefik service and pod (aswell as deployment and replicaset):
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-system service/traefik LoadBalancer 10.110.149.28 192.168.10.127 80:31298/TCP,443:32383/TCP,8100:31372/TCP 68m
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/traefik-b9b444b89-w9b6h 1/1 Running 0 68m
When trying to access the traefik service on one of the given ports, either by a browser or curl, I get a timeout error with no response. The given traefik pod shows me the following errors:
Trace[1402685037]: [30.000537725s] [30.000537725s] END
E0707 08:08:49.147428 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0707 08:08:49.148535 1 trace.go:116] Trace[84373527]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105 (started: 2020-07-07 08:08:19.148006738 +0000 UTC m=+4340.222316899) (total time: 30.000486829s):
Trace[84373527]: [30.000486829s] [30.000486829s] END
E0707 08:08:49.148560 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1.Secret: Get "https://10.96.0.1:443/api/v1/secrets?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0707 08:08:49.149667 1 trace.go:116] Trace[1117097236]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105 (started: 2020-07-07 08:08:19.149071171 +0000 UTC m=+4340.223381353) (total time: 30.000553113s):
Trace[1117097236]: [30.000553113s] [30.000553113s] END
E0707 08:08:49.149693 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0707 08:08:49.150743 1 trace.go:116] Trace[1553411795]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105 (started: 2020-07-07 08:08:19.150285239 +0000 UTC m=+4340.224595418) (total time: 30.000416845s):
Trace[1553411795]: [30.000416845s] [30.000416845s] END
E0707 08:08:49.150770 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1alpha1.TraefikService: Get "https://10.96.0.1:443/apis/traefik.containo.us/v1alpha1/traefikservices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
10.244.1.1 - - [07/Jul/2020:08:08:53 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 873 "ping#internal" "-" 0ms
10.244.1.1 - - [07/Jul/2020:08:08:56 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 874 "ping#internal" "-" 0ms
10.244.1.1 - - [07/Jul/2020:08:09:03 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 875 "ping#internal" "-" 0ms
10.244.1.1 - - [07/Jul/2020:08:09:06 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 876 "ping#internal" "-" 0ms
For some reason, traefik cannot access the kubernetes API:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18h
I've seen that behaviour on my cluster before, for example when I've installed MetalLB and it tried to access the configmap over the kubernetes API.

kube-proxy not able to list endpoints and services

I'm new to kubernetes, and I'm trying to create a cluster. But after I configure the master with the kubeadm command, I see there are some errors with the pods, and this results in a master that is always in a NotReady state.
All seems to originate from the fact that kube-proxy cannot list the endpoints and the services... and for this reason (or so I understand) cannot update the iptables.
Here is my kubectl version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
And here are the logs from the kube-proxy pod:
$ kubectl logs -n kube-system kube-proxy-xjxck
W0430 12:33:28.887260 1 server_others.go:267] Flag proxy-mode="" unknown, assuming iptables proxy
W0430 12:33:28.913671 1 node.go:113] Failed to retrieve node info: Unauthorized
I0430 12:33:28.915780 1 server_others.go:147] Using iptables Proxier.
W0430 12:33:28.916065 1 proxier.go:314] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0430 12:33:28.916089 1 proxier.go:319] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0430 12:33:28.917555 1 server.go:555] Version: v1.14.1
I0430 12:33:28.959345 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0430 12:33:28.960392 1 config.go:202] Starting service config controller
I0430 12:33:28.960444 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I0430 12:33:28.960572 1 config.go:102] Starting endpoints config controller
I0430 12:33:28.960609 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
E0430 12:33:28.970720 1 event.go:191] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"fh-ubuntu01.159a40901fa85264", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"fh-ubuntu01", UID:"fh-ubuntu01", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kube-proxy.", Source:v1.EventSource{Component:"kube-proxy", Host:"fh-ubuntu01"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf2a2e0639406264, ext:334442672, loc:(*time.Location)(0x2703080)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Unauthorized' (will not retry!)
E0430 12:33:28.970939 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:28.971106 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:29.977038 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
E0430 12:33:29.979890 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Service: Unauthorized
E0430 12:33:30.980098 1 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1.Endpoints: Unauthorized
now, I've created a new ClusterRoleBinding this way:
$ kubectl create clusterrolebinding kube-proxy-binding --clusterrole=system:node-proxier --user=system:kube-proxy
and if I describe the ClusterRole, I can see this:
$ kubectl describe clusterrole system:node-proxier
Name: system:node-proxier
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
events [] [] [create patch update]
nodes [] [] [get]
endpoints [] [] [list watch]
services [] [] [list watch]
so the user "system:kube-proxy" should be able to list the endpoints and the services, right? Now, if I print the YAML file of the kube-proxy daemonSet, I get his:
$ kubectl get configmap kube-proxy -n kube-system -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: ""
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
winkernel:
enableDSR: false
networkName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority:
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.0.1.1:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: "2019-03-21T10:34:03Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "4458115"
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy
uid: d8a454fb-4bc4-11e9-b0b4-00155d044109
I can see that "user: default" that confuses me... which user is it trying to authenticate with? is there an actual user named "default"?
thank you very much!
output from kubectl get po -n kube-system
$ kubectl get po - n kube-system
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-27qck 0/1 Pending 0 7d15h
coredns-fb8b8dccf-dd6bh 0/1 Pending 0 7d15h
kube-apiserver-fh-ubuntu01 1/1 Running 1 7d15h
kube-controller-manager-fh-ubuntu01 1/1 Running 0 7d15h
kube-proxy-xjxck 1/1 Running 0 43h
kube-scheduler-fh-ubuntu01 1/1 Running 1 7d15h
weave-net-psqh5 1/2 CrashLoopBackOff 2144 7d15h
cluster health look healthy
$ kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-2 Healthy {"health": "true"}
etcd-3 Healthy {"health": "true"}
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
Run below command to check cluster health
kubectl get cs
Then check status of control plane services
kubectl get po -n kube-system
Issue seems to be with weave-net-psqh5 pod. find out why it was getting into CrashLoop status.
share the logs from weave-net-psqh5.

404 page not found when expose service by ingress in k8s cluster

I have a RESTFull service runs on k8s cluster(1-master, 2-nodes), which writing by golang and it has a GET method and return nothing. I want to expose it by Ingress.
After I installed it by helm and 2 pods get up , I tried to send the request(curl) from client. But it return 404 error. When I curl the RESTFull service in nginx-ingress-controller pod, the service works well.
Restfull & nginx-ingress services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
stee-webservice-svc NodePort 10.109.22.37 <none> 8080:30009/TCP 47m
nginx-ingress-controller LoadBalancer 10.106.34.249 <pending> 80:31368/TCP,443:31860/TCP 30h
Ingress Yaml
Name: nonexistent-raccoon-stee-ws
Namespace: default
Address:
Default backend: default-http-backend:80 (<none>)
Rules:
Host Path Backends
---- ---- --------
*
/steews stee-webservice-svc:8080 (<none>)
Annotations:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CREATE 52m nginx-ingress-controller Ingress default/nonexistent-raccoon-stee-ws
Normal CREATE 52m nginx-ingress-controller Ingress default/nonexistent-raccoon-stee-ws
Normal UPDATE 2m27s (x101 over 52m) nginx-ingress-controller Ingress default/nonexistent-raccoon-stee-ws
Normal UPDATE 2m27s (x101 over 52m) nginx-ingress-controller Ingress default/nonexistent-raccoon-stee-ws
curl from client
curl http://10.106.34.249:80/steews/get -kL
404: Page Not Found
The ingress-controller log show the request has been received and did retrun 404 error to client. SO, problem is here, why the Ingress did not find the configed path "/steews" and return it correctly?
10.244.0.0 - [10.244.0.0] - - [06/Mar/2019:09:30:45 +0000] "GET /steews/get HTTP/1.1" 308 171 "-" "curl/7.58.0" 87 0.000 [default-stee-webservice-svc-8080] - - - - 61974b67eb85845faf3177979b851166
10.244.0.0 - [10.244.0.0] - - [06/Mar/2019:09:30:45 +0000] "GET /steews/get HTTP/2.0" 404 19 "-" "curl/7.58.0" 39 0.003 [default-stee-webservice-svc-8080] 10.244.1.38:8080 19 0.004 404 d29b5922d485c36cf0cf6f76b894770b*
curl in nginx-ingress-controller pod, it works fine.
kc exec -it nginx-ingress-controller-9cf6cf578-qhtl6 -- bash
ww-data#nginx-ingress-controller-9cf6cf578-qhtl6:/etc/nginx$ curl stee-webservice-svc:8080/get -kL -vv
* Trying 10.109.22.37...
* TCP_NODELAY set
* Connected to stee-webservice-svc (10.109.22.37) port 8080 (#0)
> GET /get HTTP/1.1
> Host: stee-webservice-svc:8080
> User-Agent: curl/7.62.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 06 Mar 2019 09:24:23 GMT
< Content-Length: 0
<
* Connection #0 to host stee-webservice-svc left intact

Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy

Link: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Started: 2018-12-01
Title: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Body:
I have windows 10 home (1803 update) host machine, Virtual Box 5.22, 2 guest ubuntu 18.04.1 servers.
Each guest has 2 networks: NAT (host IP 10.0.2.15) and shared host-only with gateway IP 192.168.151.1.
I set IPs:
for k8s master(ubuk8sma) - 192.168.151.21
for worker1 (ubuk8swrk1) - 192.168.151.22
I remained docker as is, version is 18.09.0.
I installed k8s version stable-1.12 on master and worker. For master init is:
K8S_POD_CIDR='10.244.0.0/16'
K8S_IP_ADDR='192.168.151.21'
K8S_VER='stable-1.12' # or latest
sudo kubeadm init --pod-network-cidr=${K8S_POD_CIDR} --apiserver-advertise-address=${K8S_IP_ADDR} --kubernetes-version ${K8S_VER} --ignore-preflight-errors=all
Why I set "ignore errors" flag:
[ERROR SystemVerification]: unsupported docker version: 18.09.0
I was reluctant to reinstall k8s fully compatible docker version (may be not very smart move, just I'm usually eager to try the latest stuff).
For CNI I installed flannel network:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
After installing worker1 nodes state looks like:
u1#ubuk8sma:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuk8sma Ready master 6d v1.12.2
ubuk8swrk1 Ready <none> 4d1h v1.12.2
No big issues shown up. Next I wanted is to have visualization of this pretty k8s bundle ecosystem, so I headed towards installing k8s dashboard.
I followed "defaults" path, with zero intervention, if possible. I used this yaml:
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
From basic level it looks like installed, deployed to worker Pod, running. From pod list info:
u1#ubuk8sma:~$ kubectl get all --namespace=kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-576cbf47c7-4tzm9 1/1 Running 5 6d
pod/coredns-576cbf47c7-tqtpw 1/1 Running 5 6d
pod/etcd-ubuk8sma 1/1 Running 7 6d
pod/kube-apiserver-ubuk8sma 1/1 Running 7 6d
pod/kube-controller-manager-ubuk8sma 1/1 Running 11 6d
pod/kube-flannel-ds-amd64-rt442 1/1 Running 3 4d1h
pod/kube-flannel-ds-amd64-zx78x 1/1 Running 5 6d
pod/kube-proxy-6b6mc 1/1 Running 6 6d
pod/kube-proxy-zcchn 1/1 Running 3 4d1h
pod/kube-scheduler-ubuk8sma 1/1 Running 10 6d
pod/kubernetes-dashboard-77fd78f978-crl7b 1/1 Running 1 2d1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 6d
service/kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 2d1h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6d
...
daemonset.apps/kube-proxy 2 2 2 2 2 <none> 6d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2 2 2 2 6d
deployment.apps/kubernetes-dashboard 1 1 1 1 2d1h
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-576cbf47c7 2 2 2 6d
replicaset.apps/kubernetes-dashboard-77fd78f978 1 1 1 2d1h
I started proxy for both API server and dashboard service validation:
kubectl proxy
Version check for API server:
u1#ubuk8sma:~$ curl http://localhost:8001/version
{
"major": "1",
"minor": "12",
"gitVersion": "v1.12.2",
"gitCommit": "17c77c7898218073f14c8d573582e8d2313dc740",
"gitTreeState": "clean",
"buildDate": "2018-10-24T06:43:59Z",
"goVersion": "go1.10.4",
"compiler": "gc",
"platform": "linux/amd64"
}
And here is problem I'm writing this question about:
u1#ubuk8sma:~$ curl "http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/"
Error: 'dial tcp 10.244.1.8:8443: i/o timeout'
Trying to reach: 'https://10.244.1.8:8443/'
Fragment of Pod info:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://fb0937959c7680046130e670c483877e4c0f1854870cb0b20ed4fe066d72df18
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
imageID: docker-pullable://k8s.gcr.io/kubernetes-dashboard-amd64#sha256:1d2e1229a918f4bc38b5a3f9f5f11302b3e71f8397b492afac7f273a0008776a
lastState:
terminated:
containerID: docker://f85e1cc50f59adbd8a13d42694aef7c5e726c07b3d852a26288c4bfc1124c718
exitCode: 2
finishedAt: 2018-11-30T06:53:21Z
reason: Error
startedAt: 2018-11-29T07:16:07Z
name: kubernetes-dashboard
ready: true
restartCount: 1
state:
running:
startedAt: 2018-11-30T06:53:23Z
hostIP: 10.0.2.15
phase: Running
podIP: 10.244.1.8
qosClass: BestEffort
startTime: 2018-11-29T07:16:04Z
Docker check on worker1 node:
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
fb0937959c... sha256:0dab2435c100... "/dashboard --insecure-bind-address=0.0.0.0 --bind-address=0.0.0.0 --auto-generate-certificates" 27 hours ago Up 27 hours k8s_kubernetes-dashboard_kube...
Tried to check Pod logs, no luck:
DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
kubectl -n kube-system logs $DASHBOARD_POD_NAME
Error from server (NotFound): the server could not find the requested resource ( pods/log kubernetes-dashboard-77fd78f978-crl7b)
Tried to wget from API server:
API_SRV_POD_NAME='kube-apiserver-ubuk8sma'
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME wget https://10.244.1.8:8443/
No response.
Tried to check dashboard service existence, no luck:
u1#ubuk8sma:~$ kubectl get svc $DASHBOARD_SVC_NAME
Error from server (NotFound): services "kubernetes-dashboard" not found
Checked IP route table on API server:
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME ip route show
default via 10.0.2.2 dev enp0s3 src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 scope link src 10.0.2.15
10.0.2.2 dev enp0s3 scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
192.168.151.0/24 dev enp0s8 scope link src 192.168.151.21
For reference, enp0s3 is NAT NIC adapter, enp0s8 - host-only one.
I see flannel route 10.244.1.x. Seems to be the issue is hardly about network misconfig (but I can be wrong).
So, dashboard Pod looks like running, but has some errors and I cannot diagnose which ones. Could you help to find root cause and ideally make dashboard service run without errors?
Thanks in advance, folks!
Update1:
I see events on master:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 11h kubelet, ubuk8swrk1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "43191144d447d0e9da52c8b6600bd96a23fab1e96c79af8c8fedc4e4e50882c7" network for pod "kubernetes-dashboard-77fd78f978-crl7b": NetworkPlugin cni failed to set up pod "kubernetes-dashboard-77fd78f978-crl7b_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 11h (x4 over 11h) kubelet, ubuk8swrk1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 11h kubelet, ubuk8swrk1 Container image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0" already present on machine
Normal Created 11h kubelet, ubuk8swrk1 Created container
Normal Started 11h kubelet, ubuk8swrk1 Started container
Error about subnet.env absence - a bit strange, as both master and minion have it (well, maybe created on the fly):
u1#ubuk8swrk1:~$ ls -la /run/flannel/subnet.env
-rw-r--r-- 1 root root 96 Dec 3 08:15 /run/flannel/subnet.env
This is dashboard service descriptor:
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: deployment.kubernetes.io/revision: 1
Selector: k8s-app=kubernetes-dashboard
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=kubernetes-dashboard
Service Account: kubernetes-dashboard
Containers:
kubernetes-dashboard:
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kubernetes-dashboard-77fd78f978 (1/1 replicas created)
Events: <none>
This is reduced description of pods(original yaml is 35K, too much to share):
Name: coredns-576cbf47c7-4tzm9
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Status: Running
IP: 10.244.0.14
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0efcd043407d93fb9d052045828489f6b99bb59b4f0882ec89e1897071609b77
Image: k8s.gcr.io/coredns:1.2.2
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 6
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: etcd-ubuk8sma
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: ubuk8sma/10.0.2.15
Labels: component=etcd
tier=control-plane
Status: Running
IP: 10.0.2.15
Containers:
etcd:
Container ID: docker://ba2bdcf5fa558beabdd8578628d71480d595d5ee3bb5c4edf42407419010144b
Image: k8s.gcr.io/etcd:3.2.24
Image ID: docker-pullable://k8s.gcr.io/etcd#sha256:905d7ca17fd02bc24c0eba9a062753aba15db3e31422390bc3238eb762339b20
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://127.0.0.1:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://127.0.0.1:2380
--initial-cluster=ubuk8sma=https://127.0.0.1:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
--name=ubuk8sma
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 03 Dec 2018 08:12:56 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 28 Nov 2018 09:31:46 +0000
Finished: Mon, 03 Dec 2018 08:12:35 +0000
Ready: True
Restart Count: 8
Liveness: exec [/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo] delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: kube-apiserver-ubuk8sma
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
Containers:
kube-apiserver:
Container ID: docker://099b2a30772b969c3919b57fd377980673f03a820afba6034daa70f011271a52
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.151.21
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Mon, 03 Dec 2018 08:13:00 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 8
Liveness: http-get https://192.168.151.21:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Events: <none>
Name: kube-flannel-ds-amd64-rt442
Namespace: kube-system
Node: ubuk8swrk1/10.0.2.15
Status: Running
IP: 10.0.2.15
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://a6377b0fe1b040235c24e9ca19455c56e77daecf688b212cfea5553b6e59ff68
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Ready: True
Restart Count: 4
Containers:
kube-flannel:
Container ID: docker://f7029bc2144c1ab8654407d742c1079df0059d418b7ba86b886091b5ad8c34a3
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 4
Events: <none>
Name: kube-proxy-6b6mc
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
The biggest suspect is node IP. I see 10.0.2.15 (NAT IP) everywhere. But host-only NIC should be used. I had long story of setting up network properly for my ubuntu VMs.
I edited /etc/netplan/01-netcfg.yaml before k8s setup (thanks https://askubuntu.com/questions/984445/netplan-configuration-on-ubuntu-17-04-virtual-machine?rq=1 for help). Example for master config:
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
dhcp6: yes
routes:
- to: 0.0.0.0/0
via: 10.0.2.2
metric: 0
enp0s8:
dhcp4: no
dhcp6: no
addresses: [192.168.151.21/24]
routes:
- to: 192.168.151.1/24
via: 192.168.151.1
metric: 100
Only after this and a few more changes NAT and host-only networks start work together. NAT remains default net adapter. Likely that's why its IP is everywhere. For api server I set --advertise-address=192.168.151.21 explicitly. That reduced using NAT IP at least for it.
So, maybe root cause is different, but current question, how to reconfigure networks to replace NAT IP to host-only. I already tried this for /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.151.21"
Restarted kubelet:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Didn't help. Restarted VMs. Didn't help (I only expected kubelet related changes, but nothing changed). Explored a few configs (5+) for potential changes, no luck.
Update2:
I mentioned NAT address config issue above. I resolved it with editing /etc/default/kubelet config. I found that idea in comments for this article:
https://medium.com/#joatmon08/playing-with-kubeadm-in-vagrant-machines-part-2-bac431095706
Dashboard config part now has proper IP:
hostIP: 192.168.151.22
phase: Running
podIP: 10.244.1.13
Then I went to docker container for API and tried to reach podIP via wget,ping,traceroute. Timeouts everywhere. Routes:
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s3
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 enp0s3
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.151.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s8
Attempt to perform curl call from master VM:
u1#ubuk8sma:~$ curl -v -i -kSs "https://192.168.151.21:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/" -H "$K8S_AUTH_HEADER"
...
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x565072b5a750)
> GET /api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ HTTP/2
> Host: 192.168.151.21:6443
> User-Agent: curl/7.58.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJSUzI1.....
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
HTTP/2 503
< content-type: text/plain; charset=utf-8
content-type: text/plain; charset=utf-8
< content-length: 92
content-length: 92
< date: Tue, 04 Dec 2018 08:44:25 GMT
date: Tue, 04 Dec 2018 08:44:25 GMT
<
Error: 'dial tcp 10.244.1.13:8443: i/o timeout'
* Connection #0 to host 192.168.151.21 left intact
Trying to reach: 'https://10.244.1.13:8443/'
Service info for dashboard:
u1#ubuk8sma:~$ kubectl -n kube-system get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 5d
A bit more details:
u1#ubuk8sma:~$ kubectl -n kube-system describe services kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: ClusterIP
IP: 10.103.36.134
Port: <unset> 443/TCP
TargetPort: 8443/TCP
Endpoints: 10.244.1.13:8443
Session Affinity: None
Events: <none>
Also I tried to go to shell, both via kubectl and docker. For any usual linux command I see this 'OCI runtime exec failed' issue:
u1#ubuk8sma:~$ DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
u1#ubuk8sma:~$ kubectl -v=9 -n kube-system exec "$DASHBOARD_POD_NAME" -- env
I1204 09:57:17.673345 23517 loader.go:359] Config loaded from file /home/u1/.kube/config
I1204 09:57:17.679526 23517 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b'
I1204 09:57:17.703924 23517 round_trippers.go:405] GET https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b 200 OK in 23 milliseconds
I1204 09:57:17.703957 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.703971 23517 round_trippers.go:414] Content-Length: 3435
I1204 09:57:17.703984 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
I1204 09:57:17.703997 23517 round_trippers.go:414] Content-Type: application/json
I1204 09:57:17.704866 23517 request.go:942] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kubernetes-dashboard-77fd78f978-crl7b","generateName":"kubernetes-dashboard-77fd78f978-","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b","uid":"a1d005b8-f3a6-11e8-a2d0-08002783a80f"...
I1204 09:57:17.740811 23517 round_trippers.go:386] curl -k -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true'
I1204 09:57:17.805528 23517 round_trippers.go:405] POST https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true 101 Switching Protocols in 64 milliseconds
I1204 09:57:17.805565 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.805581 23517 round_trippers.go:414] Connection: Upgrade
I1204 09:57:17.805594 23517 round_trippers.go:414] Upgrade: SPDY/3.1
I1204 09:57:17.805607 23517 round_trippers.go:414] X-Stream-Protocol-Version: v4.channel.k8s.io
I1204 09:57:17.805620 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"env\": executable file not found in $PATH": unknown
F1204 09:57:18.088488 23517 helpers.go:119] command terminated with exit code 126
So, I cannot reach pod, cannot go to shell there. But at least I see some logs:
u1#ubuk8sma:~$ kubectl -n kube-system logs -p $DASHBOARD_POD_NAME
2018/12/03 08:15:16 Starting overwatch
2018/12/03 08:15:16 Using in-cluster config to connect to apiserver
2018/12/03 08:15:16 Using service account token for csrf signing
2018/12/03 08:15:16 No request provided. Skipping authorization
2018/12/03 08:15:16 Successful initial request to the apiserver, version: v1.12.2
2018/12/03 08:15:16 Generating JWE encryption key
2018/12/03 08:15:16 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2018/12/03 08:15:16 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2018/12/03 08:15:18 Initializing JWE encryption key from synchronized object
2018/12/03 08:15:18 Creating in-cluster Heapster client
2018/12/03 08:15:19 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/03 08:15:19 Auto-generating certificates
2018/12/03 08:15:19 Successfully created certificates
2018/12/03 08:15:19 Serving securely on HTTPS port: 8443
2018/12/03 08:15:49 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
No ideas, where to go further for now to fix this timeout.