unable to upgrade connection: pod does not exist - kubernetes

I have a error when i want to access to my pod:
error: unable to upgrade connection: pod does not exist
it's a cluster with 3 nodes, below some details. Thanks in advance
root#kubm:~/deploy/nginx# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubm Ready master 37h v1.17.0 10.0.2.15 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://19.3.5
kubnode Ready <none> 37h v1.17.0 10.0.2.15 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://19.3.5
kubnode2 Ready <none> 37h v1.17.0 10.0.2.15 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://19.3.5
root#kubm:~/deploy/nginx# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-59c9f8dff-v7dvg 1/1 Running 0 16h 10.244.2.3 kubnode2 <none> <none>
root#kubm:~/deploy/nginx# kubectl exec -it nginx-59c9f8dff-v7dvg -- /bin/bash
**error: unable to upgrade connection: pod does not exist**

I had the same issue running a cluster with Vagrant and Virtualbox the first time.
Adding KUBELET_EXTRA_ARGS=--node-ip=x.x.x.x where x.x.x.x is your VM's IP in /etc/default/kubelet (this can be part of the provisioning script for example) and then restarting kubelet (systemctl restart kubelet) fixes the issues.
This is the recommended way to add extra runtime arguments to kubelet as you can see in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf. Alternatively you can also edit the kubelet config file under /etc/kubernetes/kubelet.conf

The 10.0.2.15 IP address is the default for virtualbox NAT
If you deploy a VM using a vagrantfile, your eth0 adapter will use the 10.0.2.15 IP address and the eth1 adapter will be assigned an other IP address.
K8s uses the eth0 adapter to route packets between pods.

I had the same issue and the problem was POD status as "ImagePullBackOff". Due to this, it was throwing error
error: unable to upgrade connection: container not found ("nginx")
[mayur#mayur_cloudtest ~]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-598b589c46-zcr5d 0/1 ImagePullBackOff 0 116s
[mayur#mayur_cloudtest ~]$
[mayur#mayur_cloudtest ~]$
[mayur#mayur_cloudtest ~]$ kubectl exec -it nginx-598b589c46-zcr5d -- /bin/bash
error: unable to upgrade connection: container not found ("nginx")

I use below command to get into a pod.
kubectl exec -i -t <pod-name> -- /bin/bash
Note -i and -t flag have a space on the command..
If you have multi-container pod you should pass container name with -c flag or it will by default connect to first container in POD.
ubuntu#cluster-master:~$ kubectl exec -i -t nginx -- /bin/bash
root#nginx:/# whoami
root
root#nginx:/# date
Tue Jan 7 14:12:29 UTC 2020
root#nginx:/#
Refer help section of command kubectl exec --help

Related

Kube-state-metrics error: Failed to create client: ... i/o timeout

I'm running Kubernetes in virtual machines and going through the basic tutorials, currently Add logging and metrics to the PHP / Redis Guestbook example. I'm trying to install kube-state-metrics:
git clone https://github.com/kubernetes/kube-state-metrics.git kube-state-metrics
kubectl create -f kube-state-metrics/kubernetes
but it fails.
kubectl describe pod --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7
...
Warning Unhealthy 28m (x8 over 30m) kubelet, kubernetes-node1 Readiness probe failed: Get http://192.168.129.102:8080/healthz: dial tcp 192.168.129.102:8080: connect: connection refused
kubectl logs --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7 -c kube-state-metrics
I0514 17:29:26.980707 1 main.go:85] Using default collectors
I0514 17:29:26.980774 1 main.go:93] Using all namespace
I0514 17:29:26.980780 1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0514 17:29:26.980800 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0514 17:29:26.983504 1 main.go:169] Testing communication with server
F0514 17:29:56.984025 1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
I'm unsure if this 10.96.0.1 IP is correct. My virtual machines are in a bridged network 10.10.10.0/24 and a host-only network 192.168.59.0/24. When initializing Kubernetes I used the argument --pod-network-cidr=192.168.0.0/16 so that's one more IP range that I'd expect. But 10.96.0.1 looks unfamiliar.
I'm new to Kubernetes, just doing the basic tutorials, so I don't know what to do now. How to fix it or investigate further?
EDIT - additonal info:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubernetes-master Ready master 15d v1.14.1 10.10.10.11 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node1 Ready <none> 15d v1.14.1 10.10.10.5 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node2 Ready <none> 15d v1.14.1 10.10.10.98 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
The command I used to initialize the cluster:
sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=192.168.0.0/16
The reason for this is probably overlapping of Pod network with Node network - you set Pod network CIDR to 192.168.0.0/16 which your host-only network will be included into as its address is 192.168.59.0/24.
To solve this you can either change the pod network CIDR to 192.168.0.0/24 (it is not recommended as this will give you only 255 addresses for your pod networking)
You can also use different range for your Calico. If you want to do it on a running cluster here is an instruction.
Also other way I tried:
edit Calico manifest to different range (for example 10.0.0.0/8) - sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=10.0.0.0/8) and apply it after the init.
Another way would be using different CNI like Flannel (which uses 10.244.0.0/16).
You can find more information about ranges of CNI plugins here.

Why doesn't kube-proxy route traffic to another worker node?

I've deployed several different services and always get the same error.
The service is reachable on the node port from the machine where the pod is running. On the two other nodes I get timeouts.
The kube-proxy is running on all worker nodes and I can see in the logfiles from kube-proxy that the service port was added and the node port was opened.
In this case I've deployed the stars demo from calico
Kube-proxy log output:
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.229458 659 service.go:309] Adding new service port "management-ui/management-ui:" at 10.32.0.133:9001/TCP
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.257483 659 proxier.go:1427] Opened local port "nodePort for management-ui/management-ui:" (:30002/tcp)
The kube-proxy is listening on the port 30002
root#kuben1:/tmp# netstat -lanp | grep 30002
tcp6 0 0 :::30002 :::* LISTEN 659/kube-proxy
There are also some iptable rules defined:
root#kuben1:/tmp# iptables -L -t nat | grep management-ui
KUBE-MARK-MASQ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-MARK-MASQ tcp -- !10.200.0.0/16 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
The interesting part is that I can reach the service IP from any worker node
root#kubem1:/tmp# kubectl get svc -n management-ui
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
management-ui NodePort 10.32.0.133 <none> 9001:30002/TCP 52m
The service IP/port can be accessed from any worker node if I do a "curl http://10.32.0.133:9001"
I don't understand why kube-proxy does not "route" this properly...
Has anyone a hint where I can find the error?
Here some cluster specs:
This is a hand build cluster inspired by Kelsey Hightower's "kubernetes the hard way" guide.
6 Nodes (3 master: 3 worker) local vms
OS: Ubuntu 18.04
K8s: v1.13.0
Docker: 18.9.3
Cni: calico
Component status on the master nodes looks okay
root#kubem1:/tmp# kubectl get componentstatus
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
The worker nodes are looking okay if I trust kubectl
root#kubem1:/tmp# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kuben1 Ready <none> 39d v1.13.0 192.168.178.77 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben2 Ready <none> 39d v1.13.0 192.168.178.78 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben3 Ready <none> 39d v1.13.0 192.168.178.79 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
As asked by P Ekambaram:
root#kubem1:/tmp# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-bgjdg 1/1 Running 5 40d
calico-node-nwkqw 1/1 Running 5 40d
calico-node-vrwn4 1/1 Running 5 40d
coredns-69cbb76ff8-fpssw 1/1 Running 5 40d
coredns-69cbb76ff8-tm6r8 1/1 Running 5 40d
kubernetes-dashboard-57df4db6b-2xrmb 1/1 Running 5 40d
I've found a solution for my "Problem".
This behavior was caused by a change in Docker v1.13.x and the issue was fixed in kubernetes with version 1.8.
The easy solution was to change the forward rules via iptables.
Run the following cmd on all worker nodes: "iptables -A FORWARD -j ACCEPT"
To fix it the right way i had to tell the kube-proxy the cidr for the pods.
Theoretical that could be solved in two ways:
Add "--cluster-cidr=10.0.0.0/16" as argument to the kube-proxy command line(in my case in the systemd service file)
Add 'clusterCIDR: "10.0.0.0/16"' to the kubeconfig file for kube-proxy
In my case the cmd line argument doesn't had any effect.
As i've added the line to my kubeconfig file and restarted the kube-proxy on all worker nodes everything works well.
Here is the github merge request for this "FORWARD" issue: link

cilium configuration in IPv6 mode

I am using cilium in Kubernetes 1.12 in Direct Routing mode. It is working fine in IPv4 mode. We are using cilium/cilium:no-routes image and cloudnativelabs/kube-router to advertise the routes through BGP.
Now I would like to configure the same in IPv6 only Kubernetes cluster. But I found that kube-router pod is crashing and not creating the route entries for the --pod-network-cidr.
Following is the lab details -
master node: IPv6 private IP -fd0c:6493:12bf:2942::ac18:1164
Work node: IPv6 private IP -fd0c:6493:12bf:2942::ac18:1165
Public IP for both the nodes are IPv4 as i don't have IPv6 public IP.
IPv6 only K8s cluster is created as
master:
sudo kubeadm init --kubernetes-version v1.13.2 --pod-network-cidr=2001:2::/64 --apiserver-advertise-address=fd0c:6493:12bf:2942::ac18:1164 --token-ttl 0
worker:
sudo kubeadm join [fd0c:6493:12bf:2942::ac18:1164]:6443 --token 9k9sdq.el298rka0sjqy0ha --discovery-token-ca-cert-hash sha256:b830c22dc21561c9e9287275ecc675ec6de012662fabde3bd1aba03be66562eb
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master NotReady master 38h v1.13.2 fd0c:6493:12bf:2942::ac18:1164
<none> Ubuntu 18.10 4.18.0-13-generic docker://18.6.0
worker1 Ready <none> 38h v1.13.2 fd0c:6493:12bf:2942::ac18:1165
<none> Ubuntu 18.10 4.18.0-10-generic docker://18.6.0
master node is not ready as cni is not configured yet and codedns pods are not up yet.
Now install the cilium in Ipv6.
1. Run the etcd in master node.
sudo docker run -d --network=host \
--name "cilium-etcd" k8s.gcr.io/etcd:3.2.24 \
etcd -name etcd0 \
-advertise-client-urls http://[fd0c:6493:12bf:2942::ac18:1164]:4001 \
-listen-client-urls http://[fd0c:6493:12bf:2942::ac18:1164]:4001 \
-initial-advertise-peer-urls http://[fd0c:6493:12bf:2942::ac18:1164]:2382 \
-listen-peer-urls http://[fd0c:6493:12bf:2942::ac18:1164]:2382 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster etcd0=http://[fd0c:6493:12bf:2942::ac18:1164]:2382 \
-initial-cluster-state new
Here [fd0c:6493:12bf:2942::ac18:1164] is master node ipv6 ip.
2. sudo mount bpffs /sys/fs/bpf -t bpf
3. Run the kuberouter.
Expected Result:
Kube-router adds the routing entry for the POD-CIDR corresponding to the each of the other nodes in the cluster. Node public IP will be set as GW. Following result is obtained for IPv4. For IPv4, routing entry is created in node-1 for node-2 ( public IP 10.40.139.196 and POD CIDR 10.244.1.0/24). Device is the interface where public IP is bound.
$ ip route show
10.244.1.0/24 via 10.40.139.196 dev ens4f0.116 proto 17
Note: For IPv6 only Kubernetes, --pod-network-cidr=2001:2::/64
Actual result -
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-g7nvf 0/1 ContainerCreating 0 22h
coredns-86c58d9df4-rrtgp 0/1 ContainerCreating 0 38h
etcd-master 1/1 Running 0 38h
kube-apiserver-master 1/1 Running 0 38h
kube-controller-manager-master 1/1 Running 0 38h
kube-proxy-9xb2c 1/1 Running 0 38h
kube-proxy-jfv2m 1/1 Running 0 38h
kube-router-5xjv4 0/1 CrashLoopBackOff 15 73m
kube-scheduler-master 1/1 Running 0 38h
Question -
Can kuberouter use private IPv6 address which is used by the Kubernetes cluster instead of using the public IP which in our case isenter code here IPv4.

Kuberetes V1.6.2 flannel not running on master node

I am running 2 node cluster using vagrant, configured with kubeadm command. When I setup the cluster flannel was running on all three nodes. Now i don't see flannel running in master node. because of this overlay network is not working from master node.
Used this yaml files to configure flannel.
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# kubectl get pods --all-namespaces -o wide |grep fla
kube-system kube-flannel-ds-0d3bn 2/2 Running 0 20m 192.168.15.102 node-01
kube-system kube-flannel-ds-86bzs 2/2 Running 0 20m 192.168.15.103 node-02
# k get nodes -o wide
NAME STATUS AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION
master01 Ready 26d v1.6.2 <none> CentOS Linux 7 (Core) 3.10.0-514.10.2.el7.x86_64
node-01 Ready 26d v1.6.2 <none> CentOS Linux 7 (Core) 3.10.0-514.10.2.el7.x86_64
node-02 Ready 26d v1.6.2 <none> CentOS Linux 7 (Core) 3.10.0-514.10.2.el7.x86_64
How can I start the flannel pod in my master node?
I see you using RBAC, maybe there are not enough rights at a node.
Try creating a clusterrolebinding with the necessary rights
$ kubectl create clusterrolebinding nodeName --clusterrole=system:node --
user=nodeName
or can use cluster-admin for testing

How to fix weave-net CrashLoopBackOff for the second node?

I have got 2 VMs nodes. Both see each other either by hostname (through /etc/hosts) or by ip address. One has been provisioned with kubeadm as a master. Another as a worker node. Following the instructions (http://kubernetes.io/docs/getting-started-guides/kubeadm/) I have added weave-net. The list of pods looks like the following:
vagrant#vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-vm-master 1/1 Running 0 3m
kube-system kube-apiserver-vm-master 1/1 Running 0 5m
kube-system kube-controller-manager-vm-master 1/1 Running 0 4m
kube-system kube-discovery-982812725-x2j8y 1/1 Running 0 4m
kube-system kube-dns-2247936740-5pu0l 3/3 Running 0 4m
kube-system kube-proxy-amd64-ail86 1/1 Running 0 4m
kube-system kube-proxy-amd64-oxxnc 1/1 Running 0 2m
kube-system kube-scheduler-vm-master 1/1 Running 0 4m
kube-system kubernetes-dashboard-1655269645-0swts 1/1 Running 0 4m
kube-system weave-net-7euqt 2/2 Running 0 4m
kube-system weave-net-baao6 1/2 CrashLoopBackOff 2 2m
CrashLoopBackOff appears for each worker node connected. I have spent several ours playing with network interfaces, but it seems the network is fine. I have found similar question, where the answer advised to look into the logs and no follow up. So, here are the logs:
vagrant#vm-master:~$ kubectl logs weave-net-baao6 -c weave --namespace=kube-system
2016-10-05 10:48:01.350290 I | error contacting APIServer: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: getsockopt: connection refused; trying with blank env vars
2016-10-05 10:48:01.351122 I | error contacting APIServer: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused
Failed to get peers
What I am doing wrong? Where to go from there?
I ran in the same issue too. It seems weaver wants to connect to the Kubernetes Cluster IP address, which is virtual. Just run this to find the cluster ip:
kubectl get svc. It should give you something like this:
$ kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 100.64.0.1 <none> 443/TCP 2d
Weaver picks up this IP and tries to connect to it, but worker nodes does not know anything about it. Simple route will solve this issue. On all your worker nodes, execute:
route add 100.64.0.1 gw <your real master IP>
this happens with a single node setup, too. I tried several things like reapplying the configuration and recreation, but the most stable way at the moment is to perform a full tear down (as described in docs) and put the cluster up again.
I use these scripts for relaunching the cluster:
down.sh
#!/bin/bash
systemctl stop kubelet;
docker rm -f -v $(docker ps -q);
find /var/lib/kubelet | xargs -n 1 findmnt -n -t tmpfs -o TARGET -T | uniq | xargs -r umount -v;
rm -r -f /etc/kubernetes /var/lib/kubelet /var/lib/etcd;
up.sh
#!/bin/bash
systemctl start kubelet
kubeadm init
# kubectl taint nodes --all dedicated- # single node!
kubectl create -f https://git.io/weave-kube
edit: I would also give other Pod networks a try, like Calico, if this is a weave related issue
The most common causes for this may be:
- presence of a firewall (e.g. firewalld on CentOS)
- network configuration (e.g. default NAT interface on VirtualBox)
Currently kubeadm is still alpha, and this is one of the issues that has already been reported by many of the alpha testers. We are looking into fixing this by documenting the most common problems, such documentation is going to be ready closer to beta version.
Right there exists a VirtualBox+Vargant+Ansible for Ubunutu and CentOS reference implementation that provides solutions for firewall, SELinux and VirtualBox NAT issues.
/usr/local/bin/weave reset
was the fix for me - Hope its useful - and yes make sure selinux is set to disabled
and firewalld is not running (on redhat / centos) releases
kube-system weave-net-2vlvj 2/2 Running 3 11d
kube-system weave-net-42k6p 1/2 Running 3 11d
kube-system weave-net-wvsk5 2/2 Running 3 11d