Minikube pods stuck in Waiting: ImagePullBackOff - minikube

I have a new minikube installation running on a Centos 8 VM (VirtualBox) and when I run
kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.4
the pod runs but gets stuck in "Waiting: ImagePullBackOff". Looking further into this I can see that minikube cannot resolve DNS or download from the internet. The host can download and other Docker containers can access the internet too. Running ping inside minikube ssh doesn't seem to work either (ping not there)
[john#localhost ~]$ minikube ssh -- ping google.com
bash: ping: command not found
ssh: exit status 127
And curl returns that it cannot resolve DNS
[john#localhost ~]$ minikube ssh -- curl www.google.com
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- 0curl: (6) Could not resolve host: www.google.com
ssh: exit status 6
So it looks to me that minikube cannot access the internet. I'm sure I have missed something very simple here so if anyone could tell me I would be very grateful. I have no proxies.

ok, I found the answer. It was the Centos firewall. These commands fixed it.
sudo firewall-cmd --zone=trusted --change-interface=docker0 --permanent
sudo firewall-cmd --reload

Related

why the telepresence 2 could not access the remote kubernetes cluster after connect

I want to use telepresence 2 to debug the app within remote kubernetes(v1.21) cluster in the Cloud. After installation of telepresence 2 in macOS Monterey, I am using this command to connect to the remote kubernetes cluster:
telepresence connect
To my surprise, I could not ping any of the remote kuberentes pod after getting telepresence 2 connected.
➜ ~ ping 10.97.196.216
PING 10.97.196.216 (10.97.196.216): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
^C
--- 10.97.196.216 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss
I do not know what I'm doing wrong or missing some ext configuration? this is the telepresence 2 status output:
➜ ~ telepresence status
Root Daemon: Running
Version : v2.4.9 (api 3)
DNS :
Remote IP : 10.96.0.10
Exclude suffixes: [.arpa .com .io .net .org .ru]
Include suffixes: []
Timeout : 4s
Also Proxy : (0 subnets)
Never Proxy: (1 subnets)
User Daemon: Running
Version : v2.4.9 (api 3)
Ambassador Cloud : Logged out
Status : Connected
Kubernetes server : https://126.104.83.161:6443
Kubernetes context: context-reddwarf
Telepresence proxy: ON (networking to the cluster is enabled)
Intercepts : 0 total
What should I do to make it work? I am using this command to test the connection:
➜ telepresence ping cruise-redis-master.reddwarf-cache.svc.cluster.local
PING cruise-redis-master.reddwarf-cache.svc.cluster.local (10.108.202.100): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
and this is the resource infomation in cluster:
➜ telepresence kubectl get statefulsets -n reddwarf-cache -o wide
NAME READY AGE CONTAINERS IMAGES
cruise-redis-master 1/1 276d redis docker.io/bitnami/redis:6.2.5-debian-10-r11
cruise-redis-replicas 3/3 276d redis docker.io/bitnami/redis:6.2.5-debian-10-r11

No route to host from some Kubernetes containers to other containers in same cluster

This is a Kubespray deployment using calico. All the defaults are were left as-is except for the fact that there is a proxy. Kubespray ran to the end without issues.
Access to Kubernetes services started failing and after investigation, there was no route to host to the coredns service. Accessing a K8S service by IP worked. Everything else seems to be correct, so I am left with a cluster that works, but without DNS.
Here is some background information:
Starting up a busybox container:
# nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
Now the output while explicitly defining the IP of one of the CoreDNS pods:
# nslookup kubernetes.default 10.233.0.3
;; connection timed out; no servers could be reached
Notice that telnet to the Kubernetes API works:
# telnet 10.233.0.1 443
Connected to 10.233.0.1
kube-proxy logs:
10.233.0.3 is the service IP for coredns. The last line looks concerning, even though it is INFO.
$ kubectl logs kube-proxy-45v8n -nkube-system
I1114 14:19:29.657685 1 node.go:135] Successfully retrieved node IP: X.59.172.20
I1114 14:19:29.657769 1 server_others.go:176] Using ipvs Proxier.
I1114 14:19:29.664959 1 server.go:529] Version: v1.16.0
I1114 14:19:29.665427 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I1114 14:19:29.669508 1 config.go:313] Starting service config controller
I1114 14:19:29.669566 1 shared_informer.go:197] Waiting for caches to sync for service config
I1114 14:19:29.669602 1 config.go:131] Starting endpoints config controller
I1114 14:19:29.669612 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I1114 14:19:29.769705 1 shared_informer.go:204] Caches are synced for service config
I1114 14:19:29.769756 1 shared_informer.go:204] Caches are synced for endpoints config
I1114 14:21:29.666256 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.124.23:53
I1114 14:21:29.666380 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.122.11:53
All pods are running without crashing/restarts etc. and otherwise services behave correctly.
IPVS looks correct. CoreDNS service is defined there:
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.233.0.1:443 rr
-> x.59.172.19:6443 Masq 1 0 0
-> x.59.172.20:6443 Masq 1 1 0
TCP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 0
-> 10.233.124.24:53 Masq 1 0 0
TCP 10.233.0.3:9153 rr
-> 10.233.122.12:9153 Masq 1 0 0
-> 10.233.124.24:9153 Masq 1 0 0
TCP 10.233.51.168:3306 rr
-> x.59.172.23:6446 Masq 1 0 0
TCP 10.233.53.155:44134 rr
-> 10.233.89.20:44134 Masq 1 0 0
UDP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 314
-> 10.233.124.24:53 Masq 1 0 312
Host routing also looks correct.
# ip r
default via x.59.172.17 dev ens3 proto dhcp src x.59.172.22 metric 100
10.233.87.0/24 via x.59.172.21 dev tunl0 proto bird onlink
blackhole 10.233.89.0/24 proto bird
10.233.89.20 dev calib88cf6925c2 scope link
10.233.89.21 dev califdffa38ed52 scope link
10.233.122.0/24 via x.59.172.19 dev tunl0 proto bird onlink
10.233.124.0/24 via x.59.172.20 dev tunl0 proto bird onlink
x.59.172.16/28 dev ens3 proto kernel scope link src x.59.172.22
x.59.172.17 dev ens3 proto dhcp scope link src x.59.172.22 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
I have redeployed this same cluster in separate environments with flannel and calico with iptables instead of ipvs. I have also disabled the docker http proxy after deploy temporarily. None of which makes any difference.
Also:
kube_service_addresses: 10.233.0.0/18
kube_pods_subnet: 10.233.64.0/18
(They do not overlap)
What is the next step in debugging this issue?
I highly recommend you to avoid using latest busybox image to troubleshoot DNS. There are few issues reported regarding dnslookup on versions newer than 1.28.
v 1.28.4
user#node1:~$ kubectl exec -ti busybox busybox | head -1
BusyBox v1.28.4 (2018-05-22 17:00:17 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
v 1.31.1
user#node1:~$ kubectl exec -ti busyboxlatest busybox | head -1
BusyBox v1.31.1 (2019-10-28 18:40:01 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busyboxlatest -- nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
command terminated with exit code 1
Going deeper and exploring more possibilities, I've reproduced your problem on GCP and after some digging I was able to figure out what is causing this communication problem.
GCE (Google Compute Engine) blocks traffic between hosts by default; we have to allow Calico traffic to flow between containers on different hosts.
According to calico documentation, you can do it by creating a firewall allowing this communication rule:
gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:
gcloud compute firewall-rules list
This is not present on the most recent calico documentation but it's still true and necessary.
Before creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
After creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
It doesn't matter if you bootstrap your cluster using kubespray or kubeadm, this problem will happen because calico needs to communicate between nodes and GCE is blocking it as default.
This is what works for me, I tried to install my k8s cluster using kubespray configured with calico as CNI and containerd as container runtime
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -F
[delete coredns pod]

Vault authentication using Kubernetes is failing

Created a Vault and Consul cluster on Kubernetes with TLS by following
https://testdriven.io/blog/running-vault-and-consul-on-kubernetes/
and was trying to configure Kubernetes auth method using https://learn.hashicorp.com/vault/identity-access-management/vault-agent-k8s
everything went fine up to step 3 (Verify the Kubernetes auth method configuration), when I tested connection I am getting the error "Failed to connect to vault port 8200: Connection refused".
Can any one help me with this.
$ kubectl run --generator=run-pod/v1 tmp --rm -i --tty --serviceaccount=vault-auth --image alpine:3.7
# VAULT_ADDR=https://vault:8200
/ # curl -s $VAULT_ADDR/v1/sys/health | jq
/ # curl $VAULT_ADDR/v1/sys/health | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
0curl: (7) Failed to connect to vault port 8200: Connection refused
$ k get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul ClusterIP None <none> 8500/TCP,8443/TCP,8400/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP 177m
kubernetes ClusterIP 10.245.0.1 <none> 443/TCP 26h
vault ClusterIP 10.245.215.195 <none> 8200/TCP

minikube mount broken on VPN

So I'm having issues with minikube mount command while on Big-IP VPN. Basically, the command looks like it's able to ssh into the minikube VM, but for whatever reason, minikube can't mount the host folder.
$ minikube mount --v=10 app_shared_sec:/app/shared/sec
Mounting app_shared_sec into /app/shared/sec on the minikube VM
This daemon process needs to stay alive for the mount to still be accessible...
ufs starting
Using SSH client type: native
&{{{<nil> 0 [] [] []} docker [0x140f940] 0x140f910 [] 0s} 127.0.0.1 57930 <nil> <nil>}
About to run SSH command:
sudo umount /app/shared/sec;
SSH cmd err, output: Process exited with status 32: umount: /app/shared/sec: not mounted.
Using SSH client type: native
&{{{<nil> 0 [] [] []} docker [0x140f940] 0x140f910 [] 0s} 127.0.0.1 57930 <nil> <nil>}
About to run SSH command:
sudo mkdir -p /app/shared/sec || true;
sudo mount -t 9p -o trans=tcp,port=51501,dfltuid=1001,dfltgid=1001,version=9p2000.u,msize=262144 192.168.99.1 /app/shared/sec;
sudo chmod 775 /app/shared/sec || true;
SSH cmd err, output: <nil>: mount: /app/shared/sec: mount(2) system call failed: Connection timed out.
Running netstat within the minikube VM seems to point to it being able to read the host.
$ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 0 0 0 eth0
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.0.2.2 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.99.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
The issue appears to be because VPN blows away the vboxnet interface on the host, minikube VM has no way of communicating with it, causing the mount to fail. When trying to re-create the route, Big-IP seems to watch for changes and removes it. Not sure what else to do at this point.

Unable to access kubernetes dashboard : no route to host

I install the kubernetes-dashboard on a kubernetes cluster base on Redhat linux.
Everything seems to be fine
# kubectl get --namespace kube-system pods
NAME READY STATUS RESTARTS AGE
kubernetes-dashboard-4236001524-m9uxm 1/1 Running 0 2h
except that when I try to access to the web interface
http://g-lsdp-kuber-master.rd.mydomain.fr:8080/ui
I'm redirected to to this new url
http://g-lsdp-kuber-master.rd.mydomain.fr:8080/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard/
and the result is :
Error: 'dial tcp 172.17.0.2:9090: getsockopt: no route to host'
Trying to reach: 'http://172.17.0.2:9090/'
I don't know where to find a solution, I already try to add a dns
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d415363280aa gcr.io/google_containers/skydns:2015-03-11-001 "/skydns" 38 minutes ago Up 38 minutes furious_ramanujan
12d3530c9f4d gcr.io/google_containers/kube2sky:1.11 "/kube2sky -v=10 -log" 39 minutes ago Up 39 minutes
with the same result.