Rancher Kubernetes can't create persisten volume claim - kubernetes

I can't create Persistent volume claim in Kubernetes (Rancher 2.3).
My storage class use VMware cloud provider 'vSphere Storage for Kubernetes' provided by Rancher
In rancher web interface, the Event show errors like:
(combined from similar events): Failed to provision volume with StorageClass "t-a-g1000-t6m-e0fd": Post https://vsphere.exemple.com:443/sdk: dial tcp: lookup vsphere.exemple.com on [::1]:53: read udp [::1]:51136->[::1]:53: read: connection refused
I get the same error on my Kubernetes Master:
docker logs kube-controller-manager

For some reason, the DNS resolver of the kube-controller-manager pod on kubernet master was empty:
docker exec kube-controller-manager cat /etc/resolv.conf
# Generated by dhcpcd
# /etc/resolv.conf.head can replace this line
# /etc/resolv.conf.tail can replace this line
Since the host server resolv.confwas correct, I simply restarted the container:
docker restart kube-controller-manager
(an alternative ugly way would have been to edit the resolv.conf manually using docker restart kube-controller-manager, then do run the appropriate echo XXX >> /etc/resolv.conf ... bad idea )
Some other containers may have a similar issue on this node. This is a hacky way to identify and restart those containers:
cd cd /var/lib/docker/containers
ls -1 "$(grep nameserver -L */resolv.conf)" | sed -e 's#/.*##'
0c10e1374644cc262c8186e28787f53e02051cc75c1f943678d7aeaa00e5d450
70fd116282643406b72d9d782857bb7ec76dd85dc8a7c0a83dc7ab0e90d30966
841def818a8b4df06a0d30b0b7a66b75a3b554fb5feffe78846130cdfeb39899
ae356e26f1bf8fafe530d57d8c68a022a0ee0e13b4e177d3ad6d4e808d1b36da
d593735a2f6d96bcab3addafcfe3d44b6d839d9f3775449247bdb8801e2e1692
d9b0dfaa270d3f50169fb1aed064ca7a594583b9f71a111f653c71d704daf391
Restart affected containers:
cd /var/lib/docker/containers ; ls -1 $(grep nameserver -L */resolv.conf) | sed -e 's#/.*##' | xargs -n1 -r docker restart

Related

Unable to connect internet/google.com from pod. Docker and k8 are able to pull images

I am trying to learn Kubernetes.
Create a single-node Kubernetes Cluster on Oracle Cloud using these steps here
cat /etc/resolv.conf
>> nameserver 169.254.169.254
kubectl run busybox --rm -it --image=busybox --restart=Never -- sh
cat /etc/resolv.conf
>> nameserver 10.33.0.10
nslookup google.com
>>Server: 10.33.0.10
Address: 10.33.0.10:53
;; connection timed out; no servers could be reached
ping 10.33.0.10
>>PING 10.33.0.10 (10.33.0.10): 56 data bytes
kubectl get svc -n kube-system -o wide
>> CLUSTER-IP - 10.33.0.10
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
>>[ERROR] plugin/errors: 2 google.com. A: read udp 10.32.0.9:57385->169.254.169.254:53: i/o timeout
Not able to identify if this is an error of coredns or pod networking. Any direction would really help
Kubernetes has deprecated Docker as a container runtime after v1.20.
Kubernetes Development decision to deprecate Docker as an underlying runtime in favor of runtimes that use the Container Runtime Interface (CRI) created for Kubernetes.
To support this Mirantis and Docker came to the rescue by agreeing to partner in the maintenance of the shim code standalone.
More details here here
sudo systemctl enable docker
# -- Installin cri-dockerd
VER=$(curl -s https://api.github.com/repos/Mirantis/cri-dockerd/releases/latest|grep tag_name | cut -d '"' -f 4)
echo $VER
wget https://github.com/Mirantis/cri-dockerd/releases/download/${VER}/cri-dockerd-${VER}-linux-arm64.tar.gz
tar xvf cri-dockerd-${VER}-linux-arm64.tar.gz
install -o root -g root -m 0755 cri-dockerd /usr/bin/cri-dockerd
cp cri-dockerd /usr/bin/
# -- Verification
cri-dockerd --version
# -- Configure systemd units for cri-dockerd
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.service
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.socket
sudo cp cri-docker.socket cri-docker.service /etc/systemd/system/
sudo cp cri-docker.socket cri-docker.service /usr/lib/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket
# -- Using cri-dockerd on new Kubernetes cluster
systemctl status docker | grep Active
I ran into similar issue with almost same scenario described above. The accepted solution https://stackoverflow.com/a/72104194/1119570 is wrong. This issue is a pure networking issue that is not related to any of EKS upgrade in any way.
The root cause for our issue was the fact that the Worker Node AWS EKS Linux 1.21 AMI being hardened by our security department which turns off the following setting in this file /etc/sysctl.conf:
net.ipv4.ip_forward = 0
After switching this setting to:
net.ipv4.ip_forward = 1 and rebooting the EC2 Node, everything started working properly. Hope this helps!

Get contents of /etc/resolv.conf of coredns pod in kubernetes

I am new to k8 and I am learning how DNS works inside a k8 cluster. I am able to get the contents of /etc/resolv.conf of a random pod in the default namespace but I am unable to get the contents /etc/resolv.conf of coredns pod in kube-system namespace.
$>kubectl exec kubia-manual-v2 -- cat /etc/resolv.conf
Output:
nameserver 10.96.0.10
$>kubectl exec coredns-74ff55c5b-c8dk6 --namespace kube-system -- cat /etc/resolv.conf
Output:
OCI runtime exec failed: exec failed: container_linux.go:370: starting container process caused: exec: "cat": executable file not found in $PATH: unknown
command terminated with exit code 126
It looks like the cat system binary is not present in the $PATH. So, I wanted to know how can I get the contents of /etc/resol.conf of coredns pod.
Coredns is using scratch as the base layer running it's coredns binary. You can see the Dockerfile here for reference.
You can follow the instructions in this SO post to copy busybox into your coredns container and that will give you the ability to play around it. Keep in mind that you will probably have to ssh into the k8s node running this container to make this work.

When I start prepare kubernetes in aws the errors shown below

Below are the commands and their outputs:
root#k8s-master:~# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp: cannot stat '/etc/kubernetes/admin.conf': No such file or directory
root#k8s-master:~# kubectl get services -n kube-system
The connection to the server localhost:8080 was refused - did you specify the right host or port?
It looks like you are not running EKS. Otherwise you cannot access the masters. With EKS, the masters are managed by AWS and you can't ssh to them
your kubectl commands makes a call to the kubernetes api server. So you have to check if it is running on localhost on port 8080.

Is it possible to Reverse-dns query for a POD IP address to get its hostname for a Kubernetes Deployment?

I have a deployment where the replicas scale up and down which are all under a headless service. I am able to query ..svc.cluster.local which returns list of all pod IPs.
I wanted to know if its possible to query for each pod IP and get the hostname of the pod? It works for Pods on the same host machine. But its not resolving the pods from other hosts.
I noticed that it works for a StatefulSet. But its not working for Deployment.
This has already been discussed here for kube-dns. There has been more discussion here too.
However, PTR records work fine for me with coredns and K8s 1.12:
$ kubectl get pod helloworld-xxxxxxxxxx-xxxxx -o=jsonpath="{.metadata.annotations['cni\.projectcalico\.org/podIP']}" | cut -d "/" -f 1
192.168.11.28
# Connect to another pod
$ kubectl exec -it anotherpod-svc-xxxxxxxxxx-xxxxx bash
root#anotherpod-xxxxxxxxxx-xxxxx:/# dig +short -x 192.168.11.28
192-168-11-28.helloworld.default.svc.cluster.local.
root#anotherpod-xxxxxxxxxx-xxxxx:/# dig +short 192-168-11-28.helloworld.default.svc.cluster.local
192.168.11.28
# Another helloworld pod on a different physical machine
$ kubectl get pod helloworld-xxxxxxxxxx-xxxxx -o=jsonpath="{.metadata.annotations['cni\.projectcalico\.org/podIP']}" | cut -d "/" -f 1
192.168.4.6
# Connect to another pod
$ kubectl exec -it anotherpod-svc-xxxxxxxxxx-xxxxx bash
root#anotherpod-svc-xxxxxxxxxx-xxxxx:/# dig +short -x 192.168.4.6
192-168-4-6.helloworld.default.svc.cluster.local.
root#anotherpod-xxxxxxxxxx-xxxxx:/# dig +short 192-168-4-6.helloworld.default.svc.cluster.local
192.168.4.6

DNS not resolving though all keys are there in etcd?

Here are some details, and why this is important for my next step in testing:
I can resolve any outside DNS
etcd appears to have all keys updating
correctly, along with directories (as expected)
Local-to-Kubernetes DNS queries doesn't appear to be working against the etcd datastore,
even though I can manually query for key-values.
This is the next
step that I need to complete before I can start using an NGINX L7 LB
demo.
I looked
at the advice in #10265 first [just in case], but it
appears I do have secrets for the service account...and I think(?)
everything should be there as expected.
The only thing I really see in the Kube2Sky logs are that etcd is found. I would imagine I should be seeing more than this?
[fedora#kubemaster ~]$ kubectl logs kube-dns-v10-q9mlb -c kube2sky --namespace=kube-system
I0118 17:42:24.639508 1 kube2sky.go:436] Etcd server found: http://127.0.0.1:4001
I0118 17:42:25.642366 1 kube2sky.go:503] Using https://10.254.0.1:443 for kubernetes master
I0118 17:42:25.642772 1 kube2sky.go:504] Using kubernetes API
[fedora#kubemaster ~]$
More Details:
[fedora#kubemaster ~]$ kubectl exec -t busybox -- nslookup kubelab.local
Server: 10.254.0.10
Address 1: 10.254.0.10
nslookup: can't resolve 'kubelab.local'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
fedora#kubemaster ~]$ etcdctl ls --recursive
/kubelab.local
/kubelab.local/network
/kubelab.local/network/config
/kubelab.local/network/subnets
/kubelab.local/network/subnets/172.16.46.0-24
/kubelab.local/network/subnets/172.16.12.0-24
/kubelab.local/network/subnets/172.16.70.0-24
/kubelab.local/network/subnets/172.16.21.0-24
/kubelab.local/network/subnets/172.16.54.0-24
/kubelab.local/network/subnets/172.16.71.0-24
....and so on...the keys are all there, as expected...
I see you changed the default "cluster.local" to "kubelab.local". did you change the skydns config to serve that domain?
kubectl exec --namespace=kube-system $podname -c skydns ps
PID USER COMMAND
1 root /skydns -machines=http://127.0.0.1:4001 -addr=0.0.0.0:53 -ns-rotate=false -domain=cluster.local.
11 root ps
Note the -domain flag.
If that is correct, check that you passed correct --cluster-dns and --cluster-domain flags to Kubelet. Then show me /etc/resolv.conf from a pod that can not do DNS lookups.