Minikube Kubernetes pending pod on AWS EC2 - kubernetes

I've tried many times to install kubernetes on Debian last stable version on AWS EC2 instance (2 vcpu, 4 GB RAM, 10 GB HD).
I've also tried to install now on ubuntu Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1084-aws x86_64) over AWS EC2 same vm compute configuration.
I've installed docker, kubctl, docker-cri, crictl and minikube but I've an issue with Kubernetes node not ready and then pending pods. The blocking point here for me is the CNI as I've core-dns pods pending and I see few strange things in the logs, but do not know how to solve it.
I've tried also to install Calico as you will see the calico pods. It's the first time I install Kubernetes and Minikube.
Minikube is started with the following command : minikube start --vm-driver=none
minikube version: v1.27.1
root#awsec2:~# minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
root#awsec2:~# docker version
Client:
Version: 20.10.7
API version: 1.41
Go version: go1.13.8
Git commit: 20.10.7-0ubuntu5~18.04.3
Built: Mon Nov 1 01:04:14 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
root#ip-172-31-37-142:~# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-awsec2-ip NotReady control-plane 10h v1.25.2 172.31.37.142 Ubuntu 18.04.6 LTS 5.4.0-1084-aws docker://20.10.7
root#aws:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-minikube 0/1 Pending 0 10h
kube-system coredns-565d847f94-kmbdr 0/1 Pending 0 11h
kube-system etcd-ip-172-31-37-142 1/1 Running 1 (10h ago) 11h
kube-system kube-apiserver-ip-172-31-37-142 1/1 Running 1 (10h ago) 11h
kube-system kube-controller-manager-ip-172-31-37-142 1/1 Running 1 (10h ago) 11h
kube-system kube-proxy-dff99 1/1 Running 1 (10h ago) 11h
kube-system kube-scheduler-ip-172-31-37-142 1/1 Running 1 (10h ago) 11h
kube-system storage-provisioner 0/1 Pending 0 11h
tigera-operator tigera-operator-6675dc47f4-gngrn 1/1 Running 2 (7m ago) 10h
In the minikube logs command I've seen this error but do not know how to solve it :
==> kubelet <==
-- Logs begin at Tue 2022-10-18 21:26:09 UTC, end at Wed 2022-10-19 08:57:51 UTC. --
Oct 19 08:52:52 ip-172-31-37-142 kubelet[17361]: E1019 08:52:52.018304 17361 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
If someone can explain how to correct that as it's should be very standard issue.

Now, I've found a workaround that will work for my test.
I've run Minikube with Kubernetes 1.23 instead of 1.24.
I've run this command :
minikube start --vm-driver=none --kubernetes-version=v1.23.0
I didn't set up Calico CNI now, I have successfully my node and hello-World pod running correctly.
I will test it like that, then I will try to upgrade to Kube 1.24.
Kind Regards,

Related

There are 2 networking component installed on node master, Weave and Calico. how can I completely remove Calico from my kubernetes cluster?

Weave has overlap with host's IP address and its pod stuck in CrashLoopBackOff state. There is a need to remove Calico first as I have no clue about working 2 Networking module on master!
emo#master:~$ sudo kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-dw6ch 0/1 ContainerCreating 0
kube-system coredns-64897985d-xr6br 0/1 ContainerCreating 0
kube-system etcd-master 1/1 Running 26 (14m ago)
kube-system kube-apiserver-master 1/1 Running 26 (12m ago)
kube-system kube-controller-manager-master 1/1 Running 4 (20m ago)
kube-system kube-proxy-g98ph 1/1 Running 3 (20m ago)
kube-system kube-scheduler-master 1/1 Running 4 (20m ago)
kube-system weave-net-56n8k 1/2 CrashLoopBackOff 76 (54s ago)
tigera-operator tigera-operator-b876f5799-sqzf9 1/1 Running 6 (5m57s ago)
master:
emo#master:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane,master 6d19h v1.23.5 192.168.71.132 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic containerd://1.5.5
You may need to re-build your cluster after cleaning it up.
First, run kubectl delete for all the manifests you have applied to configure calico and weave. (e.g. kubectl delete -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml)
Then run kubeadm reset and run /etc/cni/net.d/ to delete all of your cni configurations. After that, you also need to reboot the server to delete some old records of ip link, or manually remove them by ip link delete {name}.
Now the new installation should be done well.

Kubeadm Failed to create SubnetManager: error retrieving pod spec for kube-system

No matter what I do it seems I cannot get rid of this problem. I have installed Kubernetes using kubeadm many times quite successfully however adding a v1.16.0 node is giving me a heck of a headache.
O/S: Ubuntu 18.04.3 LTS
Kubernetes version: v1.16.0
Kubeadm version: Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:34:01Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"
A query of the cluster shows:
NAME STATUS ROLES AGE VERSION
kube-apiserver-1 Ready master 110d v1.15.0
kube-apiserver-2 Ready master 110d v1.15.0
kube-apiserver-3 Ready master 110d v1.15.0
kube-node-1 Ready <none> 110d v1.15.0
kube-node-2 Ready <none> 110d v1.15.0
kube-node-3 Ready <none> 110d v1.15.0
kube-node-4 Ready <none> 110d v1.16.0
kube-node-5 Ready,SchedulingDisabled <none> 3m28s v1.16.0
kube-node-databases Ready <none> 110d v1.15.0
I have temporarily disabled scheduling to the node until I can fix this problem. A query of the pod status in the kube-system namespace shows the problem:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-55zjs 1/1 Running 128 21d
coredns-fb8b8dccf-kzrpc 1/1 Running 144 21d
kube-flannel-ds-amd64-29xp2 1/1 Running 11 110d
kube-flannel-ds-amd64-hp7nq 1/1 Running 14 110d
kube-flannel-ds-amd64-hvdpf 0/1 CrashLoopBackOff 5 8m28s
kube-flannel-ds-amd64-jhhlk 1/1 Running 11 110d
kube-flannel-ds-amd64-k6dzc 1/1 Running 2 110d
kube-flannel-ds-amd64-lccxl 1/1 Running 21 110d
kube-flannel-ds-amd64-nnn7g 1/1 Running 14 110d
kube-flannel-ds-amd64-shss5 1/1 Running 7 110d
kubectl -n kube-system logs -f kube-flannel-ds-amd64-hvdpf
I1002 01:13:22.136379 1 main.go:514] Determining IP address of default interface
I1002 01:13:22.136823 1 main.go:527] Using interface with name ens3 and address 192.168.5.46
I1002 01:13:22.136849 1 main.go:544] Defaulting external address to interface address (192.168.5.46)
E1002 01:13:52.231471 1 main.go:241] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-amd64-hvdpf': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-amd64-hvdpf: dial tcp 10.96.0.1:443: i/o timeout
Although I had a few hits on iptables issues and kernel routing I don't understand why previous versions have installed without a hitch but this version is giving me such a problem.
I have installed this node and destroyed it quite a few times yet the result is always the same.
Anyone else having this issue or has a solution?
This occurs when its not able to lookup the host add the below after name: POD_NAMESPACE
- name: KUBERNETES_SERVICE_HOST
value: "10.220.64.186" #ip address of the host where kube-apiservice is running
- name: KUBERNETES_SERVICE_PORT
value: "6443"
According to Documentation about version skew policy:
kubelet
kubelet must not be newer than kube-apiserver, and may be up to two minor versions older.
Example:
kube-apiserver is at 1.13
kubelet is supported at 1.13, 1.12, and 1.11
That means that worker nodes with version v1.16.0 is not supported on master node with version v1.15.0.
To fix this issue I recommend reinstalling node with version v1.15.0 to match the rest of the cluster.
Optionally You can upgrade whole cluster to v1.16.1 however there are some problems with it running flannel as network plugin at the moment. Please review this guide from documentation before proceeding.

kubectl get pods returns inconsistent results

When I execute kubectl get pods, I get different output for the same pod.
For example:
$ kubectl get pods -n ha-rabbitmq
NAME READY STATUS RESTARTS AGE
rabbitmq-ha-0 1/1 Running 0 85m
rabbitmq-ha-1 1/1 Running 9 84m
rabbitmq-ha-2 1/1 Running 0 50m
After that I execute the same command and here is the different result:
$ kubectl get pods -n ha-rabbitmq
NAME READY STATUS RESTARTS AGE
rabbitmq-ha-0 0/1 CrashLoopBackOff 19 85m
rabbitmq-ha-1 1/1 Running 9 85m
rabbitmq-ha-2 1/1 Running 0 51m
I have 2 master nodes and 5 worker nodes initialized with kubeadm. Each master node has one instance of built-in etcd pod running on them.
Result of kubectl get nodes:
$ kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-meb-master1 Ready master 14d v1.14.3 10.30.29.11 <none> Ubuntu 18.04.2 LTS 4.15.0-51-generic docker://18.9.5
k8s-meb-master2 Ready master 14d v1.14.3 10.30.29.12 <none> Ubuntu 18.04.2 LTS 4.15.0-51-generic docker://18.9.6
k8s-meb-worker1 Ready <none> 14d v1.14.3 10.30.29.13 <none> Ubuntu 18.04.2 LTS 4.15.0-51-generic docker://18.9.5
k8s-meb-worker2 Ready <none> 14d v1.14.3 10.30.29.14 <none> Ubuntu 18.04.2 LTS 4.15.0-51-generic docker://18.9.5
k8s-meb-worker3 Ready <none> 14d v1.14.3 10.30.29.15 <none> Ubuntu 18.04.2 LTS 4.15.0-51-generic docker://18.9.5
k8s-meb-worker4 Ready <none> 14d v1.14.2 10.30.29.16 <none> Ubuntu 18.04.2 LTS 4.15.0-51-generic docker://18.9.5
k8s-meb-worker5 Ready <none> 5d19h v1.14.2 10.30.29.151 <none> Ubuntu 18.04 LTS 4.15.0-20-generic docker://18.9.5
Can this issue be related to unsynchronized contents for the /var/lib/etcd/ in the master nodes ?
Your pods are in CrashLoopBackoff state.
That means that some containers inside the pod are exiting (the main process exits) and the pod gets restarted over and over again.
Depending when you run the get po command, you might see your pod as Running (the process didn't exit yet) or CrashLoopBackoff (kubernetes is waiting before restarting your pod.
You can confirm this is the case by looking at the Restarts counter in the output.
I suggest you have a look at the restarting pods logs to get an idea why they're failing.
it seems there is a ETCD inconsistence between each control nodes due to a incomplete etcd restoration. Please refer this link to how to do it properly https://medium.com/#pranaybhardwaj007/etcd-backup-and-restore-in-ha-mode-8722b97d440d

Kubernetes - kube-system pods in master node keep restarting after worker node joins

I have followed this tutorial and this tutorial and this one but am facing the same issue for last 3 days.
I am able to set up the master node correctly with the following steps:
kubeadm init
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export kubever=$(kubectl version | base64 | tr -d ‘\’)
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"
and everything seems fine in
kubectl get all --namespace=kube-system
then,
on the worker node:
kubeadm join --token 864655.fdf6d0b389867b79 192.168.100.17:6443 --discovery-token-ca-cert-hash sha256:a2d840808b17b53b9612e6271ccde489f13dbede7d354f97188d0faa9e210af2
The output seems fine and is as below:
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "192.168.100.17:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.100.17:6443"
[discovery] Requesting info from "https://192.168.100.17:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.100.17:6443"
[discovery] Successfully established connection with API Server "192.168.100.17:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
BUT as soon as I run this command, all hell breaks loose. The
kubectl get all --namespace=kube-system
starts showing that all pods are kind of restarting all the time. the status keeps changing between Pending and Running, and at time some of the pods will even disappear and may have ContainerCreating status etc.
NAME READY STATUS RESTARTS AGE
po/etcd-ubuntu 0/1 Pending 0 0s
po/kube-controller-manager-ubuntu 0/1 Pending 0 0s
po/kube-dns-6f4fd4bdf-cmcfk 3/3 Running 0 13m
po/kube-proxy-2chb6 1/1 Running 0 13m
po/kube-scheduler-ubuntu 0/1 Pending 0 0s
po/weave-net-ptdxr 2/2 Running 0 11m
I have also tried the second tutorial, with flannel, and get the exact same issue.
My Set Up
I created two new VMs with a fresh installation of Ubuntu 17.10 on VMware with 2 processor/2core 6 GB of ram and 50 GB hard disk each. My physical machine is a i7-6700k with 32gb of ram.
I installed kubeadm, kubelet and docker on both of them and then followed the steps as mentioned above.
I have also tried switching between NAT and Bridge on VMware and nothing changed.
The initial IP of both VMs with bridge network was 192.168.100.12 and 192.168.100.17.
The hostname -I for master:
192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2
The hostname -I for worker-node:
192.168.100.12 172.17.0.1 10.44.0.0 10.32.0.1
journalctl -xeu kubelet shows the following:
https://gist.github.com/saad749/9a771a3460bf88c274498b5bc4b7fd84
While trying with flannel (and still the same issue), the result from
kubectl describe nodes
is
https://gist.github.com/saad749/d24c453c8b4e663e9abf572a0fb38bf4
Am I missing any step before kubeadm init? Should I change the IP addresses (to what)? Are there any specific logs I should look into? Is there a more comprehensive tutorial for this?
All Issues start after kubeadm join on the worker node, I can deploy the kubernetes on the master node or any other stuff, and it works fine.
UPDATE:
Even after applying the suggestions from errordeveloper, The same issue persists.
I add the following flag to kubeadm init:
--apiserver-advertise-address 192.168.100.17
I updated the kubeadm.conf to following and did reload and restart:
https://gist.github.com/saad749/c7149c87ec3e75a40586f626cf04279a
and also tried changing the cluster dns
https://gist.github.com/saad749/5fa66bebc22841e58119333e75600e40
This the log from after initializing the master:
kube-master#ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system etcd-ubuntu 1/1 Running 0 22s 192.168.100.17 ubuntu
kube-system kube-apiserver-ubuntu 1/1 Running 0 29s 192.168.100.17 ubuntu
kube-system kube-controller-manager-ubuntu 1/1 Running 0 13s 192.168.100.17 ubuntu
kube-system kube-dns-6f4fd4bdf-wfqhb 3/3 Running 0 1m 10.32.0.7 ubuntu
kube-system kube-proxy-h4hz9 1/1 Running 0 1m 192.168.100.17 ubuntu
kube-system kube-scheduler-ubuntu 1/1 Running 0 34s 192.168.100.17 ubuntu
kube-system weave-net-fkgnh 2/2 Running 0 32s 192.168.100.17 ubuntu
The hostname -i results:
kube-master#ubuntu:~$ hostname -I
192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2 10.32.0.3 10.32.0.4 10.32.0.5 10.32.0.6 10.244.0.0 10.244.0.1
kube-master#ubuntu:~$ hostname -i
192.168.100.17
Results from:
kubectl describe nodes
https://gist.github.com/saad749/8f460650182a04d0ddf3158a52761a9a
The Internal IP seems correct now.
After joining from second node, this happens:
kube-master#ubuntu:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu Ready master 49m v1.9.3
kube-master#ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system kube-controller-manager-ubuntu 0/1 Pending 0 0s <none> ubuntu
kube-system kube-dns-6f4fd4bdf-wfqhb 0/3 ContainerCreating 0 49m <none> ubuntu
kube-system kube-proxy-h4hz9 1/1 Running 0 49m 192.168.100.17 ubuntu
kube-system kube-scheduler-ubuntu 1/1 Running 0 1s 192.168.100.17 ubuntu
kube-system weave-net-fkgnh 2/2 Running 0 48m 192.168.100.17 ubuntu
ifconfig -a results:
https://gist.github.com/saad749/63a5a52bd3246ff72477b2aca7d158d0
journalctl -xeu kubelet results
https://gist.github.com/saad749/8a60870b35f93df8565e66cb208aff32
Sometimes, the pods IP is shown at 192.168.100.12 which is the IP of the non-master second node.
kube-master#ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system etcd-ubuntu 0/1 Pending 0 0s <none> ubuntu
kube-system kube-apiserver-ubuntu 0/1 Pending 0 0s <none> ubuntu
kube-system kube-controller-manager-ubuntu 1/1 Running 0 0s 192.168.100.12 ubuntu
kube-system kube-dns-6f4fd4bdf-wfqhb 2/3 Running 0 3h 10.32.0.7 ubuntu
kube-system kube-proxy-h4hz9 1/1 Running 0 3h 192.168.100.12 ubuntu
kube-system kube-scheduler-ubuntu 0/1 Pending 0 0s <none> ubuntu
kube-system weave-net-fkgnh 2/2 Running 1 3h 192.168.100.17 ubuntu
kube-master#ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system kube-dns-6f4fd4bdf-wfqhb 3/3 Running 0 3h 10.32.0.7 ubuntu
kube-system kube-proxy-h4hz9 1/1 Running 0 3h 192.168.100.12 ubuntu
kube-system weave-net-fkgnh 2/2 Running 0 3h 192.168.100.12 ubuntu
kubectl describe nodes
Name: ubuntu
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=ubuntu
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node-role.kubernetes.io/master:NoSchedule
CreationTimestamp: Fri, 02 Mar 2018 08:21:47 -0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 02 Mar 2018 11:38:36 -0800 Fri, 02 Mar 2018 08:21:43 -0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 02 Mar 2018 11:38:36 -0800 Fri, 02 Mar 2018 08:21:43 -0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 02 Mar 2018 11:38:36 -0800 Fri, 02 Mar 2018 08:21:43 -0800 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 02 Mar 2018 11:38:36 -0800 Fri, 02 Mar 2018 11:28:25 -0800 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 192.168.100.12
Hostname: ubuntu
Capacity:
cpu: 4
memory: 6080832Ki
pods: 110
Allocatable:
cpu: 4
memory: 5978432Ki
pods: 110
System Info:
Machine ID: 59bf65b835b242a3aa182f4b8a542219
System UUID: 0C3C4D56-4747-D59E-EE09-F16F2793677E
Boot ID: 658b4a08-d724-425e-9246-2b41995ecc46
Kernel Version: 4.13.0-36-generic
OS Image: Ubuntu 17.10
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.9.3
Kube-Proxy Version: v1.9.3
ExternalID: ubuntu
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system kube-dns-6f4fd4bdf-wfqhb 260m (6%) 0 (0%) 110Mi (1%) 170Mi (2%)
kube-system kube-proxy-h4hz9 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-fkgnh 20m (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
280m (7%) 0 (0%) 110Mi (1%) 170Mi (2%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Rebooted 12m (x814 over 2h) kubelet, ubuntu Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
Normal NodeHasNoDiskPressure 10m kubelet, ubuntu Node ubuntu status is now: NodeHasNoDiskPressure
Normal Starting 10m kubelet, ubuntu Starting kubelet.
Normal NodeAllocatableEnforced 10m kubelet, ubuntu Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 10m kubelet, ubuntu Node ubuntu status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 10m kubelet, ubuntu Node ubuntu status is now: NodeHasSufficientMemory
Normal NodeNotReady 10m kubelet, ubuntu Node ubuntu status is now: NodeNotReady
Warning Rebooted 2m (x870 over 2h) kubelet, ubuntu Node ubuntu has been rebooted, boot id: 658b4a08-d724-425e-9246-2b41995ecc46
Warning Rebooted 15s (x60 over 10m) kubelet, ubuntu Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
What am I doing wrong?
So after following the advice from #errordeveloper and still hitting the wall, I was able to solve the issue that turns out to be pretty simple.
Both my VMs had the same hostname.
hostname -f
would return
ubuntu
on both, and that causes issue with kubernetes, apparently.
I changed the name on my non-master node with
hostnamectl set-hostname kminion
and in the following files:
/etc/hostname
/etc/hosts
and everything went smooth onward!
Should I change the IP addresses (to what)?
Yes, this is typically the way to make things work on VMs where the default route is for NATed access to the Internet.
You want to use the IP of the bridge network, for you master that appears to be 192.168.100.17 (but please double check).
First, please try using kubeadm init --apiserver-advertise-address 192.168.100.17, but that may not solve all of the issues.
In your ouput of kubectl describe nodes, I can see this
Addresses:
InternalIP: 172.17.0.1
Hostname: ubuntu
So you probably want to make sure that kubelet also doesn't used the NATed interface, for which you would need to use kubelet's --node-ip flag.
However, there are other ways to fix this problem, e.g. if you can ensure that hostname -i returns the IP of the bridged interface (which you can do by tweaking /etc/hosts).

Rancher Kubernetes Dashboard - Service Unavailable

I am new to Rancher and containers in general. While setting up Kubernetes cluster using Rancher, i’m facing problem while accessing Kubernetes dashboard.
rancher/server: 1.6.6
Single node Rancher server + External MySQL + 3 agent nodes
Infrastructure Stack versions:
healthcheck: v0.3.1
ipsec: net:v0.11.5
network-services: metadata:v0.9.2 / network-manager:v0.7.7
scheduler: k8s:v1.7.2-rancher5
kubernetes (if applicable): kubernetes-agent:v0.6.3
# docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.03.1-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.34-rancher
Operating System: RancherOS v1.0.3
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.798 GiB
Name: ch7radod1
ID: IUNS:4WT2:Y3TV:2RI4:FZQO:4HYD:YSNN:6DPT:HMQ6:S2SI:OPGH:TX4Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://proxy.ch.abc.net:8080
Https Proxy: http://proxy.ch.abc.net:8080
No Proxy: localhost,.xyz.net,abc.net
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Accessing UI URL http://10.216.30.10/r/projects/1a6633/kubernetes-dashboard:9090/# shows “Service unavailable”
If i use the CLI section from the UI, i get the following:
> kubectl get nodes
NAME STATUS AGE VERSION
ch7radod3 Ready 1d v1.7.2
ch7radod4 Ready 5d v1.7.2
ch7radod1 Ready 1d v1.7.2
> kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-4njc2 0/1 ContainerCreating 0 5d
kube-system kube-dns-3942128195-ft56n 0/3 ContainerCreating 0 19d
kube-system kube-dns-646531078-z5lzs 0/3 ContainerCreating 0 5d
kube-system kubernetes-dashboard-716739405-lpj38 0/1 ContainerCreating 0 5d
kube-system monitoring-grafana-3552275057-qn0zf 0/1 ContainerCreating 0 5d
kube-system monitoring-influxdb-4110454889-79pvk 0/1 ContainerCreating 0 5d
kube-system tiller-deploy-737598192-f9gcl 0/1 ContainerCreating 0 5d
The setup uses private registry (Artifactory). I checked Artifactory and i could see several images present related to Docker. I was going through private registry section and i also saw this file. In case this file is required, where exactly do i keep it so that Rancher can fetch it and configure the Kubernetes dashboard?
UPDATE:
$ sudo ros engine switch docker-1.12.6
> ERRO[0031] Failed to load https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Get https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Proxy Authentication Required
> FATA[0031] docker-1.12.6 is not a valid engine
I thought may be it’s due to NGINX so i stopped the NGINX container but i am still getting the above error. Earlier i have tried the same command on this Rancher server and it used to work fine. It’s working fine on agent nodes although they are already having 1.12.6 configured.
UPDATE 2:
> kubectl -n kube-system get po
NAME READY STATUS RESTARTS AGE
heapster-4285517626-4njc2 1/1 Running 0 12d
kube-dns-2588877561-26993 0/3 ImagePullBackOff 0 5h
kube-dns-646531078-z5lzs 0/3 ContainerCreating 0 12d
kubernetes-dashboard-716739405-zq3s9 0/1 CrashLoopBackOff 67 5h
monitoring-grafana-3552275057-qn0zf 1/1 Running 0 12d
monitoring-influxdb-4110454889-79pvk 1/1 Running 0 12d
tiller-deploy-737598192-f9gcl 0/1 CrashLoopBackOff 72 12d
None of your pods running, you need to resolve that issue first. try to restart the whole cluster and see all above pods in running status.
Based on #ivan.sim's suggestion, i posted 'UPDATE 2'. This started me finally to look in the right direction. I then started looking for CrashLoopBackOff error online and came across this link and tried the following command (using CLI option from Rancher console), which was actually quite similar to what #ivan.sim suggested above but this helped me with the node where the dashboard process was running:
> kubectl get pods -a -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system heapster-4285517626-4njc2 1/1 Running 0 12d 10.42.224.157 radod4
kube-system kube-dns-2588877561-26993 0/3 ImagePullBackOff 0 5h <none> radod1
kube-system kube-dns-646531078-z5lzs 0/3 ContainerCreating 0 12d <none> radod4
kube-system kubernetes-dashboard-716739405-zq3s9 0/1 Error 70 5h 10.42.218.11 radod1
kube-system monitoring-grafana-3552275057-qn0zf 1/1 Running 0 12d 10.42.202.44 radod4
kube-system monitoring-influxdb-4110454889-79pvk 1/1 Running 0 12d 10.42.111.171 radod4
kube-system tiller-deploy-737598192-f9gcl 0/1 CrashLoopBackOff 76 12d 10.42.213.24 radod4
Then i went to the host where the process was executing and tried the following command:
[rancher#radod1 ~]$
[rancher#radod1 ~]$ docker ps -a | grep dash
282334b0ed38 gcr.io/google_containers/kubernetes-dashboard-amd64#sha256:b537ce8988510607e95b8d40ac9824523b1f9029e6f9f90e9fccc663c355cf5d "/dashboard --insecur" About a minute ago Exited (1) 55 seconds ago k8s_kubernetes-dashboard_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_72
99836d7824fd gcr.io/google_containers/pause-amd64:3.0 "/pause" 5 hours ago Up 5 hours k8s_POD_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_1
[rancher#radod1 ~]$
[rancher#radod1 ~]$
[rancher#radod1 ~]$ docker logs 282334b0ed38
Using HTTP port: 8443
Creating API server client for https://10.43.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md
After i got the above error, i again searched online and tried few things. Finally, this link helped. After i executed the following commands on all agent nodes, Kubernetes dashboard finally started working!
docker volume rm etcd
rm -rf /var/etcd/backups/*