kube-discovery fails to start when using kubeadm - kubernetes

I'm trying to install a cluster using kubeadm, using this guide.
I'm installing it on bare metal Ubuntu 16.04 server.
Docker is already preinstalled:
root#host# docker -v
Docker version 1.12.3, build 6b644ec
After executing the following:
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl kubernetes-cni
I run 'kubeadm init', and it hangs on the kube-discovery addon:
root#host# kubeadm init
Running pre-flight checks
<master/tokens> generated token: "<token>"
<master/pki> generated Certificate Authority key and certificate:
Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true
Not before: 2016-11-22 15:27:25 +0000 UTC Not After: 2026-11-20 15:27:25 +0000 UTC
Public: /etc/kubernetes/pki/ca-pub.pem
Private: /etc/kubernetes/pki/ca-key.pem
Cert: /etc/kubernetes/pki/ca.pem
<master/pki> generated API Server key and certificate:
Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false
Not before: 2016-11-22 15:27:25 +0000 UTC Not After: 2017-11-22 15:27:25 +0000 UTC
Alternate Names: [<ipaddress> kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local]
Public: /etc/kubernetes/pki/apiserver-pub.pem
Private: /etc/kubernetes/pki/apiserver-key.pem
Cert: /etc/kubernetes/pki/apiserver.pem
<master/pki> generated Service Account Signing keys:
Public: /etc/kubernetes/pki/sa-pub.pem
Private: /etc/kubernetes/pki/sa-key.pem
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 44.584082 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 1.003104 seconds
<master/apiclient> attempting a test deployment
<master/apiclient> test deployment succeeded
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
I can see that this pod is restarting:
root#host# kubectl get pods --all-namespaces=true
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-dsjtb 1/1 Running 0 29m
kube-system etcd-host.test.com 1/1 Running 0 29m
kube-system kube-apiserver-host.test.com 1/1 Running 0 30m
kube-system kube-controller-manager-host.test.com 1/1 Running 0 29m
kube-system kube-discovery-1150918428-ulap3 0/1 CrashLoopBackOff 10 29m
kube-system kube-scheduler-host.test.com 1/1 Running 0 29m
root#host# kubectl logs kube-discovery-1150918428-ulap3 --namespace=kube-system
2016/11/22 13:31:32 root CA certificate does not exist: /tmp/secret/ca.pem
Do I need to provide it a certificate?

What specific version of kubernetes are you trying to install? You can check it with:
apt-get policy kubelet

Related

"server doesn't have a resource type "pods"" while installing NVIDIA Clara Deploy

I am trying to install the latest version of NVIDIA Clara Deploy Bootstrap following the official documentations (this & this). At one step of the installation, these is a shellscript named "bootstrap.sh" - which is meant to install all the dependencies including Kubernetes & kubectl, along with cluster creation. But upon running sudo ./bootstrap.sh, I am getting this error: error: the server doesn't have a resource type "pods".
What I have done so far:
I am fairly new to Kubernetes. So I've tried solution from this answer, tried to run kubectl get pods which gives me No resources found.. I have also tried kubectl auth can-i get podswhich gives me yes. Inside etc/kubernetes/manifests, it was empty which is supposed to have conf files that I have looked from the answer, so I ran sudo kubeadm init.
Here is the full error message:
2020-10-17 20:57:37 [INFO]: Clara Deploy SDK System Prerequisites Installation
2020-10-17 20:57:37 [INFO]: Checking user privilege...
2020-10-17 20:57:37 [INFO]: Checking for NVIDIA GPU driver...
2020-10-17 20:57:37 [INFO]: NVIDIA CUDA driver version found: 418.87.01
2020-10-17 20:57:37 [INFO]: NVIDIA GPU driver found
2020-10-17 20:57:37 [INFO]: Check and install required packages: apt-transport-https ca-certificates curl software-properties-common network-manager unzip lsb-release
dirmngr jq ...
Ign:1 http://deb.debian.org/debian stretch InRelease
Get:2 http://security.debian.org stretch/updates InRelease [53.0 kB]
Get:3 http://deb.debian.org/debian stretch-updates InRelease [93.6 kB]
Get:4 http://deb.debian.org/debian stretch-backports InRelease [91.8 kB]
Hit:5 http://deb.debian.org/debian stretch Release
Hit:6 http://packages.cloud.google.com/apt gcsfuse-stretch InRelease
Get:7 https://download.docker.com/linux/debian stretch InRelease [44.8 kB]
Get:8 http://packages.cloud.google.com/apt cloud-sdk-stretch InRelease [6,389 B]
Get:9 http://security.debian.org stretch/updates/main Sources [263 kB]
Hit:10 http://packages.cloud.google.com/apt google-compute-engine-stretch-stable InRelease
Get:11 http://security.debian.org stretch/updates/main amd64 Packages [604 kB]
Get:12 http://security.debian.org stretch/updates/main Translation-en [267 kB]
Hit:13 http://packages.cloud.google.com/apt google-cloud-packages-archive-keyring-stretch InRelease
Hit:14 https://nvidia.github.io/libnvidia-container/stable/debian9/amd64 InRelease
Hit:16 https://nvidia.github.io/nvidia-container-runtime/stable/debian9/amd64 InRelease
Hit:15 https://packages.cloud.google.com/apt kubernetes-xenial InRelease
Hit:18 https://nvidia.github.io/nvidia-docker/debian9/amd64 InRelease
Fetched 1,424 kB in 1s (1,175 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
apt-transport-https is already the newest version (1.4.10).
ca-certificates is already the newest version (20200601~deb9u1).
dirmngr is already the newest version (2.1.18-8~deb9u4).
jq is already the newest version (1.5+dfsg-1.3).
lsb-release is already the newest version (9.20161125).
network-manager is already the newest version (1.6.2-3+deb9u2).
unzip is already the newest version (6.0-21+deb9u2).
curl is already the newest version (7.52.1-5+deb9u12).
software-properties-common is already the newest version (0.96.20.2-1+deb9u1).
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
2020-10-17 20:57:41 [INFO]: Starting network-manager service...
2020-10-17 20:57:41 [INFO]: Successfully installed required packages: apt-transport-https ca-certificates curl software-properties-common network-manager unzip lsb-re
lease dirmngr jq !
2020-10-17 20:57:41 [INFO]: Disabling swap ...
2020-10-17 20:57:41 [INFO]: Start installing docker and nvidia-docker2 ...
2020-10-17 20:57:41 [INFO]: 'proteeti_prova' is already added to docker group. Skipping docker group configuration ...
2020-10-17 20:57:41 [INFO]: Skipping nvidia-docker install since it is already present.
WARNING: No swap limit support
2020-10-17 20:57:42 [INFO]: Docker Compose version 1.25.4 is already installed. Skipping docker-compose installation...
2020-10-17 20:57:42 [INFO]: The following versions of k8s components are already installed.
Error from server (NotFound): the server could not find the requested resource
2020-10-17 20:57:43 [INFO]: - kubectl: Client Version: v1.15.4
2020-10-17 20:57:43 [INFO]: - kubelet: Kubernetes v1.15.4
2020-10-17 20:57:44 [INFO]: - kubeadm: v1.15.4
2020-10-17 20:57:45 [INFO]: Skipping Kubernetes installation (version: 1.15.4-00) since Kubernetes is already present.
error: the server doesn't have a resource type "pods"
1. Instance:
GCP, Ubuntu 18.04
n1-standard-16 (16 vCPUs, 60 GB memory)
1 x NVIDIA Tesla T4
2. Downloading bootstrap, unpacking:
$curl -LO https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_bootstrap/versions/0.7.1-2008.1/files/bootstrap.zip
$unzip bootstrap.zip -d bootstrap
3. Installing cuda as a prerequisite and reboot:
$wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.0-455.23.05-1_amd64.deb
$sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.0-455.23.05-1_amd64.deb
$sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pub
$sudo apt-get update
$sudo apt-get -y install cuda
$sudo reboot
4. Enable IP Forwarding after reboot:
$sudo -s
#echo 1 > /proc/sys/net/ipv4/ip_forward
5. Running bootstrap.sh(1st time).
kubelet.service shows code=exited, status=255 error:
$sudo ./bootstrap/bootstrap.sh
...
...
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Mon 2020-10-19 10:40:54 UTC; 2s ago
Docs: https://kubernetes.io/docs/home/
Process: 2356 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 2356 (code=exited, status=255)
This error means you should run kubeadm init manually. So, run kubeadm init --pod-network-cidr=10.244.0.0/16 and then check again sudo service kubelet status to be sure it is running as expected. All the kubernetes configs will be generated for you during kubeadm init --pod-network-cidr=10.244.0.0/16.
6. We add --pod-network-cidr=10.244.0.0/16 because we will use Flannel CNI. You can check the same in the bootstrap.sh, line 334 if ! sudo kubeadm init --pod-network-cidr="10.244.0.0/16"; then
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.15.12
[preflight] Pulling images required for setting up a Kubernetes cluster
...
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
...
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
...
[apiclient] All control plane components are healthy after 19.501975 seconds
...
Your Kubernetes control-plane has initialized successfully!.
...
$ sudo service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2020-10-19 13:42:22 UTC; 4min 15s ago
7. Next is regular step to be able run kubectl commands from your user instead of root
$mkdir -p $HOME/.kube
$sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$sudo chown $(id -u):$(id -g) $HOME/.kube/config
8. Show everything currently installed
$ kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-5c98db65d4-cpz4s 0/1 Pending 0 4m17s
kube-system pod/coredns-5c98db65d4-kgzg8 0/1 Pending 0 4m17s
kube-system pod/etcd-clara 1/1 Running 0 3m10s
kube-system pod/kube-apiserver-clara 1/1 Running 0 3m35s
kube-system pod/kube-controller-manager-clara 1/1 Running 0 3m17s
kube-system pod/kube-proxy-8qx4z 1/1 Running 0 4m18s
kube-system pod/kube-scheduler-clara 1/1 Running 0 3m23s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4m35s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 4m34s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 beta.kubernetes.io/os=linux 4m33s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 4m34s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-5c98db65d4 2 2 0 4m18s
Take your attention: currently coredns pods are in the Pending state. Also you can see not ready coredns deployment and replicaset
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 4m34s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-5c98db65d4 2 2 0 4m18s
They are waiting till you will apply flannel configuration yaml.
These are the lines from the same script
info "Deploy kubernetes pod network."
sudo kubectl apply -f $SCRIPT_DIR/kube-flannel.yml
sudo kubectl apply -f $SCRIPT_DIR/kube-flannel-rbac.yml
If you will not do this and rerun script at this moment - you will receive an error with the timeout
2020-10-19 14:14:03 [INFO]: coredns pods are not running yet ...
9. Deploy Flannel
$ kubectl apply -f bootstrap/kube-flannel.yml
podsecuritypolicy.extensions/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created
$ kubectl apply -f bootstrap/kube-flannel-rbac.yml
clusterrole.rbac.authorization.k8s.io/flannel configured
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
Immediately after that everything related to coredns will start to work. Pods will be created and in Running state, deployment and replicaset will be in the proper state.
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-5c98db65d4-cpz4s 1/1 Running 0 21m
kube-system pod/coredns-5c98db65d4-kgzg8 1/1 Running 0 21m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 21m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-5c98db65d4 2 2 2 21m
In addition you will see flannel related new pod and daemonsets
kube-system pod/kube-flannel-ds-amd64-64jbv 1/1 Running 0 3m59s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-flannel-ds-amd64 1 1 1 1 1 beta.kubernetes.io/arch=amd64 3m59s
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 beta.kubernetes.io/arch=arm 3m59s
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 beta.kubernetes.io/arch=arm64 3m59s
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 beta.kubernetes.io/arch=ppc64le 3m59s
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 beta.kubernetes.io/arch=s390x 3m59s
10. Finally its time to continue running script. It will TRY!!! to install helm, tillerand restart dockerd. Everything is fine except TILLER...
$sudo ./bootstrap/bootstrap.sh
[INFO]: Clara Deploy SDK System Prerequisites Installation
...
Skipping Kubernetes installation (version: 1.15.4-00) since Kubernetes is already present.
./bootstrap/bootstrap.sh: line 412: helm: command not found
...
[INFO]: Start installing helm ...
...
[INFO]: Restarting dockerd...
The connection to the server *.*.*.*:6443 was refused - did you specify the right host or port?
[INFO]: Waiting for Kubernetes to be ready...
Kubernetes master is running at https://*.*.*.*:6443
KubeDNS is running at https://*.*.*.*:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
...
[INFO]: Updating permissions...
[INFO]: tiller pod is not started yet ...
[INFO]: tiller pod is not started yet ...
[INFO]: tiller pod is not started yet ...
11. We have NO Tiller pod. Also deployment and replicaset as a result is broken...
kube-system deployment.apps/tiller-deploy 0/1 0 0 7m26s
kube-system replicaset.apps/tiller-deploy-659c6788f5 1 0 0 7m26s
I don't see any other solution here rather then manually delete tiller's related components(deployment, service) and reinstall from scratch..with small workarounds..
#delete tiller
$kubectl delete deployment tiller-deploy -n kube-system
$kubectl delete deployment tiller-deploy -n kube-system
#install helm,tiller
$curl https://raw.githubusercontent.com/helm/helm/master/scripts/get | bash
$kubectl create serviceaccount --namespace kube-system tiller
$kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
$helm init --service-account tiller
Now if you will check what has been deployed - you will clearly see that tiller-pod is in the pending state, as like tiller-deploy deployment is not ready
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/tiller-deploy-67847cd9b9-vlzm6 0/1 Pending 0 11m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/tiller-deploy 0/1 1 0 11m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/tiller-deploy-67847cd9b9 1 1 0 11m
12. Fixing tiller
Lets describe tiller pod and find tolerations
$ kubectl describe pod tiller-deploy-67847cd9b9-vlzm6 -n kube-system
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
I won't explain why(you would read about tolerations on your own), but fix is to allow master run pods...
$kubectl taint nodes --all node-role.kubernetes.io/master-
After that you will see
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/tiller-deploy-67847cd9b9-vlzm6 1/1 Running 0 13m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/tiller-deploy 1/1 1 1 13m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/tiller-deploy-67847cd9b9 1 1 1 13m
13. Next, installing all components:
$curl -LO https://api.ngc.nvidia.com/v2/resources/nvidia/clara/clara_cli/versions/0.7.1-2008.1/files/cli.zip
$sudo unzip cli.zip -d /usr/bin/ && sudo chmod 755 /usr/bin/clara*
$ clara version
Clara CLI version: 0.7.1-12788.ae65aea0
$ clara config --key KEY --orgteam nvidia/clara -y
Configuration "ngc-clara"successfully created
$ clara pull platform
Clara Platform 0.7.1-2008.1
Chart saved at: /home/YOUR_USER/.clara/charts/clara
$ clara platform start
Starting clara...
NAME: clara
$ clara pull dicom
Clara Dicom Adapter 0.7.1-2008.1
Chart saved at: /home/YOUR_USER/.clara/charts/dicom-adapter
$ clara pull render
Clara Renderer 0.7.1-2008.1
Chart saved at: /home/YOUR_USER/.clara/charts/clara-renderer
$ clara pull monitor
Clara Monitor Server 0.7.1-2008.1
Chart saved at: /home/YOUR_USER/.clara/charts/clara-monitor-server
$ clara pull console
Clara Management Console 0.7.1-2008.1
Chart saved at: /home/YOUR_USER/.clara/charts/clara-console
$ clara dicom start
Starting DICOM Adapter...
NAME: clara-dicom-adapter
$ clara render start
NAME: clara-render-server
$ clara monitor start
NAME: clara-monitor-server
$ clara console start
NAME: clara-console
14. To verify that the installation is successful, run the following command:
$ helm ls
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
clara 1 Mon Oct 19 16:16:36 2020 DEPLOYED clara-0.7.1-2008.1 1.0 default
clara-console 1 Mon Oct 19 16:28:30 2020 DEPLOYED clara-console-0.7.1-2008.1 1.0 default
clara-dicom-adapter 1 Mon Oct 19 16:22:36 2020 DEPLOYED dicom-adapter-0.7.1-2008.1 1.0 default
clara-monitor-server 1 Mon Oct 19 16:26:35 2020 DEPLOYED clara-monitor-server-0.7.1-2008.1 1.0 default
clara-render-server 1 Mon Oct 19 16:22:54 2020 DEPLOYED clara-renderer-0.7.1-2008.1 1.0 default
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
clara-clara-platformapiserver-54c5c44bbd-gqdd6 1/1 Running 0 13m
clara-console-8565b4d565-wcbg5 2/2 Running 0 2m2s
clara-console-mongodb-85f8bd5f95-ts2gp 1/1 Running 0 2m2s
clara-dicom-adapter-7948fcd445-mnsjd 1/1 Running 0 7m56s
clara-monitor-server-fluentd-elasticsearch-6zvhq 1/1 Running 0 3m57s
clara-monitor-server-grafana-5f874b974d-6l4s8 1/1 Running 0 3m57s
clara-monitor-server-monitor-server-59c8bf68f7-5dgxq 1/1 Running 0 3m57s
clara-render-server-clara-renderer-d79dd4779-wcjrv 3/3 Running 0 7m38s
clara-resultsservice-664477898f-9nk4f 1/1 Running 0 13m
clara-ui-6f89b97df8-792f6 1/1 Running 0 13m
clara-workflow-controller-69cbb55fc8-zjhdm 1/1 Running 0 13m
elasticsearch-master-0 1/1 Running 0 3m57s
elasticsearch-master-1 1/1 Running 0 3m57s
fluentd-km8nj 1/1 Running 0 13m
P.S. Sure it was much easier to fix the script for you, but I decided to show you whats going on in the background. Im sure you will do it on your own, if needed.

Kubernetes kubelet-certificate-authority on premise with kubespray causes certificate validation error for master node

I'm setting up a k8s cluster on premise using kubespray.
I'm trying to harden the kubernetes cluster using CIS Benchmark documentation.
For the --kubelet-certificate-authority argument I set up the TLS connection between the apiserver and kubelets. Then, I edited the API server pod specification file /etc/kubernetes/manifests/kube-apiserver.yaml on the master node and set the --kubelet-certificate-authority parameter for the certificate authority like this : --kubelet-certificate-authority=/etc/kubernetes/ssl/apiserver.crt
But with that I'm not longer able top deploy pods (using helm), having the known error :
[centos#infra-vm ~]$ helm list
Error: forwarding ports: error upgrading connection: error dialing backend: x509: cannot validate certificate for 192.168.33.143 because it doesn't contain any IP SANs
Where 192.168.33.143 is the master node IP address.
I've checked the above certificate autority and is has IP SANs :
So I really can't figure out where the issue comes from .
[centos#infra-vm ~]$ kubectl get pod --namespace kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7555c9885c-tjz78 1/1 Running 0 3d21h
calico-node-2p4p4 1/1 Running 0 3d21h
calico-node-4rhzj 1/1 Running 0 3d21h
coredns-56bc6b976d-wrxsl 1/1 Running 0 3d21h
coredns-56bc6b976d-zlvxb 1/1 Running 0 3d21h
dns-autoscaler-5fc5fdbf6-sl6gg 1/1 Running 0 3d21h
kube-apiserver-cpu-node0 1/1 Running 0 3d21h
kube-controller-manager-cpu-node0 1/1 Running 0 3d21h
nvidia-device-plugin-daemonset-1.12-zj82x 1/1 Running 0 3d20h
tiller-deploy-677fbf76bb-hcgtw 1/1 Running 0 3d21h
[centos#infra-vm ~]$ kubectl logs tiller-deploy-677fbf76bb-hcgtw --namespace kube-system
Error from server: Get https://192.168.33.143:10250/containerLogs/kube-system/tiller-deploy-677fbf76bb-hcgtw/tiller: x509: cannot validate certificate for 192.168.33.143 because it doesn't contain any IP SANs
[centos#infra-vm ~]$
Could one try to help figure out what is going on?
First of all /etc/kubernetes/ssl/apiserver.crt is not a valid CA certificate.
CA would have:
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment, Certificate Sign
Notice Certificate Sign extension that allows for signing certificates.
You are seeing this error: cannot validate certificate for 192.168.33.143 because it doesn't contain any IP SANs because kubelet is using self signed certificates to serve https traffic on port 10250 and you are using invalid certificate to validate it.
So what should you do to make it work??
Use /etc/kubernetes/ssl/ca.crt to sign new certificate for kubelet with valid IP SANs.
Set --kubelet-certificate-authority=/etc/kubernetes/ssl/ca.crt (valid CA).
In /var/lib/kubelet/config.yaml (kubelet config file) set tlsCertFile and tlsPrivateKeyFile to point to newly created kubelet crt and key files.

New kubernetes install has remnants of old cluster

I did a complete tear down of a v1.13.1 cluster and am now running v1.15.0 with calico cni v3.8.0. All pods are running:
[gms#thalia0 ~]$ kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-59f54d6bbc-2mjxt 1/1 Running 0 7m23s
calico-node-57lwg 1/1 Running 0 7m23s
coredns-5c98db65d4-qjzpq 1/1 Running 0 8m46s
coredns-5c98db65d4-xx2sh 1/1 Running 0 8m46s
etcd-thalia0.ahc.umn.edu 1/1 Running 0 8m5s
kube-apiserver-thalia0.ahc.umn.edu 1/1 Running 0 7m46s
kube-controller-manager-thalia0.ahc.umn.edu 1/1 Running 0 8m2s
kube-proxy-lg4cn 1/1 Running 0 8m46s
kube-scheduler-thalia0.ahc.umn.edu 1/1 Running 0 7m40s
But, when I look at the endpoint, I get the following:
[gms#thalia0 ~]$ kubectl get ep --namespace=kube-system
NAME ENDPOINTS AGE
kube-controller-manager <none> 9m46s
kube-dns 192.168.16.194:53,192.168.16.195:53,192.168.16.194:53 + 3 more... 9m30s
kube-scheduler <none> 9m46s
If I look at the log for the apiserver, I get a ton of TLS handshake errors, along the lines of:
I0718 19:35:17.148852 1 log.go:172] http: TLS handshake error from 10.x.x.160:45042: remote error: tls: bad certificate
I0718 19:35:17.158375 1 log.go:172] http: TLS handshake error from 10.x.x.159:53506: remote error: tls: bad certificate
These IP addresses were from nodes in a previous cluster. I had deleted them and done a kubeadm reset on all nodes, including master, so I have no idea why these are showing up. I would assume this is why the endpoints for the controller-manager and the scheduler are showing up as <none>.
In order to completely wipe your cluster you should do next:
1) Reset cluster
$sudo kubeadm reset (or use appropriate to your cluster command)
2) Wipe your local directory with configs
$rm -rf .kube/
3) Remove /etc/kubernetes/
$sudo rm -rf /etc/kubernetes/
4)And one of the main point is to get rid of your previous etc state configuration.
$sudo rm -rf /var/lib/etcd/

kubernetes worker node in "NotReady" status

I am trying to setup my first cluster using Kubernetes 1.13.1. The master got initialized okay, but both of my worker nodes are NotReady. kubectl describe node shows that Kubelet stopped posting node status on both worker nodes. On one of the worker nodes I get log output like
> kubelet[3680]: E0107 20:37:21.196128 3680 kubelet.go:2266] node
> "xyz" not found.
Here is the full details:
I am using Centos 7 & Kubernetes 1.13.1.
Initializing was done as follows:
[root#master ~]# kubeadm init --apiserver-advertise-address=10.142.0.4 --pod-network-cidr=10.142.0.0/24
Successfully initialized the cluster:
You can now join any number of machines by running the following on each node
as root:
`kubeadm join 10.142.0.4:6443 --token y0epoc.zan7yp35sow5rorw --discovery-token-ca-cert-hash sha256:f02d43311c2696e1a73e157bda583247b9faac4ffb368f737ee9345412c9dea4`
deployed the flannel CNI:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The join command worked fine.
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node01" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
Result of kubectl get nodes:
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 9h v1.13.1
node01 NotReady <none> 9h v1.13.1
node02 NotReady <none> 9h v1.13.1
on both nodes:
[root#node01 ~]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2019-01-08 04:49:20 UTC; 32s ago
Docs: https://kubernetes.io/docs/
Main PID: 4224 (kubelet)
Memory: 31.3M
CGroup: /system.slice/kubelet.service
└─4224 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi
`Jan 08 04:54:10 node01 kubelet[4224]: E0108 04:54:10.957115 4224 kubelet.go:2266] node "node01" not found`
I appreciate your advise on how to troubleshoot this.
The previous answer sounds correct. You can verify that by running
kubectl describe node node01 on the master, or wherever kubectl is correctly configured.
It seems like the reason of this error is due to incorrect subnet. In Flannel documentation it is written that you should use /16 not /24 for pod network.
NOTE: If kubeadm is used, then pass --pod-network-cidr=10.244.0.0/16
to kubeadm init to ensure that the podCIDR is set.
I tried to run kubeadm with /24 and although I had nodes in Ready state the flannel pods did not run properly which resulted in some issues.
You can check if your flannel pods are running properly by:
kubectl get pods -n kube-system if the status is other than running then it is incorrect behavior. In this case you can check details by running kubectl describe pod PODNAME -n kube-system. Try changing the subnet and update us if that fixed the problem.
I ran into almost the same problem, and in the end I found that the reason was that the firewall was not turned off. You can try the following commands:
sudo ufw disable
or
systemctl disable firewalld
or
setenforce 0

minikube start error exit status 1

Here is my error when i "minikube start " in Aliyun.
What I did:
minikube delete
kubectl config use-context minikube
minikube start --vm-driver=none
Aliyun(The 3rd Party Application Server) could not install VirtualBox or KVM,
so I tried to start it with --vm-driver=none.
[root#iZj6c68brirvucbzz5yyunZ home]# minikube delete
Deleting local Kubernetes cluster...
Machine deleted.
[root#iZj6c68brirvucbzz5yyunZ home]# kubectl config use-context minikube
Switched to context "minikube".
[root#iZj6c68brirvucbzz5yyunZ home]# minikube start --vm-driver=none
Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
E0618 16:06:56.885163 500 start.go:294] Error starting cluster: kubeadm init error sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI running command: : running command: sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI
output: [init] Using Kubernetes version: v1.10.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING Hostname]: hostname "minikube" could not be reached
[WARNING Hostname]: hostname "minikube" lookup minikube on 100.100.2.138:53: no such host
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
Flag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.
[certificates] Using the existing ca certificate and key.
[certificates] Using the existing apiserver certificate and key.
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated etcd/ca certificate and key.
[certificates] Generated etcd/server certificate and key.
[certificates] etcd/server serving cert is signed for DNS names [localhost] and IPs [127.0.0.1]
[certificates] Generated etcd/peer certificate and key.
[certificates] etcd/peer serving cert is signed for DNS names [minikube] and IPs [172.31.4.34]
[certificates] Generated etcd/healthcheck-client certificate and key.
[certificates] Generated apiserver-etcd-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/var/lib/localkube/certs/"
a kubeconfig file "/etc/kubernetes/admin.conf" exists already but has got the wrong CA cert
: running command: sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI
.: exit status 1
Versions of components:
[root#iZj6c68brirvucbzz5yyunZ home]# minikube version
minikube version: v0.28.0
[root#iZj6c68brirvucbzz5yyunZ home]# uname -a
Linux iZj6c68brirvucbzz5yyunZ 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root#iZj6c68brirvucbzz5yyunZ home]# kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:13:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Why Minikube exit with the status 1?
Thank in advance.
First of all, try to cleanup all traces after the previous unsuccessful minikube start. It should help with mismatch certificate issue.
rm -rf ~/.minikube ~/.kube /etc/kubernetes
Then try to start minikube again.
minikube start --vm-driver=none
If you still running into errors, try to follow my "happy path":
(This was tested on fresh GCP instance with Ubuntu 16 OS on board)
# become root
sudo su
# turn off swap
swapoff -a
# edit /etc/fstab and comment swap partition.
# add repository key
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# add repository
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
# update repository cache
apt-get update
# install some software
apt-get -y install ebtables ethtool docker.io apt-transport-https kubelet kubeadm kubectl
# tune sysctl
cat <<EOF >>/etc/ufw/sysctl.conf
net/bridge/bridge-nf-call-ip6tables = 1
net/bridge/bridge-nf-call-iptables = 1
net/bridge/bridge-nf-call-arptables = 1
EOF
sudo sysctl --system
# download minikube
wget https://github.com/kubernetes/minikube/releases/download/v0.28.0/minikube-linux-amd64
# install minikube
chmod +x minikube-linux-amd64
mv minikube-linux-amd64 /usr/bin/minikube
# start minikube
minikube start --vm-driver=none
---This is what you should see----------
Starting local Kubernetes v1.10.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Downloading kubeadm v1.10.0
Downloading kubelet v1.10.0
Finished Downloading kubeadm v1.10.0
Finished Downloading kubelet v1.10.0
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
===================
WARNING: IT IS RECOMMENDED NOT TO RUN THE NONE DRIVER ON PERSONAL WORKSTATIONS
The 'none' driver will run an insecure kubernetes apiserver as root that may leave the host vulnerable to CSRF attacks
When using the none driver, the kubectl config and credentials generated will be root owned and will appear in the root home directory.
You will need to move the files to the appropriate location and then set the correct permissions. An example of this is below:
sudo mv /root/.kube $HOME/.kube # this will write over any previous configuration
sudo chown -R $USER $HOME/.kube
sudo chgrp -R $USER $HOME/.kube
sudo mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube
This can also be done automatically by setting the env var CHANGE_MINIKUBE_NONE_USER=true
Loading cached images from config file.
-------------------
#check the results
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
minikube Ready master 18s v1.10.0
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-minikube 1/1 Running 0 9m
kube-system kube-addon-manager-minikube 1/1 Running 0 9m
kube-system kube-apiserver-minikube 1/1 Running 0 9m
kube-system kube-controller-manager-minikube 1/1 Running 0 10m
kube-system kube-dns-86f4d74b45-p99gv 3/3 Running 0 10m
kube-system kube-proxy-hlfc8 1/1 Running 0 10m
kube-system kube-scheduler-minikube 1/1 Running 0 9m
kube-system kubernetes-dashboard-5498ccf677-scdf9 1/1 Running 0 10m
kube-system storage-provisioner 1/1 Running 0 10m