kubernetes HA setup with kubeadm - fail to start scheduler and controller - kubernetes

I attempt to build an HA cluster using kubeadm, here is my configuration:
kind: MasterConfiguration
kubernetesVersion: v1.11.4
apiServerCertSANs:
- "aaa.xxx.yyy.zzz"
api:
controlPlaneEndpoint: "my.domain.de:6443"
apiServerExtraArgs:
apiserver-count: 3
etcd:
local:
image: quay.io/coreos/etcd:v3.3.10
extraArgs:
listen-client-urls: "https://127.0.0.1:2379,https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2379"
advertise-client-urls: "https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2379"
listen-peer-urls: "https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2380"
initial-advertise-peer-urls: "https://$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4):2380"
initial-cluster-state: "new"
initial-cluster-token: "kubernetes-cluster"
initial-cluster: ${CLUSTER}
name: $(hostname -s)
localEtcd:
serverCertSANs:
- "$(hostname -s)"
- "$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)"
peerCertSANs:
- "$(hostname -s)"
- "$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)"
networking:
podSubnet: "${POD_SUBNET}/${POD_SUBNETMASK}"
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: foobar.fedcba9876543210
ttl: 24h0m0s
usages:
- signing
- authentication
I run this on all three nodes, and I get the nodes starting. After joining calico, it seems that everything is fine, I even added one worker successfully:
ubuntu#master-2-test2:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-1-test2 Ready master 1h v1.11.4
master-2-test2 Ready master 1h v1.11.4
master-3-test2 Ready master 1h v1.11.4
node-1-test2 Ready <none> 1h v1.11.4
Looking at the control plane, everything looks fine.
curl https://192.168.0.125:6443/api/v1/nodes works from both the masters and the worker node. All pods are running:
ubuntu#master-2-test2:~$ sudo kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-9lnk8 2/2 Running 0 1h
calico-node-f7dkk 2/2 Running 1 1h
calico-node-k7hw5 2/2 Running 17 1h
calico-node-rtrvb 2/2 Running 3 1h
coredns-78fcdf6894-6xgqc 1/1 Running 0 1h
coredns-78fcdf6894-kcm4f 1/1 Running 0 1h
etcd-master-1-test2 1/1 Running 0 1h
etcd-master-2-test2 1/1 Running 1 1h
etcd-master-3-test2 1/1 Running 0 1h
kube-apiserver-master-1-test2 1/1 Running 0 40m
kube-apiserver-master-2-test2 1/1 Running 0 58m
kube-apiserver-master-3-test2 1/1 Running 0 36m
kube-controller-manager-master-1-test2 1/1 Running 0 17m
kube-controller-manager-master-2-test2 1/1 Running 1 17m
kube-controller-manager-master-3-test2 1/1 Running 0 17m
kube-proxy-5clt4 1/1 Running 0 1h
kube-proxy-d2tpz 1/1 Running 0 1h
kube-proxy-q6kjw 1/1 Running 0 1h
kube-proxy-vn6l7 1/1 Running 0 1h
kube-scheduler-master-1-test2 1/1 Running 1 24m
kube-scheduler-master-2-test2 1/1 Running 0 24m
kube-scheduler-master-3-test2 1/1 Running 0 24m
But trying to start a pod, nothing happens:
~$ kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx 1 0 0 0 32m
I turned into looking into the scheduler and controller, and to my dismay there are a lot of errors, the controller floods with:
E1108 00:40:36.638832 1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
E1108 00:40:36.639161 1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
and sometimes with:
garbagecollector.go:649] failed to discover preferred resources: Unauthorized
E1108 00:40:36.639356 1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
E1108 00:40:36.640568 1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
E1108 00:40:36.642129 1 reflector.go:205] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: Unauthorized
And the scheduler has similar errors:
E1107 23:25:43.026465 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta1.ReplicaSet: Get https://mydomain.de:6443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0: EOF
E1107 23:25:43.026614 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Node: Get https://mydomain.de:e:6443/api/v1/nodes?limit=500&resourceVersion=0: EOF
So far, I have no clue how to correct these errors. Any help would be appreciated.
more information:
The kubeconfig for kube-proxy is:
----
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://my.domain.de:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
Events: <none>

Looks good to me, however, can you describe the workernode and see resources available for pods? Also describe the pod and see what error it shows.

There's some authentication issue (certificate) while talking to the active kube-apiserver on this endpoint: https://mydomain.de:6443/apis/extensions/v1beta1/replicasets?limit=500&resourceVersion=0.
Some pointers:
Is the load balancer for your kube-apiserver pointing to the right one? Are you using an L4 (TCP) load balancer and not a L7 (HTTP) load balancer?
Did you copy the same certs everywhere and made sure that they are the same?
USER=ubuntu # customizable
CONTROL_PLANE_IPS="10.0.0.7 10.0.0.8"
for host in ${CONTROL_PLANE_IPS}; do
scp /etc/kubernetes/pki/ca.crt "${USER}"#$host:
scp /etc/kubernetes/pki/ca.key "${USER}"#$host:
scp /etc/kubernetes/pki/sa.key "${USER}"#$host:
scp /etc/kubernetes/pki/sa.pub "${USER}"#$host:
scp /etc/kubernetes/pki/front-proxy-ca.crt "${USER}"#$host:
scp /etc/kubernetes/pki/front-proxy-ca.key "${USER}"#$host:
scp /etc/kubernetes/pki/etcd/ca.crt "${USER}"#$host:etcd-ca.crt
scp /etc/kubernetes/pki/etcd/ca.key "${USER}"#$host:etcd-ca.key
scp /etc/kubernetes/admin.conf "${USER}"#$host:
done
Did you check that the kube-apiserver and the kube-controller-manager configurations are equivalent under /etc/kubernetes/manifests?

Related

New kubernetes install has remnants of old cluster

I did a complete tear down of a v1.13.1 cluster and am now running v1.15.0 with calico cni v3.8.0. All pods are running:
[gms#thalia0 ~]$ kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-59f54d6bbc-2mjxt 1/1 Running 0 7m23s
calico-node-57lwg 1/1 Running 0 7m23s
coredns-5c98db65d4-qjzpq 1/1 Running 0 8m46s
coredns-5c98db65d4-xx2sh 1/1 Running 0 8m46s
etcd-thalia0.ahc.umn.edu 1/1 Running 0 8m5s
kube-apiserver-thalia0.ahc.umn.edu 1/1 Running 0 7m46s
kube-controller-manager-thalia0.ahc.umn.edu 1/1 Running 0 8m2s
kube-proxy-lg4cn 1/1 Running 0 8m46s
kube-scheduler-thalia0.ahc.umn.edu 1/1 Running 0 7m40s
But, when I look at the endpoint, I get the following:
[gms#thalia0 ~]$ kubectl get ep --namespace=kube-system
NAME ENDPOINTS AGE
kube-controller-manager <none> 9m46s
kube-dns 192.168.16.194:53,192.168.16.195:53,192.168.16.194:53 + 3 more... 9m30s
kube-scheduler <none> 9m46s
If I look at the log for the apiserver, I get a ton of TLS handshake errors, along the lines of:
I0718 19:35:17.148852 1 log.go:172] http: TLS handshake error from 10.x.x.160:45042: remote error: tls: bad certificate
I0718 19:35:17.158375 1 log.go:172] http: TLS handshake error from 10.x.x.159:53506: remote error: tls: bad certificate
These IP addresses were from nodes in a previous cluster. I had deleted them and done a kubeadm reset on all nodes, including master, so I have no idea why these are showing up. I would assume this is why the endpoints for the controller-manager and the scheduler are showing up as <none>.
In order to completely wipe your cluster you should do next:
1) Reset cluster
$sudo kubeadm reset (or use appropriate to your cluster command)
2) Wipe your local directory with configs
$rm -rf .kube/
3) Remove /etc/kubernetes/
$sudo rm -rf /etc/kubernetes/
4)And one of the main point is to get rid of your previous etc state configuration.
$sudo rm -rf /var/lib/etcd/

Kubernetes dashboard: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout

I have a Kubernetes cluster in vagrant (1.14.0) and installed calico.
I have installed the kubernetes dashboard. When I use kubectl proxy to visit the dashboard:
Error: 'dial tcp 192.168.1.4:8443: connect: connection refused'
Trying to reach: 'https://192.168.1.4:8443/'
Here are my pods (dashboard is restarting frequently):
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-cj928 1/1 Running 0 11m
calico-node-4fnb6 1/1 Running 0 18m
calico-node-qjv7t 1/1 Running 0 20m
calico-policy-controller-b9b6749c6-29c44 1/1 Running 1 11m
coredns-fb8b8dccf-jjbhk 1/1 Running 0 20m
coredns-fb8b8dccf-jrc2l 1/1 Running 0 20m
etcd-k8s-master 1/1 Running 0 19m
kube-apiserver-k8s-master 1/1 Running 0 19m
kube-controller-manager-k8s-master 1/1 Running 0 19m
kube-proxy-8mrrr 1/1 Running 0 18m
kube-proxy-cdsr9 1/1 Running 0 20m
kube-scheduler-k8s-master 1/1 Running 0 19m
kubernetes-dashboard-5f7b999d65-nnztw 1/1 Running 3 2m11s
logs of the dasbhoard pod:
2019/03/30 14:36:21 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
I can telnet from both master and nodes to 10.96.0.1:443.
What is configured wrongly? The rest of the cluster seems to work fine, although I see this logs in kubelet:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml"
kubelet seems to run fine on the master.
The cluster was created with this command:
kubeadm init --apiserver-advertise-address="192.168.50.10" --apiserver-cert-extra-sans="192.168.50.10" --node-name k8s-master --pod-network-cidr=192.168.0.0/16
you should define your hostname in /etc/hosts
#hostname
YOUR_HOSTNAME
#nano /etc/hosts
YOUR_IP HOSTNAME
if you set your hostname in your master but it did not work try
# systemctl stop kubelet
# systemctl stop docker
# iptables --flush
# iptables -tnat --flush
# systemctl start kubelet
# systemctl start docker
and you should install dashboard before join worker node
and disable your firewall
and you can check your free ram.
Exclude -- node-name parameter from kubeadm init command
try this command
kubeadm init --apiserver-advertise-address=$(hostname -i) --apiserver-cert-extra-sans="192.168.50.10" --pod-network-cidr=192.168.0.0/16
For me the issue was I needed to create a NetworkPolicy that allowed Egress traffic to the kubernetes API

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

no endpoints available for service \"kubernetes-dashboard\"

I'm trying to follow GitHub - kubernetes/dashboard: General-purpose web UI for Kubernetes clusters.
deploy/access:
# export KUBECONFIG=/etc/kubernetes/admin.conf
# kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret/kubernetes-dashboard-certs created
serviceaccount/kubernetes-dashboard created
role.rbac.authorization.k8s.io/kubernetes-dashboard-minimal created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard-minimal created
deployment.apps/kubernetes-dashboard created
service/kubernetes-dashboard created
# kubectl proxy
Starting to serve on 127.0.0.1:8001
curl:
# curl http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "no endpoints available for service \"kubernetes-dashboard\"",
"reason": "ServiceUnavailable",
"code": 503
}#
Please advise.
per #VKR
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-576cbf47c7-56vg7 0/1 ContainerCreating 0 57m
kube-system coredns-576cbf47c7-sn2fk 0/1 ContainerCreating 0 57m
kube-system etcd-wcmisdlin02.uftwf.local 1/1 Running 0 56m
kube-system kube-apiserver-wcmisdlin02.uftwf.local 1/1 Running 0 56m
kube-system kube-controller-manager-wcmisdlin02.uftwf.local 1/1 Running 0 56m
kube-system kube-proxy-2hhf7 1/1 Running 0 6m57s
kube-system kube-proxy-lzfcx 1/1 Running 0 7m35s
kube-system kube-proxy-rndhm 1/1 Running 0 57m
kube-system kube-scheduler-wcmisdlin02.uftwf.local 1/1 Running 0 56m
kube-system kubernetes-dashboard-77fd78f978-g2hts 0/1 Pending 0 2m38s
$
logs:
$ kubectl logs kubernetes-dashboard-77fd78f978-g2hts -n kube-system
$
describe:
$ kubectl describe pod kubernetes-dashboard-77fd78f978-g2hts -n kube-system
Name: kubernetes-dashboard-77fd78f978-g2hts
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=77fd78f978
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/kubernetes-dashboard-77fd78f978
Containers:
kubernetes-dashboard:
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-gp4l7 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
kubernetes-dashboard-token-gp4l7:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-gp4l7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m39s (x21689 over 20h) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
$
It would appear that you are attempting to deploy Kubernetes leveraging kubeadm but have skipped the step of Installing a pod network add-on (CNI). Notice the warning:
The network must be deployed before any applications. Also, CoreDNS will not start up before a network is installed. kubeadm only supports Container Network Interface (CNI) based networks (and does not support kubenet).
Once you do this, the CoreDNS pods should come up healthy. This can be verified with:
kubectl -n kube-system -l=k8s-app=kube-dns get pods
Then the kubernetes-dashboard pod should come up healthy as well.
you could refer to https://github.com/kubernetes/dashboard#getting-started
Also, I see "https" in your link
Please try this link instead
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
I had the same problem. In the end it turned out as a Calico Network configuration problem. But step by step...
First I checked if the Dashboard Pod was running:
kubectl get pods --all-namespaces
The result for me was:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-bcc6f659f-j57l9 1/1 Running 2 19h
kube-system calico-node-hdxp6 0/1 CrashLoopBackOff 13 15h
kube-system calico-node-z6l56 0/1 Running 68 19h
kube-system coredns-74ff55c5b-8l6m6 1/1 Running 2 19h
kube-system coredns-74ff55c5b-v7pkc 1/1 Running 2 19h
kube-system etcd-got-virtualbox 1/1 Running 3 19h
kube-system kube-apiserver-got-virtualbox 1/1 Running 3 19h
kube-system kube-controller-manager-got-virtualbox 1/1 Running 3 19h
kube-system kube-proxy-q99s5 1/1 Running 2 19h
kube-system kube-proxy-vrpcd 1/1 Running 1 15h
kube-system kube-scheduler-got-virtualbox 1/1 Running 2 19h
kubernetes-dashboard dashboard-metrics-scraper-7b59f7d4df-qc9ms 1/1 Running 0 28m
kubernetes-dashboard kubernetes-dashboard-74d688b6bc-zrdk4 0/1 CrashLoopBackOff 9 28m
The last line indicates, that the dashboard pod could not have been started (status=CrashLoopBackOff).
And the 2nd line shows that the calico node has problems. Most likely the root cause is Calico.
Next step is to have a look at the pod log (change namespace / name as listed in YOUR pods list):
kubectl logs kubernetes-dashboard-74d688b6bc-zrdk4 -n kubernetes-dashboard
The result for me was:
2021/03/05 13:01:12 Starting overwatch
2021/03/05 13:01:12 Using namespace: kubernetes-dashboard
2021/03/05 13:01:12 Using in-cluster config to connect to apiserver
2021/03/05 13:01:12 Using secret token for csrf signing
2021/03/05 13:01:12 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get https://10.96.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf: dial tcp 10.96.0.1:443: i/o timeout
Hm - not really helpful. After searching for "dial tcp 10.96.0.1:443: i/o timeout" I found this information, where it says ...
If you follow the kubeadm instructions to the letter ... Which means install docker, kubernetes (kubeadm, kubectl, & kubelet), and calico with the Kubeadm hosted instructions ... and your computer nodes have a physical ip address in the range of 192.168.X.X then you will end up with the above mentioned non-working dashboard. This is because the node ip addresses clash with the internal calico ip addresses.
https://github.com/kubernetes/dashboard/issues/1578#issuecomment-329904648
Yes, in deed I do have a physical IP in the range of 192.168.x.x - like many others might have as well. I wish Calico would check this during setup.
So let's move the pod network to a different IP range:
You should use a classless reserved IP range for Private Networks like
10.0.0.0/8 (16.777.216 addresses)
172.16.0.0/12 (1.048.576 addresses)
192.168.0.0/16 (65.536 addresses). Otherwise Calico will terminate with an error saying "Invalid CIDR specified in CALICO_IPV4POOL_CIDR" ...
sudo kubeadm reset
sudo rm /etc/cni/net.d/10-calico.conflist
sudo rm /etc/cni/net.d/calico-kubeconfig
export CALICO_IPV4POOL_CIDR=172.16.0.0
export MASTER_IP=192.168.100.122
sudo kubeadm init --pod-network-cidr=$CALICO_IPV4POOL_CIDR/12 --apiserver-advertise-address=$MASTER_IP --apiserver-cert-extra-sans=$MASTER_IP
mkdir -p $HOME/.kube
sudo rm -f $HOME/.kube/config
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
sudo chown $(id -u):$(id -g) /etc/kubernetes/kubelet.conf
wget https://docs.projectcalico.org/v3.8/manifests/calico.yaml -O calico.yaml
sudo sed -i "s/192.168.0.0\/16/$CALICO_IPV4POOL_CIDR\/12/g" calico.yaml
sudo sed -i "s/192.168.0.0/$CALICO_IPV4POOL_CIDR/g" calico.yaml
kubectl apply -f calico.yaml
Now we test if all calico pods are running:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-bcc6f659f-ns7kz 1/1 Running 0 15m
kube-system calico-node-htvdv 1/1 Running 6 15m
kube-system coredns-74ff55c5b-lqwpd 1/1 Running 0 17m
kube-system coredns-74ff55c5b-qzc87 1/1 Running 0 17m
kube-system etcd-got-virtualbox 1/1 Running 0 17m
kube-system kube-apiserver-got-virtualbox 1/1 Running 0 17m
kube-system kube-controller-manager-got-virtualbox 1/1 Running 0 18m
kube-system kube-proxy-6xr5j 1/1 Running 0 17m
kube-system kube-scheduler-got-virtualbox 1/1 Running 0 17m
Looks good. If not check CALICO_IPV4POOL_CIDR by editing the node config: KUBE_EDITOR="nano" kubectl edit -n kube-system ds calico-node
Let's apply the kubernetes-dashboard and start the proxy:
export KUBECONFIG=$HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
kubectl proxy
Now I can load http://127.0.0.1:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
if you are using helm,
check if kubectl proxy is running
then goto
http://localhost:8001/api/v1/namespaces/default/services/https:kubernetes-dashboard:https/proxy
two tips in above link:
use helm to install, the namespaces will be /default (not /kubernetes-dashboard
need add https after /https:kubernetes-dashboard:
better way is
helm delete kubernetes-dashboard
kubectl create namespace kubernetes-dashboard
helm install -n kubernetes-dashboard kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard
then goto
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:https/proxy
then you can easily follow creating-sample-user to get token to login
i was facing the same issue, so i followed the official docs and then went to https://github.com/kubernetes/dashboard url, there is another way using helm on this link https://artifacthub.io/packages/helm/k8s-dashboard/kubernetes-dashboard
after installing helm and run this 2 commands
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
helm install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard
it worked but on default namespace on this link
http://localhost:8001/api/v1/namespaces/default/services/https:kubernetes-dashboard:https/proxy/#/workloads?namespace=default

Rancher Kubernetes Dashboard - Service Unavailable

I am new to Rancher and containers in general. While setting up Kubernetes cluster using Rancher, i’m facing problem while accessing Kubernetes dashboard.
rancher/server: 1.6.6
Single node Rancher server + External MySQL + 3 agent nodes
Infrastructure Stack versions:
healthcheck: v0.3.1
ipsec: net:v0.11.5
network-services: metadata:v0.9.2 / network-manager:v0.7.7
scheduler: k8s:v1.7.2-rancher5
kubernetes (if applicable): kubernetes-agent:v0.6.3
# docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.03.1-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.34-rancher
Operating System: RancherOS v1.0.3
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.798 GiB
Name: ch7radod1
ID: IUNS:4WT2:Y3TV:2RI4:FZQO:4HYD:YSNN:6DPT:HMQ6:S2SI:OPGH:TX4Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://proxy.ch.abc.net:8080
Https Proxy: http://proxy.ch.abc.net:8080
No Proxy: localhost,.xyz.net,abc.net
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Accessing UI URL http://10.216.30.10/r/projects/1a6633/kubernetes-dashboard:9090/# shows “Service unavailable”
If i use the CLI section from the UI, i get the following:
> kubectl get nodes
NAME STATUS AGE VERSION
ch7radod3 Ready 1d v1.7.2
ch7radod4 Ready 5d v1.7.2
ch7radod1 Ready 1d v1.7.2
> kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system heapster-4285517626-4njc2 0/1 ContainerCreating 0 5d
kube-system kube-dns-3942128195-ft56n 0/3 ContainerCreating 0 19d
kube-system kube-dns-646531078-z5lzs 0/3 ContainerCreating 0 5d
kube-system kubernetes-dashboard-716739405-lpj38 0/1 ContainerCreating 0 5d
kube-system monitoring-grafana-3552275057-qn0zf 0/1 ContainerCreating 0 5d
kube-system monitoring-influxdb-4110454889-79pvk 0/1 ContainerCreating 0 5d
kube-system tiller-deploy-737598192-f9gcl 0/1 ContainerCreating 0 5d
The setup uses private registry (Artifactory). I checked Artifactory and i could see several images present related to Docker. I was going through private registry section and i also saw this file. In case this file is required, where exactly do i keep it so that Rancher can fetch it and configure the Kubernetes dashboard?
UPDATE:
$ sudo ros engine switch docker-1.12.6
> ERRO[0031] Failed to load https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Get https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Proxy Authentication Required
> FATA[0031] docker-1.12.6 is not a valid engine
I thought may be it’s due to NGINX so i stopped the NGINX container but i am still getting the above error. Earlier i have tried the same command on this Rancher server and it used to work fine. It’s working fine on agent nodes although they are already having 1.12.6 configured.
UPDATE 2:
> kubectl -n kube-system get po
NAME READY STATUS RESTARTS AGE
heapster-4285517626-4njc2 1/1 Running 0 12d
kube-dns-2588877561-26993 0/3 ImagePullBackOff 0 5h
kube-dns-646531078-z5lzs 0/3 ContainerCreating 0 12d
kubernetes-dashboard-716739405-zq3s9 0/1 CrashLoopBackOff 67 5h
monitoring-grafana-3552275057-qn0zf 1/1 Running 0 12d
monitoring-influxdb-4110454889-79pvk 1/1 Running 0 12d
tiller-deploy-737598192-f9gcl 0/1 CrashLoopBackOff 72 12d
None of your pods running, you need to resolve that issue first. try to restart the whole cluster and see all above pods in running status.
Based on #ivan.sim's suggestion, i posted 'UPDATE 2'. This started me finally to look in the right direction. I then started looking for CrashLoopBackOff error online and came across this link and tried the following command (using CLI option from Rancher console), which was actually quite similar to what #ivan.sim suggested above but this helped me with the node where the dashboard process was running:
> kubectl get pods -a -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system heapster-4285517626-4njc2 1/1 Running 0 12d 10.42.224.157 radod4
kube-system kube-dns-2588877561-26993 0/3 ImagePullBackOff 0 5h <none> radod1
kube-system kube-dns-646531078-z5lzs 0/3 ContainerCreating 0 12d <none> radod4
kube-system kubernetes-dashboard-716739405-zq3s9 0/1 Error 70 5h 10.42.218.11 radod1
kube-system monitoring-grafana-3552275057-qn0zf 1/1 Running 0 12d 10.42.202.44 radod4
kube-system monitoring-influxdb-4110454889-79pvk 1/1 Running 0 12d 10.42.111.171 radod4
kube-system tiller-deploy-737598192-f9gcl 0/1 CrashLoopBackOff 76 12d 10.42.213.24 radod4
Then i went to the host where the process was executing and tried the following command:
[rancher#radod1 ~]$
[rancher#radod1 ~]$ docker ps -a | grep dash
282334b0ed38 gcr.io/google_containers/kubernetes-dashboard-amd64#sha256:b537ce8988510607e95b8d40ac9824523b1f9029e6f9f90e9fccc663c355cf5d "/dashboard --insecur" About a minute ago Exited (1) 55 seconds ago k8s_kubernetes-dashboard_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_72
99836d7824fd gcr.io/google_containers/pause-amd64:3.0 "/pause" 5 hours ago Up 5 hours k8s_POD_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_1
[rancher#radod1 ~]$
[rancher#radod1 ~]$
[rancher#radod1 ~]$ docker logs 282334b0ed38
Using HTTP port: 8443
Creating API server client for https://10.43.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md
After i got the above error, i again searched online and tried few things. Finally, this link helped. After i executed the following commands on all agent nodes, Kubernetes dashboard finally started working!
docker volume rm etcd
rm -rf /var/etcd/backups/*