I am running Kubernetes on bare metal and use Kubernets dashboard to manage the cluster. This functions fine at first, but after 5-30 minutes when I try to access the dashboard at:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
I get the following error:
Error: 'dial tcp 10.35.0.19:8443: connect: no route to host'
Trying to reach: 'https://10.35.0.19:8443/'
All pods in kube-system are up and running if I check them with kubectl get pods -n kube-system:
NAME READY STATUS RESTARTS AGE
coredns-86c58d9df4-87pfc 1/1 Running 0 1m
coredns-86c58d9df4-tflg5 1/1 Running 0 1m
etcd-controller01 1/1 Running 5 1m
etcd-controller02 1/1 Running 6 1m
heapster-798ffb9b4-744q4 1/1 Running 0 1m
kube-apiserver-controller01 1/1 Running 1 1m
kube-apiserver-controller02 1/1 Running 3 1m
kube-controller-manager-controller01 1/1 Running 5 1m
kube-controller-manager-controller02 1/1 Running 2 1m
kube-proxy-8qqnq 1/1 Running 0 1m
kube-proxy-9vgck 1/1 Running 0 1m
kube-proxy-dht69 1/1 Running 0 1m
kube-proxy-f7bx8 1/1 Running 0 1m
kube-proxy-jnxtq 1/1 Running 0 1m
kube-proxy-l5h7m 1/1 Running 0 1m
kube-proxy-p9gt5 1/1 Running 0 1m
kube-proxy-zv4sr 1/1 Running 0 1m
kube-scheduler-controller01 1/1 Running 3 1m
kube-scheduler-controller02 1/1 Running 4 1m
kubernetes-dashboard-57df4db6b-px8xc 1/1 Running 0 1m
metrics-server-55d46868d4-s9j5v 1/1 Running 0 1m
monitoring-grafana-564f579fd4-fm6lm 1/1 Running 0 1m
monitoring-influxdb-8b7d57f5c-llgz9 1/1 Running 0 1m
weave-net-2b2dm 2/2 Running 1 1m
weave-net-988rf 2/2 Running 0 1m
weave-net-hcm5n 2/2 Running 0 1m
weave-net-kb2gk 2/2 Running 0 1m
weave-net-ksvbf 2/2 Running 0 1m
weave-net-q9zlw 2/2 Running 0 1m
weave-net-t9f6m 2/2 Running 0 1m
weave-net-vdspp 2/2 Running 0 1m
When I restart all pods in this namespace with kubectl delete pods --all -n kube-system the dashboard sometimes works again for 5-30 minutes and at other times it randomly starts working again out of itself. I have tried restarting each pod in this namespace individually to try and track down which pod is causing this issue but restarting the pods one by one does not get the dashboard up again. Only the delete all at once command works.
Does anybody have an idea why this happens and how I can fix this?
Thank you in advance!
Related
UPDATE 1:
Some more logs from api-servers:
https://gist.github.com/nvcnvn/47df8798e798637386f6e0777d869d4f
This question is more about debugging method for current GKE but welcome for solution.
We're using GKE version 1.22.3-gke.1500 with following configuration:
We recently facing issue that commands like kubectl logs and exec doesn't work, deleting a namespace taking forever.
Checking some service inside the cluster, it seem for some reason some network operation just randomly failed. For example metric-server keep crashing with these error logs:
message: "pkg/mod/k8s.io/client-go#v0.19.10/tools/cache/reflector.go:156: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.97.0.1:443/api/v1/nodes?resourceVersion=387681528": net/http: TLS handshake timeout"
HTTP request timeout also:
unable to fully scrape metrics: unable to fully scrape metrics from node gke-staging-n2d-standard-8-78c35b3a-6h16: unable to fetch metrics from node gke-staging-n2d-standard-8-78c35b3a-6h16: Get "http://10.148.15.217:10255/stats/summary?only_cpu_and_memory=true": context deadline exceeded
and I also try to restart (by kubectl delete) most of the pod in this list:
kubectl get pod
NAME READY STATUS RESTARTS AGE
event-exporter-gke-5479fd58c8-snq26 2/2 Running 0 4d7h
fluentbit-gke-gbs2g 2/2 Running 0 4d7h
fluentbit-gke-knz2p 2/2 Running 0 85m
fluentbit-gke-ljw8h 2/2 Running 0 30h
gke-metadata-server-dtnvh 1/1 Running 0 4d7h
gke-metadata-server-f2bqw 1/1 Running 0 30h
gke-metadata-server-kzcv6 1/1 Running 0 85m
gke-metrics-agent-4g56c 1/1 Running 12 (3h6m ago) 4d7h
gke-metrics-agent-hnrll 1/1 Running 13 (13h ago) 30h
gke-metrics-agent-xdbrw 1/1 Running 0 85m
konnectivity-agent-87bc84bb7-g9nd6 1/1 Running 0 2m59s
konnectivity-agent-87bc84bb7-rkhhh 1/1 Running 0 3m51s
konnectivity-agent-87bc84bb7-x7pk4 1/1 Running 0 3m50s
konnectivity-agent-autoscaler-698b6d8768-297mh 1/1 Running 0 83m
kube-dns-77d9986bd5-2m8g4 4/4 Running 0 3h24m
kube-dns-77d9986bd5-z4j62 4/4 Running 0 3h24m
kube-dns-autoscaler-f4d55555-dmvpq 1/1 Running 0 83m
kube-proxy-gke-staging-n2d-standard-8-78c35b3a-8299 1/1 Running 0 11s
kube-proxy-gke-staging-n2d-standard-8-78c35b3a-fp5u 1/1 Running 0 11s
kube-proxy-gke-staging-n2d-standard-8-78c35b3a-rkdp 1/1 Running 0 11s
l7-default-backend-7db896cb4-mvptg 1/1 Running 0 83m
metrics-server-v0.4.4-fd9886cc5-tcscj 2/2 Running 82 33h
netd-5vpmc 1/1 Running 0 30h
netd-bhq64 1/1 Running 0 85m
netd-n6jmc 1/1 Running 0 4d7h
Some logs from metrics server
https://gist.github.com/nvcnvn/b77eb02705385889961aca33f0f841c7
if you cannot use kubectl to get info from your cluster, can you try to access them by using their restfull api
http://blog.madhukaraphatak.com/understanding-k8s-api-part-2/
try to delete "metric-server" pods or get logs from it using podman or curl command.
I followed the official helm chart for fluent-bit and ended up with 20 pods, in a namespace. How do I configure it to use 1 pod?
The replicaCount attribute in values.yaml is set to 1.
https://github.com/fluent/helm-charts/tree/main/charts/fluent-bit
helm upgrade -i fluent-bit helm/efk/fluent-bit --namespace some-ns
kubectl get pods -n some-ns
NAME READY STATUS RESTARTS AGE
fluent-bit-22dx4 1/1 Running 0 15h
fluent-bit-2x6rn 1/1 Running 0 15h
fluent-bit-42rfd 1/1 Running 0 15h
fluent-bit-54drx 1/1 Running 0 15h
fluent-bit-8f8pl 1/1 Running 0 15h
fluent-bit-8rtp9 1/1 Running 0 15h
fluent-bit-8wfcc 1/1 Running 0 15h
fluent-bit-bffh8 1/1 Running 0 15h
fluent-bit-lgl9k 1/1 Running 0 15h
fluent-bit-lqdrs 1/1 Running 0 15h
fluent-bit-mdvlc 1/1 Running 0 15h
fluent-bit-qgvww 1/1 Running 0 15h
fluent-bit-qqwh6 1/1 Running 0 15h
fluent-bit-qxbjt 1/1 Running 0 15h
fluent-bit-rqr8g 1/1 Running 0 15h
fluent-bit-t8vbv 1/1 Running 0 15h
fluent-bit-vkcfl 1/1 Running 0 15h
fluent-bit-wnwtq 1/1 Running 0 15h
fluent-bit-xqwxk 1/1 Running 0 15h
fluent-bit-xxj8q 1/1 Running 0 15h
Note that there are two deployment kinds in your template, daemonset and deployment.
This is controlled by the kind field of the values file.
Now daemonset is written in the values, so it will be on each node without considering the affinity Start a replica.
If you want to start one, please set kind to Deployment, then set replicaCount to 1 and redeploy.
values.yaml
# Default values for fluent-bit.
# kind -- DaemonSet or Deployment
kind: Deployment
# replicaCount -- Only applicable if kind=Deployment
replicaCount: 1
https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
I'm new to Helm. I'm trying to deploy a simple server on the master node. When I do helm install and see the details using the command kubectl get po,svc I see lot of pods created other than the pods I intend to deploy.So, My precise questions are:
Why so many pods got created?
How do I delete all those pods?
Below is the output of the command kubectl get po,svc:
NAME READY STATUS RESTARTS AGE
pod/altered-quoll-stx-sdo-chart-6446644994-57n7k 1/1 Running 0 25m
pod/austere-garfish-stx-sdo-chart-5b65d8ccb7-jjxfh 1/1 Running 0 25m
pod/bald-hyena-stx-sdo-chart-9b666c998-zcfwr 1/1 Running 0 25m
pod/cantankerous-pronghorn-stx-sdo-chart-65f5699cdc-5fkf9 1/1 Running 0 25m
pod/crusty-unicorn-stx-sdo-chart-7bdcc67546-6d295 1/1 Running 0 25m
pod/exiled-puffin-stx-sdo-chart-679b78ccc5-n68fg 1/1 Running 0 25m
pod/fantastic-waterbuffalo-stx-sdo-chart-7ddd7b54df-p78h7 1/1 Running 0 25m
pod/gangly-quail-stx-sdo-chart-75b9dd49b-rbsgq 1/1 Running 0 25m
pod/giddy-pig-stx-sdo-chart-5d86844569-5v8nn 1/1 Running 0 25m
pod/hazy-indri-stx-sdo-chart-65d4c96f46-zmvm2 1/1 Running 0 25m
pod/interested-macaw-stx-sdo-chart-6bb7874bbd-k9nnf 1/1 Running 0 25m
pod/jaundiced-orangutan-stx-sdo-chart-5699d9b44b-6fpk9 1/1 Running 0 25m
pod/kindred-nightingale-stx-sdo-chart-5cf95c4d97-zpqln 1/1 Running 0 25m
pod/kissing-snail-stx-sdo-chart-854d848649-54m9w 1/1 Running 0 25m
pod/lazy-tiger-stx-sdo-chart-568fbb8d65-gr6w7 1/1 Running 0 25m
pod/nonexistent-octopus-stx-sdo-chart-5f8f6c7ff8-9l7sm 1/1 Running 0 25m
pod/odd-boxer-stx-sdo-chart-6f5b9679cc-5stk7 1/1 Running 1 15h
pod/orderly-chicken-stx-sdo-chart-7889b64856-rmq7j 1/1 Running 0 25m
pod/redis-697fb49877-x5hr6 1/1 Running 0 25m
pod/rv.deploy-6bbffc7975-tf5z4 1/2 CrashLoopBackOff 93 30h
pod/sartorial-eagle-stx-sdo-chart-767d786685-ct7mf 1/1 Running 0 25m
pod/sullen-gnat-stx-sdo-chart-579fdb7df7-4z67w 1/1 Running 0 25m
pod/undercooked-cow-stx-sdo-chart-67875cc5c6-mwvb7 1/1 Running 0 25m
pod/wise-quoll-stx-sdo-chart-5db8c766c9-mhq8v 1/1 Running 0 21m
You can run the command helm ls to see all the deployed helm releases in your cluster.
To remove the release (and every resource it created, including the pods), run: helm delete RELEASE_NAME --purge.
If you want to delete all the pods in your namespace without your Helm release (I DON'T think this is what you're looking for), you can run: kubectl delete pods --all.
On a side note, if you're new to Helm, consider starting with Helm v3 since it has many improvements, and specially because the migration from v2 to v3 can become cumbersome, and if you can avoid it - you should.
I have just started a new Kubernetes 1.8.0 environment using minikube (0.27) on Windows 10.
I followed this steps but it didn't work:
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
When I list pods this is the result:
C:\WINDOWS\system32>kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-minikube 1/1 Running 0 23m
kube-system heapster-69b5d4974d-s9vrf 1/1 Running 0 5m
kube-system kube-addon-manager-minikube 1/1 Running 0 23m
kube-system kube-apiserver-minikube 1/1 Running 0 23m
kube-system kube-controller-manager-minikube 1/1 Running 0 23m
kube-system kube-dns-545bc4bfd4-xkt7l 3/3 Running 3 1h
kube-system kube-proxy-7jnk6 1/1 Running 0 23m
kube-system kube-scheduler-minikube 1/1 Running 0 23m
kube-system kubernetes-dashboard-5569448c6d-8zqnc 1/1 Running 2 52m
kube-system kubernetes-dashboard-869db7f6b4-ddlmq 0/1 CrashLoopBackOff 19 51m
kube-system monitoring-influxdb-78d4c6f5b6-b66m9 1/1 Running 0 4m
kube-system storage-provisioner 1/1 Running 2 1h
As you can see, I have 2 kubernets-dashboard pods now, one of then is running and the other one is CrashLookBackOff.
When I try to run minikube dashboard this is the result:
"Waiting, endpoint for service is not ready yet..."
I have tried to remove kubernetes-dashboard-869db7f6b4-ddlmq pod:
kubectl delete pod kubernetes-dashboard-869db7f6b4-ddlmq
This is the result:
"Error from server (NotFound): pods "kubernetes-dashboard-869db7f6b4-ddlmq" not found"
"Error from server (NotFound): pods "kubernetes-dashboard-869db7f6b4-ddlmq" not found"
You failed to delete the pod due to the lack of namespace (add -n kube-system). And it should be 1 dashboard pod if no modification's applied. If it still fails to run minikube dashboard after you delete the abnormal pod, more logs should be provided.
I created a Single node Kuberenetes cluster using minikube and I installed helm on that. But I am getting issue while executing helm ls and helm install commands. This is this issue I am facing:
"Get http://localhost:8080/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 127.0.0.1:8080: connect: connection refused".
These are pods are running on kube-system namespace
ubuntu#openshift:~$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
default-http-backend-vqbh4 1/1 Running 1 6h
etcd-minikube 1/1 Running 0 6h
kube-addon-manager-minikube 1/1 Running 4 1d
kube-apiserver-minikube 1/1 Running 0 6h
kube-controller-manager-minikube 1/1 Running 0 6h
kube-dns-86f4d74b45-xxznk 3/3 Running 15 1d
kube-proxy-j28zs 1/1 Running 0 6h
kube-scheduler-minikube 1/1 Running 3 1d
kubernetes-dashboard-5498ccf677-89hrf 1/1 Running 8 1d
nginx-ingress-controller-tjljg 1/1 Running 3 6h
registry-wzwnq 1/1 Running 1 7h
storage-provisioner 1/1 Running 8 1d
tiller-deploy-75d848bb9-tmm9b 1/1 Running 0 4h
If you have any idea please help me. Thanks