minikube recover from unexpected power outages - minikube

Minikube fail to start after the PC unexpectedly loses power or is kicked:
PS C:\Windows\system32> minikube.exe start
* minikube v1.2.0 on windows (amd64)
* Tip: Use 'minikube start -p <name>' to create a new cluster, or 'minikube delete' to delete this one.
* Restarting existing virtualbox VM for "minikube" ...
* Waiting for SSH access ...
* Configuring environment for Kubernetes v1.15.0 on Docker 18.09.6
* Relaunching Kubernetes v1.15.0 using kubeadm ...
X Error restarting cluster: waiting for apiserver: timed out waiting for the condition
* Sorry that minikube crashed. If this was unexpected, we would love to hear from you:
- https://github.com/kubernetes/minikube/issues/new
* Problems detected in "kube-addon-manager":
- error: unable to recognize "STDIN": Get https://localhost:8443/api?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
- error: unable to recognize "STDIN": Get https://localhost:8443/api?timeout=32s: net/http: TLS handshake timeout
- error: unable to recognize "STDIN": Get https://localhost:8443/api?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused
I fell into this accident several times. I had to minikube delete, minikube start, then redeploy my development environment. This is very troublesome, is there a better solution?

Related

Exiting due to DRV_CREATE_TIMEOUT: Failed to start host: creating host: create host timed out in 360.000000 seconds

When I run minikube start --driver=virtualbox --host-only-cidr='192.168.59.1/24' I get the following output
minikube start --driver=virtualbox --host-only-cidr='192.168.59.1/24'
😄 minikube v1.24.0 on Ubuntu 20.04
✨ Using the virtualbox driver based on user configuration
👍 Starting control plane node minikube in cluster minikube
🔥 Creating virtualbox VM (CPUs=2, Memory=6000MB, Disk=20000MB) ..
🔥 Deleting "minikube" in virtualbox ...
🤦 StartHost failed, but will try again: creating host: create host timed out in 360.000000 seconds
🔥 Creating virtualbox VM (CPUs=2, Memory=6000MB, Disk=20000MB) ..
😿 Failed to start virtualbox VM. Running "minikube delete" may fix it: creating host: create host timed out in 360.000000 seconds
❌ Exiting due to DRV_CREATE_TIMEOUT: Failed to start host: creating host: create host timed out in 360.000000 seconds
💡 Suggestion: Try 'minikube delete', and disable any conflicting VPN or firewall software
🍿 Related issue: https://github.com/kubernetes/minikube/issues/7072
I'm not certain why this is not working. Any idea?

k3s - Metrics server doesn't work for worker nodes

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

minikube: failed to start on mac with error E1006

I'm trying to setup k8s locally on my own mac, and after installing all the dependencies, I try to run minikube start, but get the following error message:
😄 minikube v1.4.0 on Darwin 10.14.6
💡 Tip: Use 'minikube start -p <name>' to create a new cluster, or 'minikube delete' to delete this one.
🏃 Using the running virtualbox "minikube" VM ...
⌛ Waiting for the host to be provisioned ...
🐳 Preparing Kubernetes v1.16.0 on Docker 18.09.9 ...
E1006 09:57:30.975647 22071 cache_images.go:79] CacheImage k8s.gcr.io/kube-apiserver:v1.16.0 -> /Users/chrisbao/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.16.0 failed: fetching image: Get https://k8s.gcr.io/v2/: dial tcp [2404:6800:4008:c04::52]:443: i/o timeout
E1006 09:57:30.976341 22071 cache_images.go:79] CacheImage gcr.io/k8s-minikube/storage-provisioner:v1.8.1 -> /Users/chrisbao/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1 failed: fetching image: Get https://gcr.io/v2/: dial tcp [2404:6800:4008:c00::52]:443: i/o timeout
and minikube status command returns the following status info:
host: Running
kubelet:
apiserver: Stopped
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100
so how to debug and fix it? what's the potential reason?
E1006 09:57:30.975647 22071 cache_images.go:79] CacheImage k8s.gcr.io/kube-apiserver:v1.16.0 -> /Users/chrisbao/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.16.0 failed: fetching image: Get https://k8s.gcr.io/v2/: dial tcp [2404:6800:4008:c04::52]:443: i/o timeout
Looks like you aren't able to pull the k8s api server image from GCR. You can try use one of the available image mirrors by using the --image-repository or --image-mirror-country flags. E.g., if you are based in China, you can start minikube with:
minikube start --image-mirror-country=cn
You're getting a connection timeout when trying to pull images.
"Get https://k8s.gcr.io/v2/: dial tcp [2404:6800:4008:c04::52]:443: i/o timeout"
Can you confirm that you're able to access the internet from within your minikube VM?
minikube ssh
ping google.com
you should see something like

k8s: Unable to delete deployment due to lack of RAM

I got into a vicious circle. I was trying to deploy a few services on AWS Ubuntu machine. It has 1 Gb RAM. By the end of deploying all RAM was used. I decided to delete some of the deployments but I was even unable to check the status of pods and deployments:
$ kubectl delete -f test.yaml
unable to recognize "test.yaml": Get https://172.31.38.138:6443/api?timeout=32s: dial tcp 172.31.38.138:6443: connect: connection refused
$ kubectl get deployments
Unable to connect to the server: dial tcp 172.31.38.138:6443: i/o timeoutUnable to connect to the server: dial tcp 172.31.38.138:6443: i/o timeout
I do understand that the issue is lack of memory. Hence kube-dns, kube-proxy, etc cannot work correctly. The question is:
How can I delete my test deployments without kubectl delete...?
Thanks
Stop Kubelet service then run docker system prune command to delete all pods. And finally restart kubelet

Unable to connect to the server: dial tcp accounts.google.com :443: getsockopt: operation timed out

I'm trying to get the pods list from the gcloud project.
The gcloud project I've created in the gcp using different laptop.
Now I'm using different machine but logged into same gcp account and using same project.
When I run the command kubectl get pods I get the below error.
Unable to connect to the server: dial tcp a.b.c.d:443: getsockopt: operation timed out
I tried to add an argument --verbose but that doesn't seems to be valid.
How can I further proceed in resolving this error.
gcloud container clusters get-credentials my-cluster-name will log you into your cluster locally
From the docs:
"updates a kubeconfig file with appropriate credentials and endpoint information to point kubectl at a specific cluster in Google Kubernetes Engine." - src