k8s: Unable to delete deployment due to lack of RAM - kubernetes

I got into a vicious circle. I was trying to deploy a few services on AWS Ubuntu machine. It has 1 Gb RAM. By the end of deploying all RAM was used. I decided to delete some of the deployments but I was even unable to check the status of pods and deployments:
$ kubectl delete -f test.yaml
unable to recognize "test.yaml": Get https://172.31.38.138:6443/api?timeout=32s: dial tcp 172.31.38.138:6443: connect: connection refused
$ kubectl get deployments
Unable to connect to the server: dial tcp 172.31.38.138:6443: i/o timeoutUnable to connect to the server: dial tcp 172.31.38.138:6443: i/o timeout
I do understand that the issue is lack of memory. Hence kube-dns, kube-proxy, etc cannot work correctly. The question is:
How can I delete my test deployments without kubectl delete...?
Thanks

Stop Kubelet service then run docker system prune command to delete all pods. And finally restart kubelet

Related

Kubernetes - Failed to Apply a yaml from a raw url, Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on: server misbehaving

Im Anddiy and im working with a kubernetes cluster deployed by Rancher.
Its important to say that all my machines don't have direct access to the internet, im using a proxy to use the internet for downloads or something, so i've setted RKE2 with this proxy during the installation steps.
Here i have a machine with an RKE2 that build up my Rancher, and from the Rancer U.I i've created my Kubernetes Cluster, here it tis:
[15:19] root#vmrmmstnodehom01 [~]:# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vmrmmstnodehom01 Ready controlplane 5d16h v1.24.8
vmrmwrknodehom01 Ready controlplane,etcd,worker 5d20h v1.24.8
vmrmwrknodehom02 Ready worker 5d19h v1.24.8
vmrmwrknodehom03 Ready worker 5d19h v1.24.8
vmrmwrknodehom04 Ready worker 5d19h v1.24.8
My cluster is a clean cluster, no applications installed on it at moment.
I've tried to install the longhorn application by using this command ( got this on longhorn documentation):
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.0/deploy/longhorn.yaml
But when i tried that, this error message are displayed to me:
Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on 10.129.251.125:53: server misbehaving
I've tried to check if it is my proxy that don't connect to this url or something, but my machine connected succesfully to this url, i've tried that using the CURL -V and the longhorn url to test that.
I don't know if the kubernetes api has imported the proxy configs of my rke2/rancher, so i don't know if i need to set the proxy manually internal or something, really don't know what is happening here.

k3s - Metrics server doesn't work for worker nodes

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

minikube: failed to start on mac with error E1006

I'm trying to setup k8s locally on my own mac, and after installing all the dependencies, I try to run minikube start, but get the following error message:
😄 minikube v1.4.0 on Darwin 10.14.6
💡 Tip: Use 'minikube start -p <name>' to create a new cluster, or 'minikube delete' to delete this one.
🏃 Using the running virtualbox "minikube" VM ...
⌛ Waiting for the host to be provisioned ...
🐳 Preparing Kubernetes v1.16.0 on Docker 18.09.9 ...
E1006 09:57:30.975647 22071 cache_images.go:79] CacheImage k8s.gcr.io/kube-apiserver:v1.16.0 -> /Users/chrisbao/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.16.0 failed: fetching image: Get https://k8s.gcr.io/v2/: dial tcp [2404:6800:4008:c04::52]:443: i/o timeout
E1006 09:57:30.976341 22071 cache_images.go:79] CacheImage gcr.io/k8s-minikube/storage-provisioner:v1.8.1 -> /Users/chrisbao/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1 failed: fetching image: Get https://gcr.io/v2/: dial tcp [2404:6800:4008:c00::52]:443: i/o timeout
and minikube status command returns the following status info:
host: Running
kubelet:
apiserver: Stopped
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100
so how to debug and fix it? what's the potential reason?
E1006 09:57:30.975647 22071 cache_images.go:79] CacheImage k8s.gcr.io/kube-apiserver:v1.16.0 -> /Users/chrisbao/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.16.0 failed: fetching image: Get https://k8s.gcr.io/v2/: dial tcp [2404:6800:4008:c04::52]:443: i/o timeout
Looks like you aren't able to pull the k8s api server image from GCR. You can try use one of the available image mirrors by using the --image-repository or --image-mirror-country flags. E.g., if you are based in China, you can start minikube with:
minikube start --image-mirror-country=cn
You're getting a connection timeout when trying to pull images.
"Get https://k8s.gcr.io/v2/: dial tcp [2404:6800:4008:c04::52]:443: i/o timeout"
Can you confirm that you're able to access the internet from within your minikube VM?
minikube ssh
ping google.com
you should see something like

How to deal with error "dial tcp 10.240.0.4:10250: i/o timeout" to see pod's logs in AKS?

Before I could run this command kubectl logs <pod> without issue for many days/versions. However, after I pushed another image and deployed recently, I faced below error:
Error from server: Get https://aks-agentpool-xxx-0:10250/containerLogs/default/<-pod->/<-service->: dial tcp 10.240.0.4:10250: i/o timeout
I tried to re-build and re-deploy but failed.
Below was the Node info for reference:
Not sure if your issue is caused by the problem described in this troubleshooting. But maybe you can take a try, it shows below:
Make sure that the default network security group isn't modified and
that both port 22 and 9000 are open for connection to the API server.
Check whether the tunnelfront pod is running in the kube-system
namespace using the kubectl get pods --namespace kube-system command.
If it isn't, force deletion of the pod and it will restart.

Kubernetes dashboard cannot be started

I create dashboard after I installed kubernetes with kubeadm.
kubectl create -f https://rawgit.com/kubernetes/dashboard/master/src/deploy/kubernetes-dashboard.yaml
Wait a while, the pod is crashed like:
kubectl get pods --all-namespaces
kubernetes-dashboard-3203831700-wq0v4 0/1 CrashLoopBackOff 3 3m
And I checked the pod log:
kubectl logs -f kubernetes-dashboard-3203831700-wq0v4 -n kube-system Using HTTP port: 9090
Creating API server client for https://10.96.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md
But I tried it mannually, the url works:
# curl https://10.96.0.1:443/version
curl: (35) Peer reports incompatible or unsupported protocol version.
Have anybody encountered this issue before? or help me?
I execute the following command:
rm -rf ~/.kube
Now it works. still a bit strange :-(