k8s (single-node) not working after restart - kubernetes

I installed kubernetes on Ubuntu 16.04 (Virtualbox vm) - a single node with master tainted. It worked well. But after I restart my vm, it is not working any more.
kubectl commands are not working any more, throws this error -
The connection to the server localhost:8001 was refused - did you specify the
right host or port?
It looks similar to this thread, but the solution is not working for me.
When I try "sudo docker ps -a", all kube pods are showing in Exited status.
Any helps/pointers, please? Thanks in advance.

I've been having the same issue with my rancher 2 setup. I have two nodes in one cluster. One of my node servers was restarted and never connected to my cluster. Even though docker and containers were running fine.
One of the things i tried was reduce the number of workloads that can run in one node. I had increased it to 400. SO i put it back to 100. That's when I got my first breakthrough of what could be happening with my downed node. I go the error "Path /var/lib/docker is mounted on / but it is not a shared or slave mount." A quick search led me to a similar issue in the rancher github page. Basically a workaround by superseb fixed my issue. I sshed into my node and ran
> mount --make-rshared /
> docker start kubelet
Maybe your issue might be different, but maybe you could be having this same shared problem.

Related

Kubectl connection refused existing cluster

Hope someone can help me.
To describe the situation in short, I have a self managed k8s cluster, running on 3 machines (1 master, 2 worker nodes). In order to make it HA, I attempted to add a second master to the cluster.
After some failed attempts, I found out that I needed to add controlPlaneEndpoint configuration to kubeadm-config config map. So I did, with masternodeHostname:6443.
I generated the certificate and join command for the second master, and after running it on the second master machine, it failed with
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
Checking the first master now, I get connection refused for the IP on port 6443. So I cannot run any kubectl commands.
Tried recreating the .kube folder, with all the config copied there, no luck.
Restarted kubelet, docker.
The containers running on the cluster seem ok, but I am locked out of any cluster configuration (dashboard is down, kubectl commands not working).
Is there any way I make it work again? Not losing any of the configuration or the deployments already present?
Thanks! Sorry if it’s a noob question.
Cluster information:
Kubernetes version: 1.15.3
Cloud being used: (put bare-metal if not on a public cloud) bare-metal
Installation method: kubeadm
Host OS: RHEL 7
CNI and version: weave 0.3.0
CRI and version: containerd 1.2.6
This is an old, known problem with Kubernetes 1.15 [1,2].
It is caused by short etcd timeout period. As far as I'm aware it is a hard coded value in source, and cannot be changed (feature request to make it configurable is open for version 1.22).
Your best bet would be to upgrade to a newer version, and recreate your cluster.

Unable to connect to the server: net/http: TLS handshake timeout

On minikube for windows I created a deployment on the kubernetes cluster, then I tried to scale it by changing replicas from 1 to 2, and after that kubectl hangs and my disk usage is 100%.
I only have one container in my deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: first-deployment
spec:
replicas: 1
selector:
matchLabels:
run: app
template:
metadata:
labels:
run: app
spec:
containers:
- name: demo
image: ner_app
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
all I did was run this after the pods were successfully deployed and running
kubectl scale --replicas=2 deployment first-deployment
In another terminal I was watching the pods using
kubectl get pods --watch
But everything is unresponsive and I'm not sure how to recover from this.
When I run kubectl get pods again it gives the following message
PS D:\docker\ner> kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
Is there a way to recover, or cancel whatever process is running?
Also my VM's are on Hyper-V for Windows 10 Pro (minikube and Docker Desktop) both have the default RAM allocated - 2048MB
The container in my pod is a machine learning process and the model it loads could be large, in the order of 200MB to 300MB
You may have some proxy problems. Try following commands:
$ unset http_proxy
$ unset https_proxy
and repeat your kubectl call.
For me, the problem is that Docker ran out of memory. (EDIT: Possibly anyway; I wrote this post a while ago, and am now not so sure that is the root case, but did not write down my rationale, so idk.)
Anyway, to fix:
Fully close your k8s emulator. (docker desktop, minikube, etc.)
Shutdown WSL2. (wsl --shutdown) [EDIT: This step is apparently not necessary -- at least not always, since this time I skipped it, and the problem still resolved.]
Restart your k8s emulator.
Rerun the commands you wanted.
Sometimes it also works to simply:
Right click the Docker Desktop tray-icon, press "Restart Docker", and wait a few minutes for things to restart. (sometimes this fails, with Docker Desktop saying "Docker failed to start", so I'd generally recommend the more thorough process above)
Just happened to me on a new Windows 10 install with Ubuntu distro in WSL2. I solved the problem by running:
$ sudo ifconfig eth0 mtu 1350
(BTW, I was on a VPN connection when trying the 'kubectl get pods' command)
You can set up resource limits on deployments so that pods will not use the entire available resource in the node.
In my case I have my private EKS cluster and there is no 443(HTTPS) enabled in security groups.
My issue is solved after enabling the (HTTPS)443 port in security groups.
Kindly refer for AWS documentation for more details: "You must ensure that your Amazon EKS control plane security group contains rules to allow ingress traffic on port 443 from your connected network"
i solved this problem when execute the following command
minikube delete
and then start it
minikube start --vm-driver="virtualbox"
if use this why your pods will deleted
and when run kubectl get pods
you can see this result
No resources found in default namespace.
You could try $ unset all_proxy to reset the socket proxy.
Also, if you're connected to a VPN, try disconnecting - it seems that can interfere with connecting to a cluster.
I think the other answers don't really mention or refer to the vpn and proxy documentation for minikube: https://minikube.sigs.k8s.io/docs/handbook/vpn_and_proxy/
The NO_PROXY variable here is important: Without setting it, minikube may not be able to access resources within the VM. minikube uses two IP ranges, which should not go through the proxy:
192.168.99.0/24: Used by the minikube VM. Configurable for some hypervisors via --host-only-cidr
192.168.39.0/24: Used by the minikube kvm2 driver.
192.168.49.0/24: Used by the minikube docker driver’s first cluster.
10.96.0.0/12: Used by service cluster IP’s. Configurable via --service-cluster-ip-rang
So adding those IP ranges to your NO_PROXY environment variable should fix the issue.
Simply closing cmd, opening again, then
minikube start
And then executing the commands again solved this issue for me.
P.S: minikube start took less than a minute
Adding the IP address to the no_proxy list worked for me.
Obtain the IP address from ip addr output.
export no_proxy=localhost,127.0.0.1,<IP_ADDRESS>
restart minikube will work.
But if you don't want to delete it
then you can just switch to other cluster and then switch back.
I just click other kubenete cluster (ex: docker-desktop)
and then click back to the cluster I want to run (ex: minikube)
If you're on Linux or Mac, go to your virtualbox, and then on the toolbar choose 'Global Tools', then if you see two machines are using the same ip address, you should remove one of them. this image shows virtual box GUI
As this answer comes first on search for net-http-tls-handshake-timeout error
For those having issue with AWS EKS (and likely any K8s),
NO_PROXY solves problem by adding related IP/host to environment variable.
As suggested in comments for first answer.
For AWS EKS (when seeing this intermittently after vpc-cni addon upgrade)
replace for specific region or single url for your use case.
NO_PROXY=$NO_PROXY;eks.amazonaws.com
At least for Windows 10 and 11
$PS C:\oc rollback dc/my-app
Unable to connect to the server: net/http: TLS handshake timeout
For OpenShift 4.x the problem is that for some reason you are logged-out:
$PS C:\oc status
error: You must be logged in to the server (Unauthorized)
logging in by e.g.
$oc login -u developer
resolves the problem
Open PowerShell as an administrator and run the command "wsl --shutdown". You will see the same notification in your open Ubuntu terminal.
Open Docker Desktop.
Open a new terminal.
Run the command "minikube status" in the Ubuntu terminal.
Run the Minikube container. You can do this in Docker Desktop.
Run the command "minikube start".
That's it! You don't need to close your computer after this, and Minikube should work fine.

Unable to setup multiple node kubernetes cluster using kubeadm (Vagrant)

I have been setting up multi node kubernetes cluster using kubeadm.Setup included 1 master and worker node each. I have created the VM using vagrant.
I followed the docs,
https://kubernetes.io/docs/setup/independent/install-kubeadm/
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm
Created 2 VM's using vagrant
IP: Master- 192.168.33.10 , Worker- 192.168.1.21 (Both host only network)
I have experienced 2 scenarios,
Case 1:
Ran kubeadm init --pod-network-cidr=10.244.0.0/16 successfully with all pods running.
Installed "Canal" pod network add on.
Followed all the instructions given at the end of the successfull kubeadm init command.
SSH into 2nd VM and ran kubeadm join .. command and I am struck at "[preflight] Running pre-flight checks"
Case 2:
Did the same process with tag --apiserver-advertise-address=192.168.33.10
Successfully ran the command kubeadm init --apiserver-advertise-address=192.168.33.10
But when I ran the command kubectl get nodes it only showed the master node. (expected the worker node to show too).
Kindly help me understand how can I complete this setup. Thank you.
I have github repository which does exactly what you want. I am pretty sure that you will get idea from it. If anything is not clear, please update with comment or original post.

Failed to create pod sandbox kubernetes cluster

I have an weave network plugin.
inside my folder /etc/cni/net.d there is a 10-weave.conf
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
}
My weave pods are running and the dns pod is also running
But when i want to run a pod like a simple nginx wich will pull an nginx image
The pod stuck at container creating , describe pod gives me the error , failed create pod sandbox.
When i run journalctl -u kubelet i get this error
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
is my network plugin not good configured ?
i used this command to configure my weave network
kubectl apply -f https://git.io/weave-kube-1.6
After this won't work i also tried this command
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”
I even tried flannel and that gives me the same error.
The system i am setting kubernetes on is a raspberry pi.
I am trying to build a raspberry pi cluster with 3 nodes and 1 master with kubernetes
Dose anyone have ideas on this?
Thank you all for responding to my question. I solved my problem now. For anyone who has come to my question in the future the solution was as followed.
I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down.
Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed. The commands to do this was
sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id
Once again my patience was being tested. Because my kubernetes setup was normal and my raspberry pi os was normal. I founded this with the help of someone in the kubernetes community. This again shows us how important and great are IT community is. To the people of the future who will come to this question. I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing.
Looking at the pertinent code in Kubernetes and in CNI, the specific error you see seems to indicate that it cannot find any files ending in .json, .conf or .conflist in the directory given.
This makes me think it could be something as the conf file not being present on all the hosts, so I would verify that as a first step.

Kubernetes cluster using Vagrant not working after restart

I installed a Kubernetes cluster by following the instruction here:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/getting-started-guides/vagrant.md
Everything looks fine the first time. I'm able to see the nodes, pods, deploy new pods, etc.
The problem shows up when I stop the cluster and try to start it again. I'm restarting the cluster as indicated on the documentation:
vagrant halt
./cluster/kube-up.sh
When I do that I see the following error:
Comment: Source file salt://kubelet/kubeconfig not found
...
Minion did not return. [No response]
Then, when I check the status of nodes it says the minion is NotReady.
If I have VirtualBox open while I run kube-up.sh, I see that the error is thrown before the minion VM is started. So it sounds like the minion is not running when it tries to configure it. That's just an observation, not sure what's the problem.
In order to solve this issue I have to destroy the cluster and create it again, what downloads and install everything again, making it very slow to use.
I found this problem on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/issues/9270
Here it was suggested to use the code in HEAD. I did that and now it is working fine.