after some time my kubernetes cluster does not work - kubernetes

I have kubernetes cluster.every thing work fine. but after 8 days when i run kubectl get pods it shows:
The connection to the server <host>:6443 was refused - did you specify the right host or port?
I have one master and one worker.
I run them in my lab without any cloud.
systemctl kubelet status
show **node not found**
my /etc/hosts was checked and it is correct

i have lack of hardware. I run this command to solve the issue
sudo -i
swapoff -a
exit
strace -eopenat kubectl version

most likely that the servers are rebooted. i had similar problem.
check kubelet logs on master server and take action.
if you can share the kubelet logs then we will be able to offer you further help

Reboot itself should not be a problem - but if you did not disabled swap permanently reboot will enable swap again and API server will not launch - it could be first shot.
Second - check free disk space, API server will not respond if disk is full (will raise disk pressure event and will try to evict pods).
If it will not help - please add logs from Kubelet (systemctl and journalctl).

verify /var/log/messages to get further information about the error
or
systemctl status kubelet
or
Alternately journalctl will also show the details.

Related

Periodic problem with kubernetes: An error: "The connection to the server x.x.x.:6443 was refused - did you specify the right host or port?"

Good afternoon!
Setting up my first k8s cluster :)
I set up a virtual machine on vmware, set up a control plane, connected a worker node, set up kubectl on the master node and on a laptop (vmware is installed on it). I observe the following problem: periodically, every 2 - 5 minutes, the api-server stops responding, when you run any kubectl command (for example, "kubectl get nodes"), an error appears: "The connection to the server 192.168.131.133:6443 was refused - did you specify the right host or port?" A few minutes pass - everything is restored, in response to the "kubectl get nodes", the system shows the nodes. A few more minutes - again the same error. The error synchronously appears both on the master node and on the laptop.
This is what it looks like (for about 10 minutes):
At the same time, if you execute commands on the master node
$ sudo systemctl stop kubelet
$ sudo systemctl start kubelet
everything is immediately restored. And after a few minutes again the same error.
I would be grateful if you could help interpret these logs and tell me how to fix this problem?
kubectl logs at the time of the error (20:42:55):
log
Could imagine that the process on 192.168.131.133 is restarting which is leading to a connection refused when it is not listening any more on the API port.
You should start investigating if you can see any hw issues.
Either CPU is increasing leading to a restart. Or memory leak.
You can check the running processes with.
ps -ef
Use
top
Command to see CPU consumption.
There should be some logs and events in k8s available as well.
It seems no connectivity issue as you are receiving a clear failure back.

Kubernetes liveness probe logging recovery

I am trying to test a liveness probe while learning kubernetes.
I have set up a minikube and configured a pod with a liveness probe.
Testing the script (E.g. via docker exec) it seems to report success and failure as required.
The probe leads to failures events which I can view via kubectl describe podname
but it does not report recovery from failures.
This answer says that liveness probe successes are not reported by default.
I have been trying to increase the log level with no success by running variations like:
minikube start --extra-config=apiserver.v=4
minikube start --extra-config=kubectl.v=4
minikube start --v=4
As suggested here & here.
What is the proper way to configure the logging level for a kubelet?
can it be modified without restarting the pod or minikube?
An event will be reported if a failure causes the pod to be restarted.
For kubernetes itself I understand that using it to decide whether to restart the pod is sufficient.
Why aren't events recorded for recovery from a failure which does not require a restart?
This is how I would expect probes to work in a health monitoring system.
How would recovery be visible if the same probe was used in prometheus or similar?
For an expensive probe I would not want it to be run multiple times.
(granted one probe could cache the output to a file making the second probe cheaper)
I have been trying to increase the log level with no success by
running variations like:
minikube start --extra-config=apiserver.v=4
minikube start --extra-config=kubectl.v=4
minikube start --v=4
#Bruce, none of the options mentioned by you will work as they are releted with other components of Kubernetes cluster and in the answer you referred to it was clearly said:
The output of successful probes isn't recorded anywhere unless your
Kubelet has a log level of at least --v=4, in which case it'll be in
the Kubelet's logs.
So you need to set -v=4 specifically for kubelet. In the official docs you can see that it can be started with specific flags including the one, changing default verbosity level of it's logs:
-v, --v Level
number for the log level verbosity
Kubelet runs as a system service on each node so you can check it's status by simply issuing:
systemctl status kubelet.service
and if you want to see it's logs issue the command:
journalctl -xeu kubelet.service
Try:
minikube start --extra-config=kubelet.v=4
however I'm not 100% sure if Minikube is able to pass this parameter so you'll need to verify it on your own. If it doesn't work you should still be able to add it in kubelet configuration file, specifying parameters with which it is started (don't forget to restart your kubelet.service after submitting the changes, you simply need to run systemctl restart kubelet.service)
Let me know if it helps and don't hesitate to ask additional questions if something is not completely clear.

Changing the CPU Manager Policy in Kubernetes

I'm trying to change the CPU Manager Policy for a Kubernetes cluster that I manage, as described here however, I've run into numerous issues while doing so.
The cluster is running in DigitalOcean and here is what I've tried so far.
1. Since the article mentions that --cpu-manager-policy is a kubelet option I assume that I cannot change it via the API Server and have to change it manually on each node. (Is this assumption BTW?)
2. I ssh into one of the nodes (droplets in DigitalOcean lingo) and run kubelet --cpu-manager-policy=static command as described in the kubelet CLI reference here. It gives me the message Flag --cpu-manager-policy has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
3. So I check the file pointed at by the --config flag by running ps aux | grep kubelet and find that its /etc/kubernetes/kubelet.conf.
4. I edit the file and add a line cpuManagerPolicy: static to it, and also kubeReserved and systemReserved because they become required fields if specifying cpuManagerPolicy.
5. Then I kill the process that was running the process and restart it. A couple other things showed up (delete this file and drain the node etc) that I was able to get through and was able to restart the kubelet ultimately
I'm a little lost about the following things
How do I need to do this for all nodes? My cluster has 12 of them and doing all of these steps for each seems very inefficient.
Is there any way I can set these params from the globally i.e. cluster-wide rather than doing node by node?
How can I even confirm that what I did actually changed the CPU Manager policy?
One issue with Dynamic Configuration is that in case the node fails to restart, the API does not give a reasonable response back that tells you what you did wrong, you'll have to ssh into the node and tail the kubelet logs. Plus, you have to ssh into every node and set the --dynamic-config-dir flag anyways.
The folllowing worked best for me
SSH into the node. Edit
vim /etc/systemd/system/kubelet.service
Add the following lines
--cpu-manager-policy=static \
--kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi \
--system-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi \
We need to set the --kube-reserved and --system-reserved flags because they're prerequisties to setting the --cpu-manager-policy flag
Then drain the node and delete the following folder
rm -rf /var/lib/kubelet/cpu_manager_state
Restart the kubelet
sudo systemctl daemon-reload
sudo systemctl stop kubelet
sudo systemctl start kubelet
uncordon the node and check the policy. This assumes that you're running kubectl proxy on port 8001.
curl -sSL "http://localhost:8001/api/v1/nodes/${NODE_NAME}/proxy/configz" | grep cpuManager
If you use a newer k8s version and kubelet is configured by a kubelet configuration file, eg:config.yml. you can just follow the same steps of #satnam mentioned above. but instead of adding --kube-reserved --system-reserved --cpu-manager-policy, you need to add kubeReserved systemReserved cpuManagerPolicy in your config.yml. for example:
systemReserved:
cpu: "1"
memory: "100m"
kubeReserved:
cpu: "1"
memory: "100m"
cpuManagerPolicy: "static"
Meanwhile, be sure your CPUManager is enabled.
It might not be the global way of doing stuff, but I think it will be much more comfortable than what you are currently doing.
First you need to run
kubectl proxy --port=8001 &
Download the configuration:
NODE_NAME="the-name-of-the-node-you-are-reconfiguring"; curl -sSL "http://localhost:8001/api/v1/nodes/${NODE_NAME}/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' > kubelet_configz_${NODE_NAME}
Edit it accordingly, and push the configuration to control plane. You will see a valid response if everything went well. Then you will have to edit the configuration so the Node starts to use the new ConfigMap. There are many more possibilities, for example you can go back to default settings if anything goes wrong.
This process is described with all the details in this documentation section.
Hope this helps.

Kubernetes is Down

All of a sudden, I get this error when I run kubectl commands
The connection to the server xx.xx.xxx.xx:6443 was refused - did you specify the right host or port?
I can see kubelet running, but none of the other Kubernetes related services are running.
docker ps -a | grep kube-api
returns me nothing.
What I did after searching for the resolution in google:
Turned off swap --> issue persists
Restarted the linux machine -->
Right after restart, I could see the kubectl commands giving me output, but after about 15 seconds, it again went back to the error I mentioned above.
Restarted Kubelet --> For 1 sec, I could see the output for kubectl commands, but again back to square one.
I'm not sure what exactly I'm supposed to do here.
NB: K8 cluster is installed with kubeadm
Also, I can see the pods in EVICTED state for a brief time when I could see the kubectl get pods output

Failed to create pod sandbox kubernetes cluster

I have an weave network plugin.
inside my folder /etc/cni/net.d there is a 10-weave.conf
{
"name": "weave",
"type": "weave-net",
"hairpinMode": true
}
My weave pods are running and the dns pod is also running
But when i want to run a pod like a simple nginx wich will pull an nginx image
The pod stuck at container creating , describe pod gives me the error , failed create pod sandbox.
When i run journalctl -u kubelet i get this error
cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
is my network plugin not good configured ?
i used this command to configure my weave network
kubectl apply -f https://git.io/weave-kube-1.6
After this won't work i also tried this command
kubectl apply -f “https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d ‘\n’)”
I even tried flannel and that gives me the same error.
The system i am setting kubernetes on is a raspberry pi.
I am trying to build a raspberry pi cluster with 3 nodes and 1 master with kubernetes
Dose anyone have ideas on this?
Thank you all for responding to my question. I solved my problem now. For anyone who has come to my question in the future the solution was as followed.
I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down.
Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed. The commands to do this was
sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id
Once again my patience was being tested. Because my kubernetes setup was normal and my raspberry pi os was normal. I founded this with the help of someone in the kubernetes community. This again shows us how important and great are IT community is. To the people of the future who will come to this question. I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing.
Looking at the pertinent code in Kubernetes and in CNI, the specific error you see seems to indicate that it cannot find any files ending in .json, .conf or .conflist in the directory given.
This makes me think it could be something as the conf file not being present on all the hosts, so I would verify that as a first step.