Kubelet failing start attempts pollutes logs - centos

I have a bunch of fresh CentOS servers installed on AWS. The service kubelet pollutes log file (var/log/messages) with it attempts to start, but as I have no use for it, I would like to remove it. It's this an optional component of CentOS and I can safely remove it (or disable kubelet.service)? I believe so, but would not expect a brand new server pushing out so many errors.
Currently, 97% of my /var/log/messages logs contain rows like:
Jan 17 03:21:03 systemd: Started kubelet: The Kubernetes Node Agent.
Jan 17 03:21:03 kubelet: F0117 03:21:03.101812 29626 server.go:198] failed to load Kubelet
config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file
"/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or
directory
***da da da, 40 more rows***
Jan 17 03:21:03 systemd: Unit kubelet.service entered failed state.
Jan 17 03:21:03 systemd: kubelet.service failed.
Jan 17 03:21:13 systemd: kubelet.service holdoff time over, scheduling restart.
Jan 17 03:21:13 systemd: Stopped kubelet: The Kubernetes Node Agent.
Jan 17 03:21:13 systemd: Started kubelet: The Kubernetes Node Agent.
***sleep for 10s and start all over*

As I have already mentioned in my comment, kubelet is a part of kubernetes cluster, it's the primary node agent that runs on each node. I sincerely doubt that this CentOS image came with it preinstalled. If it really did, and as you said, it's a "fresh CentOS server", that nobody had previously tinkered with, I would recommend you to choose a different image if your servers have nothing to do with kubernetes cluster. However if it is used as kind of your production environment and runs some other important things, you should investigate how it was installed and simply remove it.
I did not do the setup myself, but the template used is
258751437250/ami-centos-7-1.13.0-00-1543960911. We have not asked for
Kubernetes on it and is not using clusters
The simplest answer to your question is:
You can safely stop and disable it so it doesn't pollute your /var/log/messages any more:
sudo systemctl stop kubelet.service && sudo systemctl disable kubelet.service
You can also remove it. Depending on how it was installed, you may need to do it in a specific way.
First check:
yum list installed | grep kubelet
If it's there you can:
yum remove kubelet
If it doesn't return any result you may try:
rpm -qa | grep kubelet
and if anything found, remove it:
rpm -e kubelet
It may be also a remnant of an old kubernetes installation which was set up with a tool like minikube or kubeadm. To check that, run:
sudo systemctl cat kubelet.service
and take a look at the ExecStart section. Depending on what you find there, it's very likely you'll need to uninstall some other unnecessary components e.g. if you find something like /var/lib/minikube/binaries/v1.16.0/kubelet, it means it's part of minikube installation.
Chances are that it was even partially uninstalled, but there are still some leftovers. As you can see, even it's config file cannot be found:
error: open /var/lib/kubelet/config.yaml: no such file or
directory
In case of any doubts or additional questions, don't hesitate to ask.

Related

Ceph Manual Deployment no ceph -s output after mon installs (Nautilus)

I'm trying to build a cluster to test stuff before i apply them to out production cluster. We're using Ceph Nautilus so i decided to install Nautilus first as well.
Used the docs below:
https://docs.ceph.com/en/latest/install/manual-deployment/
Everything seemed to go fine. I installed 3 monitors, generated the monmap copied keyrings to other monitors, started services and they are all up. But when i type ceph -s to check the cluster status it just gets stuck forever without any output. Any command that uses the word "ceph" in it just gets stuck. As a result i can't continue to build the cluster since i need to be able to use ceph commands after monitor deployments to install other services.
Systemctl outputs are the same for all 3 monitors in the current state:
[root#mon2 ~]# systemctl status ceph-mon#mon2
● ceph-mon#mon2.service - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon#.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-04-28 09:55:24 +03; 25min ago
Main PID: 4725 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon#mon2.service
└─4725 /usr/bin/ceph-mon -f --cluster ceph --id mon2 --setuser ceph --setgroup ceph
Apr 28 09:55:24 mon2 systemd[1]: Started Ceph cluster monitor daemon.
Resolved, the problem is caused by missing firewalld and selinux configurations. After applying those and restarting the deployment process my issue was solved.

The connection to the server xxxx:6443 was refused - did you specify the right host or port?

I follow this to install kubernetes on my cloud.
When I run command kubectl get nodes I get this error:
The connection to the server localhost:6443 was refused - did you specify the right host or port?
How can I fix this?
If you followed only mentioned docs it means that you have only installed kubeadm, kubectl and kubelet.
If you want to run kubeadm properly you need to do 3 steps more.
1. Install docker
Install Docker ubuntu version. If you are using another system chose it from left menu side.
Why:
If you will not install docker you will receive errror like below:
preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": e
xecutable file not found in $PATH
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
2. Initialization of kubeadm
You have installed properly kubeadm and docker but now you need to initialize kubeadm. Docs can be found here
In short version you have to run command
$ sudo kubeadm init
After initialization you will receive information to run commands like:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
and token to join another VM to cluster. It looks like
kubeadm join 10.166.XX.XXX:6443 --token XXXX.XXXXXXXXXXXX \
--discovery-token-ca-cert-hash sha256:aXXXXXXXXXXXXXXXXXXXXXXXX166b0b446986dd05c1334626aa82355e7
If you want to run some special action in init phase please check this docs.
3. Change node status to Ready
After previous step you will be able to execute
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu-kubeadm NotReady master 4m29s v1.16.2
But your node will be in NotReady status. If you will describe it $ kubectl describe node you will see error:
Ready False Wed, 30 Oct 2019 09:55:09 +0000 Wed, 30 Oct 2019 09:50:03 +0000 KubeletNotReady runtime network not ready: Ne
tworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
It means that you have to install one of CNIs. List of them can be found here.
EDIT
Also one thing comes to my mind.
Sometimes when you turned off and on VM you need to restart
kubelet and docker service. You can do it by using
$ service docker restart
$ systemctl restart kubelet
Hope it helps.
Looks like kubeconfig file is missing.. Did you copy admin.conf file to ~/.kube/config ?
Verify if there are any proxies set like "http_proxy" or "https_proxy", mostly we set it as environment variables. If yes, then remove the proxies and it should work for you.
I did the following 2 steps. The kubectl works now.
$ service docker restart
$ systemctl restart kubelet

Joining cluster takes forever

I have set up my master node and I am trying to join a worker node as follows:
kubeadm join 192.168.30.1:6443 --token 3czfua.os565d6l3ggpagw7 --discovery-token-ca-cert-hash sha256:3a94ce61080c71d319dbfe3ce69b555027bfe20f4dbe21a9779fd902421b1a63
However the command hangs forever in the following state:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
Since this is just a warning, why does it actually fails?
edit: I noticed the following in my /var/log/syslog
Mar 29 15:03:15 ubuntu-xenial kubelet[9626]: F0329 15:03:15.353432 9626 server.go:193] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Mar 29 15:03:15 ubuntu-xenial systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Mar 29 15:03:15 ubuntu-xenial systemd[1]: kubelet.service: Unit entered failed state.
First if you want to see more detail when your worker joins to the master use:
kubeadm join 192.168.1.100:6443 --token m3jfbb.wq5m3pt0qo5g3bt9 --discovery-token-ca-cert-hash sha256:d075e5cc111ffd1b97510df9c517c122f1c7edf86b62909446042cc348ef1e0b --v=2
Using the above command I could see that my worker could not established connection with the master, so i just stoped the firewall:
systemctl stop firewalld
This can be solved by creating a new token
using this command:
kubeadm token create --print-join-command
and use the token generated for joining other nodes to the cluster
The problem had to do with kubeadm not installing a networking CNI-compatible solution out of the box;
Therefore, without this step the kubernetes nodes/master are unable to establish any form of communication;
The following task addressed the issue:
- name: kubernetes.yml --> Install Flannel
shell: kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml
become: yes
environment:
KUBECONFIG: "/etc/kubernetes/admin.conf"
when: inventory_hostname in (groups['masters'] | last)
I did get the same error on CentOS 7 but in my case join command worked without problems, so it was indeed just a warning.
> [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker
> cgroup driver. The recommended driver is "systemd". Please follow the
> guide at https://kubernetes.io/docs/setup/cri/ [preflight] Reading
> configuration from the cluster... [preflight] FYI: You can look at
> this config file with 'kubectl -n kube-system get cm kubeadm-config
> -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
As the official documentation mentions, there are two common issues that make the init hang (I guess it also applies to join command):
the default cgroup driver configuration for the kubelet differs from
that used by Docker. Check the system log file (e.g. /var/log/message)
or examine the output from journalctl -u kubelet. If you see something
like the following:
First try the steps from official documentation and if that does not work please provide more information so we can troubleshoot further if needed.
I had a bunch of k8s deployment scripts that broke recently with this same error message... it looks like docker changed it's install. Try this --
previous install:
apt-get isntall docker-ce
updated install:
apt-get install docker-ce docker-ce-cli containerd.io
How /var/lib/kubelet/config.yaml is created?
Regarding the /var/lib/kubelet/config.yaml: no such file or directory error.
Below are steps that should occur on the worker node in order for the mentioned file to be created.
1 ) The creation of the /var/lib/kubelet/ folder. It is created when the kubelet service is installed as mentioned here:
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
2 ) The creation of config.yaml. The kubeadm join flow should take place so when you run kubeadm join, kubeadm uses the Bootstrap Token credential to perform a TLS bootstrap, which fetches the credential needed to download the kubelet-config-1.X ConfigMap and writes it to /var/lib/kubelet/config.yaml.
After a successful execution you should see the logs below:
.
.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
.
.
So, after these 2 steps you should have /var/lib/kubelet/config.yaml in place.
Failure of the kubeadm join flow
In your case, it seems that the kubeadm join flow failed which might happen due to multiple reasons like bad configuration of iptables, ports that are already in use, container runtime not installed properly, etc' - as described here and here.
As far as I know, the fact that no networking CNI-compatible solution was in place should not affect the creation of /var/lib/kubelet/config.yaml:
A) We can see the under the kubeadm preflight checks what issues will cause the join phase to fail.
B ) I also tested this issue by removing the current solution I used (Calico) and ran kubeadm reset and kubeadm join again and no errors appeared in the kubeadm logs (I've got the successful execution logs I mentioned above) and /var/lib/kubelet/config.yaml was created properly.
(*) Of course that the cluster can't function in this state - I just wanted to emphasize that I think the problem was one of the options mentioned in A.

How to pass --pod-manifest-path to the kubelet quickly, without creating a new configuration file?

Running kubelet --pod-manifest-path=/newdir returns errors.
It's not clear to me where I can add the --pod-manifest-path to a systemd file on Ubuntu. I know for v1.12 there is the KubeletConfiguration type but I am using v1.11.
You can find in documentation:
Configure your kubelet daemon on the node to use this directory by running it with --pod-manifest-path=/etc/kubelet.d/ argument. On Fedora edit /etc/kubernetes/kubelet to include this line:
KUBELET_ARGS="--cluster-dns=10.254.0.10 --cluster-domain=kube.local --pod-manifest-path=/etc/kubelet.d/"
Instructions for other distributions or Kubernetes installations may vary.
Restart kubelet. On Fedora, this is:
[root#my-node1 ~] $ systemctl restart kubelet
If you want to use --pod-manifest-path you can define it in Kubelet configuration.
Usually it is stored /etc/kubernetes/kubelet or /etc/default/kubelet or /etc/systemd/system/kubelet.service

Zend Server CE messing with Apache?

I am starting with a clean install of Fedora 15 on a VirtualBox VM and trying to install Zend Server CE. To install, I adding the Zend repo to yum and ran:
sudo yum install zend-server-ce-php-5.3
The installation itself seemed to go very well. I opened the browser at http://localhost:10081/ZendServer as directed. After clicking through the license page and entering an administative password I get the error:
Failed to access Web server. Please make sure that the Web server is running and listening to the correct port
The Applications, Rules Management and Administration tabs function properly but the Monitor and Server Setup tabs both display the above error. It is a fact that the web server is not running, but when I try to rectify that I get another error:
$ sudo service httpd start
[sudo] Password for XXXXX:
Starting httpd (via systemctl): Job failed. See system logs and 'systemctl status' for details.
[FAILED]
For what it's worth (not much, I'm guessing) here are the details the message refers to:
$ sudo tail /var/log/messages
....
Jan 17 17:24:18 M5 systemd[1]: httpd.service: control process exited, code=exited status=1
Jan 17 17:24:18 M5 systemd[1]: Unit httpd.service entered failed state.
$ systemctl status httpd.service
httpd.service - LSB: start and stop Apache HTTP Server
Loaded: loaded (/etc/rc.d/init.d/httpd)
Active: failed since Tue, 17 Jan 2012 17:24:18 -0500; 3min 44s ago
Process: 19500 ExecStart=/etc/rc.d/init.d/httpd start (code=exited, status=1/FAILURE)
CGroup: name=systemd:/system/httpd.service
The diagnostics don't seem very helpful. I've tried various things, such as installing and starting httpd before installing Zend Server CE, reinstalling httpd (no good: unistalling it caused Zend to uninstall too). The httpd config isn't causing the problem as the following output demonstrates:
$ /usr/sbin/apachectl configtest
Syntax OK
Is this a know problem? What's my next move? Do I start putting debug statements in the control script to see what's failing? I can do that, but I'm hoping someone out there has dealt with this problem and can give me a quick solution.
I was able to get better information on the cause of the problem by invoking the apachectl script directly rather than using the service:
$ sudo /usr/sbin/apachectl start
httpd: Syntax error on line 220 of /etc/httpd/conf/httpd.conf: Syntax error on line 6 of /etc/httpd/conf.d/zendserver_php.conf: Cannot load /usr/local/zend/lib/apache2/libphp5.so into server: /usr/local/zend/lib/apache2/libphp5.so: cannot enable executable stack as shared object requires: Permission denied
The syntax check on httpd.conf didn't catch this because it's not really a syntax error and it's not in httpd.conf either, but in the included zendserver_php.conf. A quick search shows that this error is the result of libphp5.so violating one of the constraints that SELinux enforces. SELinux is enabled by default in Fedora 15.
I don't like to reduce security, but that the only way I've seen this issue addressed. So I disabled SELinux temporarily with the command
$ sudo setenforce 0
I also edited /etc/selinux/config and changed SELINUX=enforced to SELINUX=disabled so SELinux would stay disabled on reboot. Now my web server starts without a hitch:
[mike#M5 ~]$ sudo service httpd start
Starting httpd (via systemctl): [ OK ]
I would like to think someone in the Zend development community is working on this shared library issue. Reducing security is not an acceptable work-around in a lot of cases. If anybody has a better solution, I'd still like to know it.