how to know why the kube-proxy stopped [closed] - kubernetes

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Today I found the kubernetes v1.15.2 cluster one node's kube-proxy process was stopped, this is the stopped status:
[root#uat-k8s-01 ~]# systemctl status -l kube-proxy
● kube-proxy.service - Kubernetes Kube-Proxy Server
Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Sat 2020-04-18 08:04:18 CST; 2 weeks 0 days ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 937394 ExecStart=/opt/k8s/bin/kube-proxy --config=/etc/kubernetes/kube-proxy-config.yaml --logtostderr=true --v=2 (code=killed, signal=PIPE)
Main PID: 937394 (code=killed, signal=PIPE)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
from this log tips I did not know why the kube-proxy process stopped. This is the kube-proxy service config:
[root#uat-k8s-01 ~]# cat /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=/opt/k8s/k8s/kube-proxy
ExecStart=/opt/k8s/bin/kube-proxy \
--config=/etc/kubernetes/kube-proxy-config.yaml \
--logtostderr=true \
--v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
is there any way to find out why the kube-proxy failed and avoid stop the next time? This is the journal log output:
[root#uat-k8s-01 ~]# journalctl -u kube-proxy.service
-- No entries --

Use journalctl -u kube-proxy.service or check /var/log/kube-proxy.log to see kube-proxy logs. In a real production setup you should send logs to a log aggregator system such as ELK or splunk so that logs are not lost.

Related

Ceph Manual Deployment no ceph -s output after mon installs (Nautilus)

I'm trying to build a cluster to test stuff before i apply them to out production cluster. We're using Ceph Nautilus so i decided to install Nautilus first as well.
Used the docs below:
https://docs.ceph.com/en/latest/install/manual-deployment/
Everything seemed to go fine. I installed 3 monitors, generated the monmap copied keyrings to other monitors, started services and they are all up. But when i type ceph -s to check the cluster status it just gets stuck forever without any output. Any command that uses the word "ceph" in it just gets stuck. As a result i can't continue to build the cluster since i need to be able to use ceph commands after monitor deployments to install other services.
Systemctl outputs are the same for all 3 monitors in the current state:
[root#mon2 ~]# systemctl status ceph-mon#mon2
● ceph-mon#mon2.service - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon#.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2021-04-28 09:55:24 +03; 25min ago
Main PID: 4725 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon#mon2.service
└─4725 /usr/bin/ceph-mon -f --cluster ceph --id mon2 --setuser ceph --setgroup ceph
Apr 28 09:55:24 mon2 systemd[1]: Started Ceph cluster monitor daemon.
Resolved, the problem is caused by missing firewalld and selinux configurations. After applying those and restarting the deployment process my issue was solved.

How to resolve scheduler and controller-manager unhealthy state in Kubernetes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Symptom:
When we install the new kubernetes cluster. When we execute the following command:
$ kubectl get cs / kubectl get componentstatuses
we get this error:
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
Solution:
Modify the following files on all master nodes:
$ sudo vi /etc/kubernetes/manifests/kube-scheduler.yaml
Clear the line (spec->containers->command) containing this phrase: - --port=0
$ sudo vi /etc/kubernetes/manifests/kube-controller-manager.yaml
Clear the line (spec->containers->command) containing this phrase: - --port=0
$ sudo systemctl restart kubelet.service
Another reason for this problem:
You may have used http_proxy in the docker setting. In this case, you must set address of the master nodes addresses in no_proxy

Kubernetes Master Worker Node Kubeadm Join issue [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am installing Kubernetes on Oracle Virtualbox in my laptop using Kubeadm .
Everything worked fine till i run this command on Kuberenets Worker node to join with Master node
I got the error after running
sudo kubeadm join 192.168.56.100:6443 --token 0i2osm.vsp2mk63v1ypeyjf --discovery-token-ca-cert-hash sha256:18511321fcc4b622628dd1ad2f56dbdd319bf024740d58127818720828cc7bf0
Error
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
I tried deleting files manually and ran the command again but it didnt resolve the port issue .
and whenever i stop the kubectl which is running on 10250 port and then run the command it gives error to " kubectl needs to be started and when i start the kubectl then it gives error for port 10250 is in use "
Its a kind of chicken and egg thing
Any views on how i can resolve it ?
you should first try
#kubeadm reset
because you already have kubernetes it gets error.
Regarding kubeadm reset:
1 ) As describe here:
The "reset" command executes the following phases:
preflight Run reset pre-flight checks
update-cluster-status Remove this node from the ClusterStatus object.
remove-etcd-member Remove a local etcd member.
cleanup-node Run cleanup node.
So I recommend to run the preflight phase first (by using the --skip-phases flag) before executing the all phases together.
2 ) When you execute the cleanup-node phase you can see that the following steps are being logged:
.
.
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [
/etc/kubernetes/manifests
/etc/kubernetes/pki
]
[reset] Deleting files: [
/etc/kubernetes/admin.conf
/etc/kubernetes/kubelet.conf
/etc/kubernetes/bootstrap-kubelet.conf
/etc/kubernetes/controller-manager.conf
/etc/kubernetes/scheduler.conf
]
.
.
Let's go over the [reset] entries and see how they solve the 4 errors you mentioned:
A ) The first [reset] entry will fix the Port 10250 is in use issue (kubelet was listening on this port).
B ) The fourth [reset] entry will fix the two errors of /etc/kubernetes/manifests is not empty and /etc/kubernetes/kubelet.conf already exists.
C ) And we're left with the /etc/kubernetes/pki/ca.crt already exists error.
I thought that the third [reset] entry of removing /etc/kubernetes/pki should take care of that.
But, in my case when I ran the kubeadm join with verbosity level of 5 (by appending the --v=5 flag) I encounter the error below:
I0929 ... checks.go:432] validating if ...
[preflight] Some fatal errors occurred:
[ERROR FileAvailable-etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
So I had to remove the /etc/kubernetes/pki folder manually and then the kubeadm join was successful again.
Do not run
kubeadm init (do not do this)
on a worker node before joining. This is only run on your primary node. Doing this can be why you already have these files, when you should not. As Yasin, said:
kubeadm reset

Joining cluster takes forever

I have set up my master node and I am trying to join a worker node as follows:
kubeadm join 192.168.30.1:6443 --token 3czfua.os565d6l3ggpagw7 --discovery-token-ca-cert-hash sha256:3a94ce61080c71d319dbfe3ce69b555027bfe20f4dbe21a9779fd902421b1a63
However the command hangs forever in the following state:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
Since this is just a warning, why does it actually fails?
edit: I noticed the following in my /var/log/syslog
Mar 29 15:03:15 ubuntu-xenial kubelet[9626]: F0329 15:03:15.353432 9626 server.go:193] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Mar 29 15:03:15 ubuntu-xenial systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Mar 29 15:03:15 ubuntu-xenial systemd[1]: kubelet.service: Unit entered failed state.
First if you want to see more detail when your worker joins to the master use:
kubeadm join 192.168.1.100:6443 --token m3jfbb.wq5m3pt0qo5g3bt9 --discovery-token-ca-cert-hash sha256:d075e5cc111ffd1b97510df9c517c122f1c7edf86b62909446042cc348ef1e0b --v=2
Using the above command I could see that my worker could not established connection with the master, so i just stoped the firewall:
systemctl stop firewalld
This can be solved by creating a new token
using this command:
kubeadm token create --print-join-command
and use the token generated for joining other nodes to the cluster
The problem had to do with kubeadm not installing a networking CNI-compatible solution out of the box;
Therefore, without this step the kubernetes nodes/master are unable to establish any form of communication;
The following task addressed the issue:
- name: kubernetes.yml --> Install Flannel
shell: kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml
become: yes
environment:
KUBECONFIG: "/etc/kubernetes/admin.conf"
when: inventory_hostname in (groups['masters'] | last)
I did get the same error on CentOS 7 but in my case join command worked without problems, so it was indeed just a warning.
> [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker
> cgroup driver. The recommended driver is "systemd". Please follow the
> guide at https://kubernetes.io/docs/setup/cri/ [preflight] Reading
> configuration from the cluster... [preflight] FYI: You can look at
> this config file with 'kubectl -n kube-system get cm kubeadm-config
> -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
As the official documentation mentions, there are two common issues that make the init hang (I guess it also applies to join command):
the default cgroup driver configuration for the kubelet differs from
that used by Docker. Check the system log file (e.g. /var/log/message)
or examine the output from journalctl -u kubelet. If you see something
like the following:
First try the steps from official documentation and if that does not work please provide more information so we can troubleshoot further if needed.
I had a bunch of k8s deployment scripts that broke recently with this same error message... it looks like docker changed it's install. Try this --
previous install:
apt-get isntall docker-ce
updated install:
apt-get install docker-ce docker-ce-cli containerd.io
How /var/lib/kubelet/config.yaml is created?
Regarding the /var/lib/kubelet/config.yaml: no such file or directory error.
Below are steps that should occur on the worker node in order for the mentioned file to be created.
1 ) The creation of the /var/lib/kubelet/ folder. It is created when the kubelet service is installed as mentioned here:
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
2 ) The creation of config.yaml. The kubeadm join flow should take place so when you run kubeadm join, kubeadm uses the Bootstrap Token credential to perform a TLS bootstrap, which fetches the credential needed to download the kubelet-config-1.X ConfigMap and writes it to /var/lib/kubelet/config.yaml.
After a successful execution you should see the logs below:
.
.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
.
.
So, after these 2 steps you should have /var/lib/kubelet/config.yaml in place.
Failure of the kubeadm join flow
In your case, it seems that the kubeadm join flow failed which might happen due to multiple reasons like bad configuration of iptables, ports that are already in use, container runtime not installed properly, etc' - as described here and here.
As far as I know, the fact that no networking CNI-compatible solution was in place should not affect the creation of /var/lib/kubelet/config.yaml:
A) We can see the under the kubeadm preflight checks what issues will cause the join phase to fail.
B ) I also tested this issue by removing the current solution I used (Calico) and ran kubeadm reset and kubeadm join again and no errors appeared in the kubeadm logs (I've got the successful execution logs I mentioned above) and /var/lib/kubelet/config.yaml was created properly.
(*) Of course that the cluster can't function in this state - I just wanted to emphasize that I think the problem was one of the options mentioned in A.

How to download Kubernetes with Systemd at CoreOS

I am provisioning a cluster of CoreOS machines. But I am having trouble downloading the kubernetes tar ball because of its size (~450MB). I have managed to use this same techinique to download the latest etcd2, fleet and flannel, but when downloading such a big file as kubernetes my service fails or stop without any stack strace. It think is something related with the fact systemd is neither waiting nor restarting the service as I would expect.This is my service file:
[Unit]
Description=updates kubernetes v1.2
[Service]
Type=oneshot
User=root
WorkingDirectory=/home/core
ExecStart=/usr/bin/mkdir -p /opt/bin
ExecStart=/usr/bin/mkdir -p /home/core/kubernetes
ExecStart=/bin/wget https://github.com/kubernetes/kubernetes/releases/download/v1.2.0/kubernetes.tar.gz
ExecStart=/usr/bin/tar zxf /home/core/kubernetes -C /home/core/kubernetes --strip-components=1
ExecStart=/usr/bin/mv kubernetes/platforms/linux/amd64/kubectl /opt/bin/kubectl
ExecStart=/usr/bin/tar zxf kubernetes/server/kubernetes-server-linux-amd64.tar.gz
ExecStart=/usr/bin/chmod a+x kubernetes/server/bin/*
ExecStart=/usr/bin/mv kubernetes/server/bin/* /opt/bin
ExecStart=/usr/bin/rm -f /home/core/kubernetes
I bet you need to set/increase the TimeoutStartSec= parameter which is probably defaulted to 30 seconds or something like that.