How to resolve scheduler and controller-manager unhealthy state in Kubernetes [closed] - kubernetes

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Symptom:
When we install the new kubernetes cluster. When we execute the following command:
$ kubectl get cs / kubectl get componentstatuses
we get this error:
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get "http://127.0.0.1:10252/healthz": dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}

Solution:
Modify the following files on all master nodes:
$ sudo vi /etc/kubernetes/manifests/kube-scheduler.yaml
Clear the line (spec->containers->command) containing this phrase: - --port=0
$ sudo vi /etc/kubernetes/manifests/kube-controller-manager.yaml
Clear the line (spec->containers->command) containing this phrase: - --port=0
$ sudo systemctl restart kubelet.service
Another reason for this problem:
You may have used http_proxy in the docker setting. In this case, you must set address of the master nodes addresses in no_proxy

Related

etcdctl: unknown command "save" for "etcdctl" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last year.
Improve this question
I entered the etcd container:
kubectl -n kube-system exec -it etcd-k8scp -- sh
The I try to backup the container like explained in the K8s docs
ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshotdb
I get this error:
Error: unknown command "save" for "etcdctl"
What's wrong with my command?
I forgot to set $ENDPOINT.
If it is empty, then etcdctl gets this:
ETCDCTL_API=3 etcdctl --endpoints snapshot save snapshotdb
etcdctl thinks I want to address the endpoint called "snapshot" and execute the command "save"
:-)

how to know why the kube-proxy stopped [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Today I found the kubernetes v1.15.2 cluster one node's kube-proxy process was stopped, this is the stopped status:
[root#uat-k8s-01 ~]# systemctl status -l kube-proxy
● kube-proxy.service - Kubernetes Kube-Proxy Server
Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Sat 2020-04-18 08:04:18 CST; 2 weeks 0 days ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 937394 ExecStart=/opt/k8s/bin/kube-proxy --config=/etc/kubernetes/kube-proxy-config.yaml --logtostderr=true --v=2 (code=killed, signal=PIPE)
Main PID: 937394 (code=killed, signal=PIPE)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
from this log tips I did not know why the kube-proxy process stopped. This is the kube-proxy service config:
[root#uat-k8s-01 ~]# cat /etc/systemd/system/kube-proxy.service
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=/opt/k8s/k8s/kube-proxy
ExecStart=/opt/k8s/bin/kube-proxy \
--config=/etc/kubernetes/kube-proxy-config.yaml \
--logtostderr=true \
--v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
is there any way to find out why the kube-proxy failed and avoid stop the next time? This is the journal log output:
[root#uat-k8s-01 ~]# journalctl -u kube-proxy.service
-- No entries --
Use journalctl -u kube-proxy.service or check /var/log/kube-proxy.log to see kube-proxy logs. In a real production setup you should send logs to a log aggregator system such as ELK or splunk so that logs are not lost.

Kubernetes Master Worker Node Kubeadm Join issue [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am installing Kubernetes on Oracle Virtualbox in my laptop using Kubeadm .
Everything worked fine till i run this command on Kuberenets Worker node to join with Master node
I got the error after running
sudo kubeadm join 192.168.56.100:6443 --token 0i2osm.vsp2mk63v1ypeyjf --discovery-token-ca-cert-hash sha256:18511321fcc4b622628dd1ad2f56dbdd319bf024740d58127818720828cc7bf0
Error
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
I tried deleting files manually and ran the command again but it didnt resolve the port issue .
and whenever i stop the kubectl which is running on 10250 port and then run the command it gives error to " kubectl needs to be started and when i start the kubectl then it gives error for port 10250 is in use "
Its a kind of chicken and egg thing
Any views on how i can resolve it ?
you should first try
#kubeadm reset
because you already have kubernetes it gets error.
Regarding kubeadm reset:
1 ) As describe here:
The "reset" command executes the following phases:
preflight Run reset pre-flight checks
update-cluster-status Remove this node from the ClusterStatus object.
remove-etcd-member Remove a local etcd member.
cleanup-node Run cleanup node.
So I recommend to run the preflight phase first (by using the --skip-phases flag) before executing the all phases together.
2 ) When you execute the cleanup-node phase you can see that the following steps are being logged:
.
.
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [
/etc/kubernetes/manifests
/etc/kubernetes/pki
]
[reset] Deleting files: [
/etc/kubernetes/admin.conf
/etc/kubernetes/kubelet.conf
/etc/kubernetes/bootstrap-kubelet.conf
/etc/kubernetes/controller-manager.conf
/etc/kubernetes/scheduler.conf
]
.
.
Let's go over the [reset] entries and see how they solve the 4 errors you mentioned:
A ) The first [reset] entry will fix the Port 10250 is in use issue (kubelet was listening on this port).
B ) The fourth [reset] entry will fix the two errors of /etc/kubernetes/manifests is not empty and /etc/kubernetes/kubelet.conf already exists.
C ) And we're left with the /etc/kubernetes/pki/ca.crt already exists error.
I thought that the third [reset] entry of removing /etc/kubernetes/pki should take care of that.
But, in my case when I ran the kubeadm join with verbosity level of 5 (by appending the --v=5 flag) I encounter the error below:
I0929 ... checks.go:432] validating if ...
[preflight] Some fatal errors occurred:
[ERROR FileAvailable-etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
So I had to remove the /etc/kubernetes/pki folder manually and then the kubeadm join was successful again.
Do not run
kubeadm init (do not do this)
on a worker node before joining. This is only run on your primary node. Doing this can be why you already have these files, when you should not. As Yasin, said:
kubeadm reset

Liquibase Running as a Kubernetes job is failing to get connected with Postgres container [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Setting up minikube cluster with the postgress and liquibase.
--> postgres is deployed in the pods
--> Running liquibase job to update the postgres
kubernetes job file to run update command in liquibase:
Dockerfile to create a liquibase image:
error log:
The pod is not able to establish connection to the database. Make sure database username and password is correct. Instead of setting , localhost in LIQUIBASE_URL in DockerFile, can you provide the IP here. Also try to exec into the pod and check if you are able to ping the machine where database is hosted.
the issue is resolved .. giving the refrence of the internal end point of the Postgres pod :)

supervisord http://localhost:9001 refused connection [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Despite much effort trying all solutions posted on stackoverflow, I still cannot manage solving this.
The problem (Case 1):
$ sudo supervisorctl -c /app/vpn_bot/supervisord.conf
http://localhost:9001 refused connection
Case 2:
$ sudo supervisorctl -c /app/vpn_bot/supervisord.conf
unix:///tmp/supervisorctl.sock refused connection
Here is the relevant supervisord.conf file:
[supervisord]
# nodaemon=true
[supervisorctl]
# case 1: serverurl=http://127.0.0.1:9001
serverurl=unix:///tmp/supervisorctl.sock # case 2
[unix_http_server]
file=/tmp/supervisorctl.sock
[inet_http_server]
port=127.0.0.1:9001
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
I have made sure sudo supervisord -c /app/vpn_bot/supervisord.conf is running, and port 9001 is not used by any other process.
Any one can offer some help here?