While running commands such as kubectl get nodes resulting with following error:
The connection to the server :6443 was refused - did you specify the right host or port?
I ran systemctl status kubelet.service and receiving the following state:
root#k8s-l2bridge-ma:~# sudo systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Tue 2020-06-16 11:46:05 UTC; 9s ago
Docs: https://kubernetes.io/docs/home/
Process: 28012 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 28012 (code=exited, status=255)
Jun 16 11:46:05 k8s-l2bridge-ma systemd[1]: kubelet.service: Failed with result 'exit-code'.
How can I troubleshoot the failure and find out what is wrong? I found few leads googling but nothing solved the problem.
Just make the modification on the file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"
then execute commands:
$ systemctl daemon-reload
$ systemctl restart kubelet
Take a look: fail-kubelet-service, kubelet-failed-start.
in my case deleting swap memory worked out.
swapoff -a
To permanently disable Linux swap space, open the /etc/fstab file, search for a swap line and add a # (hashtag) sign in front of the line to comment on the entire line.
Related
I am trying to run Prometheus' standalone app on an RPI4 8GB. I am following the instructions laid out here: https://pimylifeup.com/raspberry-pi-prometheus/
My prometheus.service file is this:
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=pi
Restart=on-failure
ExecStart=/home/pi/prometheus/prometheus \
--config.file=/home/pi/prometheus/prometheus.yml \
--storage.tsdb.path=/home/pi/prometheus/data
[Install]
WantedBy=multi-user.target
But when I try to run the service I get the following error.
● prometheus.service - Prometheus Server
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2022-11-24 18:42:51 GMT; 2s ago
Docs: https://prometheus.io/docs/introduction/overview/
Process: 485265 ExecStart=/home/pi/prometheus/prometheus --config.file=/home/pi/prometheus/prometheus.yml --storage.tsdb.path=/home/pi/prometheus/data (code=exited, status=2)
Main PID: 485265 (code=exited, status=2)
CPU: 160ms
Nov 24 18:42:51 master2 systemd[1]: prometheus.service: Scheduled restart job, restart counter is at 5.
Nov 24 18:42:51 master2 systemd[1]: Stopped Prometheus Server.
Nov 24 18:42:51 master2 systemd[1]: prometheus.service: Start request repeated too quickly.
Nov 24 18:42:51 master2 systemd[1]: prometheus.service: Failed with result 'exit-code'.
Nov 24 18:42:51 master2 systemd[1]: Failed to start Prometheus Server.
What does Error Status 2 mean in this context? Is it a permission problem, or something else?
I am trying to add prometheus jmx agent (jmx_prometheus_javaagent-0.3.1.jar) to an existing kafka cluster.
But when I run the java agent, I am not getting response on the port as it says-
curl http://localhost:8080
curl: (7) Failed connect to localhost:8080; Connection refused
Here is my configuration file "kafka.service":
[kafka#Kafka-dev prometheus]$ cat /etc/systemd/system/kafka.service
[Unit]
Description=Kafka
After=network.target
[Service]
User=kafka
Group=kafka
Environment="KAFKA_HEAP_OPTS=-Xmx256M -Xms128M"
Environment="KAFKA_OPTS=-javaagent:/home/kafka/prometheus/jmx_prometheus_javaagent-0.3.1.jar=8080:/home/kafka/prometheus/kafka-0-8-2.yml"
ExecStart=/home/kafka/kafka/bin/kafka-server-start.sh -daemon /home/kafka/kafka/config/server.properties
SuccessExitStatus=143
[Install]
WantedBy=multi-user.target
Then when I start Kafka.service it looks that it works:
sudo systemctl restart kafka
But when I check the status I find that the service is inactive:
[kafka#Kafka-dev ~]$ sudo systemctl status kafka.service
● kafka.service - Kafka
Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2019-12-05 10:00:17 UTC; 1min 0s ago
Process: 125469 ExecStart=/home/kafka/kafka/bin/kafka-server-start.sh -daemon /home/kafka/kafka/config/server.properties (code=exited, status=0/SUCCESS)
Main PID: 125469 (code=exited, status=0/SUCCESS)
Note- firewalls on the machine are disabled.
I'm suspecting this has something to do with the configuration of jmx_prometheus_javaagent.
I've followed the setup guide using the gk-deploy script
./gk-deploy.sh --admin-key xxxx --user-key xxx -v -g
This went fine. However, the deployed container fail with
Readiness probe failed: /usr/local/bin/status-probe.sh failed check: systemctl -q is-active glusterd.service
The glusterd.service on the node is running:
$ systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2019-10-19 18:48:25 CEST; 1 day 17h ago
Docs: man:glusterd(8)
Process: 939 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_
Main PID: 955 (glusterd)
Tasks: 10 (limit: 4915)
Memory: 30.5M
CGroup: /system.slice/glusterd.service
└─955 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
....
GLUSTER_BLOCKD_STATUS_PROBE_ENABLE is set to "0"
I am aware of this post, but does not really help me solve my problem
Let's try with a simpler readiness probe to diagnose the problem. I would run the following script to make sure the service is indeed ok given this is the check that is failing.
systemctl -q is-active glusterd.service
if [[ $? -eq 0 ]]; then
exit 0;
else
exit 1;
fi;
There is no point in checking the logs since systemctl will give us the necessary exit code.
I ran command
systemctl stop kubelet
then try to start it
systemctl start kubelet
but can't able to start it
here is the output of systemctl status kubelet
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Wed 2019-06-05 15:35:34 UTC; 7s ago
Docs: https://kubernetes.io/docs/home/
Process: 31697 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 31697 (code=exited, status=255)
Because of this i am not able to run any kubectl command
example kubectl get pods gives
The connection to the server 172.31.6.149:6443 was refused - did you specify the right host or port?
Worked
Need to disable swap using swapoff -a
then,
try to start it systemctl start kubelet
So i need to reset kubelete service
Here are the step :-
check status of your docker service.
If stoped,start it by cmd sudo systemctl start docker.
If not installed installed it
#yum install -y kubelet kubeadm kubectl docker
Make swap off by #swapoff -a
Now reset kubeadm by #kubeadm reset
Now try #kudeadm init
after that check #systemctl status kubelet
it will be working
Check nodes
kubectl get nodes
if Master Node is not ready ,refer following
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
if you not able to create pod ..check dns
kubectl get pods --namespace=kube-system
if dns pods are in pending state
i.e you need to use network service
i used calico
kubectl apply -f https://docs.projectcalico.org/v3.7/manifests/calico.yaml
Now your master node is ready .. now you can deploy pod
When join node :
sudo kubeadm join 172.16.7.101:6443 --token 4mya3g.duoa5xxuxin0l6j3 --discovery-token-ca-cert-hash sha256:bba76ac7a207923e8cae0c466dac166500a8e0db43fb15ad9018b615bdbabeb2
The outputs:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
And systemctl status kubelet:
node#node:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2019-04-17 06:20:56 UTC; 12min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 26716 (kubelet)
Tasks: 16 (limit: 1111)
CGroup: /system.slice/kubelet.service
└─26716 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml -
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.022384 26716 kubelet.go:2244] node "node" not found
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.073969 26716 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Unauthorized
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.122820 26716 kubelet.go:2244] node "node" not found
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.228838 26716 kubelet.go:2244] node "node" not found
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.273153 26716 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Unauthorized
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.330578 26716 kubelet.go:2244] node "node" not found
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.431114 26716 kubelet.go:2244] node "node" not found
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.473501 26716 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.531294 26716 kubelet.go:2244] node "node" not found
Apr 17 06:33:38 node kubelet[26716]: E0417 06:33:38.632347 26716 kubelet.go:2244] node "node" not found
To Unauthorized I checked at master with kubeadm token list, token is valid.
So what's the problem? Thanks a lot.
Please verify pre and post installation steps here:
Please verify also the status of your services enabled and running, docker env.
sudo systemctl enable docker
sudo systemctl enable kubelet
systemctl daemon-reload
systemctl restart docker
systemctl restart kubelet
Are the results the same if you run init command with --ignore-preflight-errors=all
For more details please use also "journalctl -u kubelet"
Having more details from your logs, please take a look at "github - kubeadm/issues" here:
Please provide more details about you env in order to recreate this issue and share with your additional findings.
Could you please perform another test and run kubeadm init on your worker node, in the same way as on the first node (in short please create second master node) just to verify your working env.