kubernetes service can not send request to itself - kubernetes

I have a service that, in some contexts, sends requests to itself.
I can reach the service from outside the cluster, but the self-requests fail (time-out).
Environment:
minikube v0.34.1
Linux version 4.15.0 (jenkins#jenkins) (gcc version 7.3.0 (Buildroot 2018.05)) #1 SMP Fri Feb 15 19:27:06 UTC 2019
I've been using https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#a-pod-cannot-reach-itself-via-service-ip as a troubleshooting guide, but I'm down the step that says "seek help".
Troubleshooting results:
journalctl -u kubelet | grep -i hairpin
Feb 26 19:57:10 minikube kubelet[3066]: W0226 19:57:10.124151 3066 docker_service.go:540] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Feb 26 19:57:10 minikube kubelet[3066]: I0226 19:57:10.124295 3066 docker_service.go:236] Hairpin mode set to "hairpin-veth"
The troubleshooting guide indicates that "hairpin-veth" is OK.
for intf in /sys/devices/virtual/net/docker0/brif/veth*; do cat $intf/hairpin_mode; done
0
...
0
Note that the guide used /sys/devices/virtual/net/cbr0/brif/*, but in this version of minikube, the path is /sys/devices/virtual/net/docker0/brif/veth*. I'd like to understand why the paths are different, but it appears that hairpin_mode is not enabled.
The next step in the guide is: Seek help if none of above works out.
Am I correct in believing that I need to enable hairpin_mode?
If so, how do I do so?

It seems like known issue, more information here:
As workaround you can try:
minikube ssh -- sudo ip link set docker0 promisc on
Please share with the reulsts.

Related

kubelet won't start after kuberntes/manifest update

This is sort of strange behavior in our K8 cluster.
When we try to deploy a new version of our applications we get:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "<container-id>" network for pod "application-6647b7cbdb-4tp2v": networkPlugin cni failed to set up pod "application-6647b7cbdb-4tp2v_default" network: Get "https://[10.233.0.1]:443/api/v1/namespaces/default": dial tcp 10.233.0.1:443: connect: connection refused
I used kubectl get cs and found controller and scheduler in Unhealthy state.
As describer here updated /etc/kubernetes/manifests/kube-scheduler.yaml and
/etc/kubernetes/manifests/kube-controller-manager.yaml by commenting --port=0
When I checked systemctl status kubelet it was working.
Active: active (running) since Mon 2020-10-26 13:18:46 +0530; 1 years 0 months ago
I had restarted kubelet service and controller and scheduler were shown healthy.
But systemctl status kubelet shows (soon after restart kubelet it showed running state)
Active: activating (auto-restart) (Result: exit-code) since Thu 2021-11-11 10:50:49 +0530; 3s ago<br>
Docs: https://github.com/GoogleCloudPlatform/kubernetes<br> Process: 21234 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET
Tried adding Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false" to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf as described here, but still its not working properly.
Also removed --port=0 comment in above mentioned manifests and tried restarting,still same result.
Edit: This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf
Many other suggested kubeadm init again. But this cluster was created using kubespray no manually added nodes.
We have baremetal k8 running on Ubuntu 18.04.
K8: v1.18.8
We would like to know any debugging and fixing suggestions.
PS:
When we try to telnet 10.233.0.1 443 from any node, first attempt fails and second attempt success.
Edit: Found this in kubelet service logs
Nov 10 17:35:05 node1 kubelet[1951]: W1110 17:35:05.380982 1951 docker_sandbox.go:402] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "app-7b54557dd4-bzjd9_default": unexpected command output nsenter: cannot open /proc/12311/ns/net: No such file or directory
Posting comment as the community wiki answer for better visibility
This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf

kubelet saying node "master01" not found

I try to stack up my kubeadm cluster with three masters. I receive this problem from my init command...
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
But I do not use no cgroupfs but systemd
And my kubelet complain for not knowing his nodename.
Jan 23 14:54:12 master01 kubelet[5620]: E0123 14:54:12.251885 5620 kubelet.go:2266] node "master01" not found
Jan 23 14:54:12 master01 kubelet[5620]: E0123 14:54:12.352932 5620 kubelet.go:2266] node "master01" not found
Jan 23 14:54:12 master01 kubelet[5620]: E0123 14:54:12.453895 5620 kubelet.go:2266] node "master01" not found
Please let me know where is the issue.
The issue can be because of docker version, as docker version < 18.6 is supported in latest kubernetes version i.e. v1.13.xx.
Actually I also got the same issue but it get resolved after downgrading the docker version from 18.9 to 18.6.
If the problem is not related to Docker it might be because the Kubelet service failed to establish connection to API server.
I would first of all check the status of Kubelet: systemctl status kubelet and consider restarting with systemctl restart kubelet.
If this doesn't help try re-installing kubeadm or running kubeadm init with other version (use the --kubernetes-version=X.Y.Z flag).
In my case,my k8s version is 1.21.1 and my docker version is 19.03. I solved this bug by upgrading docker to version 20.7.

kubeadm init kubelet complains default bind address already in use

kubeadm version 1.12.2
$ sudo kubeadm init --config kubeadm_new.config --ignore-preflight-errors=all
/var/log/syslog shows:
Nov 15 08:44:13 khteh-T580 kubelet[5101]: I1115 08:44:13.438374 5101 server.go:1013] Started kubelet
Nov 15 08:44:13 khteh-T580 kubelet[5101]: I1115 08:44:13.438406 5101 server.go:133] Starting to listen on 0.0.0.0:10250
Nov 15 08:44:13 khteh-T580 kubelet[5101]: E1115 08:44:13.438446 5101 kubelet.go:1287] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
Nov 15 08:44:13 khteh-T580 kubelet[5101]: E1115 08:44:13.438492 5101 server.go:753] Starting health server failed: listen tcp 127.0.0.1:10248: bind: address already in use
Nov 15 08:44:13 khteh-T580 kubelet[5101]: I1115 08:44:13.438968 5101 server.go:318] Adding debug handlers to kubelet server.
Nov 15 08:44:13 khteh-T580 kubelet[5101]: F1115 08:44:13.439455 5101 server.go:145] listen tcp 0.0.0.0:10250: bind: address already in use
I have tried sudo systemctl stop kubelet and manually kill kubelet process but to no avail. Any advice and insights are appreciated.
Here is what you can do:
Try the following command to find out which process is holding the port 10250
root#master admin]# ss -lntp | grep 10250
LISTEN 0 128 :::10250 :::* users:(("kubelet",pid=23373,fd=20))
It will give you PID of that process and name of that process. If it is unwanted process which is holding the port, you can always kill the process and that port becomes available to use by kubelet.
After killing the process again run the above command, it should return no value.
Just to be on safe side run kubeadm reset and then run kubeadm init and it should go through.
Have you tried using netstat to see what other process is running that has already bound to that port?
sudo netstat -tulpn | grep 10250
For me, later on I discovered that there were 2 extra containers "Terminating" of core-dns-xxxxx in my cluster.
After deleting them forcefully solved the problem for me:
kubectl delete core-dns-xxxx --force
Thanks.
I ditch kubeadm and use microk8s.

PG::ConnectionBad Postgres Cluster down

Digitalocean disabled my droplet's internet access. After fixing the error (rollback to older backup) they restored the internet access. But afterwards I constantly get an error when deploying, I can't seem to get my Postgres database up and running.
I'm getting an error each time I try to deploy my application.
PG::ConnectionBad: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
So I used SSH to login to my server and check if my Postgres was actually running with:
pg_lsclusters
Results into:
Ver Cluster Port Status Owner Data directory Log file
9.5 main 5432 down postgres /var/lib/postgresql/9.5/main /var/log/postgresql/postgresql-9.5-main.log
Postgres server status
So my Postgres server seems to be down. I tried putting it 'up' again with:
pg_ctlcluster 9.5 main start After doing so I got the error: Insecure directory in $ENV{PATH} while running with -T switch at /usr/bin/pg_ctlcluster line 403.
And /usr/bin/pg_ctlcluster on line 403 says:
system 'systemctl', 'is-active', '-q', "postgresql\#$version-$cluster";
But I'm not to sure what the problem could be here and how I could fix this.
Update
I also tried updating the permissions on /bin to 755 as mentioned here. Sadly that did not fix my problem.
Update 2
I changed the /usr/bin to 755. Now when I try pg_ctlcluster 9.5 main start, I get this:
Job for postgresql#9.5-main.service failed because the control process exited with error code. See "systemctl status postgresql#9.5-main.service" and "journalctl -xe" for details.
And inside the systemctl status postgresql#9.5-main.service:
postgresql#9.5-main.service - PostgreSQL Cluster 9.5-main
Loaded: loaded (/lib/systemd/system/postgresql#.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2018-01-28 17:32:38 EST; 45s ago
Process: 22473 ExecStart=postgresql#%i --skip-systemctl-redirect %i start (code=exited, status=1/FAILURE)
Jan 28 17:32:08 *url* systemd[1]: Starting PostgreSQL Cluster 9.5-main...
Jan 28 17:32:38 *url* postgresql#9.5-main[22473]: The PostgreSQL server failed to start.
Jan 28 17:32:38 *url* systemd[1]: postgresql#9.5-main.service: Control process exited, code=exited status=1
Jan 28 17:32:38 *url* systemd[1]: Failed to start PostgreSQL Cluster 9.5-main.
Jan 28 17:32:38 *url* systemd[1]: postgresql#9.5-main.service: Unit entered failed state.
Jan 28 17:32:38 *url* systemd[1]: postgresql#9.5-main.service: Failed with result 'exit-code'.
Thanks!
You better not mix systemctl and pg_ctlcluster. Let systemctl makes the calls to pg_ctlcluster with the right user and permissions. You should start your postgresql instance with
sudo systemctl start postgresql#9.5-main.service
Also, check the errors in the startup log. You can post them too, to help you figure out what's going on.
Your systemctl status also outputs that the service is disable, so, when the server reboots, you will have to start the service manually. To enable it run:
sudo systemctl enable postgresql#9.5-main.service
I hope it helps
It is mainly because /etc/hosts file is somehow changed.I have removed extra space inside /etc/hosts file.Use cat /etc/hosts
Add these lines into the file
127.0.0.1 localhost
127.0.1.1 your-host-name
::1 ip6-localhost ip6-loopback
And I have given permission 644 to /etc/hosts file.It is working for me even after the reboot of the system.

Where are the Kubernetes kubelet logs located?

I installed Kubernetes on my Ubuntu machine. For some debugging purposes I need to look at the kubelet log file (if there is any such file).
I have looked in /var/logs but I couldn't find a such file. Where could that be?
If you run kubelet using systemd, then you could use the following method to see kubelet's logs:
# journalctl -u kubelet
If you are trying to go directly to the file you can find the kubelet logs in /var/log/syslog directory. This is for ubuntu 16.04 and above.
It depends how it was installed. I installed Kubernetes on some Ubuntu machines following the Docker-MultiNode instructions.
With this install, I find the logs using the logs command like this.
Find your container ID.
$ docker ps | egrep kubelet
Use that container ID to view the logs
$ docker logs `<container-id>`
Finally I could find it in /var/log/upstart directory. Kubernetes in my machine is started using upstart. That's why those log files are in upstart directory
I installed Kubernetes by kind (Kubernetes in docker).
find docker container of kind to enter
$ docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
62588e4d284b kindest/node:v1.17.0 "/usr/local/bin/entr…" 2 weeks ago Up 2 weeks 127.0.0.1:32769->6443/tcp kind2-control-plane
$ docker container exec -it kind2-control-plane bash
root#kind2-control-plane:/#
Inside container kind2-control-plane, you could find logfiles in two place:
/var/log/containers/
/var/log/pods/
And then,you will find they are the same, you can see the example below:
root#kind2-control-plane:/# cat /var/log/containers/redis-master-7db7f6579f-scw95_default_master-f6374281c2c6afcfcd0ee1214d9bd51c1684c0b6c0ba1056295246ecd055563c.log | tail -n 5
2020-04-08T12:09:29.824252114Z stdout F
2020-04-08T12:09:29.824372278Z stdout F [1] 08 Apr 12:09:29.822 # Server started, Redis version 2.8.19
2020-04-08T12:09:29.824440661Z stdout F [1] 08 Apr 12:09:29.823 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
2020-04-08T12:09:29.824459317Z stdout F [1] 08 Apr 12:09:29.823 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2020-04-08T12:09:29.82446451Z stdout F [1] 08 Apr 12:09:29.824 * The server is now ready to accept connections on port 6379
root#kind2-control-plane:/# cat /var/log/pods/default_redis-master-7db7f6579f-scw95_094824e1-25aa-4e1e-ab23-d4bae861988a/master/0.log | tail -n 5
2020-04-08T12:09:29.824252114Z stdout F
2020-04-08T12:09:29.824372278Z stdout F [1] 08 Apr 12:09:29.822 # Server started, Redis version 2.8.19
2020-04-08T12:09:29.824440661Z stdout F [1] 08 Apr 12:09:29.823 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
2020-04-08T12:09:29.824459317Z stdout F [1] 08 Apr 12:09:29.823 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2020-04-08T12:09:29.82446451Z stdout F [1] 08 Apr 12:09:29.824 * The server is now ready to accept connections on port 6379
root#kind2-control-plane:/# ls -l /var/log/containers/ | grep redis
lrwxrwxrwx 1 root root 101 Apr 8 12:09 redis-master-7db7f6579f-scw95_default_master-f6374281c2c6afcfcd0ee1214d9bd51c1684c0b6c0ba1056295246ecd055563c.log -> /var/log/pods/default_redis-master-7db7f6579f-scw95_094824e1-25aa-4e1e-ab23-d4bae861988a/master/0.log
If you want to know more in detail about the directories, you can see 2019-2-merge-request in Github.