Cannot curl kubelet read-only port - kubernetes

I have a heapster pod running on one of the nodes in my Kubernetes cluster. It is able to get http://<node-with-heapster-pod>:10255/stats/summary just fine, but whenever it runs the same get request on another node, it cannot. When I run curl from within any given node I can access that port, but when I curl any node from another machine I get the following error:
Failed to connect to 128.180.120.229 port 10255: No route to host
The following is the netstat output for all ports on which the kubelet is listening:
netstat -ap | grep -i "listen" | grep "kubelet"
tcp 0 0 localhost:10248 0.0.0.0:* LISTEN 7562/kubelet
tcp6 0 0 [::]:4194 [::]:* LISTEN 7562/kubelet
tcp6 0 0 [::]:10250 [::]:* LISTEN 7562/kubelet
tcp6 0 0 [::]:10255 [::]:* LISTEN 7562/kubelet
unix 2 [ ACC ] STREAM LISTENING 621349 7562/kubelet /var/run/dockershim.sock
I apologize for the messy last column. Any ideas why this may be? My iptables rules are set up to accept all incoming connections, and any node can ping port 10250 fine, just not 10255.

you may not have ip_forward enabled on your system. can you check this settings?
sysctl -n net.ipv4.ip_forward

If anybody still cares, port 10255 is the kubelet's read only port and may or may not be configured. You can confirm this by accessing the worker node in question then looking at the kubelet's startup command.
systemctl status kubelet-worker.service
Some on-prem kubernetes solutions set this to 0 as mentioned below
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
--read-only-port int32 The read-only port for the Kubelet to serve on with no authentication/authorization (set to 0 to disable) (default 10255) (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

Related

Kube Controller Manager CrashLoopBackOff

My kube-controller-managerkeeps staying on CrashLoopBackOff status.
I found this upon looking in the logs of the pod:
failed to create listener: failed to listen on 0.0.0.0:10252: listen tcp 0.0.0.0:10252: bind: address already in use
Then I stumbled upon this article who fortunately was able to find a fix for it. Where he killed the process using the port and restarted his kube-controller-manager pod. https://medium.com/#deepeshtripathi/kubernetes-controller-pod-crashloopbackoff-resolved-16aaa1c27cfc
So I did follow the steps he made. When I have tried to get into the master node to find which process is using this port, I can't see anything that uses it.
root#ip:/# netstat -tunlp | grep 1025
tcp6 0 0 :::10250 :::* LISTEN 1598/kubelet
tcp6 0 0 :::10251 :::* LISTEN 7472/kube-scheduler
tcp6 0 0 :::10255 :::* LISTEN 1598/kubelet
tcp6 0 0 :::10256 :::* LISTEN 5629/kube-proxy
Is there anyone else know any solution on how to fix this?
failed to create listener: failed to listen on 0.0.0.0:10252: listen tcp 0.0.0.0:10252: bind: address already in use
According to the error message port 10252 is in use. So need to stop listening on this port. You can do that by running
fuser -k 10252/tcp

How to enable listening 10255 in my kubelet service

I am learning to work with Kubernetes and trying to configure monitoring of my Kubernetes cluster. For this I use metricbeat and elk.
After deploying and configuring metricbeat, I get an error:
error making http request: Get http://172.16.0.205:10255/stats/summary: dial tcp 172.16.0.205:10255: connect: connection refused
I found that my Kubelet is not listening on port 10255:
[root#kube2 /]# netstat -ap | grep -i "listen" | grep "kubelet"
tcp 0 0 localhost:40450 0.0.0.0:* LISTEN 8560/kubelet
tcp 0 0 localhost:10248 0.0.0.0:* LISTEN 8560/kubelet
tcp6 0 0 [::]:10250 [::]:* LISTEN 8560/kubelet
How can I enable this port. I found information that I need to use the parameter --read-only-port = 10255, but how do I apply it to my kubelet, I do not quite understand. For example:
[root#kube2 /]# kubelet --config --read-only-port=10255
\F1010 13:32:48.592306 15851 server.go:196] failed to load Kubelet config file --read-only-port=10255, error failed to read kubelet config file "/--read-only-port=10255", error: open /--read-only-port=10255: no such file or directory
It's does't work. Which file does it need?
Can anyone help me with a solution to this problem?
I resolved this issue. I added flags in /var/lib/kubelet/kubelet-flags in every my kubertenes' nodes:
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --read-only-port=10255"
and restart kubelet service.
Now I have open port 10255:
[root#kube2 7.1]# netstat -ap | grep -i "listen" | grep "kubelet"
tcp 0 0 localhost:44799 0.0.0.0:* LISTEN 6281/kubelet
tcp 0 0 localhost:10248 0.0.0.0:* LISTEN 6281/kubelet
tcp6 0 0 [::]:10250 [::]:* LISTEN 6281/kubelet
tcp6 0 0 [::]:10255 [::]:* LISTEN 6281/kubelet
And I see some logs of kubernetes in my kibana.

kubernetes master starts only on tcp6 , how to join a node?

I have a local Kubernetes master started on a tcp6:6443 but not on tcp so how to start a kubeadm join for using the right port?
tcp6 0 0 :::10250 :::* LISTEN -
tcp6 0 0 :::6443 :::* LISTEN -
tcp6 0 0 :::10251 :::* LISTEN -
Starting Nmap 7.01 ( https://nmap.org ) at 2019-09-25 15:40 CEST
Nmap scan report for 10.0.2.15
Host is up (0.000081s latency).
PORT STATE SERVICE
6443/tcp closed unknown
You should run the below command (on master host):
$ kubeadm init --apiserver-advertise-address=<private-ip of master host>
--apiserver-advertise-address parameter - if the node should host a new control plane instance, the IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
Now try to run the join command that was generated in the output of kubeadm init. It should works fine.
Also, what you can check is a firewall running on your master node that should be disabled. It’s blocking incoming traffic.
systemctl stop firewalld

What is the extra random port in kafka and how to set it to bind to localhost

This happens on host and docker images, besides the famous 9092 port, there is another dynamic port that kafka listens to.
I am using the /usr/local/kafka/bin/kafka-server-start.sh to run kafka.
ps -ef |grep kafka |grep -v grep |awk '{print $2}'
15580
netstat -tnpl |grep 15580
tcp6 0 0 :::37023 ::: LISTEN 15580/java*
tcp6 0 0 192.168.64.18:9092 :::* LISTEN 15580/java
What is the port "37023" above? how to disable it? Can it be bind to "localhost" ?
The actual Kafka process only listens on the 9092 port by default.
Can you run lsof -i :37023, or whatever other dynamic port you get? That would get the PID of the process that is listening on that TCP port, and will probably point to the culprit.

Network connectivity/DNS issues on a GKE 1.10 kubernetes cluster

I'm running into DNS issues on a GKE 1.10 kubernetes cluster. Occasionally pods start without any network connectivity. Restarting the pod tends to fix the issue.
Here's the result of the same few commands inside a container without network, and one with.
BROKEN:
kc exec -it -n iotest app1-b67598997-p9lqk -c userapp sh
/app $ nslookup www.google.com
nslookup: can't resolve '(null)': Name does not resolve
/app $ cat /etc/resolv.conf
nameserver 10.63.240.10
search iotest.svc.cluster.local svc.cluster.local cluster.local c.myproj.internal google.internal
options ndots:5
/app $ curl -I 10.63.240.10
curl: (7) Failed to connect to 10.63.240.10 port 80: Connection refused
/app $ netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:8001 0.0.0.0:* LISTEN 1/python
tcp 0 0 ::1:50051 :::* LISTEN 1/python
tcp 0 0 ::ffff:127.0.0.1:50051 :::* LISTEN 1/python
WORKING:
kc exec -it -n iotest app1-7d985bfd7b-h5dbr -c userapp sh
/app $ nslookup www.google.com
nslookup: can't resolve '(null)': Name does not resolve
Name: www.google.com
Address 1: 74.125.206.147 wk-in-f147.1e100.net
Address 2: 74.125.206.105 wk-in-f105.1e100.net
Address 3: 74.125.206.99 wk-in-f99.1e100.net
Address 4: 74.125.206.104 wk-in-f104.1e100.net
Address 5: 74.125.206.106 wk-in-f106.1e100.net
Address 6: 74.125.206.103 wk-in-f103.1e100.net
Address 7: 2a00:1450:400c:c04::68 wk-in-x68.1e100.net
/app $ cat /etc/resolv.conf
nameserver 10.63.240.10
search iotest.svc.cluster.local svc.cluster.local cluster.local c.myproj.internal google.internal
options ndots:5
/app $ curl -I 10.63.240.10
HTTP/1.1 404 Not Found
date: Sun, 29 Jul 2018 15:13:47 GMT
server: envoy
content-length: 0
/app $ netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:15000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:8001 0.0.0.0:* LISTEN 1/python
tcp 0 0 10.60.2.6:56508 10.60.48.22:9091 ESTABLISHED -
tcp 0 0 127.0.0.1:57768 127.0.0.1:50051 ESTABLISHED -
tcp 0 0 10.60.2.6:43334 10.63.255.44:15011 ESTABLISHED -
tcp 0 0 10.60.2.6:15001 10.60.45.26:57160 ESTABLISHED -
tcp 0 0 10.60.2.6:48946 10.60.45.28:9091 ESTABLISHED -
tcp 0 0 127.0.0.1:49804 127.0.0.1:50051 ESTABLISHED -
tcp 0 0 ::1:50051 :::* LISTEN 1/python
tcp 0 0 ::ffff:127.0.0.1:50051 :::* LISTEN 1/python
tcp 0 0 ::ffff:127.0.0.1:50051 ::ffff:127.0.0.1:49804 ESTABLISHED 1/python
tcp 0 0 ::ffff:127.0.0.1:50051 ::ffff:127.0.0.1:57768 ESTABLISHED 1/python
These pods are identical, just one was restarted.
Does anyone have advice about how to analyse and fix this issue?
Some steps to try:
1) ifconfig eth0 or whatever the primary interface is.
Is the interface up? Are the tx and rx packet counts increasing?
2)If interface is up, you can try tcpdump as you are running the nslookup command that you posted. See if the dns request packets are getting sent out.
3) See which node the pod is scheduled on, when network connectivity gets broken. Maybe it is on the same node every time? If yes, are other pods on that node running into similar problem?
I also faced the same problem, and I simply worked around it for now by switching to the 1.9.x GKE version (after spending many hours trying to debug why my app wasn't working).
Hope this helps!