Too many TCP ports in CLOSE WAIT condition in kafka broker - apache-kafka

Too many TCP Connections are in CLOSE_WAIT status in a kafka broker causing DisconnectionException in kafka clients.
tcp6 27 0 172.31.10.143:9092 172.31.0.47:45138 ESTABLISHED -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41612 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.0.47:45010 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:43000 CLOSE_WAIT -
tcp6 194 0 172.31.10.143:8080 172.31.20.219:45952 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.20.219:48006 CLOSE_WAIT -
tcp6 1 0 172.31.10.143:9092 172.31.0.47:44582 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:42828 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41934 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41758 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41584 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41852 CLOSE_WAIT -
tcp6 1 0 172.31.10.143:9092 172.31.0.47:44342 CLOSE_WAIT -
Error in debezium
connect-prod | 2019-02-14 06:28:54,885 INFO || [Consumer clientId=consumer-3, groupId=4] Error sending fetch request (sessionId=1727876188, epoch=INITIAL) to node 2: org.apache.kafka.common.errors.DisconnectException. [org.apache.kafka.clients.FetchSessionHandler] connect-prod | 2019-02-14 06:28:55,448 INFO || [Consumer clientId=consumer-1, groupId=4] Error sending fetch request (sessionId=1379896198, epoch=INITIAL) to node 2: org.apache.kafka.common.errors.DisconnectException. [org.apache.kafka.clients.FetchSessionHandler]
What can be the reason behind this?

It appears that this is a known issue in Kafka 2.1.0.
https://issues.apache.org/jira/browse/KAFKA-7697
I think the connections stuck in Close_wait is a side effect of the real problem.
This issue has been fixed in Kafka version 2.1.1 which should be released in a few days. Looking forward to it.

Related

Minikube / Docker proxy is using port 80

I am using minikube in no-driver mode ( sudo minikube start --vm-driver none ) and I can't free port 80.
With
sudo netstat -nlplute
I get:
tcp 0 0 192.168.0.14:2380 0.0.0.0:* LISTEN 0 58500 7200/etcd
tcp6 0 0 :::80 :::* LISTEN 0 62030 8681/docker-proxy
tcp6 0 0 :::8080 :::* LISTEN 0 57318 8656/docker-proxy
I tried to stop minikube, but it doesn't seem to be working when using driver=none
How should I free port 80 ?
EDIT: Full netstat ouput
➜ ~ sudo netstat -nlpute
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 102 35399 1019/systemd-resolv
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 0 6629864 11358/cupsd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 128 45843 1317/postgres
tcp 0 0 127.0.0.1:6942 0.0.0.0:* LISTEN 1000 14547489 16086/java
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 0 58474 1053/kubelet
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 0 71361 10409/kube-proxy
tcp 0 0 127.0.0.1:45801 0.0.0.0:* LISTEN 0 57445 1053/kubelet
tcp 0 0 192.168.0.14:2379 0.0.0.0:* LISTEN 0 56922 7920/etcd
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 0 56921 7920/etcd
tcp 0 0 192.168.0.14:2380 0.0.0.0:* LISTEN 0 56917 7920/etcd
tcp 0 0 127.0.0.1:2381 0.0.0.0:* LISTEN 0 56084 7920/etcd
tcp 0 0 127.0.0.1:63342 0.0.0.0:* LISTEN 1000 14549242 16086/java
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 0 15699 1/init
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 0 60857 7889/kube-controlle
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 0 56932 7879/kube-scheduler
tcp 0 0 127.0.0.1:5939 0.0.0.0:* LISTEN 0 48507 2205/teamviewerd
tcp6 0 0 ::1:631 :::* LISTEN 0 6629863 11358/cupsd
tcp6 0 0 :::8443 :::* LISTEN 0 55158 7853/kube-apiserver
tcp6 0 0 :::44444 :::* LISTEN 1000 16217187 7252/___go_build_gi
tcp6 0 0 :::32028 :::* LISTEN 0 74556 10409/kube-proxy
tcp6 0 0 :::10250 :::* LISTEN 0 58479 1053/kubelet
tcp6 0 0 :::30795 :::* LISTEN 0 74558 10409/kube-proxy
tcp6 0 0 :::10251 :::* LISTEN 0 56926 7879/kube-scheduler
tcp6 0 0 :::10252 :::* LISTEN 0 60851 7889/kube-controlle
tcp6 0 0 :::30285 :::* LISTEN 0 74559 10409/kube-proxy
tcp6 0 0 :::31406 :::* LISTEN 0 74557 10409/kube-proxy
tcp6 0 0 :::111 :::* LISTEN 0 15702 1/init
tcp6 0 0 :::80 :::* LISTEN 0 16269016 16536/docker-proxy
tcp6 0 0 :::8080 :::* LISTEN 0 16263128 16524/docker-proxy
tcp6 0 0 :::10256 :::* LISTEN 0 75123 10409/kube-proxy
udp 0 0 0.0.0.0:45455 0.0.0.0:* 115 40296 1082/avahi-daemon:
udp 0 0 224.0.0.251:5353 0.0.0.0:* 1000 16274723 23811/chrome --type
udp 0 0 224.0.0.251:5353 0.0.0.0:* 1000 16270144 23728/chrome
udp 0 0 224.0.0.251:5353 0.0.0.0:* 1000 16270142 23728/chrome
udp 0 0 0.0.0.0:5353 0.0.0.0:* 115 40294 1082/avahi-daemon:
udp 0 0 127.0.0.53:53 0.0.0.0:* 102 35398 1019/systemd-resolv
udp 0 0 192.168.0.14:68 0.0.0.0:* 0 12307745 1072/NetworkManager
udp 0 0 0.0.0.0:111 0.0.0.0:* 0 18653 1/init
udp 0 0 0.0.0.0:631 0.0.0.0:* 0 6628156 11360/cups-browsed
udp6 0 0 :::5353 :::* 115 40295 1082/avahi-daemon:
udp6 0 0 :::111 :::* 0 15705 1/init
udp6 0 0 :::50342 :::* 115 40297 1082/avahi-daemon:
Ive reproduced your environment (--vm-driver=none). At first I thought it might be connected with minikube built-in configuration, however clean Minikube don't use port 80 on default configuration.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
$ minikube version
minikube version: v1.6.2
$ sudo netstat -nlplute
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 0 49556 9345/kube-controlle
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 0 50223 9550/kube-scheduler
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 101 15218 752/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 21550 1541/sshd
tcp 0 0 127.0.0.1:44197 0.0.0.0:* LISTEN 0 51016 10029/kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 0 51043 10029/kubelet
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 0 52581 10524/kube-proxy
tcp 0 0 127.0.0.1:2379 0.0.0.0:* LISTEN 0 49728 9626/etcd
tcp 0 0 10.156.0.11:2379 0.0.0.0:* LISTEN 0 49727 9626/etcd
tcp 0 0 10.156.0.11:2380 0.0.0.0:* LISTEN 0 49723 9626/etcd
tcp 0 0 127.0.0.1:2381 0.0.0.0:* LISTEN 0 49739 9626/etcd
tcp6 0 0 :::10256 :::* LISTEN 0 52577 10524/kube-proxy
tcp6 0 0 :::22 :::* LISTEN 0 21552 1541/sshd
tcp6 0 0 :::8443 :::* LISTEN 0 49120 9419/kube-apiserver
tcp6 0 0 :::10250 :::* LISTEN 0 51050 10029/kubelet
tcp6 0 0 :::10251 :::* LISTEN 0 50217 9550/kube-scheduler
tcp6 0 0 :::10252 :::* LISTEN 0 49550 9345/kube-controlle
udp 0 0 127.0.0.53:53 0.0.0.0:* 101 15217 752/systemd-resolve
udp 0 0 10.156.0.11:68 0.0.0.0:* 100 15574 713/systemd-network
udp 0 0 127.0.0.1:323 0.0.0.0:* 0 23984 2059/chronyd
udp6 0 0 ::1:323 :::* 0 23985 2059/chronyd
Good description for what docker-proxy is used you can check this article.
When a container starts with its port forwarded to the Docker host on which it runs, in addition to the new process that runs inside the container, you may have noticed an additional process on the Docker host called docker-proxy
This docker-proxy might be something similar like docker zombie process where container was removed, however allocated port wasn't unlocked. Unfortunately it seems that this is recurrent docker issue occuring across versions and OS since 2016. As I mentioned I think currently there is no fix for this, however you can find workaround.
cd /usr/libexec/docker/
ln -s docker-proxy-current docker-proxy
service docker restart
===
$ sudo service docker stop
$ sudo service docker start
===
$ sudo service docker stop
# remove all internal docker network: rm /var/lib/docker/network/files/
$ sudo service docker start
===
$ sudo systemctl stop docker
$ sudo systemctl start docker
There are a few github threads mentioning about this issue. For more information please check this and this thread.
After checking that my port 8080 was also used by docker proxy, I did
$ docker ps
and notices that both port 80 and port 8080 are used by traefik controller:
$ kubectl get services
traefik-ingress-service ClusterIP 10.96.199.177 <none> 80/TCP,8080/TCP 25d
When I checked for traefik service, I found:
kind: Service
apiVersion: v1
metadata:
name: traefik-ingress-service
spec:
selector:
k8s-app: traefik-ingress-lb
ports:
- protocol: TCP
port: 80
name: web
- protocol: TCP
port: 8080
name: admin
So, I think this is why I get a docker-proxy. If I need it to use another port, I can change it here. My bad :(

Blocking an incomming foreign address with fail2ban or iptables

I am trying to ban an "IP address" or hostname dont know what this is : static.40.25.69.1 from my ubuntu droplet but without any luck. Banning ip addresses was easy but i cant manage to do anything with the given address. The given address is crawling the site and adding a big load on the server so i need to block them somehow.
My question is how can i block them with iptables/fail2ban?
For ip addresses i had a manban jail and i added them there but the given address is not recognized as a valid ipaddress so nothing happens.
I am using ubuntu 16.04.
Netstat is showing the following:
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:34918 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:44008 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:48032 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:59180 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:55064 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:35442 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:50676 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:59708 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:43686 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:44120 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:43996 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:38754 CLOSE_WAIT
tcp6 1 0 ingatlanmaps.hu:https static.40.25.69.1:35100 CLOSE_WAIT
Many thanks,
Trix
Managed to find an answer. To find out the IP address that you need to ban you need to add an additional paramater to netstat so you can get the numerical ip address.
netstat -n is the command needed.

Unable to connect to PostgreSQL on Google Cloud Instance

I have postgreSQL runiing on my google cloud instance and i added firewall rule "tcp 5432" on Google cloud firewall but still i am unable to connect, even telnet is not working.
officetaskpy#instance-1:/etc/postgresql/9.5/main$ netstat -ntpl
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:5910 0.0.0.0:* LISTEN 9020/Xvnc
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:44801 0.0.0.0:* LISTEN 16023/phantomjs
tcp 0 0 0.0.0.0:53619 0.0.0.0:* LISTEN 812/phantomjs
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 :::5432 :::* LISTEN -
Result of netstat command
Above is my firewall rule. Is there anything which i am missing here.

What does the Recv-Q values in a Listen socket mean?

My program runs in trouble with a netstat output like bellow. It cannot receive a packet. What does the Recv-Q value in the first line mean? I see the man page, and do some googling, but no result found.
[root#(none) /data]# netstat -ntap | grep 8000
tcp 129 0 0.0.0.0:8000 0.0.0.0:* LISTEN 1526/XXXXX-
tcp 0 0 9.11.6.36:8000 9.11.6.37:48306 SYN_RECV -
tcp 0 0 9.11.6.36:8000 9.11.6.34:44936 SYN_RECV -
tcp 365 0 9.11.6.36:8000 9.11.6.37:58446 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:55018 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:42830 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:56344 CLOSE_WAIT -
tcp 0 364 9.11.6.34:38947 9.11.6.36:8000 FIN_WAIT1 -
tcp 364 0 9.11.6.36:8000 9.11.6.37:52406 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:53603 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:47522 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:48191 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:51813 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:57789 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:34252 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:38930 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:44121 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:60465 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:37461 CLOSE_WAIT -
tcp 0 362 9.11.6.34:35954 9.11.6.36:8000 FIN_WAIT1 -
tcp 364 0 9.11.6.36:8000 9.11.6.37:55241 CLOSE_WAIT -
P.S.
See also at https://groups.google.com/forum/#!topic/comp.os.linux.networking/PoP0YOOIj70
Recv-Q is the Receive Queue. It is the number of bytes that are currently in a receive buffer. Upon reading the socket, the bytes are removed from the buffer and put into application memory. If the Recv-Q number gets too high, packets will be dropped because there is no place to put them.
More info here netstat

Zabbix agent windows TIME_WAIT sockets

I have a big problem with Zabbix windows agent.
The agent has lot of sockets in time_wait state:
...........
TCP 10.0.10.4:10050 10.0.10.8:38681 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38683 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38710 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38736 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38755 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38764 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38781 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38811 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38835 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38849 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38878 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38888 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38913 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38933 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38952 TIME_WAIT 0
C:\>netstat -nao | find /c "TIME_WAIT"
200 <- it is too much.
Why does the agent open all this sockets?
Is there a way to close this socket?
I have lot of monitored item, could be this the problem?
The intervall time is about 10 minutes.
thank you
any help is appreciated
IMHO it's not a big problem, it's concept how TCP works. Do you have any performance issue because your device has 200 TIME-WAIT connections?
If you have a lot of monitored items and your agent is in passive mode, then zabbix server has to create a lot of TCP connections to your agent. TIME-WAIT is almost last state of this TCP connection. TIME_WAIT indicates that this side has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. Common duration of TIME-WAIT state can be 30 seconds.
You can play with Windows registry to decrease duration of TIME-WAIT state. But I don't recommend it, if you don't know what are you doing.
http://help.globalscape.com/help/secureserver3/Windows_Registry_keys_for_TCP_IP_Performance_Tuning.htm
About TCP states:
http://commons.wikimedia.org/wiki/File:Tcp_state_diagram_fixed_new.svg
About TIME-WAIT state (on linux)
http://www.fromdual.com/huge-amount-of-time-wait-connections