Zabbix agent windows TIME_WAIT sockets - sockets

I have a big problem with Zabbix windows agent.
The agent has lot of sockets in time_wait state:
...........
TCP 10.0.10.4:10050 10.0.10.8:38681 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38683 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38710 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38736 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38755 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38764 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38781 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38811 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38835 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38849 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38878 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38888 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38913 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38933 TIME_WAIT 0
TCP 10.0.10.4:10050 10.0.10.8:38952 TIME_WAIT 0
C:\>netstat -nao | find /c "TIME_WAIT"
200 <- it is too much.
Why does the agent open all this sockets?
Is there a way to close this socket?
I have lot of monitored item, could be this the problem?
The intervall time is about 10 minutes.
thank you
any help is appreciated

IMHO it's not a big problem, it's concept how TCP works. Do you have any performance issue because your device has 200 TIME-WAIT connections?
If you have a lot of monitored items and your agent is in passive mode, then zabbix server has to create a lot of TCP connections to your agent. TIME-WAIT is almost last state of this TCP connection. TIME_WAIT indicates that this side has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. Common duration of TIME-WAIT state can be 30 seconds.
You can play with Windows registry to decrease duration of TIME-WAIT state. But I don't recommend it, if you don't know what are you doing.
http://help.globalscape.com/help/secureserver3/Windows_Registry_keys_for_TCP_IP_Performance_Tuning.htm
About TCP states:
http://commons.wikimedia.org/wiki/File:Tcp_state_diagram_fixed_new.svg
About TIME-WAIT state (on linux)
http://www.fromdual.com/huge-amount-of-time-wait-connections

Related

kubernetes v1.18.8 installation issue [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I have deployed Kubernetes cluster v1.18.8 with kubeadm on production environment.Cluster setup is 3 Master and 3 Worker nodes with external Kube-api loadbalancer, etcd residing in Master nodes.Didn't see any issue during installation and all pods in kube-system are running. However when i get error when i run below command i get error:
kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
While troubleshooting i found that the ports are not being listened.
sudo netstat -tlpn |grep kube
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 132584/kubelet
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 133300/kube-proxy
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 197705/kube-control
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 213741/kube-schedul
tcp6 0 0 :::10250 :::* LISTEN 132584/kubelet
tcp6 0 0 :::6443 :::* LISTEN 132941/kube-apiserv
tcp6 0 0 :::10256 :::* LISTEN 133300/kube-proxy
If i check the same thing on development enviroment kubernetes cluster(v1.17) i see no issue.
kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
sudo netstat -tlpn |grep 102
tcp 0 0 127.0.0.1:10257 0.0.0.0:* LISTEN 2141/kube-controlle
tcp 0 0 127.0.0.1:10259 0.0.0.0:* LISTEN 2209/kube-scheduler
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 1230/kubelet
tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 2668/kube-proxy
tcp6 0 0 :::10256 :::* LISTEN 2668/kube-proxy
tcp6 0 0 :::10250 :::* LISTEN 1230/kubelet
tcp6 0 0 :::10251 :::* LISTEN 2209/kube-scheduler
tcp6 0 0 :::10252 :::* LISTEN 2141/kube-controlle
On newly created prodction cluster i have deployed nginx and another application just to test how the kubernetes components behave, didn't see any error.
Is it the expected behaviour in version v1.18? Will really apprecite any help on this.
NOTE: No port is being blocked in internal communication
The command Kubectl get componentstatus is depreciated in newer version(1.19) and it already has many issues.
The main point to note here is that Kubernetes has disabled insecure serving of
these components for older versions(atleast from v1.18). Hence i couldn't see kube-controller and kube-scheduler being listned on 1051 and 1052 ports. To restore this functionality you can remove the --port=0 from their manifests files(Not recommended as this can expose their metrics to the whole internet) that you can see inside:
/etc/kubernetes/manifests/
I commented out --port=0 field from the manifest file just to check this, kubectl get componentstatus command worked.

Unable to connect to PostgreSQL db on Ubuntu 18.04 Server

Having a time trying to connect to a PostgreSQL database on Ubuntu 18.04 server.
Here is my:
postgresql.conf file:
port=5432
listen_addresses='*'
pg_hba.conf:
host all all 0.0.0.0/0 md5
firewall is currently disabled
here is the output when I did the command (saw in another thread to do this...):
sudo netstat -ltpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 608/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 842/sshd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 2922/postgres
tcp 0 0 0.0.0.0:25 0.0.0.0:* LISTEN 1055/master
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 867/nginx: master p
tcp6 0 0 :::22 :::* LISTEN 842/sshd
tcp6 0 0 :::25 :::* LISTEN 1055/master
tcp6 0 0 :::80 :::* LISTEN
I have restarted postgresql each when making a change using the command:
sudo service postgresql restart.
I have tried to access the db using the python library psycopg2 on macOS and getting this error
could not connect to server: Connection refused
Is the server running on host "<ip_address>" and accepting
TCP/IP connections on port 5432?
What am I missing?
From the netstat output it is obvious that you didn't restart PostgreSQL after changing listen_addresses.

Too many TCP ports in CLOSE WAIT condition in kafka broker

Too many TCP Connections are in CLOSE_WAIT status in a kafka broker causing DisconnectionException in kafka clients.
tcp6 27 0 172.31.10.143:9092 172.31.0.47:45138 ESTABLISHED -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41612 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.0.47:45010 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:43000 CLOSE_WAIT -
tcp6 194 0 172.31.10.143:8080 172.31.20.219:45952 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.20.219:48006 CLOSE_WAIT -
tcp6 1 0 172.31.10.143:9092 172.31.0.47:44582 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:42828 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41934 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41758 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41584 CLOSE_WAIT -
tcp6 25 0 172.31.10.143:9092 172.31.46.69:41852 CLOSE_WAIT -
tcp6 1 0 172.31.10.143:9092 172.31.0.47:44342 CLOSE_WAIT -
Error in debezium
connect-prod | 2019-02-14 06:28:54,885 INFO || [Consumer clientId=consumer-3, groupId=4] Error sending fetch request (sessionId=1727876188, epoch=INITIAL) to node 2: org.apache.kafka.common.errors.DisconnectException. [org.apache.kafka.clients.FetchSessionHandler] connect-prod | 2019-02-14 06:28:55,448 INFO || [Consumer clientId=consumer-1, groupId=4] Error sending fetch request (sessionId=1379896198, epoch=INITIAL) to node 2: org.apache.kafka.common.errors.DisconnectException. [org.apache.kafka.clients.FetchSessionHandler]
What can be the reason behind this?
It appears that this is a known issue in Kafka 2.1.0.
https://issues.apache.org/jira/browse/KAFKA-7697
I think the connections stuck in Close_wait is a side effect of the real problem.
This issue has been fixed in Kafka version 2.1.1 which should be released in a few days. Looking forward to it.

Unable to connect to PostgreSQL on Google Cloud Instance

I have postgreSQL runiing on my google cloud instance and i added firewall rule "tcp 5432" on Google cloud firewall but still i am unable to connect, even telnet is not working.
officetaskpy#instance-1:/etc/postgresql/9.5/main$ netstat -ntpl
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:5910 0.0.0.0:* LISTEN 9020/Xvnc
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:44801 0.0.0.0:* LISTEN 16023/phantomjs
tcp 0 0 0.0.0.0:53619 0.0.0.0:* LISTEN 812/phantomjs
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 :::5432 :::* LISTEN -
Result of netstat command
Above is my firewall rule. Is there anything which i am missing here.

What does the Recv-Q values in a Listen socket mean?

My program runs in trouble with a netstat output like bellow. It cannot receive a packet. What does the Recv-Q value in the first line mean? I see the man page, and do some googling, but no result found.
[root#(none) /data]# netstat -ntap | grep 8000
tcp 129 0 0.0.0.0:8000 0.0.0.0:* LISTEN 1526/XXXXX-
tcp 0 0 9.11.6.36:8000 9.11.6.37:48306 SYN_RECV -
tcp 0 0 9.11.6.36:8000 9.11.6.34:44936 SYN_RECV -
tcp 365 0 9.11.6.36:8000 9.11.6.37:58446 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:55018 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:42830 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:56344 CLOSE_WAIT -
tcp 0 364 9.11.6.34:38947 9.11.6.36:8000 FIN_WAIT1 -
tcp 364 0 9.11.6.36:8000 9.11.6.37:52406 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:53603 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:47522 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:48191 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:51813 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:57789 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:34252 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:38930 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:44121 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:60465 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:37461 CLOSE_WAIT -
tcp 0 362 9.11.6.34:35954 9.11.6.36:8000 FIN_WAIT1 -
tcp 364 0 9.11.6.36:8000 9.11.6.37:55241 CLOSE_WAIT -
P.S.
See also at https://groups.google.com/forum/#!topic/comp.os.linux.networking/PoP0YOOIj70
Recv-Q is the Receive Queue. It is the number of bytes that are currently in a receive buffer. Upon reading the socket, the bytes are removed from the buffer and put into application memory. If the Recv-Q number gets too high, packets will be dropped because there is no place to put them.
More info here netstat