DNS in K8S pod: tcpdump shows bad udp checksum but nslookup still works and UDP error counters don't increment - kubernetes

I have a K8S pod. Inside pod, I do dns lookup using nslookup. It works fine. But when I do tcpdump on pod interface (eth0), it clearly shows received dns response has bad udp checksum. I checked with netstat the udp counters, but I dont see the checksum error counter (InCsumErrors) at all getting hit. Here are some relevant outputs.
IP config of pod:
root#node:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0#NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0#if10936: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether e2:22:5c:6c:53:bd brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.233.85.177/32 scope global eth0
valid_lft forever preferred_lft forever
Successfull Nslookup:
bash-4.4# nslookup google.com
Server: 169.254.25.10
Address: 169.254.25.10#53
Non-authoritative answer:
Name: google.com
Address: 216.58.207.238
Name: google.com
Address: 2a00:1450:400e:809::200e
Tcpdump showing bad udp cksum for above nslookup run:
root#node:~# tcpdump -ni eth0 -vvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:24.267999 IP (tos 0x0, ttl 64, id 50356, offset 0, flags [none], proto UDP (17), length 82)
10.233.85.177.52764 > 169.254.25.10.53: [bad udp cksum 0x23f2 -> 0xd1bd!] 43806+ A? google.com.qaammuk.svc.cluster.local. (54)
16:02:24.269489 IP (tos 0x0, ttl 64, id 56987, offset 0, flags [DF], proto UDP (17), length 175)
169.254.25.10.53 > 10.233.85.177.52764: [bad udp cksum 0x244f -> 0x2c2a!] 43806 NXDomain*- q: A? google.com.qaammuk.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (147)
16:02:24.269847 IP (tos 0x0, ttl 64, id 50357, offset 0, flags [none], proto UDP (17), length 74)
10.233.85.177.39433 > 169.254.25.10.53: [bad udp cksum 0x23ea -> 0xac65!] 45029+ A? google.com.svc.cluster.local. (46)
16:02:24.270901 IP (tos 0x0, ttl 64, id 56988, offset 0, flags [DF], proto UDP (17), length 167)
169.254.25.10.53 > 10.233.85.177.39433: [bad udp cksum 0x2447 -> 0x06d2!] 45029 NXDomain*- q: A? google.com.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (139)
16:02:24.271206 IP (tos 0x0, ttl 64, id 50358, offset 0, flags [none], proto UDP (17), length 70)
10.233.85.177.59330 > 169.254.25.10.53: [bad udp cksum 0x23e6 -> 0xdaca!] 2633+ A? google.com.cluster.local. (42)
16:02:24.272262 IP (tos 0x0, ttl 64, id 56989, offset 0, flags [DF], proto UDP (17), length 163)
169.254.25.10.53 > 10.233.85.177.59330: [bad udp cksum 0x2443 -> 0x3537!] 2633 NXDomain*- q: A? google.com.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (135)
16:02:24.272527 IP (tos 0x0, ttl 64, id 50359, offset 0, flags [none], proto UDP (17), length 56)
10.233.85.177.53873 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x278c!] 52759+ A? google.com. (28)
16:02:24.272707 IP (tos 0x0, ttl 64, id 56990, offset 0, flags [DF], proto UDP (17), length 82)
169.254.25.10.53 > 10.233.85.177.53873: [bad udp cksum 0x23f2 -> 0xe468!] 52759* q: A? google.com. 1/0/0 google.com. [8s] A 216.58.211.110 (54)
16:02:24.272963 IP (tos 0x0, ttl 64, id 50360, offset 0, flags [none], proto UDP (17), length 56)
10.233.85.177.54691 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x370f!] 47943+ AAAA? google.com. (28)
16:02:24.273141 IP (tos 0x0, ttl 64, id 56991, offset 0, flags [DF], proto UDP (17), length 94)
169.254.25.10.53 > 10.233.85.177.54691: [bad udp cksum 0x23fe -> 0xf8e0!] 47943* q: AAAA? google.com. 1/0/0 google.com. [8s] AAAA 2a00:1450:400e:809::200e (66)
netstat output to show udp counters from linux stack. No InCsumErrors:
root#node:~# netstat -s -u
Udp:
18 packets received
0 packets to unknown port received
0 packet receive errors
18 packets sent
0 receive buffer errors
0 send buffer errors
UdpLite:
IpExt:
InOctets: 2130
OutOctets: 1101
InNoECTPkts: 18
I tried both with checksum offload enabled and disable on eth0. Same behavior in both cases.
Shouldn't bad udp checksum detected by tcpdump mean that kernel will at some point drop udp packets before handing them over to the socket bound to nslookup?

When you do nslookup google.com 8.8.8.8 everything looks fine. I think that this because since you are using coredns to resolve the domains, the packets run through a Service.
Service in k8s is a virtual entity. It appears as a forwarding rule in iptables. During forwarding process, source ip address is swapped out without recalculating the cksum, thus the error in tcpdump.
And according to RFC 768, UDP checksum is defined as following:
Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data
So as you see, IP header also is a part of a checksum and so is the source IP that is swapped out and this is changing the checksum.
Calculating the checksum is usually done using hardware acceleration of NIC before sending/receiving packets to a node. It would require a lot of computation just to compute checksums of all packets doing through iptables. And also it is useless because once you receive a packet on a node's network interface and you confirm that it's valid, you can be sure that it will stay valid within the node even after you forward it with iptables.
Does K8S setup some rules for linux kernel to ignore bad udp checksums
for pod interfaces?
I know that e.g. loopback interface does not checksum packets (at least not by default). Maybe brigde interfaces (e.g. docker0 and veth*) also doesn't checksum. I tried to find a strong evidence to prove this statement but I didn't find anything to either prove it or disprove it.

try:
ethtool --offload eth0 rx off tx off
ethtool -K eth0 gso off

Related

Why can the third host receive dataflow between host no.1 and host no.2?(Three hosts are all virtual machines installed in VMware with NAT mode)

I installed three operating systems(let's say 3 hosts) in VMware, all with NAT mode. 3 hosts are named centos, centos 1,centos 2.(As the pic shows below)
3 hosts in VMware
The first host's IP address is 192.168.248.132, the second is 192.168.248.136, and we don't need to know third host's IP because it's not related to this issue.
I typed the command "ping 192.168.248.136", and the output on the screen is:
PING 192.168.248.136 (192.168.248.136) 56(84) bytes of data.
64 bytes from 192.168.248.136: icmp_seq=1 ttl=64 time=0.435 ms
64 bytes from 192.168.248.136: icmp_seq=2 ttl=64 time=0.313 ms
64 bytes from 192.168.248.136: icmp_seq=3 ttl=64 time=0.385 ms
This means ping command has succeeded and host no.2(whose IP addr is 192.168.248.136) has received ICMP and replied.
Meanwhile, I typed the command "tcpdump -i ens33" in host no.3. If everything had worked correctly, host no.3 would not have received any data between host no.1 and host no.2, because ICMP is neither broadcast nor multicast, so only host no.1 and 2 can send and receive. Also, host no.3's network interface is not promiscuous mode, so it can only receive it's own frame. The output from host no.3 below can show it is not promiscuous mode.
[root#localhost usr]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.248.137 netmask 255.255.255.0 broadcast 192.168.248.255
inet6 fe80::b488:bc2c:3770:a95f prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:0d:dc:86 txqueuelen 1000 (Ethernet)
RX packets 351081 bytes 512917768 (489.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 34947 bytes 2166260 (2.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flag is 4163<UP,BROADCAST,RUNNING,MULTICAST>, "PROMISC" is not mentioned, so it is not promiscuous mode.
However, after I typed "tcpdump -i ens33" in host no.3, something appeared on the screen:
06:28:11.511233 IP 192.168.248.132 > 192.168.248.136: ICMP echo request, id 3137, seq 5, length 64
06:28:11.511503 IP 192.168.248.136 > 192.168.248.132: ICMP echo reply, id 3137, seq 5, length 64
Host no.3 received the dataflow between no.1 and 2, and this was supposed to be sent to no.2, but no.3 received it.
So here comes the question, why can host no.3 receive packet which was not supposed to be sent to it?
tcpdump by default activates "promiscuous mode" making it able to see anything on the network it is connected to (even if not explicitly sent to it).
the three hosts seem to be connected to a virtual switch that do not isolate the hosts from each other.

How to fix pgAdmin4 connection refused error

I'm getting this error when attempting to setup a new server on pgAdmin4:
Unable to connect to server:
could not connect to server: Connection refused (0x0000274D/10061)
Is the server running on host "192.168.210.146" and accepting
TCP/IP connections on port 5432?
I have postgres 12.7 running on CentOS 8 inside a virtual box 6.1 VM which is running on my Windows 10 21H1 laptop. I can connect to the OS using putty and the CentOS web client just fine.
Here is some network info via the CentOS web client terminal:
# nmap localhost
Starting Nmap 7.70 ( https://nmap.org ) at 2021-07-14 16:59 PDT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000014s latency).
Other addresses for localhost (not scanned): ::1
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
5432/tcp open postgresql
9090/tcp open zeus-admin
Nmap done: 1 IP address (1 host up) scanned in 1.68 seconds
netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 954/sshd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 972/postmaster
tcp 0 0 127.0.0.1:37753 0.0.0.0:* LISTEN 1620/cockpit-bridge
# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: cockpit dhcpv6-client postgresql ssh
ports: 5432/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
#
# ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.210.146 netmask 255.255.254.0 broadcast 192.168.211.255
inet6 fe80::a00:27ff:fecb:8d2d prefixlen 64 scopeid 0x20<link>
ether 08:00:27:cb:8d:2d txqueuelen 1000 (Ethernet)
RX packets 4704 bytes 512333 (500.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3757 bytes 2510585 (2.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 7252 bytes 2161674 (2.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 7252 bytes 2161674 (2.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
In the PgAdmin setup server screen I'm using the ip address listed above of 192.168.210.146, user postgres and its password, port 5432 and the database is set to postgres.
I get this same error trying to establish odbc and jdbc connections from my laptop but I'm not sure what in the postgres environment needs to be fixed.I did add 1 entry the pg_hba.conf file as shown below, but that didn't help:
# IPv4 local connections:
host all all 127.0.0.1/32 ident
host all all 192.168.210.146/32 trust #added;not helping
Is there another file or setting that needs to be fixed?
Thanks.
The solution was to first un-comment the listen_address entry in postgresql.conf and then set it to the necessary ip number. Everything connects just fine now. Thanks

rsyslogd client not closing the TCP connection when server rsyslogd goes down

I have configured rsyslogd on a remote server to send the logs from a client machine with rsyslogd using TCP protocol. After configuring and restarting the rsyslogd daemon on both client and server I am able to send the packets to the server and all works fine. But later when I restart the rsyslogd on the server the client is still sending the packets to the old TCP connection. Hence the client retries for 16 times and fails to send the packet. After the retry for sending the next packet the client is creating a new connection and the communication works fine there onward.
When I restart the rsyslogd on server using the tcpdump i captured the packets and we can see that the server sends flag[F] to client and the client acknowledges it as well. But when we send the next packet it is not crating a new connection.
restart rsyslog on server:
Server side tcpdump:
*09:54:50.012933 IP x.x.x.101.514 > y.y.y.167.37141: Flags [F.], seq 1, ack 31, win 229, length 0
09:54:50.013050 IP y.y.y.167.37141 > x.x.x.101.514: Flags [.], ack 2,
win 115, length 0*
For the very next packet send from client the server sends flag[R] but client keeps retrying for 16 times:
tcpdump from server:
*03:55:11.811611 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq 31:61, ack 2, win 115, length 30
03:55:11.811647 IP x.x.x.101.514 > y.y.y.167.37141: Flags [R], seq
1863584583, win 0, length 0
03:55:12.014158 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
03:55:12.014189 IP x.x.x.101.514 > y.y.y.167.37141: Flags [R], seq
1863584583, win 0, length 0*
<this repeated 6 times on sever>
at same time on client we do not see the response reaching from server:
09:55:11.811077 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
09:55:12.013639 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
09:55:12.421627 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
<this retied 16 times>
Now after the 16 retry (took ~13 min) if we send a new packet it is sending correctly.
Here we see a new session is getting created:
*10:16:43.873325 IP y.y.y.167.39859 > x.x.x.101.514: Flags [S], seq 1000783963, win 14600, options [mss 1460,nop,wscale 7], length 0
10:16:43.873658 IP x.x.x.101.514 > y.y.y.167.39859: Flags [S.],
seq 231452091, ack 1000783964, win 29200, options [mss 1460,nop,wscale
7], length 0
10:16:43.873740 IP y.y.y.167.39859 > x.x.x.101.514: Flags [.], ack 1,
win 115, length 0
10:16:43.873904 IP y.y.y.167.39859 > x.x.x.101.514: Flags [P.], seq
1:31, ack 1, win 115, length 30
10:16:43.874084 IP x.x.x.101.514 > y.y.y.167.39859: Flags [.], ack 31,
win 229, length 0*
Does any one faced such issue? Can any one tell why server is not closing the connection when client sends flag[F]. Do we have any configuration parameter in rsyslogd to create a new session when server sends flag[F]?
Why client is sending data after receiving FIN and ACKed it?
TCP connection termination is a 4 way handshake, which means once a client received FIN from server, it acknowledges it and sends all remaining data to server before sending another FIN to server and wait for it's ACK to complete the hand-shake and fully close the connection.
Logs you have provided shows that, the connection is half-open when the server restarted (which it should have not done, before connection is full-close). and that's why the client is sending remaining data before completing handshake.
What is the correct way of abrupt termination?
When an endpoint is about to abruptly terminate a connection, while already some data is in transfer, it should send RST packet instead of FIN.
Why the RST packet sent by server after restart is not received in client?
It must have been discarded as already the connection is half-open with FIN packet received earlier, or it must have been discarded by client firewall for potential TCP Reset attack

How to debug "ERROR: Could not reach the worker node."?

I am trying to set up a gateway and a worker node on a BlueData 3.7 controller using SSH credentials.
I believe that I have met all the pre-requisites in the docs, however, I get an error trying to install the gateway and the worker in the Installation section of the controller UI:
The error I get for both hosts are:
ERROR: Could not reach the worker node.
The URL for the errors are:
http://x.x.x.x/bdswebui/logworker/?id=/api/v1/workers/4
http://x.x.x.x/bdswebui/logworker/?id=/api/v1/workers/5
I have checked the logs on the gateway and the worker. Both show:
# tree /var/log/bluedata/
/var/log/bluedata/
└── install
1 directory, 0 files
All hosts can ssh to each other without a password prompt.
No firewall is running:
# iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
I've deleted the worker while listening in with tcpdump on the worker:
# tcpdump -i eth0 -ttttnnvvS src host x.x.x.x and tcp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
2019-08-17 00:08:41.570940 IP (tos 0x0, ttl 64, id 3977, offset 0, flags [DF], proto TCP (6), length 48)
x.x.x.x.39039 > y.y.y.y.22: Flags [S], cksum 0x6215 (correct), seq 1453535125, win 29200, options [mss 1460,nop,wscale 9], length 0
...
x.x.x.x.46064 > y.y.y.y.22: Flags [F.], cksum 0x564b (correct), seq 1997087540, ack 4031219947, win 238, length 0
2019-08-17 00:14:54.710739 IP (tos 0x0, ttl 64, id 15525, offset 0, flags [DF], proto TCP (6), length 40)
x.x.x.x.46064 > y.y.y.y.22: Flags [.], cksum 0x564a (correct), seq 1997087541, ack 4031219948, win 238, length 0
Checking port 46064 on the controller, I can see that it is coming from a beam process which gives me confidence the network connectivity is ok between the two machines:
# lsof -i -P -n | grep 46064
beam.smp 12714 root 16u IPv4 498735 0t0 TCP x.x.x.x:46064->y.y.y.y:22 (ESTABLISHED)
What else can I do to debug?
The debug information I needed could be found in the /tmp folder on the worker and gateway hosts, e.g.
/tmp/bd_prechecks.nnnnn.log
/tmp/bd_prechecks.nnnnn.log.xtrace
/tmp/bds-nnnnnnnnnnnnnn.log
/tmp/bds-nnnnnnnnnnnnnn.log.xtrace
/tmp/worker_setup_x.x.x.x-n-n-n-n-n-n
/tmp/worker_setup_x.x.x.x-n-n-n-n-n-n.xtrace
For more information, see http://docs.bluedata.com/37_step-1-troubleshooting

Kubelet periodically lose TCP connection with pods when doing liveness/readiness probe check on GKE

We have an software system deployed in a single GKE(google kubernetes engine) cluster node which using around 100 pods, in each pod we defined TCP readiness probe, now we could see the readiness probe periodically fails with Unable to connect to remote host: Connection refused on different pods.
With the tcpdump traces on the cluster node and the failing pods, we find that the packets sent from the cluster node seems right, while the pod doesn't receive the TCP packet, but failing pod could still receive IP broadcast packets.
The weird thing is if we ping/curl/wget the cluster node from the failing pod, regardless the cluster node has http service or not, the TCP connection will be recovered immediately and the readiness check will become fine.
An example is as below:
The cluster node host: 10.44.0.1
The failing pod host: 10.44.0.92
tcpdump on cbr0 interface of the cluster node:
#sudo tcpdump -i cbr0 host 10.44.0.92
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:33:52.913052 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:33:52.913181 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:33:57.727497 IP 10.44.0.1.47736 > 10.44.0.92.mysql: Flags [S], seq 756717730, win 28400, options [mss 1420,sackOK,TS val 1084890021 ecr 0,nop,wscale 7], length 0
17:33:57.727537 IP 10.44.0.92.mysql > 10.44.0.1.47736: Flags [R.], seq 0, ack 756717731, win 0, length 0
17:34:07.727563 IP 10.44.0.1.48202 > 10.44.0.92.mysql: Flags [S], seq 2235831098, win 28400, options [mss 1420,sackOK,TS val 1084900021 ecr 0,nop,wscale 7], length 0
17:34:07.727618 IP 10.44.0.92.mysql > 10.44.0.1.48202: Flags [R.], seq 0, ack 2235831099, win 0, length 0
17:34:12.881059 ARP, Request who-has 10.44.0.92 tell 10.44.0.1, length 28
17:34:12.881176 ARP, Reply 10.44.0.92 is-at 0a:58:0a:2c:00:5c (oui Unknown), length 28
These are readiness checks packets sent from Kubelet, we could see the failing node responds with Flags [R.], seq 0, ack 756717731, win 0, length 0 which is a TCP handshake ACK/SYN reply, it is a failing packet and the TCP connection will NOT be established.
The if we exec -it to the failing pod and ping the cluster node from the pod as below:
root#mariadb:/# ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: icmp_seq=0 ttl=64 time=3.301 ms
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=0.303 ms
Then let's see what's happening on cluster node side from the TCP dump:
#sudo tcpdump -i cbr0 host 10.44.0.92
17:34:17.728039 IP 10.44.0.92.mysql > 10.44.0.1.48704: Flags [R.], seq 0, ack 2086181490, win 0, length 0
17:34:27.727638 IP 10.44.0.1.49202 > 10.44.0.92.mysql: Flags [S], seq 1769056007, win 28400, options [mss 1420,sackOK,TS val 1084920022 ecr 0,nop,wscale 7], length 0
17:34:27.727693 IP 10.44.0.92.mysql > 10.44.0.1.49202: Flags [R.], seq 0, ack 1769056008, win 0, length 0
17:34:34.016995 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:34:34.018358 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:34:34.020020 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 0, length 64
17:34:34.020101 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 0, length 64
17:34:35.017197 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 1, length 64
17:34:35.017256 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 1, length 64
17:34:36.018589 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 2, length 64
17:34:36.018700 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 2, length 64
17:34:37.019791 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 3, length 64
17:34:37.019837 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 3, length 64
17:34:37.730849 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [S], seq 1304758051, win 28400, options [mss 1420,sackOK,TS val 1084930025 ecr 0,nop,wscale 7], length 0
17:34:37.730900 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [S.], seq 1267340310, ack 1304758052, win 28160, options [mss 1420,sackOK,TS val 3617117819 ecr 1084930025,nop,wscale 7], length 0
17:34:37.730952 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [.], ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731149 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [F.], seq 1, ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731268 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [P.], seq 1:107, ack 2, win 220, options [nop,nop,TS val 3617117819 ecr 1084930025], length 106
17:34:37.731322 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [R], seq 1304758053, win 0, length 0
17:34:47.728119 IP 10.44.0.1.50138 > 10.44.0.92.mysql: Flags [S], seq 502800802, win 28400, options [mss 1420,sackOK,TS val 1084940022 ecr 0,nop,wscale 7], length 0
17:34:47.728179 IP 10.44.0.92.mysql > 10.44.0.1.50138: Flags [S.], seq 4294752326, ack 502800803, win 28160, options [mss 1420,sackOK,TS val 3617127816 ecr 1084940022,nop,wscale 7], length 0
We could see the ICMP packets are the packets for the ping command sent from the pod, after the ICMP packets, the readiness check packets now immediately become right, the TCP handleshake success.
Not only ping could make it work, other commands like curl/wget could also make it work, just need to reach the cluster node from the failing pod, after that, the TCP connection from cluster node to the pod will become correct.
The failing pods change time to time, it could happen to any pod, since there are 100 pods up running on the node, not sure if it triggers certain system limitation, however all others work correctly, we don't see huge CPU utilization, and there still be few GB memory left on the node.
Does anyone know what the issue could be?