How to debug "ERROR: Could not reach the worker node."? - bluedata-3.7

I am trying to set up a gateway and a worker node on a BlueData 3.7 controller using SSH credentials.
I believe that I have met all the pre-requisites in the docs, however, I get an error trying to install the gateway and the worker in the Installation section of the controller UI:
The error I get for both hosts are:
ERROR: Could not reach the worker node.
The URL for the errors are:
http://x.x.x.x/bdswebui/logworker/?id=/api/v1/workers/4
http://x.x.x.x/bdswebui/logworker/?id=/api/v1/workers/5
I have checked the logs on the gateway and the worker. Both show:
# tree /var/log/bluedata/
/var/log/bluedata/
└── install
1 directory, 0 files
All hosts can ssh to each other without a password prompt.
No firewall is running:
# iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
I've deleted the worker while listening in with tcpdump on the worker:
# tcpdump -i eth0 -ttttnnvvS src host x.x.x.x and tcp
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
2019-08-17 00:08:41.570940 IP (tos 0x0, ttl 64, id 3977, offset 0, flags [DF], proto TCP (6), length 48)
x.x.x.x.39039 > y.y.y.y.22: Flags [S], cksum 0x6215 (correct), seq 1453535125, win 29200, options [mss 1460,nop,wscale 9], length 0
...
x.x.x.x.46064 > y.y.y.y.22: Flags [F.], cksum 0x564b (correct), seq 1997087540, ack 4031219947, win 238, length 0
2019-08-17 00:14:54.710739 IP (tos 0x0, ttl 64, id 15525, offset 0, flags [DF], proto TCP (6), length 40)
x.x.x.x.46064 > y.y.y.y.22: Flags [.], cksum 0x564a (correct), seq 1997087541, ack 4031219948, win 238, length 0
Checking port 46064 on the controller, I can see that it is coming from a beam process which gives me confidence the network connectivity is ok between the two machines:
# lsof -i -P -n | grep 46064
beam.smp 12714 root 16u IPv4 498735 0t0 TCP x.x.x.x:46064->y.y.y.y:22 (ESTABLISHED)
What else can I do to debug?

The debug information I needed could be found in the /tmp folder on the worker and gateway hosts, e.g.
/tmp/bd_prechecks.nnnnn.log
/tmp/bd_prechecks.nnnnn.log.xtrace
/tmp/bds-nnnnnnnnnnnnnn.log
/tmp/bds-nnnnnnnnnnnnnn.log.xtrace
/tmp/worker_setup_x.x.x.x-n-n-n-n-n-n
/tmp/worker_setup_x.x.x.x-n-n-n-n-n-n.xtrace
For more information, see http://docs.bluedata.com/37_step-1-troubleshooting

Related

In a WebSocket connection how to determine who is terminating socket first? Client or Server? [duplicate]

I am trying to understand TIME_WAIT and CLOSE_WAIT.
I have opened a socket connection via Chrome console, which connects to my Java WebSocket server running locally - then closed it:
var webSocket = new WebSocket('ws://127.0.0.1:1234/support');
webSocket.close();
What I expect to see when I run sudo ss -apn|grep ':1234' is CLOSE_WAIT as it the client who has closed the connection, and should be first to send fin packet, which would have changed server socket status to CLOSE_WAIT, however, I see differently.
Tuesday 17 May 2022 06:23:21 PM IST
tcp6 0 0 :::1234 :::* LISTEN 86008/java
tcp6 0 0 127.0.0.1:1234 127.0.0.1:60672 TIME_WAIT -
Can somebody please explain what is happening?
Update 1:
masood#masood-ThinkPad-L450:~/Desktop/code/current/chat_server11$ sudo tcpdump -i lo port 1234
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
21:39:58.376375 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [S], seq 969747395, win 65476, options [mss 65476,sackOK,TS val 3310041060 ecr 0,nop,wscale 7], length 0
21:39:58.376393 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [S.], seq 630365028, ack 969747396, win 65464, options [mss 65476,sackOK,TS val 3310041060 ecr 3310041060,nop,wscale 7], length 0
21:39:58.376410 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [.], ack 1, win 512, options [nop,nop,TS val 3310041060 ecr 3310041060], length 0
21:39:58.378183 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [P.], seq 1:535, ack 1, win 512, options [nop,nop,TS val 3310041062 ecr 3310041060], length 534
21:39:58.378197 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [.], ack 535, win 508, options [nop,nop,TS val 3310041062 ecr 3310041062], length 0
21:39:58.398870 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [P.], seq 1:130, ack 535, win 512, options [nop,nop,TS val 3310041083 ecr 3310041062], length 129
21:39:58.398890 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [.], ack 130, win 511, options [nop,nop,TS val 3310041083 ecr 3310041083], length 0
21:40:08.678597 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [P.], seq 535:541, ack 130, win 512, options [nop,nop,TS val 3310051362 ecr 3310041083], length 6
21:40:08.678618 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [.], ack 541, win 512, options [nop,nop,TS val 3310051362 ecr 3310051362], length 0
21:40:08.679275 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [P.], seq 130:132, ack 541, win 512, options [nop,nop,TS val 3310051363 ecr 3310051362], length 2
21:40:08.679293 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [.], ack 132, win 512, options [nop,nop,TS val 3310051363 ecr 3310051363], length 0
21:40:08.679438 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [F.], seq 132, ack 541, win 512, options [nop,nop,TS val 3310051363 ecr 3310051363], length 0
21:40:08.679487 IP6 ip6-localhost.34196 > ip6-localhost.1234: Flags [F.], seq 541, ack 133, win 512, options [nop,nop,TS val 3310051363 ecr 3310051363], length 0
21:40:08.679506 IP6 ip6-localhost.1234 > ip6-localhost.34196: Flags [.], ack 542, win 512, options [nop,nop,TS val 3310051363 ecr 3310051363], length 0
I tried to capture packets using tcpdump and I can clearly see that server is sending directly FIN+ACK without receiving FIN.I am totally confused.
as it the client who has closed the connection, and should be first to send fin packet
That is an incorrect assumption. When the client calls close() on a fully established WebSocket connection, close() will perform a handshake that causes the server to close the TCP connection first (send the initial FIN), rather than the client.
This is described in the WebSocket protocol spec, RFC 6455, Section 7, "Closing the Connection":
7.1.1. Close the WebSocket Connection
To Close the WebSocket Connection, an endpoint closes the underlying TCP connection. An endpoint SHOULD use a method that cleanly closes the TCP connection, as well as the TLS session, if applicable, discarding any trailing bytes that may have been received. An endpoint MAY close the connection via any means available when necessary, such as when under attack.
The underlying TCP connection, in most normal cases, SHOULD be closed first by the server, so that it holds the TIME_WAIT state and not the client (as this would prevent it from re-opening the connection for 2 maximum segment lifetimes (2MSL), while there is no corresponding server impact as a TIME_WAIT connection is immediately reopened upon a new SYN with a higher seq number). In abnormal cases (such as not having received a TCP Close from the server after a reasonable amount of time) a client MAY initiate the TCP Close. As such, when a server is instructed to Close the WebSocket Connection it SHOULD initiate a TCP Close immediately, and when a client is instructed to do the same, it SHOULD wait for a TCP Close from the server.
As an example of how to obtain a clean closure in C using Berkeley sockets, one would call shutdown() with SHUT_WR on the socket, call recv() until obtaining a return value of 0 indicating that the peer has also performed an orderly shutdown, and finally call close() on the socket.
7.1.2. Start the WebSocket Closing Handshake
To Start the WebSocket Closing Handshake with a status code (Section 7.4) /code/ and an optional close reason (Section 7.1.6) /reason/, an endpoint MUST send a Close control frame, as described in Section 5.5.1, whose status code is set to /code/ and whose close reason is set to /reason/. Once an endpoint has both sent and received a Close control frame, that endpoint SHOULD Close the WebSocket Connection as defined in Section 7.1.1.
As well as the WebSockets standard, close() method:
Run the first matching steps from the following list:
If this's ready state is CLOSING (2) or CLOSED (3)
...
If the WebSocket connection is not yet established [WSP]
...
If the WebSocket closing handshake has not yet been started [WSP]
...
Otherwise
Set this's ready state to CLOSING (2).
NOTE: The WebSocket closing handshake is started, and will eventually invoke the close the WebSocket connection algorithm, which will establish that the WebSocket connection is closed, and thus the close event will fire, as described below.

DNS in K8S pod: tcpdump shows bad udp checksum but nslookup still works and UDP error counters don't increment

I have a K8S pod. Inside pod, I do dns lookup using nslookup. It works fine. But when I do tcpdump on pod interface (eth0), it clearly shows received dns response has bad udp checksum. I checked with netstat the udp counters, but I dont see the checksum error counter (InCsumErrors) at all getting hit. Here are some relevant outputs.
IP config of pod:
root#node:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0#NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0#if10936: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether e2:22:5c:6c:53:bd brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.233.85.177/32 scope global eth0
valid_lft forever preferred_lft forever
Successfull Nslookup:
bash-4.4# nslookup google.com
Server: 169.254.25.10
Address: 169.254.25.10#53
Non-authoritative answer:
Name: google.com
Address: 216.58.207.238
Name: google.com
Address: 2a00:1450:400e:809::200e
Tcpdump showing bad udp cksum for above nslookup run:
root#node:~# tcpdump -ni eth0 -vvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:24.267999 IP (tos 0x0, ttl 64, id 50356, offset 0, flags [none], proto UDP (17), length 82)
10.233.85.177.52764 > 169.254.25.10.53: [bad udp cksum 0x23f2 -> 0xd1bd!] 43806+ A? google.com.qaammuk.svc.cluster.local. (54)
16:02:24.269489 IP (tos 0x0, ttl 64, id 56987, offset 0, flags [DF], proto UDP (17), length 175)
169.254.25.10.53 > 10.233.85.177.52764: [bad udp cksum 0x244f -> 0x2c2a!] 43806 NXDomain*- q: A? google.com.qaammuk.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (147)
16:02:24.269847 IP (tos 0x0, ttl 64, id 50357, offset 0, flags [none], proto UDP (17), length 74)
10.233.85.177.39433 > 169.254.25.10.53: [bad udp cksum 0x23ea -> 0xac65!] 45029+ A? google.com.svc.cluster.local. (46)
16:02:24.270901 IP (tos 0x0, ttl 64, id 56988, offset 0, flags [DF], proto UDP (17), length 167)
169.254.25.10.53 > 10.233.85.177.39433: [bad udp cksum 0x2447 -> 0x06d2!] 45029 NXDomain*- q: A? google.com.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (139)
16:02:24.271206 IP (tos 0x0, ttl 64, id 50358, offset 0, flags [none], proto UDP (17), length 70)
10.233.85.177.59330 > 169.254.25.10.53: [bad udp cksum 0x23e6 -> 0xdaca!] 2633+ A? google.com.cluster.local. (42)
16:02:24.272262 IP (tos 0x0, ttl 64, id 56989, offset 0, flags [DF], proto UDP (17), length 163)
169.254.25.10.53 > 10.233.85.177.59330: [bad udp cksum 0x2443 -> 0x3537!] 2633 NXDomain*- q: A? google.com.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1609862082 7200 1800 86400 5 (135)
16:02:24.272527 IP (tos 0x0, ttl 64, id 50359, offset 0, flags [none], proto UDP (17), length 56)
10.233.85.177.53873 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x278c!] 52759+ A? google.com. (28)
16:02:24.272707 IP (tos 0x0, ttl 64, id 56990, offset 0, flags [DF], proto UDP (17), length 82)
169.254.25.10.53 > 10.233.85.177.53873: [bad udp cksum 0x23f2 -> 0xe468!] 52759* q: A? google.com. 1/0/0 google.com. [8s] A 216.58.211.110 (54)
16:02:24.272963 IP (tos 0x0, ttl 64, id 50360, offset 0, flags [none], proto UDP (17), length 56)
10.233.85.177.54691 > 169.254.25.10.53: [bad udp cksum 0x23d8 -> 0x370f!] 47943+ AAAA? google.com. (28)
16:02:24.273141 IP (tos 0x0, ttl 64, id 56991, offset 0, flags [DF], proto UDP (17), length 94)
169.254.25.10.53 > 10.233.85.177.54691: [bad udp cksum 0x23fe -> 0xf8e0!] 47943* q: AAAA? google.com. 1/0/0 google.com. [8s] AAAA 2a00:1450:400e:809::200e (66)
netstat output to show udp counters from linux stack. No InCsumErrors:
root#node:~# netstat -s -u
Udp:
18 packets received
0 packets to unknown port received
0 packet receive errors
18 packets sent
0 receive buffer errors
0 send buffer errors
UdpLite:
IpExt:
InOctets: 2130
OutOctets: 1101
InNoECTPkts: 18
I tried both with checksum offload enabled and disable on eth0. Same behavior in both cases.
Shouldn't bad udp checksum detected by tcpdump mean that kernel will at some point drop udp packets before handing them over to the socket bound to nslookup?
When you do nslookup google.com 8.8.8.8 everything looks fine. I think that this because since you are using coredns to resolve the domains, the packets run through a Service.
Service in k8s is a virtual entity. It appears as a forwarding rule in iptables. During forwarding process, source ip address is swapped out without recalculating the cksum, thus the error in tcpdump.
And according to RFC 768, UDP checksum is defined as following:
Checksum is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data
So as you see, IP header also is a part of a checksum and so is the source IP that is swapped out and this is changing the checksum.
Calculating the checksum is usually done using hardware acceleration of NIC before sending/receiving packets to a node. It would require a lot of computation just to compute checksums of all packets doing through iptables. And also it is useless because once you receive a packet on a node's network interface and you confirm that it's valid, you can be sure that it will stay valid within the node even after you forward it with iptables.
Does K8S setup some rules for linux kernel to ignore bad udp checksums
for pod interfaces?
I know that e.g. loopback interface does not checksum packets (at least not by default). Maybe brigde interfaces (e.g. docker0 and veth*) also doesn't checksum. I tried to find a strong evidence to prove this statement but I didn't find anything to either prove it or disprove it.
try:
ethtool --offload eth0 rx off tx off
ethtool -K eth0 gso off

rsyslogd client not closing the TCP connection when server rsyslogd goes down

I have configured rsyslogd on a remote server to send the logs from a client machine with rsyslogd using TCP protocol. After configuring and restarting the rsyslogd daemon on both client and server I am able to send the packets to the server and all works fine. But later when I restart the rsyslogd on the server the client is still sending the packets to the old TCP connection. Hence the client retries for 16 times and fails to send the packet. After the retry for sending the next packet the client is creating a new connection and the communication works fine there onward.
When I restart the rsyslogd on server using the tcpdump i captured the packets and we can see that the server sends flag[F] to client and the client acknowledges it as well. But when we send the next packet it is not crating a new connection.
restart rsyslog on server:
Server side tcpdump:
*09:54:50.012933 IP x.x.x.101.514 > y.y.y.167.37141: Flags [F.], seq 1, ack 31, win 229, length 0
09:54:50.013050 IP y.y.y.167.37141 > x.x.x.101.514: Flags [.], ack 2,
win 115, length 0*
For the very next packet send from client the server sends flag[R] but client keeps retrying for 16 times:
tcpdump from server:
*03:55:11.811611 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq 31:61, ack 2, win 115, length 30
03:55:11.811647 IP x.x.x.101.514 > y.y.y.167.37141: Flags [R], seq
1863584583, win 0, length 0
03:55:12.014158 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
03:55:12.014189 IP x.x.x.101.514 > y.y.y.167.37141: Flags [R], seq
1863584583, win 0, length 0*
<this repeated 6 times on sever>
at same time on client we do not see the response reaching from server:
09:55:11.811077 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
09:55:12.013639 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
09:55:12.421627 IP y.y.y.167.37141 > x.x.x.101.514: Flags [P.], seq
31:61, ack 2, win 115, length 30
<this retied 16 times>
Now after the 16 retry (took ~13 min) if we send a new packet it is sending correctly.
Here we see a new session is getting created:
*10:16:43.873325 IP y.y.y.167.39859 > x.x.x.101.514: Flags [S], seq 1000783963, win 14600, options [mss 1460,nop,wscale 7], length 0
10:16:43.873658 IP x.x.x.101.514 > y.y.y.167.39859: Flags [S.],
seq 231452091, ack 1000783964, win 29200, options [mss 1460,nop,wscale
7], length 0
10:16:43.873740 IP y.y.y.167.39859 > x.x.x.101.514: Flags [.], ack 1,
win 115, length 0
10:16:43.873904 IP y.y.y.167.39859 > x.x.x.101.514: Flags [P.], seq
1:31, ack 1, win 115, length 30
10:16:43.874084 IP x.x.x.101.514 > y.y.y.167.39859: Flags [.], ack 31,
win 229, length 0*
Does any one faced such issue? Can any one tell why server is not closing the connection when client sends flag[F]. Do we have any configuration parameter in rsyslogd to create a new session when server sends flag[F]?
Why client is sending data after receiving FIN and ACKed it?
TCP connection termination is a 4 way handshake, which means once a client received FIN from server, it acknowledges it and sends all remaining data to server before sending another FIN to server and wait for it's ACK to complete the hand-shake and fully close the connection.
Logs you have provided shows that, the connection is half-open when the server restarted (which it should have not done, before connection is full-close). and that's why the client is sending remaining data before completing handshake.
What is the correct way of abrupt termination?
When an endpoint is about to abruptly terminate a connection, while already some data is in transfer, it should send RST packet instead of FIN.
Why the RST packet sent by server after restart is not received in client?
It must have been discarded as already the connection is half-open with FIN packet received earlier, or it must have been discarded by client firewall for potential TCP Reset attack

Kubelet periodically lose TCP connection with pods when doing liveness/readiness probe check on GKE

We have an software system deployed in a single GKE(google kubernetes engine) cluster node which using around 100 pods, in each pod we defined TCP readiness probe, now we could see the readiness probe periodically fails with Unable to connect to remote host: Connection refused on different pods.
With the tcpdump traces on the cluster node and the failing pods, we find that the packets sent from the cluster node seems right, while the pod doesn't receive the TCP packet, but failing pod could still receive IP broadcast packets.
The weird thing is if we ping/curl/wget the cluster node from the failing pod, regardless the cluster node has http service or not, the TCP connection will be recovered immediately and the readiness check will become fine.
An example is as below:
The cluster node host: 10.44.0.1
The failing pod host: 10.44.0.92
tcpdump on cbr0 interface of the cluster node:
#sudo tcpdump -i cbr0 host 10.44.0.92
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:33:52.913052 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:33:52.913181 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:33:57.727497 IP 10.44.0.1.47736 > 10.44.0.92.mysql: Flags [S], seq 756717730, win 28400, options [mss 1420,sackOK,TS val 1084890021 ecr 0,nop,wscale 7], length 0
17:33:57.727537 IP 10.44.0.92.mysql > 10.44.0.1.47736: Flags [R.], seq 0, ack 756717731, win 0, length 0
17:34:07.727563 IP 10.44.0.1.48202 > 10.44.0.92.mysql: Flags [S], seq 2235831098, win 28400, options [mss 1420,sackOK,TS val 1084900021 ecr 0,nop,wscale 7], length 0
17:34:07.727618 IP 10.44.0.92.mysql > 10.44.0.1.48202: Flags [R.], seq 0, ack 2235831099, win 0, length 0
17:34:12.881059 ARP, Request who-has 10.44.0.92 tell 10.44.0.1, length 28
17:34:12.881176 ARP, Reply 10.44.0.92 is-at 0a:58:0a:2c:00:5c (oui Unknown), length 28
These are readiness checks packets sent from Kubelet, we could see the failing node responds with Flags [R.], seq 0, ack 756717731, win 0, length 0 which is a TCP handshake ACK/SYN reply, it is a failing packet and the TCP connection will NOT be established.
The if we exec -it to the failing pod and ping the cluster node from the pod as below:
root#mariadb:/# ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: icmp_seq=0 ttl=64 time=3.301 ms
64 bytes from 10.44.0.1: icmp_seq=1 ttl=64 time=0.303 ms
Then let's see what's happening on cluster node side from the TCP dump:
#sudo tcpdump -i cbr0 host 10.44.0.92
17:34:17.728039 IP 10.44.0.92.mysql > 10.44.0.1.48704: Flags [R.], seq 0, ack 2086181490, win 0, length 0
17:34:27.727638 IP 10.44.0.1.49202 > 10.44.0.92.mysql: Flags [S], seq 1769056007, win 28400, options [mss 1420,sackOK,TS val 1084920022 ecr 0,nop,wscale 7], length 0
17:34:27.727693 IP 10.44.0.92.mysql > 10.44.0.1.49202: Flags [R.], seq 0, ack 1769056008, win 0, length 0
17:34:34.016995 ARP, Request who-has 10.44.0.1 tell 10.44.0.92, length 28
17:34:34.018358 ARP, Reply 10.44.0.1 is-at 0a:58:0a:2c:00:01 (oui Unknown), length 28
17:34:34.020020 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 0, length 64
17:34:34.020101 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 0, length 64
17:34:35.017197 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 1, length 64
17:34:35.017256 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 1, length 64
17:34:36.018589 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 2, length 64
17:34:36.018700 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 2, length 64
17:34:37.019791 IP 10.44.0.92 > 10.44.0.1: ICMP echo request, id 53, seq 3, length 64
17:34:37.019837 IP 10.44.0.1 > 10.44.0.92: ICMP echo reply, id 53, seq 3, length 64
17:34:37.730849 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [S], seq 1304758051, win 28400, options [mss 1420,sackOK,TS val 1084930025 ecr 0,nop,wscale 7], length 0
17:34:37.730900 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [S.], seq 1267340310, ack 1304758052, win 28160, options [mss 1420,sackOK,TS val 3617117819 ecr 1084930025,nop,wscale 7], length 0
17:34:37.730952 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [.], ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731149 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [F.], seq 1, ack 1, win 222, options [nop,nop,TS val 1084930025 ecr 3617117819], length 0
17:34:37.731268 IP 10.44.0.92.mysql > 10.44.0.1.49666: Flags [P.], seq 1:107, ack 2, win 220, options [nop,nop,TS val 3617117819 ecr 1084930025], length 106
17:34:37.731322 IP 10.44.0.1.49666 > 10.44.0.92.mysql: Flags [R], seq 1304758053, win 0, length 0
17:34:47.728119 IP 10.44.0.1.50138 > 10.44.0.92.mysql: Flags [S], seq 502800802, win 28400, options [mss 1420,sackOK,TS val 1084940022 ecr 0,nop,wscale 7], length 0
17:34:47.728179 IP 10.44.0.92.mysql > 10.44.0.1.50138: Flags [S.], seq 4294752326, ack 502800803, win 28160, options [mss 1420,sackOK,TS val 3617127816 ecr 1084940022,nop,wscale 7], length 0
We could see the ICMP packets are the packets for the ping command sent from the pod, after the ICMP packets, the readiness check packets now immediately become right, the TCP handleshake success.
Not only ping could make it work, other commands like curl/wget could also make it work, just need to reach the cluster node from the failing pod, after that, the TCP connection from cluster node to the pod will become correct.
The failing pods change time to time, it could happen to any pod, since there are 100 pods up running on the node, not sure if it triggers certain system limitation, however all others work correctly, we don't see huge CPU utilization, and there still be few GB memory left on the node.
Does anyone know what the issue could be?

Why mongodb refuses ssl connections?

I am running a mongodb 3.2 instance on a vServer behind a firewall (which I am not allowed to configure). Mongo is reachable (and connectable) from anywhere (bind_ip: 0.0.0.0) if ssl is turned off in /etc/mongod.conf
Further I generated a CA, a server.pem and a client.pem (to connect via mongo-shell). These Certs are working fine, since I can connect to mongod from the machine mongod is running on:
$ mongo --host localhost --ssl --sslPEMKeyFile client.pem --sslCAFile ca.crt
BUT: when I try to connect from another machine with the same Certs it won't connect:
$ mongo --host mongo1.mydomain.net --ssl --sslPEMKeyFile client.pem --sslCAFile ca.crt
MongoDB shell version: 3.2.9
connecting to: <ip>:27017/test
2016-08-22T22:29:17.632+0200 W NETWORK [thread1] Failed to connect to <ip>:27017 after 5000 milliseconds, giving up.
2016-08-22T22:29:17.633+0200 E QUERY [thread1] Error: couldn't connect to server <ip>:27017, connection attempt failed :
connect#src/mongo/shell/mongo.js:231:14
#(connect):1:6
This is strange because the /var/log/mongodb/mongd.log says that mongod is listining on port 27017 for ssl connections (netstat says the same):
2016-08-22T21:09:10.182+0200 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/var/lib/mongodb/diagnostic.data'
2016-08-22T21:09:10.182+0200 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2016-08-22T21:09:10.183+0200 I NETWORK [initandlisten] waiting for connections on port 27017 ssl
The ssl connection attempt from the other machine is not listed in the log file which is even more strange.
So I asked tcpdump:
23:10:32.984067 IP (tos 0x0, ttl 51, id 64132, offset 0, flags [DF], proto TCP (6), length 60)
<other_machine>.39644 > 172.12.51.23.27017: Flags [S], cksum 0xd3d0 (correct), seq 1809185188, win 29200, options [mss 1420,sackOK,TS val 7275296 ecr 0,nop,wscale 7], length 0
23:10:32.984112 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
172.12.51.23.27017 > <other_machine>.39644: Flags [S.], cksum 0x9506 (incorrect -> 0x14cf), seq 2653469724, ack 1809185189, win 28960, options [mss 1460,sackOK,TS val 93151206 ecr 7275296,nop,wscale 7], length 0
23:10:33.041545 IP (tos 0x0, ttl 51, id 64133, offset 0, flags [DF], proto TCP (6), length 52)
<other_machine>.39644 > 172.12.51.23.27017: Flags [.], cksum 0xb3c5 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 7275313 ecr 93151206], length 0
23:10:33.047713 IP (tos 0x0, ttl 63, id 49309, offset 0, flags [none], proto TCP (6), length 40)
<other_machine>.39644 > 172.12.51.23.27017: Flags [R.], cksum 0x55ec (correct), seq 1, ack 1, win 229, length 0
The first reply from the mongo server has always an invalid checksum. But I really don't know if this is important or not. Actually I have no clue how to solve this an I am just hoping that anyone could help me with a hint.
cheers,
dymat
The hostname in the certificate might be different than the one used in mongo command. Try adding --sslAllowInvalidHostnames param at the end of mongo command.
mongo --host mongo1.mydomain.net --ssl --sslPEMKeyFile client.pem --sslCAFile ca.crt --sslAllowInvalidHostnames
See the mongodb connections options at https://docs.mongodb.com/manual/reference/program/mongotop/#cmdoption--sslAllowInvalidHostnames
Thanks Mehmet!
This was my first idea, too. But after talking to the sysadmin it turned out, that they had installed a application firewall, which classified SSL requests on port 27017 as not belonging to the mongodb server. The firewall expected unencrypted traffic on port 27017.
Once they reconfigured the firewall everything works as it is supposed to.
Bye,
dymat