TCP transactions taking progressively longer - sockets

I have a piece of equipment I'm trying to talk to over TCP. It's quite simple, and the following (python) code is pretty much exactly what I want to do in practice.
What should happen is I send a packet to request the device change a setting, then it should receive an "operation complete packet" (which is just '1').
I've implemented the code below in C with the same result, so I'm confident it isn't a Python problem.
import numpy as np
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Set some values to send to the device
values = np.arange(0.2, 1.0, 0.02)
sock.connect(('192.168.1.147', 9221))
for value in values:
# This string sets a value, then requests an operation complete output.
sock.sendall('I1 %1.3f; *OPC?'.encode('utf-8') % (value, ))
print(sock.recv(32)) # Prints out '1', or b'1\r\n' in python 3
What I actually experience is the output is the expected output but the delay between the sending and receiving gets progressively longer until I have to kill the program.
The resultant wireshark output is interesting:
No. Time Source Destination Protocol Length Info
4 2.970271597 192.168.1.106 192.168.1.147 TCP 74 49938 → 9221 [SYN] Seq=0 Win=26880 Len=0 MSS=8960 SACK_PERM=1 TSval=26446953 TSecr=0 WS=128
5 2.971102415 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460
6 2.971118924 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [ACK] Seq=1 Ack=1 Win=26880 Len=0
7 2.971152591 192.168.1.106 192.168.1.147 TCP 69 49938 → 9221 [PSH, ACK] Seq=1 Ack=1 Win=26880 Len=15
8 2.977589098 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [PSH, ACK] Seq=1 Ack=1 Win=5840 Len=3
9 2.977597172 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [ACK] Seq=16 Ack=4 Win=26880 Len=0
10 2.977948459 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [ACK] Seq=1 Ack=16 Win=5840 Len=0
11 2.977952056 192.168.1.106 192.168.1.147 TCP 54 [TCP Dup ACK 9#1] 49938 → 9221 [ACK] Seq=16 Ack=4 Win=26880 Len=0
12 3.167593066 192.168.1.106 192.168.1.147 TCP 69 [TCP Spurious Retransmission] 49938 → 9221 [PSH, ACK] Seq=1 Ack=4 Win=26880 Len=15
13 3.168475846 192.168.1.147 192.168.1.106 TCP 60 [TCP Dup ACK 10#1] 9221 → 49938 [ACK] Seq=4 Ack=16 Win=5840 Len=0
14 3.168487149 192.168.1.106 192.168.1.147 TCP 69 49938 → 9221 [PSH, ACK] Seq=16 Ack=4 Win=26880 Len=15
15 3.174457755 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [PSH, ACK] Seq=4 Ack=16 Win=5840 Len=3
16 3.174481722 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [ACK] Seq=31 Ack=7 Win=26880 Len=0
17 3.174817948 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [ACK] Seq=4 Ack=31 Win=5840 Len=0
18 3.567587105 192.168.1.106 192.168.1.147 TCP 69 [TCP Spurious Retransmission] 49938 → 9221 [PSH, ACK] Seq=16 Ack=7 Win=26880 Len=15
19 3.568540028 192.168.1.147 192.168.1.106 TCP 60 [TCP Dup ACK 17#1] 9221 → 49938 [ACK] Seq=7 Ack=31 Win=5840 Len=0
20 3.568551611 192.168.1.106 192.168.1.147 TCP 69 49938 → 9221 [PSH, ACK] Seq=31 Ack=7 Win=26880 Len=15
21 3.574509787 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [PSH, ACK] Seq=7 Ack=31 Win=5840 Len=3
22 3.574533527 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [ACK] Seq=46 Ack=10 Win=26880 Len=0
23 3.574857577 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [ACK] Seq=7 Ack=46 Win=5840 Len=0
24 3.574870866 192.168.1.106 192.168.1.147 TCP 54 [TCP Dup ACK 22#1] 49938 → 9221 [ACK] Seq=46 Ack=10 Win=26880 Len=0
25 4.367591502 192.168.1.106 192.168.1.147 TCP 69 [TCP Spurious Retransmission] 49938 → 9221 [PSH, ACK] Seq=31 Ack=10 Win=26880 Len=15
26 4.368487116 192.168.1.147 192.168.1.106 TCP 60 [TCP Dup ACK 23#1] 9221 → 49938 [ACK] Seq=10 Ack=46 Win=5840 Len=0
27 4.368498284 192.168.1.106 192.168.1.147 TCP 69 49938 → 9221 [PSH, ACK] Seq=46 Ack=10 Win=26880 Len=15
28 4.374526599 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [PSH, ACK] Seq=10 Ack=46 Win=5840 Len=3
29 4.374558188 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [ACK] Seq=61 Ack=13 Win=26880 Len=0
30 4.374881659 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [ACK] Seq=10 Ack=61 Win=5840 Len=0
31 4.374896303 192.168.1.106 192.168.1.147 TCP 54 [TCP Dup ACK 29#1] 49938 → 9221 [ACK] Seq=61 Ack=13 Win=26880 Len=0
32 5.971603454 192.168.1.106 192.168.1.147 TCP 69 [TCP Spurious Retransmission] 49938 → 9221 [PSH, ACK] Seq=46 Ack=13 Win=26880 Len=15
33 5.972478351 192.168.1.147 192.168.1.106 TCP 60 [TCP Dup ACK 30#1] 9221 → 49938 [ACK] Seq=13 Ack=61 Win=5840 Len=0
34 5.972490012 192.168.1.106 192.168.1.147 TCP 69 49938 → 9221 [PSH, ACK] Seq=61 Ack=13 Win=26880 Len=15
35 5.978397699 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [PSH, ACK] Seq=13 Ack=61 Win=5840 Len=3
36 5.978418493 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [ACK] Seq=76 Ack=16 Win=26880 Len=0
37 5.978754841 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [ACK] Seq=13 Ack=76 Win=5840 Len=0
38 5.978768499 192.168.1.106 192.168.1.147 TCP 54 [TCP Dup ACK 36#1] 49938 → 9221 [ACK] Seq=76 Ack=16 Win=26880 Len=0
43 7.134413907 192.168.1.106 192.168.1.147 TCP 69 49938 → 9221 [FIN, PSH, ACK] Seq=76 Ack=16 Win=26880 Len=15
44 7.140478879 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [PSH, ACK] Seq=16 Ack=76 Win=5840 Len=3
45 7.140515226 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [RST] Seq=76 Win=0 Len=0
46 7.140822269 192.168.1.147 192.168.1.106 TCP 60 9221 → 49938 [ACK] Seq=16 Ack=91 Win=5840 Len=0
47 7.140838249 192.168.1.106 192.168.1.147 TCP 54 49938 → 9221 [RST] Seq=91 Win=0 Len=0
(At which point I killed it)
What it suggests to me is that the ACK is being sent, but is being ignored for some reason (malformed in some way?). The original packet is then sent again with progressively longer delays (which I suspect is correct behaviour for lost packets), which results in the transmission buffer being backed up so the new packet is not being sent. Wireshark seems to think the resend is spurious as is demonstrated by the "Spurious Retransmission".
The device is responding to the packet as soon as it's sent the first time, which I can see from it's display.
It's possible there is some protocol incorrectness going on, but I don't know how to diagnose the problem. I'm happy for a work around to this - it's not production at this stage.
(FYI, the device is a TTi power supply).

So it turns out the problem was a firmware issue on the hardware. I'm curious about what was incorrect in the ACK packets to cause a resend, but the problem as far as I am concerned is solved.
For completeness and for anybody encountering the same problem, the power supply is a TTi PL303QMD. I upgraded from firmware version 3.02 to 4.06 and the problem was solved.

Related

Kubernetes readiness and liveliness endpoints being called from the pod itself

I have a cluster running linkerd and several pods which are all meshed. One pod has strange behavior in that I am seeing multiple calls for liveliness and readiness. One originates from the kubelet, the other originates from the pod itself.
I installed the linkerd debug container into the pod to verify this. Here is a sample of the debug logs showing /readyz being called twice as described:
61500 8585.442787151 10.56.108.1 → 10.56.108.3 HTTP 177 GET /readyz HTTP/1.1
61501 8585.442829241 10.56.108.3 → 10.56.108.1 TCP 68 5200 → 53564 [ACK] Seq=1 Ack=110 Win=43648 Len=0 TSval=885352394 TSecr=1855609776
61502 8585.443101851 10.56.108.3 → 10.56.108.3 HTTP 158 GET /readyz HTTP/1.1
61503 8585.443136941 10.56.108.3 → 10.56.108.3 TCP 68 5200 → 49518 [ACK] Seq=178 Ack=181 Win=43776 Len=0 TSval=2132012422 TSecr=2132012422
61504 8585.443699891 10.56.108.3 → 10.56.108.3 HTTP 245 HTTP/1.1 200 OK
61505 8585.443706991 10.56.108.3 → 10.56.108.3 TCP 68 49518 → 5200 [ACK] Seq=181 Ack=355 Win=43648 Len=0 TSval=2132012423 TSecr=2132012423
61506 8585.443804231 10.56.108.3 → 10.56.108.1 HTTP 198 HTTP/1.1 200 OK
I don't see this behavior on other pods in the cluster, is there anything that could be configured in linkerd that might create this issue?

kubernetes externalTrafficPolicy: Cluster service timing out (tcp dump included)

I have a kubernetes service set to externalTrafficPolicy: Cluster (it's a simple nginx backend). When i try to curl it from outside the cluster it's often timing out. The loadBalancerSourceRanges are set to 0.0.0.0/0, and it actually succeeds very infrequently (2/20 times).
I am aware that in an externalTrafficPolicy:Cluster service, the nodes in the cluster use iptables to reach the pod. So i did some tcpdumps from both the pod and a node in the cluster that is attempting to reach the pod
Below is a tcpdump from a node that the backend pod tried to reach and send data to. (note I am using Calico for my cluster CNI plugin).
10.2.243.236 is the IP of the backend pod
sudo tshark -i vxlan.calico | grep 10.2.243.236
Running as user "root" and group "root". This could be dangerous.
Capturing on 'vxlan.calico'
468 463 5.867497471 10.2.0.192 -> 10.2.243.236 TCP 58 38109 > http [SYN] Seq=0 Win=7514 Len=0 MSS=1460
464 5.867920115 10.2.243.236 -> 10.2.0.192 TCP 58 http > 38109 [SYN, ACK] Seq=0 Ack=1 Win=26200 Len=0 MSS=1310
604 599 7.372050068 10.2.243.236 -> 10.2.0.192 TCP 58 [TCP Retransmission] http > 38109 [SYN, ACK] Seq=0 Ack=1 Win=26200 Len=0 MSS=1310
759 781 9.372058511 10.2.243.236 -> 10.2.0.192 TCP 58 [TCP Retransmission] http > 38109 [SYN, ACK] Seq=0 Ack=1 Win=26200 Len=0 MSS=1310
1094 1078 13.372017415 10.2.243.236 -> 10.2.0.192 TCP 58 [TCP Retransmission] http > 38109 [SYN, ACK] Seq=0 Ack=1 Win=26200 Len=0 MSS=1310
1877 1913 21.372786131 10.2.243.236 -> 10.2.0.192 TCP 58 [TCP Retransmission] http > 38109 [SYN, ACK] Seq=0 Ack=1 Win=26200 Len=0 MSS=1310
3285 3281 37.372007425 10.2.243.236 -> 10.2.0.192 TCP 58 [TCP Retransmission] http > 38109 [SYN, ACK] Seq=0 Ack=1 Win=26200 Len=0 MSS=1310
So it basically seems like the node is initiating the TCP connection, but is not responding to the pod's syn-ack message, and eventually the connection times out.
How can i debug this further? I'm kind of stuck on how i can debug why the node is seemingly not responding to the connection it initiated in the first place.
NOTE: I can curl the pod IP successfully from inside every node.
Answer: We installed Calico on the kubernetes cluster as the CNI plugin. We did not set the kube proxy's --cluster-cidr argument as we believed Calico would take care of creating the rules.
Upon running iptables-save on kubernetes nodes, it was found that no rule actually matched the pod cidr range, and hence packets were getting dropped by the default FORWARD DROP rule (this can be verified using iptables-save -c).
after setting kube-proxy's cluster-cidr argument, and restarting kube proxy on all the worker nodes, the IPtables rules were created as expected and services with externalTrafficPolicy: Cluster worked as expected.

How to solve: RPC: Port mapper failure - RPC: Unable to receive errno = Connection refused

I'm trying to set up a NFS server.
I have two programs server and client, I start the server which starts without errors, then I create a file with the client, the file is created correctly, but when I try to write something in that file I get the error:
call failed: RPC: Unable to receive; errno = Connection refused
And here is my rpcinfo -p output
# rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 662 status
100024 1 tcp 662 status
100005 1 udp 892 mountd
100005 1 tcp 892 mountd
100005 2 udp 892 mountd
100005 2 tcp 892 mountd
100005 3 udp 892 mountd
100005 3 tcp 892 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049 nfs_acl
100003 3 udp 2049 nfs
100227 3 udp 2049 nfs_acl
100021 1 udp 58383 nlockmgr
100021 3 udp 58383 nlockmgr
100021 4 udp 58383 nlockmgr
100021 1 tcp 39957 nlockmgr
100021 3 tcp 39957 nlockmgr
100021 4 tcp 39957 nlockmgr
536870913 1 udp 997
536870913 1 tcp 999
Please does anyone know how can I solve this problem ?
NOTE: I am using my laptop as server and client at the same time.
Make sure rpcbind is running. Also, it is a good idea to check if you can see the exports with "showmount -e ".

Modify ip addresses in a socks capture pcap

I have a pcap capture of socks traffic. The traffic goes like -
client_ip <-> 127.0.0.1:9050 <-> destination_ip
Looking at pcap in wireshark, thus, shows:
src_ip = 127.0.0.1
dst_ip = 127.0.0.1
Is it possible to change src_ip and dst_ip addresses?
I tried bittwiste as:
bittwiste -I in.pcap -O out.pcap -T ip -p 6 -s 127.0.0.1,1.2.3.4 -d
127.0.0.1,4.3.2.1
But, only first packet gets modified. All packets from 2nd onwards remain the same.
I also tried tcprewrite as:
tcprewrite --seed=325 --infile=in.pcap --outfile=out.pcap
This changes all src_ip & dst_ip (127.0.0.1) to the same random IP, since it seems to find only one (same) endpoint IP.
How can I modify src & dst ip addresses in a socks traffic capture.?
Thanks
TL;DR. The --endpoints option of tcprewrite is what you're looking for. It requires a cachefile from tcpprep:
$ tcpprep --port --pcap=in.pcap --cachefile=in.cache
$ tcprewrite --cachefile=in.cache --endpoints=1.2.3.4:4.3.2.1 --infile=in.pcap --outfile=out.pcap
$
$ tshark -r out.pcap
1 0.000000 1.2.3.4 → 4.3.2.1 TCP 74 49870 → 80 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=10438137 TSecr=0 WS=128
2 0.000030 4.3.2.1 → 1.2.3.4 TCP 74 80 → 49870 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=10438137 TSecr=10438137 WS=128
3 0.000051 1.2.3.4 → 4.3.2.1 TCP 66 49870 → 80 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=10438137 TSecr=10438137
4 0.000101 1.2.3.4 → 4.3.2.1 HTTP 139 GET / HTTP/1.1
5 0.000121 4.3.2.1 → 1.2.3.4 TCP 66 80 → 49870 [ACK] Seq=1 Ack=74 Win=43776 Len=0 TSval=10438137 TSecr=10438137
6 0.023045 4.3.2.1 → 1.2.3.4 HTTP 11642 HTTP/1.1 200 OK (text/html)
7 0.023094 1.2.3.4 → 4.3.2.1 TCP 66 49870 → 80 [ACK] Seq=74 Ack=11577 Win=174720 Len=0 TSval=10438143 TSecr=10438143
8 0.023517 1.2.3.4 → 4.3.2.1 TCP 66 49870 → 80 [FIN, ACK] Seq=74 Ack=11577 Win=174720 Len=0 TSval=10438143 TSecr=10438143
9 0.023547 4.3.2.1 → 1.2.3.4 TCP 66 80 → 49870 [FIN, ACK] Seq=11577 Ack=75 Win=43776 Len=0 TSval=10438143 TSecr=10438143
10 0.023560 1.2.3.4 → 4.3.2.1 TCP 66 49870 → 80 [ACK] Seq=75 Ack=11578 Win=174720 Len=0 TSval=10438143 TSecr=10438143
Explanations
According to the documentation for tcprewrite, --endpoints=ip1:ip2 rewrites all packets to appear to be between ip1 and ip2. However, this option requires the --cachefile option.
The tcpprep cache file is used to split traffic in two sides depending on ports, IP addresses, MAC addresses, etc. Here, according to the tcpprep wiki, we want to use the --port option.

What does the Recv-Q values in a Listen socket mean?

My program runs in trouble with a netstat output like bellow. It cannot receive a packet. What does the Recv-Q value in the first line mean? I see the man page, and do some googling, but no result found.
[root#(none) /data]# netstat -ntap | grep 8000
tcp 129 0 0.0.0.0:8000 0.0.0.0:* LISTEN 1526/XXXXX-
tcp 0 0 9.11.6.36:8000 9.11.6.37:48306 SYN_RECV -
tcp 0 0 9.11.6.36:8000 9.11.6.34:44936 SYN_RECV -
tcp 365 0 9.11.6.36:8000 9.11.6.37:58446 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:55018 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:42830 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:56344 CLOSE_WAIT -
tcp 0 364 9.11.6.34:38947 9.11.6.36:8000 FIN_WAIT1 -
tcp 364 0 9.11.6.36:8000 9.11.6.37:52406 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:53603 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:47522 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:48191 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:51813 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:57789 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.37:34252 CLOSE_WAIT -
tcp 364 0 9.11.6.36:8000 9.11.6.34:38930 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:44121 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:60465 CLOSE_WAIT -
tcp 365 0 9.11.6.36:8000 9.11.6.37:37461 CLOSE_WAIT -
tcp 0 362 9.11.6.34:35954 9.11.6.36:8000 FIN_WAIT1 -
tcp 364 0 9.11.6.36:8000 9.11.6.37:55241 CLOSE_WAIT -
P.S.
See also at https://groups.google.com/forum/#!topic/comp.os.linux.networking/PoP0YOOIj70
Recv-Q is the Receive Queue. It is the number of bytes that are currently in a receive buffer. Upon reading the socket, the bytes are removed from the buffer and put into application memory. If the Recv-Q number gets too high, packets will be dropped because there is no place to put them.
More info here netstat