How does packet interaction with TCP Selective Acknowledgement work? - sockets

Can anybody explain how the packet interaction with TCP Selective Acknowledgment works?
I found the definition on Wikipedia, but I cannot get a clear picture what Selective Acknowledgment really does compared to Positive Acknowledgment and Negative Acknowledgment.

TCP breaks the information it sends into segments... essentially segments are chunks of data no larger than the current value of the TCP MSS (maximum segment size) received from the other end. Those chunks have incrementing sequence numbers (based on total data byte counts sent in the TCP session) that allow TCP to know when something got lost in-flight; the first TCP sequence number is chosen at random, and for security-purposes it should not be a pseudo-random number. Most of the time, the MTU of your local ethernet is smaller than the MSS, so they could send multiple segments to you before you can ACK.
It is helpful to think of these things in the time sequence they got standardized...
First came Positive Acknowledgement, which is the mechanism of telling the sender you got the data, and the sequence number you ACK with is the maximum byte-sequence received per TCP chunk (a.k.a segment) he sent.
I will demonstrate below, but in my examples you will see small TCP segment numbers like 1,2,3,4,5... in reality these byte-sequence numbers will be large, incrementing, and have gaps between them (but that's normal... TCP typically sends data in chunks at least 500 bytes long).
So, let's suppose the sender xmits segment numbers 1,2,3,4,5 before send your first ACK. If all goes well, you send an ACK for 1,2,3,4,5 and life is good. If 2 gets lost, everything is on hold till the sender realizes that 2 has never been ACK'd; he knows because you send duplicate ACKs for 1. Upon the proper timeout, the sender xmits 2,3,4,5 again.
Then Selective Acknowledgement was proposed as a way to make this more efficient. In the same example, you ACK 1, and SACK segments 3 through 5 along with it (if you use a sniffer, you'll see something like "ACK:1, SACK:3-5" for the ACK packets from you). This way, the sender knows it only has to retransmit TCP segment 2... so life is better. Also, note that the SACK defined the edges of the contiguous data you have received; however, multiple non-contiguous data segments can be SACK'd at the same time.
Negative Acknowledgement is the mechanism of telling the sender only about missing data. If you don't tell them something is missing, they keep sending the data 'till you cry uncle.
HTH, \m

Related

Does TCP ensure packet is received by sequence that server send it

I'm working on an gameServer that communicate with game client, but wonder whether the packet server send to client remain sequence when client received it ?
like server sends packets A,B,C
but the client received B,A,C ?
I have read the great blog http://packetlife.net/blog/2010/jun/7/understanding-tcp-sequence-acknowledgment-numbers/
It seems that every packet send by the server has an ack corresponding by client, but it does not say why the packet received by client has the same sequence with server
It's worth reading TCP's RFC, particularly section 1.5 (Operation), which explains the process. In part, it says:
The TCP must recover from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. This is achieved by assigning a sequence number to each octet transmitted, and requiring a positive acknowledgment (ACK) from the receiving TCP. If the ACK is not received within a timeout interval, the data is retransmitted. At the receiver, the sequence numbers are used to correctly order segments that may be received out of order and to eliminate duplicates. Damage is handled by adding a checksum to each segment transmitted, checking it at the receiver, and discarding damaged segments.
I don't see where it's ever made explicit, but since the acknowledgement (as described in section 2.6) describes the next expected packet, the receiving TCP implementation is only ever acknowledging consecutive sequences of packets from the beginning. That is, if you never receive the first packet, you never send an acknowledgement, even if you've received all other packets in the message; if you've received 1, 2, 3, 5, and 6, you only acknowledge 1-3.
For completeness, I'd also direct your attention to section 2.6, again, after it describes the above-quoted section in more detail:
An acknowledgment by TCP does not guarantee that the data has been delivered to the end user, but only that the receiving TCP has taken the responsibility to do so.
So, TCP ensures the order of packets, unless the application doesn't receive them. That exception probably wouldn't be common, except for cases where the application is unavailable, but it does mean that an application shouldn't assume that a successful send is equivalent to a successful reception. It probably is, for a variety of reasons, but it's explicitly outside of the protocol's scope.
TCP guarantees sequence and integrity of the byte stream. You will not receive data out of sequence. From RFC 793:
Reliable Communication: A stream of data sent on a TCP connection is delivered reliably and in
order at the destination.

how can I transfer large data over tcp socket

how can I transfer large data without splitting. Am using tcp socket. Its for a game. I cant use udp and there might be 1200 values in an array. Am sending array in json format. But the server receiving it like splitted.
Also is there any option to send http request like tcp? I need the response in order. Also it should be faster.
Thanks,
You can't.
HTTP may chunk it
TCP will segment it
IP will packetize it
routers will fragment it ...
and TCP will reassemble it all at the other end.
There isn't a problem here to solve.
You do not have much control over splitting packets/datagrams. The network decides about this.
In the case of IP, you have the DF (don't fragment) flag, but I doubt it will be of much help here. If you are communicating over Ethernet, then 1200 element array may not fit into an Ethernet frame (payload size is up to the MTU of 1500 octets).
Why does your application depend on the fact that the whole data must arrive in a single unit, and not in a single connection (comprised potentially of multiple units)?
how can I transfer large data without splitting.
I'm interpreting the above to be roughly equivalent to "how can I transfer my data across a TCP connection using as few TCP packets as possible". As others have noted, there is no way to guarantee that your data will be placed into a single TCP packet -- but you can do some things to make it more likely. Here are some things I would do:
Keep a single TCP connection open. (HTTP traditionally opens a separate TCP connection for each request, but for low-latency you can't afford to do that. Instead you need to open a single TCP connection, keep it open, and continue sending/receiving data on it for as long as necessary).
Reduce the amount of data you need to send. (i.e. are there things that you are sending that the receiving program already knows? If so, don't send them)
Reduce the number of bytes you need to send. (The easiest way to do this is to zlib-compress your message-data before you send it, and have the receiving program decompress the message after receiving it. This can give you a size-reduction of 50-90%, depending on the content of your data)
Turn off Nagle's algorithm on your TCP socket. That will reduce latency by 200mS and discourage the TCP stack from playing unnecessary games with your data.
Send each data packet with a single send() call (if that means manually copying all of the data items into a separate memory buffer before calling send(), then so be it).
Note that even after you do all of the above, the TCP layer will still sometimes spread your messages across multiple packets, etc -- that's just the way TCP works. And even if your local TCP stack never did that, the receiving computer's TCP stack would still sometimes merge the data from consecutive TCP packets together inside its receive buffer. So the receiving program is always going to "receive it like splitted" sometimes, because TCP is a stream-based protocol and does not maintain message boundaries. (If you want message boundaries, you'll have to do your own framing -- the easiest way is usually to send a fixed-size (e.g. 1, 2, or 4-byte) integer byte-count field before each message, so the receiver knows how many bytes it needs to read in before it has a full message to parse)
Consider the idea that the issue may be else where or that you may be sending too much unnecessary data. In example with PHP there is the isset() function. If you're creating an internet based turn based game you don't (need to send all 1,200 variables back and forth every single time. Just send what changed and when the other player receives that data only change the variables are are set.

How bad is ip fragmentation

I understand that when sending ip messages around, each hop in the network path between be and my packet's destination will check if the next hop's MTU is bigger than the size of the packet I sent. If so, the packet will be fragmented and the two packets will be separately sent to the next hop, only to be reassembled at destination (or, in some cases, at the first NAT router encountered).
As far as I understand, this thing can be pretty bad, but I don't really understand why.
I understand that if the connection tends to drop a lot of packets, losing a single fragment means I have to resend the whole packet (this is actually the only thing I figured out myself)
Is there a chance that instead of being fragmented my packet will just be dropped?
How are packet fragments identified? Can I be 100% sure that they will be reassembled correctly? On example, if I send two ip packets of the same length nearly simultaneously to the same destination, how likely it is that fragments of the two will be swaped, like AAA, BBB reassembled into ABA, BAB?
In principle, if packets aren't dropped and fragments are reassembled correctly, actually using packet fragmentation seems like a good idea to save on local bandwidth and avoid having to send more and more headers instead of just one big packet.
Thank you
IP fragmentation can cause several problems:
1) Application layer loss is increased
As you mentioned, if a single fragment is dropped, the entire layer 4 packet will be lost. Thus, for a network with a small random packet loss rate, the application layer loss rate is increased by a factor approximately equal to the number of fragments for each layer 4 packet.
2) Not all networks handle fragmented packets
Some systems, such as Google's Compute Engine, do not reassemble fragmented packets.
3) Fragmentation can cause re-ordering
When routers split traffic down parallel paths, they may try to keep packets from the same flow on a single path. Because only the first fragment has layer 4 information like UDP/TCP port number, subsequent fragments may be routed down a different path, delaying assembly of the layer 4 packet and causing re-ordering.
4) Fragmentation can cause confusing behavior that is hard to debug
For example, if you send two UDP streams, A and B, from one source to a destination running Linux, the destination may discard packets from one of the streams. This is because by default, Linux "times out" fragment queues if more than 64 other fragments have been received from the same source. If stream A has a much higher data rate than stream B, 64 fragments from stream A may arrive in between the fragments from stream B, causing the B fragment to be dropped.
Thus, while IP fragmentation can reduce overhead by minimizing user headers, it may cause more trouble than it is worth.
To my knowledge, the only case where packets will be dropped rather than fragmented (barring cases where it would be dropped anyway), is packets which are marked "don't fragment". These packets are to be discarded rather than being fragmented.
Fragmented packets have identifier, fragment offset, and more fragments fields in their headers that, when combined, allow the destination host to reliably reassemble the packet upon receipt of all the fragments. The first fragment's offset is zero, and the last fragment has the more fragments flag set to zero. It is still possible (although very unlikely) to reassemble an incorrect packet if two packets' headers are mutated so their fragment offsets are exchanged, but their checksums are still valid. The probability of this happening is essentially zero. Bear in mind that IP does not provide any mechanism for ensuring the integrity of the data payload, only the integrity of the control information in the header.
Packet fragmentation necessarily wastes bandwidth because each fragment has a copy of [most of] the original datagram's header. Packets can be fragmented down to only 8 bytes per fragment, so we could have a maximum-sized packet at 60 + 65536 bytes fragmented into 60 * 8192 + 65536 bytes, yielding a payload increase of about 750% in the worst case. The only example I can come up with where you would come out ahead is if you fragmented a packet in order to send its fragments in parallel using some kind of Frequency Division Multiplexing scheme with the knowledge that the other channels are free. At that point, it still seems like it would require more work than would be saved to detect that circumstance and divide the packet rather than just sending it.
All the basic details about the mechanics of packet fragmentation in IP can be found in IETF RFC 791, if you're hungry for more information.

How will TCP protocol delay packets transferring when one of packets is dropped?

If client socket sends:
Packet A - dropped
Packet B
Packet C
Will server socket receive and queue B and C and then when A is received B and C will be passed to the server application immediately? Or B and C will be resent too? Or no packets will be sent at all until A is delivered?
TCP is a sophisticated protocol that changes many parameters depending on the current network state, there are whole books written about the subject. The clearest way to answer your question is to say that TCP generally maintains a given send 'window' size in bytes. This is the amount of data that will be sent until previously sent acknowledgments are successfully returned.
In older TCP specifications a dropped packet within that window would result in a complete resend of data from the dropped packet onwards. To solve this problem as it's obviously a little wasteful, TCP now employs a selective acknowledgment (SACK) option (RFC 2018). This would result in just the lost/corrupted packet being resent.
Back to your example, assuming the window size is large enough to encompass all three packets, and providing you are taking advantage of the latest TCP standard (don't see why you wouldn't), if packet A were dropped only packet A would be resent. If all packets are individually larger than the window then the packets must be sent and acknowledged sequentially.
It depends on the latencies. In general, first A is resent. If the client gets it and already has B and C, it can acknowledge them as well.
If this happens fast enough, B and C won't be resent, or maybe only B.

Raw Socket Receive Buffer

We are currently testing a Telecom application over IP. We open a Raw Socket and receives messages from the remote side (msgrate#750+msgs/second approx size of 180 bytes excluding IP).
On top of the Raw socket sits a layer called SCTP (just like TCP) which is indicating every now and then that it is missing some packets. Now, we are running Wireshark on the receive node and we can see that packet in Wireshark.
It looks to me that the receive buffer of the socket is small causing IP(?) to drop messages. However, IP Pegs(netstat -sv) show NO dropped packets. We have tried setting the socket receive queue to 40000 without any success.
I would appreciate any pointers as to what option, if any, of IP layer should we be configuring or is there any specific socket option that we need to set.
Thanks for your inputs. However, we have been able to "solve" this problem.
Earlier, I described how we read messages.
Once select returns, we run a loop (to the tune of number of Raw Messages to read which was >1 in our case).
1) we call ioctl(FIONREAD) to find the number of bytes to read;
2) read that many bytes by calling recvfrom
3) send the bytes upto the user
4) go into loop again and call ioctl(FIONREAD) and then repeat the steps
However, at point 4, ioctl(FIONREAD) use to return 0. Our code had a defensive check. It was expecting, a 0 bytes from ioctl(FIONREAD) means that the sender has send an IP header with 0 payload. Therefore, it use to call recvfrom(bytes to read=0) to flush out the IP header lest the select will set again on this.
At time t0, ioctl(FIONREAD) returns 0 as number of bytes to read
At time t1, recvfrom(bytes to read=0) is called.
Sometimes, between t0 and t1, actual data use to get queued in the socket receive queue and use to get discarded as we were calling recvFrom(bytes=0).
Setting, the number of rawMsgsToRead=1 has "solved" this problem. However, my guess is it will impact our performance. Is their any ioctl call which can differentiate between octets in the queue as 0 and IP header with payload 0
I have a few questions and a few things to think about.
1) Which implementation of SCTP are you using and on which OS. Some SCTP implementations are more robust than others.
2) Is SCTP negatively acknowledging the dropped packets? Search for a gap acks in wireshark.
3) And where you see the dropped packets in wireshark are you sure that these are not retransmissions?
4) Where in the system is wireshark monitoring? If it is not on the same wire as your application then it may be seeing messages which your application doesn't.
5) What exactly is the indication SCTP is giving?
If you believe that the IP socket rx buffer is overflowing then you could consider reducing the size of the SCTP RX window; this is often configurable in sctp stacks. The Rx window limits the amount of data that can be outstanding waiting for acknowledgement and consequently restricts the amount of data which could be in the IP buffer.
You could also try raising the priority of your SCTP task so that it more quickly reads messages out of the IP buffers (This may be the easiest thing to try and in my opinion a good thing to do).
Regards