Why does SCTP not require a TIME_WAIT state? - sockets

I am reading "UNIX Network Programming: The Sockets API" and it mentions that SCTP does not require a TIME_WAIT state as TCP does due to its use of verification tags. Why is this the case? I understand why verification tags fix the issue with duplicate packets, since the receiver can determine whether a packet is part of the current SCTP association or not, but surely the final SCTP SHUTDOWN-COMPLETE packet can be lost just as the final ACK in TCP can be lost, so the peer performing the active close still has to maintain some sort of state to handle this event just as with TCP.

There is no need to maintain state information in this case. RFC 4960 defines a sort of default handling for unknown (out of the blue) packets.
Let’s say, you have two sides in your association: side A and side B. v1/v2 are verification tags used by these sides.
Side A initiated shutdown.
`
A B
Shutdown(v1)
-------------------->
Shutdown_ack(v2)
<--------------------
Shutdown_complete(v1)
-------------------->
`
When side A sends SHUTDOWN COMPLETE it deallocates all the resources used by this association. As far as side A concerns the association is gone.
If for some reasons SHUTDOWN COMPLETE chunk has been lost, side B will re-transmit SHUTDOWN ACK chunk after t2 timer (RFC 4960 term) expiration.
When side A receives this retransmitted SHUTDOWN ACK chunk, it will not be able to determinate to which association it belongs, because that association has already been closed. So, side A will treat this packet as “out of the blue”. RFC 4960 chapter 8.4 describes how to handle out of the blue packet, bullet #5 describes how to handle “out of the blue” SHUTDOWN ACK.
In this case side A will reply with SHUTDOWN COMPLETE. However, packet that carries SHUTDOWN COMPLETE chunk will be slightly different from the original one. The new packet will have t-bit set to 1 and contain what is called reflected verification tag (which is just verification tag from the packet that contained SHUTDOWN ACK).
A B Shutdown(v1) --------------------> Shutdown_ack(v2) <-------------------- Shutdown_complete(v1) -------LOST-------- Shutdown_ack(v2) <-------------------- Shutdown_complete(v2), t-bit=1 -------------------->
Side B knows how to handle packet with t-bit set to 1 and process SHUTDOWN COMPLETE.
As far as you can see, side A does not keep any state information once it sent SHUTDOWN COMPLETE. If any packets belong to this association arrives after that, they will be treated as “out of the blue”.

Related

What is the difference between Nagle algorithm and 'stop and wait'?

I saw the socket option TCP_NODELAY, which is used to turn on or off the Nagle alorithm.
I checked what the Nagle algorithm is, and it seems similar to 'stop and wait'.
Can someone give me a clear difference between these two concepts?
In a stop and wait protocol, one
sends a message to the peer
waits for an ack for that message
sends the next message
(i.e. one cannot send a new message until the previous one has been acknowledged)
Nagle's algorithem as used in TCP is orthoginal to this concept. When the TCP application sends some data, the protocol buffers the data and waits a little while to see if there's more data to be sent instead of sending data to the peer immediately.
If the application has more data to send in this small timeframe, the protocol stack merges that data into the current buffer and can send it as one large message.
This concept could very well be applied to a stop and go protocol as well. (Note that TCP is not a stop and wait protocol)
The Nagle Algorithm is used to control whether the socket provider sends outgoing data immediately as-is at the cost of less efficient network transmissions (off), or if it buffers outgoing data so it can make more efficient network transmissions at the cost of speed (on).
Stop and Wait is a mechanism used to ensure the integrity of transmitted data, by making the sender send a frame of data and then wait for an acknowledgement from the receiver before sending another frame, thus ensuring frames are received in the same order in which they are sent.
These two features operate independently of each other.

How TCP handles Packet Loss - Implications in physical Layer

I am doing some exam revision and i have a question regarding TCP/IP. I am OK with the first part on how TCP handles packet loss but not sure about the second part where the response is incorrect in the physical layer.
a) An assumption inherent in the TCP protocol is that lost acknowledgements(ACKs) are caused by congestion in the network. Explain how TCP responds to lost acknowledgements and discuss why this response is incorrect when the physical layer is implemented as a wireless carrier. Briefly discuss the consequences of this situation.
Lost TCP acknowledgements end up being retransmitted when the acknowledgement timer in the sender elapses. This causes the sender to retransmit the data, resulting in the receiver generating another ACK.
In other words, since the ACK doesn't arrive from the recipient, it is the sender who initiates the retransmission, assuming that the data must not have arrived since there is no ACK within the retransmission window.

Does TCP ensure packet is received by sequence that server send it

I'm working on an gameServer that communicate with game client, but wonder whether the packet server send to client remain sequence when client received it ?
like server sends packets A,B,C
but the client received B,A,C ?
I have read the great blog http://packetlife.net/blog/2010/jun/7/understanding-tcp-sequence-acknowledgment-numbers/
It seems that every packet send by the server has an ack corresponding by client, but it does not say why the packet received by client has the same sequence with server
It's worth reading TCP's RFC, particularly section 1.5 (Operation), which explains the process. In part, it says:
The TCP must recover from data that is damaged, lost, duplicated, or delivered out of order by the internet communication system. This is achieved by assigning a sequence number to each octet transmitted, and requiring a positive acknowledgment (ACK) from the receiving TCP. If the ACK is not received within a timeout interval, the data is retransmitted. At the receiver, the sequence numbers are used to correctly order segments that may be received out of order and to eliminate duplicates. Damage is handled by adding a checksum to each segment transmitted, checking it at the receiver, and discarding damaged segments.
I don't see where it's ever made explicit, but since the acknowledgement (as described in section 2.6) describes the next expected packet, the receiving TCP implementation is only ever acknowledging consecutive sequences of packets from the beginning. That is, if you never receive the first packet, you never send an acknowledgement, even if you've received all other packets in the message; if you've received 1, 2, 3, 5, and 6, you only acknowledge 1-3.
For completeness, I'd also direct your attention to section 2.6, again, after it describes the above-quoted section in more detail:
An acknowledgment by TCP does not guarantee that the data has been delivered to the end user, but only that the receiving TCP has taken the responsibility to do so.
So, TCP ensures the order of packets, unless the application doesn't receive them. That exception probably wouldn't be common, except for cases where the application is unavailable, but it does mean that an application shouldn't assume that a successful send is equivalent to a successful reception. It probably is, for a variety of reasons, but it's explicitly outside of the protocol's scope.
TCP guarantees sequence and integrity of the byte stream. You will not receive data out of sequence. From RFC 793:
Reliable Communication: A stream of data sent on a TCP connection is delivered reliably and in
order at the destination.

Message delimitation in TCP communication

I am a newbie to networks and in particular TCP (I have been fooling a bit with UDP, but that's it).
I am developing a simple protocol based on exchanging messages between two endpoints. Those messages need to be certified, so I implemented a cryptographic layer that takes care of that. However, while UDP has a sound definition of a packet that constitutes the minimum unit that can get transferred at a time, the TCP protocol (as far as my understanding goes) is completely stream oriented.
Now, this puzzles me a bit. When exchanging messages, how can I tell where one starts and the other one ends? In principle, I can obviously communicate fixed length messages or first communicate the size of each message in some header. However, this can be subject to attacks: while of course it is going to be impossible to distort or determine the content of the communication, the above technique would make it easy to completely disrupt my communication just by adding a single byte in the middle.
Say that I need to transfer a message 1234567 bytes long. First of all, I communicate 4 bytes with an integer representing the size of the message. Okay. Then I start sending out the actual message. That message gets split in several packets, which get separately received. Now, an attacker just sends in an additional packet, faking it as if it was part of the conversation. It can just be one byte long: this completely destroys any synchronization mechanism I have implemented! The message has a spurious byte in the middle, and it doesn't successfully get decoded. Not only that, the last byte of the first message disrupts the alignment of the second message and so on: the connection is destroyed, and with a simple, simple attack! How likely and feasible is this attack anyway?
So I am wondering: what is the maximum data unit that can be transferred at once? I understand that to a call to send doesn't correspond a call to receive: the message can be split in different chunks. How can I group the packets together in some way so that I know that they get packed together? Is there a way to define an higher level message that gets reconstructed and aligned all together and triggers a single call to a receive-like function? If not, what other solutions can I find to keep my communication re-alignable even in presence of an attacker?
Basically it is difficult to control the way the OS divides the stream into TCP packets (The RFC defining TCP protocol states that TCP stack should allow the clients to force it to send buffered data by using push function, but it does not define how many packets this should generate. After all the attacker can modify any of them).
And these TCP packets can get divided even more into IP fragments during their way through the network (which can be opted-out by a 'Do not fragment' IP flag -- but this flag can cause that your packets are not delivered at all).
I think that your problem is not about introducing packets into a stream protocol, but about securing it.
IPSec could be very beneficial in your scenario, as it operates on the network layer.
It provides integrity for every packet sent, so any modification on-the-wire gets detected and the invalid packets are dropped. In case of TCP the dropped packets get re-transmitted automatically.
(Almost) everything is done automatically by the OS -- so yo do not need to worry about it (and make mistakes doing so).
The confidentiality can be assured as well (with the same advantage of not re-inventing the wheel).
IPSec should provide you a reliable transport protocol ontop of which you can use whatever framing format you like.
Another alternative is using SSL/TLS on top of TCP session which is less robust (as it does close the whole connection on integrity error).
Now, an attacker just sends in an additional packet, faking it as if it was part of the conversation. It can just be one byte long: this completely destroys any synchronization mechanism I have implemented!
Thwarting such an injection problem is dealt with by securing the stream. Create an encrypted stream and send your packets through that.
Of course the encrypted stream itself then has this problem; its messages can be corrupted. But those messages have secure integrity checks. The problem is detected, and the connection can be torn down and re-established to resynchronize it.
Also, some fixed-length synchronizing/framing bit sequence can be used between messages: some specific bit pattern. It doesn't matter if that pattern occurs inside messages by accident, because we only ever specifically look for that pattern when things go wrong (a corrupt message is received), otherwise we skip that sequence. If a corrupt message is received, we then receive bytes until we see the synchronizing pattern, and assume that whatever follows it is the start of a message (length followed by payload). If that fails, we repeat the process. When we receive a correct message, we reply to the peer, which will re-transmit anything we didn't get.
How likely and feasible is this attack anyway?
TCP connections are identified by four items: the source and destination IP, and source and destination port number. The attacker has to fake a packet which matches your stream in these four identifiers, and sneak that packet past all the routers and firewalls between that attacker and the receiving machine. The attacker also has to be in the right ballpark with regard to the TCP sequence number.
Basically, this is next to impossible for an attacker C to perpetrate against endpoints A and B which are both distant from C on the network. The fake source IP will be rejected long before C is able to reach its destination. It's more plausible as an inside job (which includes malware): C is close to A and B.

How will TCP protocol delay packets transferring when one of packets is dropped?

If client socket sends:
Packet A - dropped
Packet B
Packet C
Will server socket receive and queue B and C and then when A is received B and C will be passed to the server application immediately? Or B and C will be resent too? Or no packets will be sent at all until A is delivered?
TCP is a sophisticated protocol that changes many parameters depending on the current network state, there are whole books written about the subject. The clearest way to answer your question is to say that TCP generally maintains a given send 'window' size in bytes. This is the amount of data that will be sent until previously sent acknowledgments are successfully returned.
In older TCP specifications a dropped packet within that window would result in a complete resend of data from the dropped packet onwards. To solve this problem as it's obviously a little wasteful, TCP now employs a selective acknowledgment (SACK) option (RFC 2018). This would result in just the lost/corrupted packet being resent.
Back to your example, assuming the window size is large enough to encompass all three packets, and providing you are taking advantage of the latest TCP standard (don't see why you wouldn't), if packet A were dropped only packet A would be resent. If all packets are individually larger than the window then the packets must be sent and acknowledged sequentially.
It depends on the latencies. In general, first A is resent. If the client gets it and already has B and C, it can acknowledge them as well.
If this happens fast enough, B and C won't be resent, or maybe only B.