What is the difference between Nagle algorithm and 'stop and wait'? - sockets

I saw the socket option TCP_NODELAY, which is used to turn on or off the Nagle alorithm.
I checked what the Nagle algorithm is, and it seems similar to 'stop and wait'.
Can someone give me a clear difference between these two concepts?

In a stop and wait protocol, one
sends a message to the peer
waits for an ack for that message
sends the next message
(i.e. one cannot send a new message until the previous one has been acknowledged)
Nagle's algorithem as used in TCP is orthoginal to this concept. When the TCP application sends some data, the protocol buffers the data and waits a little while to see if there's more data to be sent instead of sending data to the peer immediately.
If the application has more data to send in this small timeframe, the protocol stack merges that data into the current buffer and can send it as one large message.
This concept could very well be applied to a stop and go protocol as well. (Note that TCP is not a stop and wait protocol)

The Nagle Algorithm is used to control whether the socket provider sends outgoing data immediately as-is at the cost of less efficient network transmissions (off), or if it buffers outgoing data so it can make more efficient network transmissions at the cost of speed (on).
Stop and Wait is a mechanism used to ensure the integrity of transmitted data, by making the sender send a frame of data and then wait for an acknowledgement from the receiver before sending another frame, thus ensuring frames are received in the same order in which they are sent.
These two features operate independently of each other.

Related

Are TCP/IP Sockets Atomic?

It is my understanding that a write to a TCP/IP socket will be atomic if the amount of data written is small. By atomic, I mean that the receiver will receive all of the data or none of the data. However, it is not atomic, if the amount of the data written is large. Am I correct? and if so, what counts as large?
Thanks,
Bob
No. TCP is a byte-stream protocol. No messages, no datagram-like behaviour.
For UDP, that is true, because all data written by the app is sent out in one UDP datagram.
For TCP, that is not true, unless the application sends only 1 byte of data at a time. A write to a TCP socket will write all of the data to a buffer that is associated with that socket. TCP will then read data from that buffer in the background and send it to the receiver. How much data TCP actually sends in one TCP segment depends on variables of its flow control mechanisms, and other factors, including:
Receive Window published by the other node (receiver)
Amount of data sent in previous segments in flight that are not acknowledged yet
Slow start and congestion avoidance algorithm state
Negotiated maximum segment size (MSS)
In TCP, you can never assume what the application writes to a socket is actually received in one read by the receiver. Data in the socket's buffer can be sent to the receiver in one or many TCP segments. At any moment when data is made available, the receiver can perform a socket read and return with whatever data is actually available at that moment.
Of course, all sent data will eventually reach the receiver, if there is no failure in the middle preventing that, and if the receiver does not close the connection or stop reading before the data arrives.

Message delimitation in TCP communication

I am a newbie to networks and in particular TCP (I have been fooling a bit with UDP, but that's it).
I am developing a simple protocol based on exchanging messages between two endpoints. Those messages need to be certified, so I implemented a cryptographic layer that takes care of that. However, while UDP has a sound definition of a packet that constitutes the minimum unit that can get transferred at a time, the TCP protocol (as far as my understanding goes) is completely stream oriented.
Now, this puzzles me a bit. When exchanging messages, how can I tell where one starts and the other one ends? In principle, I can obviously communicate fixed length messages or first communicate the size of each message in some header. However, this can be subject to attacks: while of course it is going to be impossible to distort or determine the content of the communication, the above technique would make it easy to completely disrupt my communication just by adding a single byte in the middle.
Say that I need to transfer a message 1234567 bytes long. First of all, I communicate 4 bytes with an integer representing the size of the message. Okay. Then I start sending out the actual message. That message gets split in several packets, which get separately received. Now, an attacker just sends in an additional packet, faking it as if it was part of the conversation. It can just be one byte long: this completely destroys any synchronization mechanism I have implemented! The message has a spurious byte in the middle, and it doesn't successfully get decoded. Not only that, the last byte of the first message disrupts the alignment of the second message and so on: the connection is destroyed, and with a simple, simple attack! How likely and feasible is this attack anyway?
So I am wondering: what is the maximum data unit that can be transferred at once? I understand that to a call to send doesn't correspond a call to receive: the message can be split in different chunks. How can I group the packets together in some way so that I know that they get packed together? Is there a way to define an higher level message that gets reconstructed and aligned all together and triggers a single call to a receive-like function? If not, what other solutions can I find to keep my communication re-alignable even in presence of an attacker?
Basically it is difficult to control the way the OS divides the stream into TCP packets (The RFC defining TCP protocol states that TCP stack should allow the clients to force it to send buffered data by using push function, but it does not define how many packets this should generate. After all the attacker can modify any of them).
And these TCP packets can get divided even more into IP fragments during their way through the network (which can be opted-out by a 'Do not fragment' IP flag -- but this flag can cause that your packets are not delivered at all).
I think that your problem is not about introducing packets into a stream protocol, but about securing it.
IPSec could be very beneficial in your scenario, as it operates on the network layer.
It provides integrity for every packet sent, so any modification on-the-wire gets detected and the invalid packets are dropped. In case of TCP the dropped packets get re-transmitted automatically.
(Almost) everything is done automatically by the OS -- so yo do not need to worry about it (and make mistakes doing so).
The confidentiality can be assured as well (with the same advantage of not re-inventing the wheel).
IPSec should provide you a reliable transport protocol ontop of which you can use whatever framing format you like.
Another alternative is using SSL/TLS on top of TCP session which is less robust (as it does close the whole connection on integrity error).
Now, an attacker just sends in an additional packet, faking it as if it was part of the conversation. It can just be one byte long: this completely destroys any synchronization mechanism I have implemented!
Thwarting such an injection problem is dealt with by securing the stream. Create an encrypted stream and send your packets through that.
Of course the encrypted stream itself then has this problem; its messages can be corrupted. But those messages have secure integrity checks. The problem is detected, and the connection can be torn down and re-established to resynchronize it.
Also, some fixed-length synchronizing/framing bit sequence can be used between messages: some specific bit pattern. It doesn't matter if that pattern occurs inside messages by accident, because we only ever specifically look for that pattern when things go wrong (a corrupt message is received), otherwise we skip that sequence. If a corrupt message is received, we then receive bytes until we see the synchronizing pattern, and assume that whatever follows it is the start of a message (length followed by payload). If that fails, we repeat the process. When we receive a correct message, we reply to the peer, which will re-transmit anything we didn't get.
How likely and feasible is this attack anyway?
TCP connections are identified by four items: the source and destination IP, and source and destination port number. The attacker has to fake a packet which matches your stream in these four identifiers, and sneak that packet past all the routers and firewalls between that attacker and the receiving machine. The attacker also has to be in the right ballpark with regard to the TCP sequence number.
Basically, this is next to impossible for an attacker C to perpetrate against endpoints A and B which are both distant from C on the network. The fake source IP will be rejected long before C is able to reach its destination. It's more plausible as an inside job (which includes malware): C is close to A and B.

TCP connection for real time

I want to use a real time TCP connection, I have a streaming of data from server , and I receive it by a client, but this client is too slow to receive as fast as the sender is, so the server buffer the data until it's reach the destination, for example if I "produce" data at time t, and suppose that the client are 10 time slower, then the data produced at time t, will arrive at time 10t.
I want to make the server "drop" the data that can't reach the client at the present time, and send the new data which is expected to arrive at the time?
B.S : I know that UDP protocol do this, but I want to do this by TCP.
I've done this sort of thing in the past, and got reasonably good results. Here's how I did it:
1) On the sending side, use setsockopt(SOL_SOCKET, SO_SNDBUF) to make the server's TCP socket's send buffer as small as you can get away with (since you can't drop data once it's already in the socket's send buffer, you want to keep as little data there as possible)
2) On the sending side, never proactively send() any outgoing data into the socket at all. Instead, write a function (we'll call it DumpCurrentStateToBuffer()) that writes the "current state" bytes (that you want to send to the client) into an in-memory buffer.
3) When the client's socket select()'s (or poll()'s, or whatever mechanism you use) as ready-for-write, call DumpCurrentStateToBuffer() to create a memory-buffer of bytes that are to be sent to the client. Now send that data to the client (if you're using blocking I/O you can do it synchronously, at the cost of potentially stalling your server until the data can be sent; OTOH if you're using non-blocking I/O, you may need to keep the memory-buffer and your current sent-bytes index into the buffer around as state variables, so you can keep sending more sub-chunks of the memory buffer over time, whenever the socket indicates that it can receive more bytes)
4) Once the memory-buffer's contents have been fully sent, you can free the memory buffer, and then wait for the socket to select as ready-for-write again; when it does, goto (3).
This technique doesn't solve all of TCP's non-real-time issues; for example, a dropped TCP packet will still have to be resent to the client. What it does do is guarantee that the client-to-server data backlog will never be more than one or two "states" long, because you never generate any new data unless/until there is at least some room in the socket's output buffer.

Sending And Receiving Sockets (TCP/IP)

I know that it is possible that multiple packets would be stacked to the buffer to be read from and that a long packet might require a loop of multiple send attempts to be fully sent. But I have a question about packaging in these cases:
If I call recv (or any alternative (low-level) function) when there are multiple packets awaiting to be read, would it return them all stacked into my buffer or only one of them (or part of the first one if my buffer is insufficient)?
If I send a long packet which requires multiple iterations to be sent fully, does it count as a single packet or multiple packets? It's basically a question whether it marks that the package sent is not full?
These questions came to my mind when I thought about web sockets packaging. Special characters are used to mark the beginning and end of a packet which sorta leads to a conclusion that it's not possible to separate multiple packages.
P.S. All the questions are about TCP/IP but you are welcomed to share information (answers) about UDP as well.
TCP sockets are stream based. The order is guaranteed but the number of bytes you receive with each recv/read could be any chunk of the pending bytes from the sender. You can layer a message based transport on top of TCP by adding framing information to indicate the way that the payload should be chunked into messages. This is what WebSockets does. Each WebSocket message/frame starts with at least 2 bytes of header information which contains the length of the payload to follow. This allows the receiver to wait for and re-assemble complete messages.
For example, libraries/interfaces that implement the standard Websocket API or a similar API (such as a browser), the onmessage event will fire once for each message received and the data attribute of the event will contain the entire message.
Note that in the older Hixie version of WebSockets, each frame was started with '\x00' and terminated with '\xff'. The current standardized IETF 6455 (HyBi) version of the protocol uses the header information that contains the length which allows much easier processing of the frames (but note that both the old and new are still message based and have basically the same API).
TCP connection provides for stream of bytes, so treat it as such. No application message boundaries are preserved - one send can correspond to multiple receives and the other way around. You need loops on both sides.
UDP, on the other hand, is datagram (i.e. message) based. Here one read will always dequeue single datagram (unless you mess with low-level flags on the socket). Event if your application buffer is smaller then the pending datagram and you read only a part of it, the rest of it is lost. The way around it is to limit the size of datagrams you send to something bellow the normal MTU of 1500 (less IP and UDP headers, so actually 1472).

TCP Socket Transfer

A while back i had a question about why my socket sometimes received only 653 octets ( for example ) when i sent 1024 octets and thanks to Rakis i understood: The OS allows reception to occur in arbitrarily sized chunks.
This time i need a confirmation :)
On any OS ( Well GNU/Linux and Windows at least ), In any Language ( I'm using Python here ), if i send a packet of a random number of bytes, can be 2 bytes, can be 12000 bytes, let's say X, when i write socket.send(X), am i absolutely guaranteed that X will be FULLY received ( regardless of any chunks the receiving OS divides it into ) on the other end of the socket BEFORE i do another socket.send(any string) ?
Or in other words if i have the code :
socket.send(X)
socket.send(Y)
Even if X > MTU so it will be obliged to send multiple packets, does it wait until every packet is sent and acknowledged by the endpoint of the socket before sending Y ? Well writing that makes me believe that the answer is yes it is guaranteed and that this is exactly the purpose of setting a socket in blocking mode but i want to be sure :D
Thanks in advance,
Nolhian
You are guaranteed that X will be received (at the application level) before Y, if it's a stream socket. If it's a datagram socket, no guarantees.
Depending on the networking implementation, it's possible that at a lower level, X will be sent, lost in transmission, then Y will be sent, then X will be re-sent because no acknowledgement was received.
Even in blocking mode, the socket.send(Y) can execute before X even makes it "onto the wire", because the OS will buffer network traffic.
No, you can't.
All you know is that the client will receive the data in order, assuming it does receive it all. There's no way of knowing (at the application level) whether the client has received all the data without having some sort of "ACK" at the application level protocol.
am i absolutely guaranteed that X will be FULLY received ( regardless of any chunks the receiving OS divides it into ) on the other end of the socket BEFORE i do another socket.send(any string) ?
No. In general, more data may be sent without waiting for the receiving side, within certain limits:
on the sending side, you will have a maximum amount of data you can enqueue for transmission until the client has acknowledged some receipt (but typically the client's OS will acknowledge and buffer quite a lot before it refuses further data until the application has processed some), after which the sending socket may start blocking
forces the application design to consider how to enqueue and buffer excessive amounts of data, rather than having naively written applications utilise excessive amounts of Operating System-provided buffer memory
reduces retransmission rates when the receiving side is flooded with data too fast to process it
avoids sending huge amounts of data despite the network connection having been lost
So, strictly speaking and for large transmissions, the sender should be designed to handle sockets blocked from further sends (either knowing it is ok to block in the attempt (perhaps due to a dedicated sending thread) or waiting until it is possible to send more via non-blocking sockets or select/poll).
Whatever retransmission and buffering may be required, what you CAN be sure of is that the receiving side will have to read all of "X" before it starts being given the subsequently sent data "Y" (unless it specifically asks to have it otherwise, e.g. Out Of Band data).
Depending on the type of Sockets that you use, you can, in some cases, have a guarantee that data will be received, but not a feedback or a confirmation when it actually was.
Back to your question:
does it wait until every packet is sent and acknowledged by the endpoint of the socket before sending Y
So, you could say:
YES it does wait until it is sent, and
NO it does not wait for acknowledgment
A suggestion:
Since there are no auto-magic/built-in confirmations that your data was received, you could fairly easily implement your own logic for ACKnowledging the package was received, which would basically come down to your custom communication protocol.