Is send() function in TCP Guaranteed to arrive in order? - sockets

It is known that the return value of send() can be less than length, and it means a part of message, not whole, has arrived. I'm supposed to send 2 packets whose contents are "ABC" and "DEF" respectively, and their length is 3. I want to send "DEF" by send() after send() was called to transfer "ABC". However, there is a case that the return value of send() for "ABC" is less than its length, 3. I think there is opportunity that messages are not delivered in order. For example, if the return value for "ABC" is 2, received message is "ABDEF".
Is send() function in TCP Guaranteed to arrive in order?

First of all, send() itself doesn't guarantee anything, send() only writes the data you want to send over the network to the socket's buffer. There it's segmented (placed in TCP segments) by the operating system, which manages the reliability of the transmission. If the underlying buffer is full, then you'll get a return value that's lower that the number of bytes you wanted to write. This usually indicates that the buffer is not being emptied by the operating system fast enough, i.e. the rate at which you write data to the buffer is higher than the rate at which the data is being sent to the network (or read by the other party).
Second, TCP is a stream protocol, if you send() "ABC" and then "DEF", there's no guarantee about how that data will be segmented, it might end up in one packet, or in six packets. Exactly like writing data to a file.
Third, the network stack (TCP/IP implementation in the OS) guarantees in-order delivery, as well as the other nice things promised by TCP - reliability, congestion control, flow control, etc.

Related

What happens when a process tries to read more bytes than the one that sent it

If Two processes communicate via sockets and Process A sends Process B 100 bytes.
Process B tries to read 150 bytes. Later Process A sends 50 bytes.
What is the result of Process B's read?
Will the process B read wait until it receives 150 bytes?
That is dependent on many factors, especially related to the type of socket, but also to the timing.
Generally, however, the receive buffer size is considered a maximum. So, if a process executes a recv with a buffer size of 150, but the operating system has only received 100 bytes so far from the peer socket, usually the available 100 are delivered to the receiving process (and the return value of the system call will reflect that). It is the responsibility of the receiving application to go back and execute recv again if it is expecting more data.
Another related factor (which will not generally be the case with a short transfer like 150 bytes but definitely will if you're sending a megabyte, say) is that the sender's apparently "atomic" send of 1000000 bytes will not all be delivered in one packet to the receiving peer, so if the receiver has a corresponding recv with a 1000000 byte buffer, it's very unlikely that all the data will be received in one call. Again, it's the receiver's responsibility to continue calling recv until it has received all the data sent.
And it's generally the responsibility of the sender and receiver to somehow coordinate what the expected size is. One common way to do so is by including a fixed-length header at the beginning of each logical transmission telling the receiver how many bytes are to be expected.
Depends on what kind of socket it is. For a STREAM socket, the read will return either the amount of data currently available or the amount requested (whichever is less) and will only ever block (wait) if there is no data available.
So in this example, assuming the 100 bytes have (all) been transmitted and received into the receive buffer when B reads from the socket and the additional 50 bytes have not yet been transmitted, the read will return those 100 bytes and will not wait.
Note also, the dependency of all the data being transmitted and received -- when process A writes data to a socket it will not necessarily be sent immediately or all at once. Depending on the underlying transport, there's an MTU size and any write larger than that will be broken up. Smaller writes may also be delayed and combined with later writes to make up the MTU. So in your case the send of 100 bytes might be too large (and broken up), or might be too small and not be transmitted immediately.

TCP connection for real time

I want to use a real time TCP connection, I have a streaming of data from server , and I receive it by a client, but this client is too slow to receive as fast as the sender is, so the server buffer the data until it's reach the destination, for example if I "produce" data at time t, and suppose that the client are 10 time slower, then the data produced at time t, will arrive at time 10t.
I want to make the server "drop" the data that can't reach the client at the present time, and send the new data which is expected to arrive at the time?
B.S : I know that UDP protocol do this, but I want to do this by TCP.
I've done this sort of thing in the past, and got reasonably good results. Here's how I did it:
1) On the sending side, use setsockopt(SOL_SOCKET, SO_SNDBUF) to make the server's TCP socket's send buffer as small as you can get away with (since you can't drop data once it's already in the socket's send buffer, you want to keep as little data there as possible)
2) On the sending side, never proactively send() any outgoing data into the socket at all. Instead, write a function (we'll call it DumpCurrentStateToBuffer()) that writes the "current state" bytes (that you want to send to the client) into an in-memory buffer.
3) When the client's socket select()'s (or poll()'s, or whatever mechanism you use) as ready-for-write, call DumpCurrentStateToBuffer() to create a memory-buffer of bytes that are to be sent to the client. Now send that data to the client (if you're using blocking I/O you can do it synchronously, at the cost of potentially stalling your server until the data can be sent; OTOH if you're using non-blocking I/O, you may need to keep the memory-buffer and your current sent-bytes index into the buffer around as state variables, so you can keep sending more sub-chunks of the memory buffer over time, whenever the socket indicates that it can receive more bytes)
4) Once the memory-buffer's contents have been fully sent, you can free the memory buffer, and then wait for the socket to select as ready-for-write again; when it does, goto (3).
This technique doesn't solve all of TCP's non-real-time issues; for example, a dropped TCP packet will still have to be resent to the client. What it does do is guarantee that the client-to-server data backlog will never be more than one or two "states" long, because you never generate any new data unless/until there is at least some room in the socket's output buffer.

c++ posix sockets recv functionality

I have a perhaps noobish question to ask, I've looked around but haven't seen a direct answer addressing it and thought I might get a quick answer here. In a simple TCP/IP client-server select loop using bsd sockets, if a client sends two messages that arrive simultaneously at a server, would one call to recv at the server return both messages bundled together in the buffer, or does recv force each distinct arriving message to be read separately?
I ask because I'm working in an environment where I can't tell how the client is building its messages to send. Normally recv reports that 12 bytes are read, then 915, then 12 bytes, then 915, and so on in such an alternating 12 to 915 pattern... but then sometimes it reports 927 (which is 915+12). I was thinking that either the client is bundling some of it's information together before it sends it out to the server, or that the messages arrive before recv is invoked and then recv pulls all the pending bytes simultaneously. So I wanted to make sure I understood recv's behavior properly. I think perhaps I'm missing something here in my understanding, and I hope someone can point it out, thanks!
TCP/IP is a stream-based transport, not a datagram-based transport. In a stream, there is no 1-to-1 correlation between send() and recv(). That is only true for datagrams. So, you have to be prepared to handle multiple possibilities:
a single call to send() may fit in a single TCP packet and be read in full by a single call to recv().
a single call to send() may span multiple TCP packets and need multiple calls to recv() to read everything.
multiple calls to send() may fit in a single TCP packet and be read in full by a single call to recv().
multiple calls to send() may span multiple TCP packets and require multiple calls to recv() for each packet.
To illustrate this, consider two messages are being sent - send("hello", 5) and send("world", 5). The following are a few possible combinations when calling recv():
"hello" "world"
"hel" "lo" "world"
"helloworld"
"hel" "lo" "worl" "d"
"he" "llow" "or" "ld"
Get the idea? This is simply how TCP/IP works. Every TCP/IP implementation has to account for this fragementation.
In order to receive data properly, there has to be a clear separation between logical messages, not individual calls to send(), as it may take multiple calls to send() to send a single message, and multiple recv() calls to receive a single message in full. So, taking the earlier example into account, let's add a separator between the messages:
send("hello\n", 6);
send("world", 5);
send("\n", 1);
On the receiving end, you would call recv() as many times as it takes until a \n character is received, then you would process everything you had received leading up to that character. If there is any read data left over when finished, save it for later processing and start calling recv() again until the next \n character, and so on.
Sometimes, it is not possible to place a unique character between messages (maybe the message body allows all characters to be used, so there is no distinct character available to use as a separator). In that case, you need to prefix the message with the message's length instead, either as a preceeding integer, a structured header, etc. Then you simply call recv() as many times as needed until you have received the full integer/header, then you call recv() as many times as needed to read just as many bytes as the length/header specifies. When finished, save any remaining data if needed, and start calling recv() all over again to read the next message length/header, and so on.
It is definitely valid for both messages to be returned in a single recv call (see Nagle's Algorithm). TCP/IP guarantees order (the bytes from the messages won't be mixed). In addition to them being returned together in a single call, it is also possible for a single message to require multiple calls to recv (although it would be unlikely with packets as small as described).
The only thing you can count on is the order of the bytes. You cannot count on how they are partitioned into recv calls. Sometimes things get merged either at the endpoint or along the way. Things can also get broken up along the way and so arrive independently. It does sound like your sender is sending alternating 12 and 915 but you can't count on it.

How to split received with boost asio udp sockets united datagrams

I've made my UDP server and client with boost::asio udp sockets. Everything looked good before I started sending more datagrams. They come correctly from client to server. But, they are united in my buffer into one message.
I use
udp::socket::async_receive with std::array<char, 1 << 18 > buffer
for making async request. And receive data through callback
void on_receive(const error_code& code, size_t bytes_transferred)
If I send data too often (every 10 milliseconds) I receive several datagrams simultaneously into my buffer with callback above. The question is - how to separate them? Note: my UDP datagrams have variable length. I don't want to use addition header with size, cause it'll make my code useless for third-party datagrams.
I believe this is a limitation in the way boost::asio handles stateless data streams. I noticed exactly the same behavior when using boost::asio for a serial interface. When I was sending packets with relatively large gaps between them I was receiving each one in a separate callback. As the packet size grew and the gap between the packets therefore decreased, it reached a stage when it would execute the callback only when the buffer was full, not after receipt of a single packet.
If you know exactly the size of the expected datagrams, then your solution of limiting the input buffer size is a perfectly sensible one, as you know a-priori exactly how large the buffer needs to be.
If your congestion is coming from having multiple different packet types being transmitted, so you can't pre-allocate the correct size buffer, then you could potentially create different sockets on different ports for each type of transaction. It's a little more "hacky" but given the virtually unlimited nature of ephemeral port availability, as long as you're not using 20,000 different packet types that would probably help you out as-well.

Is a successful send() "atomic"?

Does a successful call to send() with the number returned equal to the amount specified in the size parameter guarantee that no "partial sends" will occur?
Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?
I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again.
Update:
Based on the answers so far, my question could be rephrased as follows:
Is there any way for packets/data to be sent over the wire before the call to send() returns?
Does a successful call to send() with the number returned equal to the amount specified in >the size parameter guarantee that no "partial sends" will occur?
No, it's possible that parts of your data gets passed over the wire, and another part only goes as far as being copied into the internal buffers of the local TCP stack. send() will return the no. of bytes passed to the local TCP stack, not the no. of bytes that gets passed onto the wire (and even if the data reaches the wire, it might not reach the peer).
Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?
As send() only returns the no. of bytes passed into the local TCP stack, not whether send() actually sends anything, you can't really distinguish these two cases anyway. But yes, it's possibly only some data makes it over the wire. Even if there's enough space in the local buffer, the peer might not have enough space. If you send 2 bytes, but the peer only has room for 1 more byte, 1 byte might be sent, the other will reside in the local tcp stack until the peer has enough room again.
(That's an extreme example, most TCP stacks protects against sending such small segments of data at a time, but the same applies if you try to send 4k of data but the peer only have room for 3k).
I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again
That will only happen if your socket is non-blocking. If it's blocking and the local buffers are full, send() will wait until there's room in the local buffers again (or, it might return
a short count if parts of the data was delivered, but an error occured in the mean time.)
Edit to answer:
Is there any way for packets/data to be sent over the wire before the call to send() returns?
Yes. That might happen for many reasons.
e.g.
The local buffers gets filled up by that recent send() call, and you use blocking I/O.
The TCP stack sends your data over the wire but decides to schedule other processes to
run before that sending process returns from send().
Though this depends on the protocol you are using, the general question is no.
For TCP the data gets buffered inside the kernel and then sent out at the discretion of the TCP packetization algorithm, which is pretty hairy - it keeps multiple timers, minds path MTU trying to avoid IP fragmentation.
For UDP you can only assume this kind of "atomicity" if your datagram does not exceed link frame size (usual value is 1472 = 1500 of ethernet frame - 20 bytes of IP header - 8 bytes of UDP header). Otherwise your sending host will have to IP-fragment the datagram.
Then intermediate routers can still IP-fragment the passing packet if their outgoing link MTU is less then the packet size.