What happens when a process tries to read more bytes than the one that sent it - sockets

If Two processes communicate via sockets and Process A sends Process B 100 bytes.
Process B tries to read 150 bytes. Later Process A sends 50 bytes.
What is the result of Process B's read?
Will the process B read wait until it receives 150 bytes?

That is dependent on many factors, especially related to the type of socket, but also to the timing.
Generally, however, the receive buffer size is considered a maximum. So, if a process executes a recv with a buffer size of 150, but the operating system has only received 100 bytes so far from the peer socket, usually the available 100 are delivered to the receiving process (and the return value of the system call will reflect that). It is the responsibility of the receiving application to go back and execute recv again if it is expecting more data.
Another related factor (which will not generally be the case with a short transfer like 150 bytes but definitely will if you're sending a megabyte, say) is that the sender's apparently "atomic" send of 1000000 bytes will not all be delivered in one packet to the receiving peer, so if the receiver has a corresponding recv with a 1000000 byte buffer, it's very unlikely that all the data will be received in one call. Again, it's the receiver's responsibility to continue calling recv until it has received all the data sent.
And it's generally the responsibility of the sender and receiver to somehow coordinate what the expected size is. One common way to do so is by including a fixed-length header at the beginning of each logical transmission telling the receiver how many bytes are to be expected.

Depends on what kind of socket it is. For a STREAM socket, the read will return either the amount of data currently available or the amount requested (whichever is less) and will only ever block (wait) if there is no data available.
So in this example, assuming the 100 bytes have (all) been transmitted and received into the receive buffer when B reads from the socket and the additional 50 bytes have not yet been transmitted, the read will return those 100 bytes and will not wait.
Note also, the dependency of all the data being transmitted and received -- when process A writes data to a socket it will not necessarily be sent immediately or all at once. Depending on the underlying transport, there's an MTU size and any write larger than that will be broken up. Smaller writes may also be delayed and combined with later writes to make up the MTU. So in your case the send of 100 bytes might be too large (and broken up), or might be too small and not be transmitted immediately.

Related

Understanding Indy Socket timeouts

I want to understand how Indy socket timeouts works, because I want to use them in the following way.
I have an application (TCP server/client) that transfers a file over the Internet. When I start the transfer, I want to be able to stop it fast enough (let's say, 1500 ms) if I decide that. If some socket is reading data, and something happens on the wire that makes it late, I won't be able to stop the transfer, because the socket is hung reading data. So I need to set some short timeouts, that in normal operation will not be triggered. But if something happens and data is running late, the control will be passed to the main proc and I'll be able to check for the abort request.
Now, I don't know what to do next... If a socket read times out, what do that mean? The socket did not receive any data for that period of time... Or, the socket received some data in the buffer but doesn't have time to finish? I have a feeling that those timeouts are the waiting periods for something to happen (start a read or a write operation). But (let's say a read), once started, what happens if the socket receives half of the data (which he was asked to read) and then nothing comes? Will that call block the program execution forever? Because if that happens, then again I will not be able to check for abort request.
Anyway... when the timeout occurs, it will raise an exception? I can catch it and try again, in the same connection, like nothing happened? Will the in/out buffer be modified after a timeout?
I am using this to set the Read and Write timeouts:
Socket.ReadTimeout:= WorkingRTimeOut;
Socket.Binding.SetSockOpt(SOL_SOCKET, SO_SNDTIMEO, WorkingWTimeOut);
Socket timeouts are applied on a per-byte basis.
If you ask a socket to read N number of bytes, it will return as many bytes as it can, up to N bytes max, from the socket's receive buffer. It can (and frequently does) return fewer bytes, requiring another read to receive the remaining bytes. If a timeout error occurs, it means no bytes at all arrived in time for the current read. There is no way to know why, or whether they ever will arrive.
If you ask a socket to send N number of bytes, it will accept as many bytes as it can, up to N bytes max, into the socket's write buffer. It can (and sometimes does) buffer fewer bytes, requiring another send to buffer the remaining bytes. If a timeout occurs, it means the socket's write buffer has filled up, the receiver is not reading fast enough (or at all) to clear space in the sender's write buffer in time.
If you ask Indy to read/send N number of bytes, it may perform multiple socket reads/sends internally, waiting for all of the expected bytes to be received/sent. So it may have read/sent X number of bytes, where X < N, before the timeout occured. Sure, you could try another read/send again, asking for only the remaining bytes you haven't received/sent yet (N - X), but don't ask for the bytes you already received/sent (X). You might receive/send more bytes, or you might get another timeout, there is no way to know until you try. However, depending on context, it may not be easy/possible to know how many bytes were received/sent before the timeout, so you might not know how many remaining bytes to ask for again. In which case, about all you can sensibly do is just close the TCP connection, reconnect, and resume/start over.
As for your ability to abort a connection quickly, you could move your read/send code to a worker thread, and then Disconnect() the socket from your main proc when needed. That will generally abort any blocking read/send in progress.

Ignore data coming in to TCP socket

Some protocols like HTTP can specify a message length, then send a (possibly very long) message. No other messages can be received while this message is being sent (something HTTP/2.0 tried to solve) so if you decide to ignore the message, you can't just continue waiting for messages and not pull its data.
Normally I read() up to the length of the message repeatedly into a junk buffer and just ignore the bytes. But that involves copying possibly millions of bytes from kernelspace into userspace (at 1 copy per memory page, so not millions of copies). Isn't there some way to just tell the kernel to discard the bytes instead of providing them?
It seemed like an obvious question, but the only answer I've been able to come up with is oddly resource heavy, either using splice() to dump the bytes into a pipe and seeking the pipe back to 0, or opening "/dev/null" and using sendfile() to send the bytes there. I could do that, and reserve a (single) file descriptor for flushing data out of clogged connections, without reading, but isn't there just a... ignore(descriptor, length) function?

How to split received with boost asio udp sockets united datagrams

I've made my UDP server and client with boost::asio udp sockets. Everything looked good before I started sending more datagrams. They come correctly from client to server. But, they are united in my buffer into one message.
I use
udp::socket::async_receive with std::array<char, 1 << 18 > buffer
for making async request. And receive data through callback
void on_receive(const error_code& code, size_t bytes_transferred)
If I send data too often (every 10 milliseconds) I receive several datagrams simultaneously into my buffer with callback above. The question is - how to separate them? Note: my UDP datagrams have variable length. I don't want to use addition header with size, cause it'll make my code useless for third-party datagrams.
I believe this is a limitation in the way boost::asio handles stateless data streams. I noticed exactly the same behavior when using boost::asio for a serial interface. When I was sending packets with relatively large gaps between them I was receiving each one in a separate callback. As the packet size grew and the gap between the packets therefore decreased, it reached a stage when it would execute the callback only when the buffer was full, not after receipt of a single packet.
If you know exactly the size of the expected datagrams, then your solution of limiting the input buffer size is a perfectly sensible one, as you know a-priori exactly how large the buffer needs to be.
If your congestion is coming from having multiple different packet types being transmitted, so you can't pre-allocate the correct size buffer, then you could potentially create different sockets on different ports for each type of transaction. It's a little more "hacky" but given the virtually unlimited nature of ephemeral port availability, as long as you're not using 20,000 different packet types that would probably help you out as-well.

Is a successful send() "atomic"?

Does a successful call to send() with the number returned equal to the amount specified in the size parameter guarantee that no "partial sends" will occur?
Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?
I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again.
Update:
Based on the answers so far, my question could be rephrased as follows:
Is there any way for packets/data to be sent over the wire before the call to send() returns?
Does a successful call to send() with the number returned equal to the amount specified in >the size parameter guarantee that no "partial sends" will occur?
No, it's possible that parts of your data gets passed over the wire, and another part only goes as far as being copied into the internal buffers of the local TCP stack. send() will return the no. of bytes passed to the local TCP stack, not the no. of bytes that gets passed onto the wire (and even if the data reaches the wire, it might not reach the peer).
Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?
As send() only returns the no. of bytes passed into the local TCP stack, not whether send() actually sends anything, you can't really distinguish these two cases anyway. But yes, it's possibly only some data makes it over the wire. Even if there's enough space in the local buffer, the peer might not have enough space. If you send 2 bytes, but the peer only has room for 1 more byte, 1 byte might be sent, the other will reside in the local tcp stack until the peer has enough room again.
(That's an extreme example, most TCP stacks protects against sending such small segments of data at a time, but the same applies if you try to send 4k of data but the peer only have room for 3k).
I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again
That will only happen if your socket is non-blocking. If it's blocking and the local buffers are full, send() will wait until there's room in the local buffers again (or, it might return
a short count if parts of the data was delivered, but an error occured in the mean time.)
Edit to answer:
Is there any way for packets/data to be sent over the wire before the call to send() returns?
Yes. That might happen for many reasons.
e.g.
The local buffers gets filled up by that recent send() call, and you use blocking I/O.
The TCP stack sends your data over the wire but decides to schedule other processes to
run before that sending process returns from send().
Though this depends on the protocol you are using, the general question is no.
For TCP the data gets buffered inside the kernel and then sent out at the discretion of the TCP packetization algorithm, which is pretty hairy - it keeps multiple timers, minds path MTU trying to avoid IP fragmentation.
For UDP you can only assume this kind of "atomicity" if your datagram does not exceed link frame size (usual value is 1472 = 1500 of ethernet frame - 20 bytes of IP header - 8 bytes of UDP header). Otherwise your sending host will have to IP-fragment the datagram.
Then intermediate routers can still IP-fragment the passing packet if their outgoing link MTU is less then the packet size.

TCP Socket Transfer

A while back i had a question about why my socket sometimes received only 653 octets ( for example ) when i sent 1024 octets and thanks to Rakis i understood: The OS allows reception to occur in arbitrarily sized chunks.
This time i need a confirmation :)
On any OS ( Well GNU/Linux and Windows at least ), In any Language ( I'm using Python here ), if i send a packet of a random number of bytes, can be 2 bytes, can be 12000 bytes, let's say X, when i write socket.send(X), am i absolutely guaranteed that X will be FULLY received ( regardless of any chunks the receiving OS divides it into ) on the other end of the socket BEFORE i do another socket.send(any string) ?
Or in other words if i have the code :
socket.send(X)
socket.send(Y)
Even if X > MTU so it will be obliged to send multiple packets, does it wait until every packet is sent and acknowledged by the endpoint of the socket before sending Y ? Well writing that makes me believe that the answer is yes it is guaranteed and that this is exactly the purpose of setting a socket in blocking mode but i want to be sure :D
Thanks in advance,
Nolhian
You are guaranteed that X will be received (at the application level) before Y, if it's a stream socket. If it's a datagram socket, no guarantees.
Depending on the networking implementation, it's possible that at a lower level, X will be sent, lost in transmission, then Y will be sent, then X will be re-sent because no acknowledgement was received.
Even in blocking mode, the socket.send(Y) can execute before X even makes it "onto the wire", because the OS will buffer network traffic.
No, you can't.
All you know is that the client will receive the data in order, assuming it does receive it all. There's no way of knowing (at the application level) whether the client has received all the data without having some sort of "ACK" at the application level protocol.
am i absolutely guaranteed that X will be FULLY received ( regardless of any chunks the receiving OS divides it into ) on the other end of the socket BEFORE i do another socket.send(any string) ?
No. In general, more data may be sent without waiting for the receiving side, within certain limits:
on the sending side, you will have a maximum amount of data you can enqueue for transmission until the client has acknowledged some receipt (but typically the client's OS will acknowledge and buffer quite a lot before it refuses further data until the application has processed some), after which the sending socket may start blocking
forces the application design to consider how to enqueue and buffer excessive amounts of data, rather than having naively written applications utilise excessive amounts of Operating System-provided buffer memory
reduces retransmission rates when the receiving side is flooded with data too fast to process it
avoids sending huge amounts of data despite the network connection having been lost
So, strictly speaking and for large transmissions, the sender should be designed to handle sockets blocked from further sends (either knowing it is ok to block in the attempt (perhaps due to a dedicated sending thread) or waiting until it is possible to send more via non-blocking sockets or select/poll).
Whatever retransmission and buffering may be required, what you CAN be sure of is that the receiving side will have to read all of "X" before it starts being given the subsequently sent data "Y" (unless it specifically asks to have it otherwise, e.g. Out Of Band data).
Depending on the type of Sockets that you use, you can, in some cases, have a guarantee that data will be received, but not a feedback or a confirmation when it actually was.
Back to your question:
does it wait until every packet is sent and acknowledged by the endpoint of the socket before sending Y
So, you could say:
YES it does wait until it is sent, and
NO it does not wait for acknowledgment
A suggestion:
Since there are no auto-magic/built-in confirmations that your data was received, you could fairly easily implement your own logic for ACKnowledging the package was received, which would basically come down to your custom communication protocol.