Are TCP/IP Sockets Atomic? - sockets

It is my understanding that a write to a TCP/IP socket will be atomic if the amount of data written is small. By atomic, I mean that the receiver will receive all of the data or none of the data. However, it is not atomic, if the amount of the data written is large. Am I correct? and if so, what counts as large?
Thanks,
Bob

No. TCP is a byte-stream protocol. No messages, no datagram-like behaviour.

For UDP, that is true, because all data written by the app is sent out in one UDP datagram.
For TCP, that is not true, unless the application sends only 1 byte of data at a time. A write to a TCP socket will write all of the data to a buffer that is associated with that socket. TCP will then read data from that buffer in the background and send it to the receiver. How much data TCP actually sends in one TCP segment depends on variables of its flow control mechanisms, and other factors, including:
Receive Window published by the other node (receiver)
Amount of data sent in previous segments in flight that are not acknowledged yet
Slow start and congestion avoidance algorithm state
Negotiated maximum segment size (MSS)
In TCP, you can never assume what the application writes to a socket is actually received in one read by the receiver. Data in the socket's buffer can be sent to the receiver in one or many TCP segments. At any moment when data is made available, the receiver can perform a socket read and return with whatever data is actually available at that moment.
Of course, all sent data will eventually reach the receiver, if there is no failure in the middle preventing that, and if the receiver does not close the connection or stop reading before the data arrives.

Related

In which scenario UDP datagrams can be reordered in a sender socket buffer?

Successfull sendto system call assumes that data is placed into a socket buffer.
Kernel documentation assumes that data stored in a socket buffer can somehow be reordered.
Packets can be reordered in the transmit path, for instance in the packet scheduler.
I cannot imagine when this can happen for a UDP socket. I was assuming that for a UDP socket its send buffer works as sort of a queue and data which is requested to be sent is never reordered.
So can you please explain how it can happen? Ideally I want to get some understanding to what extent this reordering can take place.
P.S. I am interested in sender side only. Because packets sent can be rerouted to MSG_ERRQUEUE using timestamping feature. I understand that nothing is guaranteed on receiver side.

Why does TCP not tell me how many bytes have been received?

When sending data on a TCP connection, the TCP stack makes sure the bytes arrive, and lost packets are resent as needed. To accomplish that, the other end sends ACK messages to acknowledge it has received the data.
As the TCP stack receives ACK messages, it knows that the other end has received some data and can stop retransmitting. While it doesn't know exactly how much data the other end has received (ACKs may get lost or delayed) it at least has a lower bound of data that has already been received on the other end.
To my knowledge, TCP implementations don't make that information available to high level callers.
Specifically I'm thinking of the POSIX socket APIs, as far as I know there's no way to tell how much data the other end has received. I only know how much I sent, but due to large buffer sizes it could take a lot of time for that data to be received.
Obviously, if I control the other end, I could occasionally report how much data has been received with explicit messages (either on the same tcp connection or via a separate channel), but that seems inelegant.
Why is that information not exposed to the user? It would be useful for things like progress bars.

What is the difference between Nagle algorithm and 'stop and wait'?

I saw the socket option TCP_NODELAY, which is used to turn on or off the Nagle alorithm.
I checked what the Nagle algorithm is, and it seems similar to 'stop and wait'.
Can someone give me a clear difference between these two concepts?
In a stop and wait protocol, one
sends a message to the peer
waits for an ack for that message
sends the next message
(i.e. one cannot send a new message until the previous one has been acknowledged)
Nagle's algorithem as used in TCP is orthoginal to this concept. When the TCP application sends some data, the protocol buffers the data and waits a little while to see if there's more data to be sent instead of sending data to the peer immediately.
If the application has more data to send in this small timeframe, the protocol stack merges that data into the current buffer and can send it as one large message.
This concept could very well be applied to a stop and go protocol as well. (Note that TCP is not a stop and wait protocol)
The Nagle Algorithm is used to control whether the socket provider sends outgoing data immediately as-is at the cost of less efficient network transmissions (off), or if it buffers outgoing data so it can make more efficient network transmissions at the cost of speed (on).
Stop and Wait is a mechanism used to ensure the integrity of transmitted data, by making the sender send a frame of data and then wait for an acknowledgement from the receiver before sending another frame, thus ensuring frames are received in the same order in which they are sent.
These two features operate independently of each other.

how can I transfer large data over tcp socket

how can I transfer large data without splitting. Am using tcp socket. Its for a game. I cant use udp and there might be 1200 values in an array. Am sending array in json format. But the server receiving it like splitted.
Also is there any option to send http request like tcp? I need the response in order. Also it should be faster.
Thanks,
You can't.
HTTP may chunk it
TCP will segment it
IP will packetize it
routers will fragment it ...
and TCP will reassemble it all at the other end.
There isn't a problem here to solve.
You do not have much control over splitting packets/datagrams. The network decides about this.
In the case of IP, you have the DF (don't fragment) flag, but I doubt it will be of much help here. If you are communicating over Ethernet, then 1200 element array may not fit into an Ethernet frame (payload size is up to the MTU of 1500 octets).
Why does your application depend on the fact that the whole data must arrive in a single unit, and not in a single connection (comprised potentially of multiple units)?
how can I transfer large data without splitting.
I'm interpreting the above to be roughly equivalent to "how can I transfer my data across a TCP connection using as few TCP packets as possible". As others have noted, there is no way to guarantee that your data will be placed into a single TCP packet -- but you can do some things to make it more likely. Here are some things I would do:
Keep a single TCP connection open. (HTTP traditionally opens a separate TCP connection for each request, but for low-latency you can't afford to do that. Instead you need to open a single TCP connection, keep it open, and continue sending/receiving data on it for as long as necessary).
Reduce the amount of data you need to send. (i.e. are there things that you are sending that the receiving program already knows? If so, don't send them)
Reduce the number of bytes you need to send. (The easiest way to do this is to zlib-compress your message-data before you send it, and have the receiving program decompress the message after receiving it. This can give you a size-reduction of 50-90%, depending on the content of your data)
Turn off Nagle's algorithm on your TCP socket. That will reduce latency by 200mS and discourage the TCP stack from playing unnecessary games with your data.
Send each data packet with a single send() call (if that means manually copying all of the data items into a separate memory buffer before calling send(), then so be it).
Note that even after you do all of the above, the TCP layer will still sometimes spread your messages across multiple packets, etc -- that's just the way TCP works. And even if your local TCP stack never did that, the receiving computer's TCP stack would still sometimes merge the data from consecutive TCP packets together inside its receive buffer. So the receiving program is always going to "receive it like splitted" sometimes, because TCP is a stream-based protocol and does not maintain message boundaries. (If you want message boundaries, you'll have to do your own framing -- the easiest way is usually to send a fixed-size (e.g. 1, 2, or 4-byte) integer byte-count field before each message, so the receiver knows how many bytes it needs to read in before it has a full message to parse)
Consider the idea that the issue may be else where or that you may be sending too much unnecessary data. In example with PHP there is the isset() function. If you're creating an internet based turn based game you don't (need to send all 1,200 variables back and forth every single time. Just send what changed and when the other player receives that data only change the variables are are set.

TCP connection for real time

I want to use a real time TCP connection, I have a streaming of data from server , and I receive it by a client, but this client is too slow to receive as fast as the sender is, so the server buffer the data until it's reach the destination, for example if I "produce" data at time t, and suppose that the client are 10 time slower, then the data produced at time t, will arrive at time 10t.
I want to make the server "drop" the data that can't reach the client at the present time, and send the new data which is expected to arrive at the time?
B.S : I know that UDP protocol do this, but I want to do this by TCP.
I've done this sort of thing in the past, and got reasonably good results. Here's how I did it:
1) On the sending side, use setsockopt(SOL_SOCKET, SO_SNDBUF) to make the server's TCP socket's send buffer as small as you can get away with (since you can't drop data once it's already in the socket's send buffer, you want to keep as little data there as possible)
2) On the sending side, never proactively send() any outgoing data into the socket at all. Instead, write a function (we'll call it DumpCurrentStateToBuffer()) that writes the "current state" bytes (that you want to send to the client) into an in-memory buffer.
3) When the client's socket select()'s (or poll()'s, or whatever mechanism you use) as ready-for-write, call DumpCurrentStateToBuffer() to create a memory-buffer of bytes that are to be sent to the client. Now send that data to the client (if you're using blocking I/O you can do it synchronously, at the cost of potentially stalling your server until the data can be sent; OTOH if you're using non-blocking I/O, you may need to keep the memory-buffer and your current sent-bytes index into the buffer around as state variables, so you can keep sending more sub-chunks of the memory buffer over time, whenever the socket indicates that it can receive more bytes)
4) Once the memory-buffer's contents have been fully sent, you can free the memory buffer, and then wait for the socket to select as ready-for-write again; when it does, goto (3).
This technique doesn't solve all of TCP's non-real-time issues; for example, a dropped TCP packet will still have to be resent to the client. What it does do is guarantee that the client-to-server data backlog will never be more than one or two "states" long, because you never generate any new data unless/until there is at least some room in the socket's output buffer.