Packet Size ,Window Size and Socket Buffer In TCP - sockets

After studying the "window size" concept, what I understood is that it keeps packet before sending over wire and till acknowledgement come for earliest packet . Once this gets filled up, subsequent packet will be dropped. Somewhere I also have read that TCP is a streaming protocol, and packet is what related to IP protocol at Network layer .
What I assumed till was that I have declared a Buffer (inside code) which I fill with some data and send this Buffer using socket. I declared a buffer of 10000 bytes and send it repeatedly using socket over 10 Gbps link .
I have following assumptions and questions. Please verify and help
If I want to send a packet of 64,256,512 etc. bytes, declared buffer inside code of that much space and send over socket. Each execution of send() command will send one packet of that much size .
So if I want to study the packet size variation effect on throughput, what do I have to do? Do I need to vary buffer size in code?
What are the socket buffer which we set using SO_SNDBUF and SO_RECVBUF? Google says it's buffer space for socket. Is it same as TCP window size or something different? Which parameter is more suitable to vary or to increase throughput?
Also there are three parameter in socket buffer: Min, Default and Max. Which one should I vary to my experiment and to get more relevance?

If I want to send a packet of 64,256,512 etc. bytes , Declared buffer inside code of that much space and send over socket .Each execution of send() command will send one packet of that much size.
Only if you disable the Nagle algorithm and the size is less than the path MTU. You mustn't rely on this.
So if I want to Study the Packet size variation effect on throughput, What I have to do , vary buffer space in Code?
No. Vary SO_RCVBUF at the receiver. This is the single biggest determinant of throughput, as it determines the maximum receive window.
what are the socket buffer which we set using SO_SNDBUF and SO_RCVBUF
Send buffer size at the sender, and receive buffer size at the receiver. In the kernel.
It's Same as TCP Window size
See above.
or else different ? Which parameter is more suitable to vary to increase throughput ?
See above.
Also there are three parameter in Socket Buffer min Default and Max . Which one should I vary for My experiment to get more relevance
None of them. These are the system-wide parameters. Just play with SO_SNDBUF and SO_RCVBUF for the specific sockets in your application.

TCP does not directly expose a way to control the way packets are sent since it is a stream protocol. But you can make the TCP stack send packets by disabling the Nagle algorithm. That way all data that you send will be sent out immediately instead of being buffered. Data will be split into packets of MTU size which is like ~1400 bytes. Depends on the link.
To answer (2): Disable nagling and invoke send with buffers of < 1400 bytes. Use Wireshark to make sure you got what you wanted.
The buffer settings have nothing to do with any of this. I know of no valid reason to touch them.
In general this question is probably moot since you seem to want to send a lot of data. Just leave Nagling enabled and send big buffers (such as 64KB).

I do some experience on Windows 10:
code from https://docs.python.org/3/library/socketserver.html#asynchronous-mixins,
RawCap for loopback capture,
WireShark for watching result.
The primary client code is:
def client(ip, port, message):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 100000)
sock.connect((ip, port))
sock.sendall(bytes(message, 'ascii'))
response = str(sock.recv(1024), 'ascii')
print("Received: {}".format(response))
Here is the result(the server port is 11111):
you can see, the tcp recive window size is the same as SO_RCVBUF, may it is platform indepent, you can verify it on other platform.
on https://msdn.microsoft.com/en-us/library/windows/hardware/ff570832(v=vs.85).aspx
The SO_RCVBUF socket option determines the size of a socket's receive buffer that is used by the underlying transport.
verified this.
Also, when I set SO_SNDBUF = 100000, it have no affects on the tcp transmission between client and server, as server just can discard data if client send much data one time.
So, if you want to change SO_RCVBUF to max Throughput, you can refer http://packetbomb.com/understanding-throughput-and-tcp-windows/, the os may offer func to detect ideal send backlog (ISB).

Related

I can not sent short messages by TCP protocol

I have a trouble to tune TCP client-server communication.
My current project has a client, running on PC (C#) and a server,
running on embedded Linux 4.1.22-ltsi.
Them use UDP communication to exchanging data.
The client and server work in blocking mode and
send short messages one to 2nd
(16, 60, 200 bytes etc.) that include either command or set of parameters.
The messages do note include any header with message length because
UDP is message oriented protocol. Its recvfrom() API returns number of received bytes.
For my server's program structure is important to get and process entire alone message.
The problem is raised when I try to implement TCP communication type instead of UDP.
The server's receive buffer (recv() TCP API) is 2048 bytes:
#define UDP_RX_BUF_SIZE 2048
numbytes = recv(fd_connect, rx_buffer, UDP_RX_BUF_SIZE, MSG_WAITALL/*BLOCKING_MODE*/);
So, the recv() API returns from waiting when rx_buffer is full, i.e after it receives
2048 bytes. It breaks all program approach. In other words, when client send 16 bytes command
to server and waits an answer from it, server's recv() keeps the message
"in stomach", until it will receive 2048 bytes.
I tried to fix it as below, without success:
On client side (C#) I set the socket parameter theSocket.NoDelay.
When I checked this on the sniffer I saw that client sends messages "as I want",
with requested length.
On server side I set TCP_NODELAY socket option to 1
int optval= 1;
setsockopt(fd,IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval);
On server side (Linux) I checked socket options SO_SNDLOWAT/SO_RCVLOWAT and they are 1 byte each one.
Please see the attached sniffer's log picture. 10.0.0.10 is a client. 10.0.0.106 is a server. It is seen, that client activates PSH flag (push), informing the server side to move the incoming data to application immediately and do not fill a buffer.
Additional question: what is SSH encrypted packets that runs between the sides. I suppose that it is my Eclipse debugger on PC (running server application through the same Ethernet connection) sends them. Am I right?
So, my problem is how to cause `recv() API to return each short message (16, 60, 200 bytes etc.) instead of accumulating them until receiving buffer fills.
TCP is connection oriented and it also maintains the order in which packets are sent and received.
Having said that, in TCP client, you will receive the stream of bytes and not the individual udp message as in UDP. So you will need to send the packet length and marker as the initial bytes.
So client can first find the packet length and then read data till packet length is reached and then expect new packet length.
You can also check for library like netty, zmq to do this extra work

Lowest TCP receive window size

What is the lowest possible TCP receive window size can be announced by Linux kernel TCP/IP stack impl and how can i configure to make it announce such one? My goal is to achieve low latency and sacrificing throughput?
Right now my test client application reads about 128 bytes each second from buffer and TCP stack waits till there are free SO_RCVBUF/2 bytes space before notifying server about new window size. I'd like to make my client announce window size lower than SO_RCVBUF/2 if its possible.
I do not think that changing the receive window size will have any effect on latency.
However, since your messages are small (128 bytes), you may like to disable Nagle algorithm in the sender that makes the sender wait when sending TCP payload smaller than MSS:
Applications that expect real-time responses and low latency can react poorly with Nagle's algorithm. Applications such as networked multiplayer video games or the movement of the mouse in a remotely controlled operating system, expect that actions are sent immediately, while the algorithm purposefully delays transmission, increasing bandwidth efficiency at the expense of latency. For this reason applications with low-bandwidth time-sensitive transmissions typically use TCP_NODELAY to bypass the Nagle delay.
On the receiver you may like to disable TCP delayed acknowledgements:
The additional wait time introduced by the delayed ACK can cause further delays when interacting with certain applications and configurations. If Nagle's algorithm is being used by the sending party, data will be queued by the sender until an ACK is received. If the sender does not send enough data to fill the maximum segment size (for example, if it performs two small writes followed by a blocking read) then the transfer will pause up to the ACK delay timeout. Linux 2.4.4+ supports a TCP_QUICKACK socket option that disables delayed ACK.
C++ code:
void disableTcpNagle(int sock) {
int value = 1;
if(::setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &value, sizeof value))
throw std::system_error(errno, std::system_category(), "setsockopt(TCP_NODELAY)");
}
void enableTcpQuickAck(int sock) {
int value = 1;
if(::setsockopt(sock, IPPROTO_TCP, TCP_QUICKACK, &value, sizeof value))
throw std::system_error(errno, std::system_category(), "setsockopt(TCP_QUICKACK)");
}

read() on a NON-BLOCKING tun/tap file descriptor gets EAGAIN error

I want to read IP packets from a non-blocking tun/tap file descriptor tunfd
I set the tunfd as non-blocking and register a READ_EV event for it in libevent.
when the event is triggered, I read the first 20 bytes first to get the IP header, and then
read the rest.
nr_bytes = read(tunfd, buf, 20);
...
ip_len = .... // here I get the IP length
....
nr_bytes = read(tunfd, buf+20, ip_len-20);
but for the read(tunfd, buf+20, ip_len-20)
I got EAGAIN error, actually there should be a full packet,
so there should be some bytes,
why I get such an error?
tunfd is not compatible with non-blocking mode or libevent?
thanks!
Reads and writes with TUN/TAP, much like reads and writes on datagram sockets, must be for complete packets. If you read into a buffer that is too small to fit a full packet, the buffer will be filled up and the rest of the packet will be discarded. For writes, if you write a partial packet, the driver will think it's a full packet and deliver the truncated packet through the tunnel device.
Therefore, when you read a TUN/TAP device, you must supply a buffer that is at least as large as the configured MTU on the tun or tap interface.

Examine data at in callout driver for FWPM_LAYER_EGRESS_VSWITCH_TRANSPORT_V4 layer in WFP

I am writing the callout driver for Hyper-V 2012 where I need to filter the packets sent from virtual machines.
I added filter at FWPM_LAYER_EGRESS_VSWITCH_TRANSPORT_V4 layer in WFP. Callout function receive packet buffer which I am typecasting it to NET_BUFFER_LIST. I am doing following to get the data pointer
pNetBuffer = NET_BUFFER_LIST_FIRST_NB((NET_BUFFER_LIST*)pClassifyData->pPacket);
pContiguousData = NdisGetDataBuffer(pNetBuffer, NET_BUFFER_DATA_LENGTH(pNetBuffer), 0, 1, 0);
I have simple client-server application to test the packet data. Client is on VM and server is another machine. As I observed, data sent from client to server is truncated and some garbage value is added at the end. There is no issue for sending message from server to client. If I dont add this layer filter client-server works without any issue.
Callback function receives the metadata which incldues ipHeaderSize and transportHeaderSize. Both these values are zero. Are these correct values or should those be non-zero??
Can somebody help me to extract the data from packet in callout function and forward it safely to further layers?
Thank You.
These are the TCP packets. I looked into size and offset information. It seems the problem is consistent across packets.
I checked below values in (NET_BUFFER_LIST*)pClassifyData->pPacket.
NET_BUFFER_LIST->NetBUfferListHeader->NetBUfferListData->FirstNetBuffer->NetBuffe rHeader->NetBufferData->CurrentMdl->MappedSystemVa
First 24 bytes are only sent correctly and remaining are garbage.
For example total size of the packet is 0x36 + 0x18 = 0x4E I don't know what is there in first 0x36 bytes which is constant for all the packets. Is it a TCP/IP header? Second part 0x18 is the actual data which i sent.
I even tried with API NdisQueryMdl() to retrieve from MDL list.
So on the receiver side I get only 24 bytes correct and remaining is the garbage. How to read the full buffer from NET_BUFFER_LIST?

limitation of the reception buffer

I established a connection with a client this way:
gen_tcp:listen(1234,[binary,{packet,0},{reuseaddr,true},{active,false},{recbuf,2048}]).
This code performs message processing:
loop(Socket)->
inet:setops(Socket,[{active,once}],
receive
{tcp,Socket,Data}->
handle(Data),
loop(Socket);
{Pid,Cmd}->
gen_tcp:send(Socket,Cmd),
loop(Socket);
{tcp_close,Socket}->
% ...
end.
My OS is Windows. When the size of the message is 1024 bytes, I lose bytes in Data. The server sends ACK + FIN to the client.
I believe that the Erlang is limited to 1024 bytes, therefore I defined recbuf.
Where the problem is: Erlang, Windows, hardware?
Thanks.
You may be setting the receive buffer far too small. Erlang certainly isn't limited to a 1024 byte buffer. You can check for yourself by doing the following in the shell:
{ok, S} = gen_tcp:connect("www.google.com", 80, [{active,false}]),
O = inet:getopts(S, [recbuf]),
gen_tcp:close(S),
O.
On Mac OS X I get a default receive buffer size of about 512Kb.
With {packet, 0} parsing, you'll receive tcp data in whatever chunks the network stack chooses to send it in, so you have to do message boundary parsing and buffering yourself. Do you have a reliable way to check message boundaries in the wire protocol? If so, receive the tcp data and append it to a buffer variable until you have a complete message. Then call handle on the complete message and remove the complete message from the buffer before continuing.
We could probably help you more if you gave us some information on the client and the protocol in use.