dpdk LRO offload on HINIC can't coalesce packets - packet

I have a problem with using LRO. LRO offload is enabled, but the package cannot coalesce.
I use dpdk 21.11, nic is HINIC 1822.
enable LRO in rte_eth_dev_configure
conf->rxmode.offloads = DEV_RX_OFFLOAD_TCP_LRO | DEV_RX_OFFLOAD_TCP_CKSUM |DEV_RX_OFFLOAD_IPV4_CKSUM;
read packets from rte_eth_rx_burst pkts->ol_flags is 0x10180 have RTE_MBUF_F_RX_LRO, but pkts->pkt_len and kts->data_len is 1514. 1514 is a single pkt max length, so it seems that there is no coalesced packets.
Client sends packet length is 8192.
What could be causing the LRO not to function properly?

Related

I can not sent short messages by TCP protocol

I have a trouble to tune TCP client-server communication.
My current project has a client, running on PC (C#) and a server,
running on embedded Linux 4.1.22-ltsi.
Them use UDP communication to exchanging data.
The client and server work in blocking mode and
send short messages one to 2nd
(16, 60, 200 bytes etc.) that include either command or set of parameters.
The messages do note include any header with message length because
UDP is message oriented protocol. Its recvfrom() API returns number of received bytes.
For my server's program structure is important to get and process entire alone message.
The problem is raised when I try to implement TCP communication type instead of UDP.
The server's receive buffer (recv() TCP API) is 2048 bytes:
#define UDP_RX_BUF_SIZE 2048
numbytes = recv(fd_connect, rx_buffer, UDP_RX_BUF_SIZE, MSG_WAITALL/*BLOCKING_MODE*/);
So, the recv() API returns from waiting when rx_buffer is full, i.e after it receives
2048 bytes. It breaks all program approach. In other words, when client send 16 bytes command
to server and waits an answer from it, server's recv() keeps the message
"in stomach", until it will receive 2048 bytes.
I tried to fix it as below, without success:
On client side (C#) I set the socket parameter theSocket.NoDelay.
When I checked this on the sniffer I saw that client sends messages "as I want",
with requested length.
On server side I set TCP_NODELAY socket option to 1
int optval= 1;
setsockopt(fd,IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval);
On server side (Linux) I checked socket options SO_SNDLOWAT/SO_RCVLOWAT and they are 1 byte each one.
Please see the attached sniffer's log picture. 10.0.0.10 is a client. 10.0.0.106 is a server. It is seen, that client activates PSH flag (push), informing the server side to move the incoming data to application immediately and do not fill a buffer.
Additional question: what is SSH encrypted packets that runs between the sides. I suppose that it is my Eclipse debugger on PC (running server application through the same Ethernet connection) sends them. Am I right?
So, my problem is how to cause `recv() API to return each short message (16, 60, 200 bytes etc.) instead of accumulating them until receiving buffer fills.
TCP is connection oriented and it also maintains the order in which packets are sent and received.
Having said that, in TCP client, you will receive the stream of bytes and not the individual udp message as in UDP. So you will need to send the packet length and marker as the initial bytes.
So client can first find the packet length and then read data till packet length is reached and then expect new packet length.
You can also check for library like netty, zmq to do this extra work

Can Windivert injects packets larger than MTU?

I used winpcap and I got errors on "pcap_sendpacket", I fragmented the packet in little IP packets with the size of the MTU and did not work even wireshark didnt show errors in the packets which I fragmented.
Now I have this question, Can windivert inject packets larger than MTU? I need to know that before try to disable the "large send offload", if I disable that will me be able to send packet with winpcap larger than MTU, and with windivert? Is the only way to solve this?.
Sometimes in my program I have to fordward packet which I receive in winpcap with a size of 2300 bytes and my MTU has 1500 and It fails. If i receive the packet with windivert and send it with windivert will I have errors? Is a solution to disable the LSO?.
Regards.
Now I have this question, Can windivert inject packets larger than MTU?
Yes you should be able to "inject" it. However, the packet may be dropped (IPv6) or fragmented (IPv4) by the network en route to the destination.

Packet Size ,Window Size and Socket Buffer In TCP

After studying the "window size" concept, what I understood is that it keeps packet before sending over wire and till acknowledgement come for earliest packet . Once this gets filled up, subsequent packet will be dropped. Somewhere I also have read that TCP is a streaming protocol, and packet is what related to IP protocol at Network layer .
What I assumed till was that I have declared a Buffer (inside code) which I fill with some data and send this Buffer using socket. I declared a buffer of 10000 bytes and send it repeatedly using socket over 10 Gbps link .
I have following assumptions and questions. Please verify and help
If I want to send a packet of 64,256,512 etc. bytes, declared buffer inside code of that much space and send over socket. Each execution of send() command will send one packet of that much size .
So if I want to study the packet size variation effect on throughput, what do I have to do? Do I need to vary buffer size in code?
What are the socket buffer which we set using SO_SNDBUF and SO_RECVBUF? Google says it's buffer space for socket. Is it same as TCP window size or something different? Which parameter is more suitable to vary or to increase throughput?
Also there are three parameter in socket buffer: Min, Default and Max. Which one should I vary to my experiment and to get more relevance?
If I want to send a packet of 64,256,512 etc. bytes , Declared buffer inside code of that much space and send over socket .Each execution of send() command will send one packet of that much size.
Only if you disable the Nagle algorithm and the size is less than the path MTU. You mustn't rely on this.
So if I want to Study the Packet size variation effect on throughput, What I have to do , vary buffer space in Code?
No. Vary SO_RCVBUF at the receiver. This is the single biggest determinant of throughput, as it determines the maximum receive window.
what are the socket buffer which we set using SO_SNDBUF and SO_RCVBUF
Send buffer size at the sender, and receive buffer size at the receiver. In the kernel.
It's Same as TCP Window size
See above.
or else different ? Which parameter is more suitable to vary to increase throughput ?
See above.
Also there are three parameter in Socket Buffer min Default and Max . Which one should I vary for My experiment to get more relevance
None of them. These are the system-wide parameters. Just play with SO_SNDBUF and SO_RCVBUF for the specific sockets in your application.
TCP does not directly expose a way to control the way packets are sent since it is a stream protocol. But you can make the TCP stack send packets by disabling the Nagle algorithm. That way all data that you send will be sent out immediately instead of being buffered. Data will be split into packets of MTU size which is like ~1400 bytes. Depends on the link.
To answer (2): Disable nagling and invoke send with buffers of < 1400 bytes. Use Wireshark to make sure you got what you wanted.
The buffer settings have nothing to do with any of this. I know of no valid reason to touch them.
In general this question is probably moot since you seem to want to send a lot of data. Just leave Nagling enabled and send big buffers (such as 64KB).
I do some experience on Windows 10:
code from https://docs.python.org/3/library/socketserver.html#asynchronous-mixins,
RawCap for loopback capture,
WireShark for watching result.
The primary client code is:
def client(ip, port, message):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 100000)
sock.connect((ip, port))
sock.sendall(bytes(message, 'ascii'))
response = str(sock.recv(1024), 'ascii')
print("Received: {}".format(response))
Here is the result(the server port is 11111):
you can see, the tcp recive window size is the same as SO_RCVBUF, may it is platform indepent, you can verify it on other platform.
on https://msdn.microsoft.com/en-us/library/windows/hardware/ff570832(v=vs.85).aspx
The SO_RCVBUF socket option determines the size of a socket's receive buffer that is used by the underlying transport.
verified this.
Also, when I set SO_SNDBUF = 100000, it have no affects on the tcp transmission between client and server, as server just can discard data if client send much data one time.
So, if you want to change SO_RCVBUF to max Throughput, you can refer http://packetbomb.com/understanding-throughput-and-tcp-windows/, the os may offer func to detect ideal send backlog (ISB).

recvfrom() only gets up to 2048 bytes from UDP socket

I have to call the function repeatedly to get all data, given that the len argument is set to 10240. But this results in blocking at last. How can I get all the data and safely return in a platform independent way?
BTW, I use netcat at the sender side:
cat ocr_pi.png | nc -u server 5555
Is this issue relative to nc's behavior? I didn't find any parameter to set UDP packet size(-O is for TCP).
Thanks.
UDP sends and receives data as messages. In the len argument, you tell recvfrom() the max message size you can receive, and then recvfrom() waits until a full message arrives, regardless of its size. UDP messages are self-contained. Unlike TCP, a UDP message cannot be partially sent/received. It is an all-or-nothing thing. If the size of the received message is greater than the len value you specify, the message is discarded and you get an error.
The only way recvfrom() blocks is if there is no message available to read. If you don't want to block, use select() (or pselect() or epoll or other platform equivalent) to specify a timeout to wait for a message to arrive, and then call recvfrom() only if there is actually something to read.

read() on a NON-BLOCKING tun/tap file descriptor gets EAGAIN error

I want to read IP packets from a non-blocking tun/tap file descriptor tunfd
I set the tunfd as non-blocking and register a READ_EV event for it in libevent.
when the event is triggered, I read the first 20 bytes first to get the IP header, and then
read the rest.
nr_bytes = read(tunfd, buf, 20);
...
ip_len = .... // here I get the IP length
....
nr_bytes = read(tunfd, buf+20, ip_len-20);
but for the read(tunfd, buf+20, ip_len-20)
I got EAGAIN error, actually there should be a full packet,
so there should be some bytes,
why I get such an error?
tunfd is not compatible with non-blocking mode or libevent?
thanks!
Reads and writes with TUN/TAP, much like reads and writes on datagram sockets, must be for complete packets. If you read into a buffer that is too small to fit a full packet, the buffer will be filled up and the rest of the packet will be discarded. For writes, if you write a partial packet, the driver will think it's a full packet and deliver the truncated packet through the tunnel device.
Therefore, when you read a TUN/TAP device, you must supply a buffer that is at least as large as the configured MTU on the tun or tap interface.