ReplayingDecoder in Netty reads 10240 Bytes in each Iteration - server

I am working on a high-performance TCP server that maintains persistent SSL over TCP connections with clients. And, the clients send application messages of sizes varying from 0.8MB to 4MB in custom binary format. To read these payloads, I am using ReplayingDecoder interface in Netty with checkpoint states. However, I am seeing that the amount of bytes I am able to read as part of each call to decode() is always capped at 10240 bytes. I even tried changing the RCVBUF_ALLOCATOR with a higher window size but still doesnt help. I am using Epoll to read data into the application.
Can Netty experts chime in and let me know where this magical 10240 bytes (10KB) is coming from? And, how can I increase this limit to improve my overall TCP network throughput?
.childOption(ChannelOption.RCVBUF_ALLOCATOR, new FixedRecvByteBufAllocator(64 * 1024))

Related

Protocol with small(est) overhead over GPRS

Our company developed stations that collect data in an agricultural field. These field could be in the middle of nowhere, so stations use GSM/GPRS with SIM card that automatically switches to strongest provider.
Every 5 minutes, an internet connection is setup to send data to a server. The data has a structure with packet length, command, sensor data and crc check. But these data structures are sent with a http post to an url.
For 480 bytes of data, about 2550 bytes of data traffic is used. There is a lot of overhead in the HTTP protocol. Because we only need to send 480 bytes of data, we have 80% overhead with the post over http. Now we have a few hundreds of stations and that number is growing. So costs for data traffic are increasing rapidly.
We want to do a redesign of the transmission of data. The data is send by microchip microprocessor in the stations.
Our goal is to decrease the overhead as much as possible, with guaranteed data delivery. So I looked into TCP and UDP.
TCP has failure detection and recovery, but has a higher overhead.
UDP has lower overhead, but there it is not guaranteed that data is delivered without failure.
My first idea is to build a server that listens to a TCP port. And stations sent the data over TCP. Mainly because of guarantied data delivery.
With UDP we have to develop check and resend of data ourself, but the data structure of our records is already prepared to do checks.
So I am really in doubt what to do. And I am trying to get an answer on these questions:
How many bytes overhead would it take by TCP and UDP to sent (and deliver) 480 bytes of data?
Are TCP and UDP the best ways to consider for sending 480 bytes of data, or is there a smarter solution with even lower overhead?
How many bytes overhead would it take by TCP and UDP to sent (and deliver) 480 bytes of data?
A (typical) TCP header is 20 bytes long although it can be (slightly) longer with options. If the entire 480 bytes are sent in a single TCP segment, you'd end up with 480 +20 + 20 (IP header) = 520 bytes before layer-2 overhead.
UDP has an 8 byte header, so for UDP you'll have 480 + 8 + 20 = 508 bytes.
However you should to consider that TCP is a stream protocol. Reading from a TCP socket is like reading from a binary file - you'd need to split that stream into individual messages yourself by using some sort of delimiter or prepending the length of the message to each message.
UDP on the other hand works on individual messages. Reading from a UDP socket would return messages one at a time.
Are TCP and UDP the best ways to consider for sending 480 bytes of data, or is there a smarter solution with even lower overhead?
UDP and TCP are the lowest level transport protocols on the internet. HTTP and other high-level protocols are built on top of them. If the size of data is critical, raw TCP and UDP are as low-overhead as you're going to get without using RAW sockets and embedding your data directly into IP packets.

TCP File Transfer - Kernel copy mechanisms

I am wondering, if the speed of copy processes between user and kernel space, and in general within the whole tcp send/receive process, is dependent on the type of file (.txt, .mp4).
I mean not the file size, but the "structure" of the bytes or anything. I searched for quite a while but did not find anything related. Are there helpful phrases or terms I could have a look for ?
Thanks in advance!
TCP has no idea of the application level structure of a file and thus cannot make any decisions based on this. All what it cares about is a byte stream.
But it is not uncommon that how the application interacts with the kernel depends on the specific protocol and that this can have noticeable effects on the performance: for example one application might just write 1000 bytes at once to the kernel while another writes a 500 byte HTTP header followed by a 500 byte HTTP body. These might in both cases be the exact same bytes but more context switches are involved in the second case due to more syscalls and depending on the socket options this might also result in two TCP packets instead of one, where each TCP packet has a noticeable overhead in bytes and in processing time.

How to split received with boost asio udp sockets united datagrams

I've made my UDP server and client with boost::asio udp sockets. Everything looked good before I started sending more datagrams. They come correctly from client to server. But, they are united in my buffer into one message.
I use
udp::socket::async_receive with std::array<char, 1 << 18 > buffer
for making async request. And receive data through callback
void on_receive(const error_code& code, size_t bytes_transferred)
If I send data too often (every 10 milliseconds) I receive several datagrams simultaneously into my buffer with callback above. The question is - how to separate them? Note: my UDP datagrams have variable length. I don't want to use addition header with size, cause it'll make my code useless for third-party datagrams.
I believe this is a limitation in the way boost::asio handles stateless data streams. I noticed exactly the same behavior when using boost::asio for a serial interface. When I was sending packets with relatively large gaps between them I was receiving each one in a separate callback. As the packet size grew and the gap between the packets therefore decreased, it reached a stage when it would execute the callback only when the buffer was full, not after receipt of a single packet.
If you know exactly the size of the expected datagrams, then your solution of limiting the input buffer size is a perfectly sensible one, as you know a-priori exactly how large the buffer needs to be.
If your congestion is coming from having multiple different packet types being transmitted, so you can't pre-allocate the correct size buffer, then you could potentially create different sockets on different ports for each type of transaction. It's a little more "hacky" but given the virtually unlimited nature of ephemeral port availability, as long as you're not using 20,000 different packet types that would probably help you out as-well.

Receive UDP data from multiple sources in Matlab

I need to set up a UDP reader in Matlab that receives data from a number of sources. I typically use this for a single data source:
[packet,~,~,senderaddress]=fread(s,s.BytesAvailable)
The problems are that I want to avoid waiting for the timeout, I don't have terminated data arriving, and packets are of unknown sizes. Has anyone else had this problem? Thanks
Why not using Java's networking capabilities. UDP File Exchange does some Java scripting to access a UDP socket. Define the max size of your packets (UDP datagrams can be up to 65536 bytes long) and the preferred timeout.

Benefits of "Don't Fragment" on TCP Packets?

One of our customers is having trouble submitting data from our application (on their PC) to a server (different geographical location). When sending packets under 1100 bytes everything works fine, but above this we see TCP retransmitting the packet every few seconds and getting no response. The packets we are using for testing are about 1400 bytes (but less than 1472). I can send an ICMP ping to www.google.com that is 1472 bytes and get a response (so it's not their router/first few hops).
I found that our application sets the DF flag for these packets, and I believe a router along the way to the server has an MTU less than/equal to 1100 and dropping the packet.
This affects 1 client in 5000, but since everybody's routes will be different this is expected.
The data is a SOAP envelope and we expect a SOAP response back. I can't justify WHY we do it, the code to do this was written by a previous developer.
So... Are there any benefits OR justification to setting the DF flag on TCP packets for application data?
I can think of reasons it is needed for network diagnostics applications but not in our situation (we want the data to get to the endpoint, fragmented or not). One of our sysadmins said that it might have something to do with us using SSL, but as far as I know SSL is like a stream and regardless of fragmentation, as long as the stream is rebuilt at the end, there's no problem.
If there's no good justification I will be changing the behaviour of our application.
Thanks in advance.
The DF flag is typically set on IP packets carrying TCP segments.
This is because a TCP connection can dynamically change its segment size to match the path MTU, and better overall performance is achieved when the TCP segments are each carried in one IP packet.
So TCP packets have the DF flag set, which should cause an ICMP Fragmentation Needed packet to be returned if an intermediate router has to discard a packet because it's too large. The sending TCP will then reduce its estimate of the connection's Path MTU (Maximum Transmission Unit) and re-send in smaller segments. If DF wasn't set, the sending TCP would never know that it was sending segments that are too large. This process is called PMTU-D ("Path MTU Discovery").
If the ICMP Fragmentation Needed packets aren't getting through, then you're dealing with a broken network. Ideally the first step would be to identify the misconfigured device and have it corrected; however, if that doesn't work out then you add a configuration knob to your application that tells it to set the TCP_MAXSEG socket option with setsockopt(). (A typical example of a misconfigured device is a router or firewall that's been configured by an inexperienced network administrator to drop all ICMP, not realising that Fragmentation Needed packets are required by TCP PMTU-D).
The operation of Path-MTU discovery is described in RFC 1191, https://www.rfc-editor.org/rfc/rfc1191.
It is better for TCP to discover the Path-MTU than to have every packet over a certain size fragmented into two pieces (typically one large and one small).
Apparently, some protocols like NFS benefit from avoiding fragmentation (link text). However, you're right in that you typically shouldn't be requesting DF unless you really require it.