Performance issue sending larger messages via akka remote actors

Performance issue sending larger messages via akka remote actors - scala

The responses to concurrent requests to remote actors were taking long time to respond, aka 1 request takes 300 ms, but 100 concurrent requests took almost 30 seconds to complete! So it almost looks like the requests are being executed sequentially! The request size is small, but response size was about 120 kB in JVM before serialization. But the response had deep nested case class.
The response times are similar when running on two different JVMs on same machine as well. But responses are fast when in same JVM (i.e. local actors). It is a single client making concurrent requests to one remote actor.
I see this log in akka debug logs. What does this indicate?
DEBUG test-app akka.remote.EndpointWriter - Drained buffer with
maxWriteCount: 50, fullBackoffCount: 546, smallBackoffCount: 2,
noBackoffCount: 1 , adaptiveBackoff: 2000

The logs show that write to send-buffer failed. This could indicate that
send-buffer is too small
receive-buffer on the remote actor's side is too small
network issues
The send buffer size and receive buffer size directly limits the number of concurrent requests and responses! Increase the send buffer and receive buffer sizes, on both client and server, to support the required concurrency in both client and server.
If the buffer size is not adequate, netty will wait for the buffer to be cleared before attempting to rewrite to the buffer. And by default there will be a backoff time too, and this can be configured as well.
The settings are under remote.netty.tcp:
akka {
remote {
netty.tcp {
# Sets the send buffer size of the Sockets,
# set to 0b for platform default
send-buffer-size = 1024000b
# Sets the receive buffer size of the Sockets,
# set to 0b for platform default
receive-buffer-size = 2048000b
}
# Controls the backoff interval after a refused write is reattempted.
# (Transports may refuse writes if their internal buffer is full)
backoff-interval = 1 ms
}
}
For full configuration see Akka reference config.

Related

Multiple concurrent connections with Vertx

I'm trying to build a web application that should be able to handle at least 15000 rps. Some of the optimizations I have done is increase the worker pool size to 20 and set an accept back log to 25000. Since I have set my worker pool size to 20; wil this help with the the blocking piece of code?

A worker pool size of 20 seems to be the default.
I believe the important question in your case is how long do you expect each request to run. On my side, I expect to have thousands of short-lived requests, each with a payload size of about 5-10KB. All of these will be blocking, because of a blocking database driver I use at the moment. I have increased the default worker pool size to 40 and I have explicitly set my deploy vertical instances using the following formulae:
final int instances = Math.min(Math.max(Runtime.getRuntime().availableProcessors() / 2, 1), 2);
A test run of 500 simultaneous clients running for 60 seconds, on a vert.x server doing nothing but blocking calls, produced an average of 6 failed requests out of 11089. My test payload in this case was ~28KB.
Of course, from experience I know that running my software in production would often produce results that I have not anticipated. Thus, the important thing in my case is to have good atomicity rules in place, so that I don't get half-baked or corrupted data in the database.

Service Bus Explorer

Our project has Microsoft Service Bus (on-prem ) running on Windows 2012 R2 servers for message processing.
When sending messages to service bus topic above the size limit (say 10 mb ) , services bus shows processing error – throws socket timeout exception.
Just wanted to know ,
if anyone has worked with sending messages (say > 10 MBs ) to Service Bus Topics . Would appreciate any suggested approach on how to handle this.
Also is there a way to increase the service bus timeout configuration or message size limit settings on Service Bus Topics either through Powershell cmds or Service Bus Explorer.

Service Bus queues support a maximum message size of 256 Kb (the header, which includes the standard and custom application properties, can have a maximum size of 64 Kb).
There is no limit on the number of messages held in a queue but there is a cap on the total size of the messages held by a queue. This queue size is defined at creation time, with an upper limit of 5 GB.

Are you asking about sending a message which is of size 10 MB? Service Bus doesn't allow that large message. For Premium, the maximum message size is 1 MB, and for Standard, it's 256 KB as #Ana said.
Also is there a way to increase the service bus timeout configuration
or message size limit settings?
Yes, there is a possibility to handle time-to-live property of messages which can be configured either at the time of Queue/Subscription creation or while sending Individual message. Refer to set Time to live for Queue as well as message.
Also is there a way to increase message size limit settings?
No, as the maximum size is 1 MB (May be increased by Azure in the future).

To answer this "Can we Send messages (say > 10 MBs ) to Service Bus Topics".
Now as of today, the updated answer will be YES: The Premium tier of Service Bus, enabling Message size up to 100 MB. Where as Standard is up to 256 KB as of today.
How to enabling large messages support for an existing queue (or topic)
Recommended:
While 100 MB message payloads are supported, it's recommended to keep the message payloads as small as possible to ensure reliable performance from the Service Bus namespace.
The Premium tier is recommended for production scenarios.

Operating system computations for throughput

Can i get help with this? I can't seem to understand the question
"In this problem you are to compare reading a file using a single-threaded file server with a multi- threaded file server. It takes 16 msec to get a request for work, dispatch it, and do the rest of the necessary processing, assuming the data are in the block cache. If a disk operation is needed (assume a spinning disk drive with 1 head), as is the case one-fourth of the time, an additional 32 msec is required."

Can i get help with this?
I don't think so (I don't think there's enough information in the question for anyone to be able to understand it).
Example 1
The file server is single-threaded and handles asynchronous requests, and the "16 msec" is primarily "request delivery latency" (time between a process sending a request and the file server receiving the request). A process sends a single request asking to read from 1000 files, the file server receives this request, "immediately" sends back 750 replies (for file data that was cached) and sends a single request asking something (file system code, disk driver?) to fetch the remaining 250 things; then file server "immediately" waits for more requests while waiting for the reply from something (file system code, disk driver?) to complete the early 250 things. In this case you can say that throughput for single-threaded file server is virtually infinite (e.g. infinite throughput for "file data cache hit", which is the only thing that matters because you can make more requests while waiting for slow disk IO).
Example 2
The file server has 8 threads and handles synchronous requests. A single-threaded process sends 1 request (to read from 1 file) and then has to wait for the reply, the request is given to one of the file server's threads (doesn't matter which) and that thread takes an average of "16 + 32*0.25 = 24 msec" for the request to be handled before the process can make it's next request; and the process does this in a loop because it wants to read 1000 files. In this case throughput is "1/0.024 = 41.66 requests per second", which is extremely bad (primarily because the single-threaded process can't send requests fast enough to keep all threads of the multi-threaded server busy).
Example 3
The file server has 8 threads and handles synchronous requests. A process with 1000 threads send 1 request (to read from 1 file) from each of its threads. In this case we need to know how many CPUs there are (and how scheduler works) to determine anything about throughput. E.g. if there's only 2 CPUs then you're not going to get 8 file server threads running in parallel at the same time.

maximum number if requests per TCP connection

I am acting as server which receives multiple requests from client in socket and handles in a thread.
Should i set any parameter in TCP level to set maximum number of requests a connection can handle simultaneously?
because in my server side ,if processing the request is slow i observe that other requests are queued up (client says request has been sent but i receive it late)
Kindly guide me

If it takes a long time to do the work and you want to handle multiple connections simultaneously, you have to change how you do things.
If you are actively using a lot of CPU during processing a long request, you'll need multiple threads. That's the only way to actually get more CPU time / second -- assuming you have multiple cores available.
If you are waiting on things like file IO, then you can instead use asynchronous processing to handle the requests on a single thread, but just handle a little piece at a time.
Setting a maximum number of TCP connections won't help you handle more processes more quickly. It will just reject connections and not even allow a first-come first-served type of behavior - it will just be random if a specific client ever gets through or not.

Why is UDP packet reception seemingly optimized mid-execution?

I'm running a client server configuration over Ethernet and measuring packet latency at both ends. The client (windows) is sending packets every 5 ms (confirmed with wire shark) as it should. Yet, the server (embedded linux) only receives packets at 5 ms intervals for a few seconds, at which point it stops for 300 ms. After this break the latency is only 20 us. After another period of about a few seconds it takes another break for 300 ms. This repeats indefinitely (300ms break, 20 us packet latency burst). It seems as if the server program is being optimized mid-execution to read IO in shorter bursts. Why is this happening?
Disclaimer: I haven't posted the code, as the client and server are small subsets of more complex applications, however, I am willing to factor it out if an obvious answer doesn't present itself.

This is UDP so there is no handshake or any flow control mechanism. Those 300 ms must be because of work the server is doing in the processing of the UDP messages received. During those 300 ms the server has surely lost around 60 messages that were not read from client.
You might probably want to check the server does not take more than 5 ms in processing each message if it uses one thread to process. If the server uses multi-threading to process the messages and the processing takes some time, even if it takes 1 ms, you might be in a situation where at some point all threads are competing for resources and they don't finish in time to read the next message. For the problem you are describing I would bet the server is multithreaded and you have that problem. I cannot assure that 100% for lack of info though. But in any case, you want to check the time it takes to process messages because you might be dealing with real-time requirements.

I spaced out the measurements to 1 in every 1000 packets and now it is behaving itself. I was using printf every 5ms which must've eventually filled the printf tx queue entirely. This then delayed the execution for 300ms. Once printf caught its breath, the program had a queue full of incoming packets and thus was seemingly receiving packets every 20 us.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse