I have a queue with the following parameters:
x-ha-policy: all
durable: true
There is a static shovel configured to process this queue and another server is set as destination. The shovel configuration:
{sources, [ {broker, "amqp://web:web#thisserver.spc/%2F"} ]},
{destinations, [ {broker, "amqp://web:web#remoteserver.spc/%2F"} ]},
{queue, <<"queuename">>},
{prefetch_count, 10},
{publish_fields, [ {exchange, <<"exchangename">>} ]},
{reconnect_delay, 5}
My problem is that the deliver / get and ack rates never go over 50/s for the "queuename" queue, therefore causing a huge build-up (7 million) in the queue.
I wonder if changing the prefetch_count will increase the message rate?
Also "Ack required" is enabled for the consumer of the queue. The default value for this is: on_confirm. If the messages are forced to be acknowledged, does it limit the message rates?
There could be a few things going on here that could affect the message rates including network speed, message size, memory size for the RabbitMQ Broker, etc. You can change the pre-fetch count and make it higher, that will help some. Due to the high number of messages in the queue you may be reaching flow control, check out this article.
Related
[1] 2022-01-18 21:56:10,280 ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #9 - KafkaProducer[test]) Failed delivery for (MessageId: 95835510BC9E9B2-0000000000134315 on ExchangeId: 95835510BC9E9B2-0000000000134315). Exhausted after delivery attempt: 1 caught: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
[1]
[1] Message History (complete message history is disabled)
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] RouteId ProcessorId Processor Elapsed (ms)
[1] [route1 ] [route1 ] [from[netty://udp://0.0.0.0:8080?receiveBufferSize=65536&sync=false] ] [ 125320]
[1] ...
[1] [route1 ] [to1 ] [kafka:test?brokers=10.99.155.100:9092&producerBatchSize=0 ] [ 0]
[1]
[1] Stacktrace
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] : org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
Here's the flow for my project
External Service ---> Netty
Netty ---> Kafka(consumer)
Kafka(producer) ---> processing events
1 and 2 are running in one Kubernetes pod and 3 is running in a separate pod.
I have encountered TimeoutException at the beginning saying like:
org.apache.kafka.common.errors.TimeoutException: Expiring 20 record(s) for test-0:121924 ms has passed since batch creation
I searched online and found a couple of potential solutions
Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
Based on the suggestion, I have done:
make the timeout bigger, double the default value
make the batch size to 0, which will not send events in batch and keeps the memory usage low.
Unfortunately I still encounter the error due to memory is used up.
Does anyone know how to solve it?
Thanks!
There are several things to take into account here.
You are not showing up what your throughput is, you have to take into account that value and if your broker on 10.99.155.100:9092 is able to process such load.
Did you check 10.99.155.100 during the time of the transfer? The fact that Kafka can potentially process hundreds of thousands of messages per second doesn't mean that you can do it on any hardware.
So, having said that, the timeout is the first to come to my mind, but in your case you have 2 minutes and still you are timing out, for me, this sounds more like a problem in your broker and not on your producer.
To understand the issue, basically, you are getting your mouth full faster than you can swallow, by the time push a message the broker is not able to acknowledge on time (in this case, 2 minutes)
What things you can do here:
Check the broker performance for the given load Change your
delivery.timeout.ms to an acceptable value, I guess you have SLAs
to attach to Increase your retry backoff timer (retry.backoff.ms)
Do not put the batch size as 0, this will try a live push to the
broker, which in case seems not possible for the load. Make sure your
max.block.ms is set correctly Change to bigger batches (even if this
increases latency), but not too big, you need to sit down, check how
many records you are pushing and allocate them correctly.
Now, some rules:
delivery.timeout.ms must be bigger than the sum of
request.timeout.ms and linger.ms All the above are impacted by
the batch.size If you don't have so many rows, but those rows are
huge! then control the max.request.size
So, to summarize, your properties to change are the following:
delivery.timeout.ms, request.timeout.ms, linger.mx, max.request.size
Assuming the hardware is good and also assuming that you are not sending more than you should, those should do the trick
The scenario:
I have thousands of requests I need to issue each day.
I know the number at the beginning of the day and hopefully I want to send all the data about the requests to pubsub. Message per request.
I want to make the requests in constant rate. for example if I have 172800 requests, I want to process 2 in each second.
The ultimate way will involved pubsub push and cloud run.
Using pull with long running instances is also an option.
Any other option are also welcome.
I want to avoid running in a loop and fetch records from a database with limit.
This is how I am doing it today.
You can use batch and flow control settings for fine-tuning Pub/Sub performance which will help in processing messages at a constant rate.
Batching
A batch, within the context of Cloud Pub/Sub, refers to a group of one or more messages published to a topic by a publisher in a single publish request. Batching is done by default in the client library or explicitly by the user. The purpose for this feature is to allow for a higher throughput of messages while also providing a more efficient way for messages to travel through the various layers of the service(s). Adjusting the batch size (i.e. how many messages or bytes are sent in a publish request) can be used to achieve the desired level of throughput.
Features specific to batching on the publisher side include setElementCountThreshold(), setRequestByteThreshold(), and setDelayThreshold() as part of setBatchSettings() on a publisher client (the naming varies slightly in the different client libraries). These features can be used to finely tune the behavior of batching to find a better balance among cost, latency, and throughput.
Note: The maximum number of messages that can be published in a single batch is 1000 messages or 10 MB.
An example of these batching properties can be found in the Publish with batching settings documentation.
Flow Control
Flow control features on the subscriber side can help control the unhealthy behavior of tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. These features provide the added functionality to adjust how sensitive the service is to sudden spikes or drops of published throughput.
Some features that are helpful for adjusting flow control and other settings on the subscriber are setMaxOutstandingElementCount(), setMaxOutstandingRequestBytes(), and setMaxAckExtensionPeriod().
Examples of these settings being used can be found in the Subscribe with flow control documentation.
For more information refer to this link.
If you are having long running instances as subscribers, then you will need to set relevant FlowControl settings for example .setMaxOutstandingElementCount(1000L)
Once you have set it to the desired number (for example 1000), this should control the maximum amount of messages the subscriber receives before pausing the message stream, as explained in the code below from this documentation:
// The subscriber will pause the message stream and stop receiving more messsages from the
// server if any one of the conditions is met.
FlowControlSettings flowControlSettings =
FlowControlSettings.newBuilder()
// 1,000 outstanding messages. Must be >0. It controls the maximum number of messages
// the subscriber receives before pausing the message stream.
.setMaxOutstandingElementCount(1000L)
// 100 MiB. Must be >0. It controls the maximum size of messages the subscriber
// receives before pausing the message stream.
.setMaxOutstandingRequestBytes(100L * 1024L * 1024L)
.build();
As per LINGER_MS_DOC in ProducerConfig java class:
"The producer groups together any records that arrive in between
request transmissions into a single batched request. Normally this
occurs only under load when records arrive faster than they can be
sent out. However in some circumstances the client may want to reduce
the number of requests even under moderate load. This setting
accomplishes this by adding a small amount of artificial delay; that
is, rather than immediately sending out a record the producer will
wait for up to the given delay to allow other records to be sent so
that the sends can be batched together. This can be thought of as
analogous to Nagle's algorithm in TCP. This setting gives the upper
bound on the delay for batching: once we get "BATCH_SIZE_CONFIG" worth
of records for a partition it will be sent immediately regardless of
this setting, however if we have fewer than this many bytes
accumulated for this partition we will 'linger' for the specified time
waiting for more records to show up. This setting defaults to 0 (i.e.
no delay). Setting "LINGER_MS_CONFIG=5" for example, would have the
effect of reducing the number of requests sent but would add up to 5ms
of latency to records sent in the absence of load."
I searched for a suggested value for linger.ms but nowhere found a higher value suggested for this. Most of the places 5 ms is mentioned for linger.ms.
For testing, I have set "batch.size" to 16384 (16 KB)
and "linger.ms" to 60000 (60 seconds)
as per doc I felt if I send a message of size > 16384 bytes then the producer will not wait and send the message immediately, but I am not observing the same behavior.
I am sending events of size > 16384 bytes but it still waits for 60 seconds. Am I missing to understand the purpose of "batch.size"? My understanding of "batch.size" and "linger.ms" is that whichever meets first the messages/batch will be sent.
In this case, if it is going to be the minimum wait time and do not give preference to "batch.size" then I guess setting a high value for linger.ms is not right.
Here is the kafka properties used in yaml:
producer:
properties:
acks: all
retries: 5
batch:
size: 16384
linger:
ms: 10
max:
request:
size: 1046528
The responses to concurrent requests to remote actors were taking long time to respond, aka 1 request takes 300 ms, but 100 concurrent requests took almost 30 seconds to complete! So it almost looks like the requests are being executed sequentially! The request size is small, but response size was about 120 kB in JVM before serialization. But the response had deep nested case class.
The response times are similar when running on two different JVMs on same machine as well. But responses are fast when in same JVM (i.e. local actors). It is a single client making concurrent requests to one remote actor.
I see this log in akka debug logs. What does this indicate?
DEBUG test-app akka.remote.EndpointWriter - Drained buffer with
maxWriteCount: 50, fullBackoffCount: 546, smallBackoffCount: 2,
noBackoffCount: 1 , adaptiveBackoff: 2000
The logs show that write to send-buffer failed. This could indicate that
send-buffer is too small
receive-buffer on the remote actor's side is too small
network issues
The send buffer size and receive buffer size directly limits the number of concurrent requests and responses! Increase the send buffer and receive buffer sizes, on both client and server, to support the required concurrency in both client and server.
If the buffer size is not adequate, netty will wait for the buffer to be cleared before attempting to rewrite to the buffer. And by default there will be a backoff time too, and this can be configured as well.
The settings are under remote.netty.tcp:
akka {
remote {
netty.tcp {
# Sets the send buffer size of the Sockets,
# set to 0b for platform default
send-buffer-size = 1024000b
# Sets the receive buffer size of the Sockets,
# set to 0b for platform default
receive-buffer-size = 2048000b
}
# Controls the backoff interval after a refused write is reattempted.
# (Transports may refuse writes if their internal buffer is full)
backoff-interval = 1 ms
}
}
For full configuration see Akka reference config.
Sometimes my RPI drops internet connection and cron jobs starts failing left and right and the queue grows incredibly fast, and once I reconnect it just floods me with email.
I'd like to limit the mail queue to a fixed number, is this possible?
Yes, You can limit the maximal number of recipients and active queue size,
In main.cf please add these config options
qmgr_message_active_limit = 40000
qmgr_message_recipient_limit = 40000
Refer this links for better understanding active limit, recipient limit
What you want is not really possible.
I'd say a solution would be to either limit the size of the queue (by using the queue_minfree parameter) or make the system more robust regarding internet outages (like not sending mails for every error cron produces).