Will Kafka Producer always waits for the value specified by linger.ms, before sending a request? - apache-kafka

As per LINGER_MS_DOC in ProducerConfig java class:
"The producer groups together any records that arrive in between
request transmissions into a single batched request. Normally this
occurs only under load when records arrive faster than they can be
sent out. However in some circumstances the client may want to reduce
the number of requests even under moderate load. This setting
accomplishes this by adding a small amount of artificial delay; that
is, rather than immediately sending out a record the producer will
wait for up to the given delay to allow other records to be sent so
that the sends can be batched together. This can be thought of as
analogous to Nagle's algorithm in TCP. This setting gives the upper
bound on the delay for batching: once we get "BATCH_SIZE_CONFIG" worth
of records for a partition it will be sent immediately regardless of
this setting, however if we have fewer than this many bytes
accumulated for this partition we will 'linger' for the specified time
waiting for more records to show up. This setting defaults to 0 (i.e.
no delay). Setting "LINGER_MS_CONFIG=5" for example, would have the
effect of reducing the number of requests sent but would add up to 5ms
of latency to records sent in the absence of load."
I searched for a suggested value for linger.ms but nowhere found a higher value suggested for this. Most of the places 5 ms is mentioned for linger.ms.
For testing, I have set "batch.size" to 16384 (16 KB)
and "linger.ms" to 60000 (60 seconds)
as per doc I felt if I send a message of size > 16384 bytes then the producer will not wait and send the message immediately, but I am not observing the same behavior.
I am sending events of size > 16384 bytes but it still waits for 60 seconds. Am I missing to understand the purpose of "batch.size"? My understanding of "batch.size" and "linger.ms" is that whichever meets first the messages/batch will be sent.
In this case, if it is going to be the minimum wait time and do not give preference to "batch.size" then I guess setting a high value for linger.ms is not right.
Here is the kafka properties used in yaml:
producer:
properties:
acks: all
retries: 5
batch:
size: 16384
linger:
ms: 10
max:
request:
size: 1046528

Related

Kafka producer timeout exception

[1] 2022-01-18 21:56:10,280 ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #9 - KafkaProducer[test]) Failed delivery for (MessageId: 95835510BC9E9B2-0000000000134315 on ExchangeId: 95835510BC9E9B2-0000000000134315). Exhausted after delivery attempt: 1 caught: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
[1]
[1] Message History (complete message history is disabled)
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] RouteId ProcessorId Processor Elapsed (ms)
[1] [route1 ] [route1 ] [from[netty://udp://0.0.0.0:8080?receiveBufferSize=65536&sync=false] ] [ 125320]
[1] ...
[1] [route1 ] [to1 ] [kafka:test?brokers=10.99.155.100:9092&producerBatchSize=0 ] [ 0]
[1]
[1] Stacktrace
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] : org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
Here's the flow for my project
External Service ---> Netty
Netty ---> Kafka(consumer)
Kafka(producer) ---> processing events
1 and 2 are running in one Kubernetes pod and 3 is running in a separate pod.
I have encountered TimeoutException at the beginning saying like:
org.apache.kafka.common.errors.TimeoutException: Expiring 20 record(s) for test-0:121924 ms has passed since batch creation
I searched online and found a couple of potential solutions
Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
Based on the suggestion, I have done:
make the timeout bigger, double the default value
make the batch size to 0, which will not send events in batch and keeps the memory usage low.
Unfortunately I still encounter the error due to memory is used up.
Does anyone know how to solve it?
Thanks!
There are several things to take into account here.
You are not showing up what your throughput is, you have to take into account that value and if your broker on 10.99.155.100:9092 is able to process such load.
Did you check 10.99.155.100 during the time of the transfer? The fact that Kafka can potentially process hundreds of thousands of messages per second doesn't mean that you can do it on any hardware.
So, having said that, the timeout is the first to come to my mind, but in your case you have 2 minutes and still you are timing out, for me, this sounds more like a problem in your broker and not on your producer.
To understand the issue, basically, you are getting your mouth full faster than you can swallow, by the time push a message the broker is not able to acknowledge on time (in this case, 2 minutes)
What things you can do here:
Check the broker performance for the given load Change your
delivery.timeout.ms to an acceptable value, I guess you have SLAs
to attach to Increase your retry backoff timer (retry.backoff.ms)
Do not put the batch size as 0, this will try a live push to the
broker, which in case seems not possible for the load. Make sure your
max.block.ms is set correctly Change to bigger batches (even if this
increases latency), but not too big, you need to sit down, check how
many records you are pushing and allocate them correctly.
Now, some rules:
delivery.timeout.ms must be bigger than the sum of
request.timeout.ms and linger.ms All the above are impacted by
the batch.size If you don't have so many rows, but those rows are
huge! then control the max.request.size
So, to summarize, your properties to change are the following:
delivery.timeout.ms, request.timeout.ms, linger.mx, max.request.size
Assuming the hardware is good and also assuming that you are not sending more than you should, those should do the trick

Jmeter JSR223 Pre processor and sampler

I have a jsr223 preprocessor in the concurrency thread group which creates data to send it to Kafka producer. and I have a JSR sampler that uses Kafka client 2.7.0 to send messages to Kafka procedure.
The message sent to Kafka should be different each time for e.g. it has device information which should be different and events with time (which is the current time). These are been generated without any issues as I tested it with few
(50) threads. The problem I am having is when I want to send more messages like 6000 messages per second. How to resolve this issue
below is my setup
You're showing us a screenshot of the Concurrency Thread Group configured to start 6000 threads (virtual users) and hold them for 20 seconds.
It will result in 6000 messages per second only if your JSR223 PreProcessor and Sampler cumulative response time is exactly 1 second. If it will be less - you will generate more messages per second and vice versa.
For example:
if PreProcessor and Sampler execution time is 500ms - you will end with 12000 messages per second
if PreProcessor and Sampler execution time is 2000ms - you will send 3000 messages per second
If you're sending less messages than you need - consider following JMeter Best Practices, at least disable all the Listeners and run your test in non-GUI mode. Still not enough? Increase the concurrency. Increased concurrency, lacking resources and still not enough - go for Distributed Testing
If you're sending more messages than 6000 per second you can limit JMeter sampler execution rate to the desired throughput using Throughput Shaping Timer
You can see your current throughput using i.e. Transactions per Second listener

Does Kafka consumer fetch-min-size (fetch.min.bytes) wait for the mentioned size to get filled?

Suppose there are 107 records, each record is 1kb. If the fetch-size is 15kb, in 7 iterations 105kb would be consumed. Now, only 2kb is remaining, will I get the remaining 2 records in next iterations or will it wait for more 15kb to be accumulated? Assuming after this there are no records remaining after this one.
It waits until the time defined in fetch.max.wait.ms is reached. This value is set to 500 by default. You can find the description of the two relevant configurations in the Kafka documentation on Consumer
fetch.min.bytes
The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request. The default setting of 1 byte means that fetch requests are answered as soon as a single byte of data is available or the fetch request times out waiting for data to arrive. Setting this to something greater than 1 will cause the server to wait for larger amounts of data to accumulate which can improve server throughput a bit at the cost of some additional latency.
fetch.max.wait.ms
The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes.

Kafka Connect fetch.max.wait.ms & fetch.min.bytes combined not honored?

I'm creating a custom SinkConnector using Kafka Connect (2.3.0) that needs to be optimized for throughput rather than latency. Ideally, what I want is:
Batches of ~ 20 megabytes or 100k records whatever comes first, but if message rate is low, process at least every minute (avoid small batches, but minimum MySinkTask.put() rate to be every minute).
This is what I set for consumer settings in an attempt to accomplish it:
consumer.max.poll.records=100000
consumer.fetch.max.bytes=20971520
consumer.fetch.max.wait.ms=60000
consumer.max.poll.interval.ms=120000
consumer.fetch.min.bytes=1048576
I needs this fetch.min.bytes setting, or else MySinkTask.put() is called for multiple times per second despite the other settings...?
Now, what I observe in a low-rate situation is that MySinkTask.put() is called with 0 records multiple times and several minutes pass by, until fetch.min.bytes is reached, and then I get them all at once.
I fail to understand so far:
Why fetch.max.wait.ms=60000 is not pushing downwards from the consumer to the put() call of my connector? Shouldn't that have precedence over fetch.min.bytes?
What setting controls the ~ 2x per second call to MySinkTask.put() if fetch.min.bytes=1 (default)? I don't understand why it does that, even the verbose output of the Connect runtime settings don't show any interval below multiples of seconds.
I've double-checked the log output, and the lines INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values: as printed by the Connect Runtime are showing the expected values as I pass with the consumer. prefixed values.
The "process at least every interval" part seems not possible, as the fetch.min.bytes consumer setting takes precedence and Connect does not allow you to dynamically adjust the ConsumerConfig while the Task is running. :-(
Work-around for now is batching in the Task manually; set fetch.min.bytes to 1 (yikes), buffer records in the Task on put() calls, and flush when necessary. This is not very ideal as it infers some overhead for the Connector which I hoped to avoid.
The logic how Connect does a ~ 2x per second batching from its consumer's poll to SinkTask.put() remains a mystery to me, but it's better than being called for every message.

Kafka producer future metadata in callback

In my application when I send messages I use the Metadata in the callback to save the offset of the record for future usage. However sometimes the metadata.offset() returns -1 which makes things hard later.
Why does this happen and is there a way to get the offset without consuming the topic to find it.
Edit: I am on ack 0 currently, when I pass to ack 1 I don't have these errors anymore however my performance drops drastically. From 100k message in 10 sec to 1 min.
acks=0 If set to zero then the producer will not wait for any
acknowledgment from the server at all. The record will be immediately
added to the socket buffer and considered sent. No guarantee can be
made that the server has received the record in this case, and the
retries configuration will not take effect (as the client won't
generally know of any failures). The offset given back for each
record will always be set to -1.
This is not exactly true as out of 100k messages I got 95k with offsets but I guess it's normal.
Still will need to find another solution to get the offset with ack=0