How to apply back pressure to kafka producer? - apache-kafka

Back pressure can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue.
In RabbitMQ it can be applied setting Queue length limit.
How can it be done with Kafka ?
Can this be done by keeping Rate Limiter(TokenBucket) between kafka producer and broker, where the current bucket size and refill rate be set dynamically with values of rate of consumption ? Like a rest api in producer receiving from consumer the rate at which consumer is processing messages.

Load is typically distributed amongst brokers, so back-pressure from the producer client-side may not be necessary.
But you can add quotas per client to throttle their requests.

Related

Kafka reactor consumer high memory usage

We built a Java application with Kafka reactor consumer. We tried to fetch the messages from Kafka broker as quick as possible in each consumers so setting up fetch.max.wait.ms as a low number like 2 or 4. We see a frequent GC due to high memory usage per consumer threads.
Once we plugin in the VisualVM and found each consumer having a high allocated Bytes / sec. We tried to stop the producer but the memory usage doesn't stop and it looks very consistent cross different consumer threads (some of the partition consumer assigned should have zero new message). Increase the fetch.max.wait.ms will reduce the usage but it doesn't explained a high usage when no new message from last fetch. Is that possible that Kafka consumer poll also container some historical buffer? (Reduce the VisualVM Sampling frequency doesn't change the result)

How to get Load of my Kafka Topic. Is there any API with which I can measure the load on my Kafka Topic?

I have a Kafka topic on my spring boot application on which I am sending some data from producer. I want to check the load of my topic so that I can create new topics, if load on the previous topic exceeds
Topics don't have "load" in the traditional sense. Sure, you can use JMX metrics to measure incoming byte rates, but that is network load, as measured by the broker. You can also measure outgoing rates by the producer, per partition, and aggregate to get data by topic.
The brokers hosting the partitions do have load; measurable network, disk , and CPU load.
Secondly, your producers would all need to be updated to actually send data to those new topics you'd created; they nor the brokers would know to "distribute load" to them.
The correct way to reduce broker load, and distribute data to more brokers is to increase the cluster size, and the correct way to scale production is to add more partitions.

Kafka Reduce Lag for Consumer

I am setting up the new Kafka cluster and for testing purpose I created the topic with 1 partition and 3 replicas.
Now, I am firing the messages via producer in parallel say 50K messages per Second. And I have create One Consumer inside a Group and its only able to fetch 30K messages per second.
I can change topic level, partition level, consumer level configurations.
I am monitoring everything via grafana + prometheus.
Any Idea which configuration or something else can help me to consumer more data??
Thanks In Advance
A Kafka consumer polls the broker for messages and fetches whatever messages are available for consumption, depending upon the consumer configuration used. In general, it is efficient to transfer as much data is possible in a single poll request if increasing throughput is your aim. But how much data is transferred in a single poll is determined by the size of messages, number of records and some parameters which control how much time to wait for messages to be available.
In general, you can influence throughput using one or more of the following consumer configurations:
fetch.min.bytes
max.partition.fetch.bytes
fetch.max.bytes
max.poll.records
fetch.max.wait.ms

Back pressure in Kafka

I have a situation in Kafka where the producer publishes the messages at a very higher rate than the consumer consumption rate. I have to implement the back pressure implementation in kafka for further consumption and processing.
Please let me know how can I implement in spark and also in normal java api.
Kafka acts as the regulator here. You produce at whatever rate you want to into Kafka, scaling the brokers out to accommodate the ingest rate. You then consume as you want to; Kafka persists the data and tracks the offset of the consumers as they work their way through the data they read.
You can disable auto-commit by enable.auto.commit=false on consumer and commit only when consumer operation is finished. That way consumer would be slow, but Kafka knows how many messages consumer processed, also configuring poll interval with max.poll.interval.ms and messages to be consumed in each poll with max.poll.records you should be good.

How to set Kafka Producer message rate per second?

I am reading a csv file and giving the rows of this input to my Kafka Producer. now I want my Kafka Producer to produce messages at a rate of 100 messages per second.
Take a look at linger.ms and batch.size properties of Kafka Producer.
You have to adjust these properties correspondingly to get desired rate.
The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle's algorithm in TCP. This setting gives the upper bound on the delay for batching: once we get batch.size worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting linger.ms=5, for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absense of load.
If you like stream processing then akka-streams has nice support for throttling: http://doc.akka.io/docs/akka/current/java/stream/stream-quickstart.html#time-based-processing
Then the akka-stream-kafka (aka reactive-kafka) library allows you to connect the two together: http://doc.akka.io/docs/akka-stream-kafka/current/home.html
In Kafka JVM Producer, the throughput depends upon multiple factors. And most commonly it's calculated in MB/sec rather than Msg/sec. In your example, if let's say each of your row in CSV is 1MB in size then you need to tune your producer configs to achieve 100MB/sec, so that you can achieve your target throughput of 100 Msg/sec. While tuning producer configs, you have to take into the consideration what's your batch.size ( measured in bytes ) config value? If it's set too low then producer will try to send messages more often and wait for reply from server. This will improve the producer's throughput. But would impact the latency. If you are using async callback based producer then in this case your overall throughput will be limited by how many number of messages producer can send before waiting for reply from server determined by max.in.flight.request.per.connection.
If you keep batch.size too high then producer throughput will get affected since after waiting for linger.ms period kafka producer will send the all messages in a batch to broker for that particular partition at once. But having bigger batch.size means bigger buffer.memory which might put pressure on GC.