Confluent Cloud - Can't see significant multithreading improvements in poll() - apache-kafka

I tried consuming messages from Confluent Cloud cluster using 2 approaches and I'm getting almost the same total poll() time for both-
Single threaded- Use only one consumer to sequentially read from all TopicPartitions
Multithreaded- Spawn multiple Consumers (equal to no. of TopicPartitions), assign each partition on separate consumer manually (using #assign()). Run these threads in parallel and do a #poll(). Processing of messages will also be done by the thread itself.
There is some speed increase in 2nd approach but it's mostly due to the processing part which happens in parallel. The time taken by poll() method (to fetch all ConsumerRecords, excluding the processing) is almost same in both cases.
My question- Is the poll() method in KafkaConsumer blocking on the server side in some way? Multiple consumers polling in parallel vs a single consumer polling sequentially is giving almost similar poll time. The only performance increase seems to be happening due to the processing part which happens after fetching all ConsumerRecords using poll().
NOTE: I am not using the consumer group functionality here as it is not suitable to our use-case. As mentioned, I'm manually assigning the TopicPartitions.

What could be advised when trying to do multi threading or parallel consumers are these two blogs - Confluent Cloud - Can't see significant multithreading improvements in poll() and https://www.confluent.io/blog/kafka-consumer-multi-threaded-messaging/. I was informed that the main point of running multiple concurrent consumers in a group is generally not about improving the performance of the fetch from Kafka, but improving processing throughput.

Related

Consuming messages in a Kafka topic ASAP

Imagine a scenario in which a producer is producing 100 messages per second, and we're working on a system that consuming messages ASAP matters a lot, even 5 seconds delay might result in a decision not to take care of that message anymore. also, the order of messages does not matter.
So I don't want to use a basic queue and a single pod listening on a single partition to consume messages, since in order to consume a message, the consumer needs to make multiple remote API calls and this might take time.
In such a scenario, I'm thinking of a single Kafka topic, with 100 partitions. and for each partition, I'm gonna have a separate machine (pod) listening for partitions 0 to 99.
Am I thinking right? this is my first project with Kafka. this seems a little weird to me.
For your use case, think of partitions = max number of instances of the service consuming data. Don't create extra partitions if you'll have 8 instances. This will have a negative impact if consumers need to be rebalanced and probably won't give you any performace improvement. Also 100 messages/s is very, very little, you can make this work with almost any technology.
To get the maximum performance I would suggest:
Use a round robin partitioner
Find a Parallel consumer implementation for your platform (for jvm)
And there a few producer and consumer properties that you'll need to change, but they depend your environment. For example batch.size, linger.ms, etc. I would also check about the need to set acks=all as it might be ok for you to lose data if a broker dies given that old data is of no use.
One warning: In Java, the standard kafka consumer is single threaded. This surprises many people and I'm not sure if the same is true for other platforms. So having 100s of partitions won't give any performance benefit with these consumers, and that's why it's important to use a Parallel Consumer.
One more warning: Kafka is a complex broker. It's trivial to start using it, but it's a very bumpy journey to use it correctly.
And a note: One of the benefits of Kafka is that it keeps the messages rather than delete them once they are consumed. If messages older than 5 seconds are useless for you, Kafka might be the wrong technology and using a more traditional broker might be easier (activeMQ, rabbitMQ or go to blazing fast ones like zeromq)
Your bottleneck is your application processing the event, not Kafka.
when you have ten consumers, there is overhead for connecting each consumer to Kafka so it will lower the performance.
I advise focusing on your application performance rather than message broker.
Kafka p99 Latency is 5 ms with 200 MB/s load.
https://developer.confluent.io/learn/kafka-performance/

Handling catastrophic failover in Kafka

Let's imaging a simple message processing pipeline, like on the image below:
A group of consumers listens to a topic, picks messages one by one, does some sort of processing and sends them over to the next topic.
Some messages crash the consumer or make it stuck forever (so then a liveness probe kills the consumer after timeout).
In this case a consumer is not able to commit the offset, so the malicious message gets picked up by another consumer. And also makes it crash.
Ideally we want to move the message to a dead letter topic after N such attempts.
This can be achieved by introducing a shared storage:
But this creates coupling between the services and introduces a Single Point of Failure (SPOF) which is the shared database.
I'm looking for ideas on how to work this around with stateless services.
If your context is correct with this approach (that's something you should judge, as I'm only trying to give a suggestion), please consider decoupling the consumption and the processing.
In your case, the consumer is stopped, not because it was not able to read from kafka, and/or the kafka broker wasn't able to provide messages, but because the processing of the message was too slow and/or unsuccesful.
The consumer, in fact, was correctly receiving the messages. It was the processing of them that made it be declared dead.
First of all, the KafkaConsumer javadoc block regarding this (just above the constructor summary). The second option is the one quoted here
2. Decouple Consumption and Processing
Another alternative is to have one or more consumer threads that do
all data consumption and hands off ConsumerRecords instances to a
blocking queue consumed by a pool of processor threads that actually
handle the record processing. This option likewise has pros and cons:
PRO: This option allows independently scaling the number of consumers
and processors. This makes it possible to have a single consumer that
feeds many processor threads, avoiding any limitation on partitions.
CON: Guaranteeing order across the processors requires particular care
as the threads will execute independently an earlier chunk of data may
actually be processed after a later chunk of data just due to the luck
of thread execution timing. For processing that has no ordering
requirements this is not a problem.
CON: Manually committing the position becomes harder as it requires
that all threads co-ordinate to ensure that processing is complete for
that partition.**
Esentially, works like this. The consumer keeps reading and gives the responsibility of the processing and process-timeout management to the processor threads .
The error handling of the message processing would be responsibility of the processor threads as well. For example, if a timeout is thrown or an exception occurs, the processor will send the message to your defined "dead" queue, or whatever management of this you wish to perform, without involving the consumer. Regardless of the processor threads' success or fail, the consumer will continue its job and never be considered dead for not calling poll() in the specified timeout.
You should control the amount of messages the consumer retrieves in its poll call in order not to saturate the processors. Its a game regarding how fast the processors finish their job, how many messages the consumer retrieves (max.poll.records) at each iteration, and what's the specified timeout for the consumer.
Decoupled workflow
The first element to be quoted is the queue (with a limited size, which you should also manage in order not getting too filled - OOM).
This queue would be the link between consumer and processor threads, essentially a buffer that could dynamically get bigger or smaller depending on the specific word load at each time; It would manage overloads, something like a dam, or barrier, to find a similarity.
----->WORKERTHREAD1
KAFKA <------> CONSUMER ----> QUEUE -----|
----->WORKERTHREAD2
What you get is a second queue-lag mechanism:
1. Kafka Consumer LAG (the messages still to be read from the partition/topic)
2. Queue LAG (received messages still need to be processed)
--->WORKERTHREAD1
KAFKA <--(LAG)--> CONSUMER ----> QUEUE --(LAG)--|
--->WORKERTHREAD2
The queue could be some kind of synchronized queue, such a ConcurrentLinkedQueue. for example. Or you could manage yourself the synchronization with a customized queue.
Essentially, the duties would be divided, and the consumer is given the easiest one (as its the one that is most crucial).
Responsibilities:
Consumer
consume-->send to queue
Workers
read from queue|-->[manage timeout]
|==>PROCESS MESSAGE ==> send to topic
|-->[handle failed messages]
You should also manage if the processor threads die/deadlock; but usually those mechanisms are already implemented in most of ThreadPool variants.
I suggest the workers to share a unique KafkaProducer; The producer is thread safe and since the output topic would be the same for the group of consumers, this would also increase its performance. Also from the Kafka Producer javadoc:
The producer is thread safe and sharing a single producer instance
across threads will generally be faster than having multiple
instances.
In resume, each consumer thread feeds n processor threads. Some variants could be:
- 1 consumer - 1 worker (no processing paralellization, just division of duties)
- 1 consumer - 2 workers
- 1 consumer - 4 workers
- 2 consumers - 4 workers (2 for each)
- 2 consumers - 8 workers (4 for each)
...
Read carefully the pros and contras from this mechanism in the javadoc, and judge if this could be a solution to your specific case.
In my oppinion, there's a PRO that doesn't get reflected in the docs, which is the root of this answer/suggestion:
Consumption shouldn't be affected by processing. This approach avoids any consumer thread being considered dead due to a slow processing of the messages, and offers an extra "safety-window" thanks to the queue. I'm not saying that, at the point in which all processors fail for every message, or the queue hits maximum size, for example, the consumer would continue happily as if that didn't affect it; It will in fact be stopped by processing, but much, much later and due to bigger reasons that couldn't be avoided. This approach offers some extra time, or extra shield, for that to happen. Just like a dam can fail if it can't hold any more water.
Well, hope you take this as a suggestion, and may it be helpful somehow. It may avoid most of the dead consumer issues you're having. If well managed, it's a good approach for 24/7 real time data workflow.

dealing with Kafka's exactly once processing edge-cases

Folks,
Trying to do a POC for processing messages using Kafka for an implementation which absolutely requires only once processing. Example: as a payment system, process a credit card transaction only once
What edge cases should we protect against?
One failure scenario covered here is:
1.) If a consumer fails, and does not commit that it has read through a particular offset, the message will be read again.
Lets say consumers live in Kubernetes pods, and one of the hosts goes offline. We will potentially have messages that have been processed, but not marked as processed in Kafka before the pods went away due to underlying hardware issue. Do i understand this error scenario correctly?
Are there other failure scenarios which we need to fully understand on the producer/consumer side when thinking of Kafka doing only-once processing?
Thanks!
im going to basically repeat and exand on an answer i gave here:
a few scenarios can result in duplication:
consumers only periodically checkpoint their positions. a consumer crash can result in duplicate processing of some range or records
producers have client-side timeouts. this means the producer may think a request timed out and re-transmit while broker-side it actually succeeded.
if you mirror data between kafka clusters thats usually done with a producer + consumer pair of some sort that can lead to more duplication.
there are also scenarios that end in data loss - look up "unclean leader election" (disabling that trades with availability).
also - kafka "exactly once" configurations only work if all you inputs, outputs, and side effects happen on the same kafka cluster. which often makes it of limited use in real life.
there are a few kafka features you could try using to reduce the likelihood of this happening to you:
set enable.idempotence to true in your producer configs (see https://kafka.apache.org/documentation/#producerconfigs) - incurs some overhead
use transactions when producing - incurs overhead and adds latency
set transactional.id on the producer in case your fail over across machines - gets complicated to manage at scale
set isolation.level to read_committed on the consumer - adds latency (needs to be done in combination with 2 above)
shorten auto.commit.interval.ms on the consumer - just reduces the window of duplication, doesnt really solve anything. incurs overhead at really low values.
I have to say that as someone who's been maintaining a VERY large kafka installation for the past few years I'd never use a bank that relied on kafka for its core transaction processing though ...

Kafka Random Access to Logs

I am trying to implement a way to randomly access messages from Kafka, by using KafkaConsumer.assign(partition), KafkaConsumer.seek(partition, offset).
And then read poll for a single message.
Yet i can't get past 500 messages per second in this case. In comparison if i "subscribe" to the partition i am getting 100,000+ msg/sec. (#1000 bytes msg size)
I've tried:
Broker, Zookeeper, Consumer on the same host and on different hosts. (no replication is used)
1 and 15 partitions
default threads configuration in "server.properties" and increased to 20 (io and network)
Single consumer assigned to a different partition each time and one consumer per partition
Single thread to consume and multiple threads to consume (calling multiple different consumers)
Adding two brokers and a new topic with it's partitions on both brokers
Starting multiple Kafka Consumer Processes
Changing message sizes 5k, 50k, 100k -
In all cases the minimum i get is ~200 msg/sec. And the maximum is 500 if i use 2-3 threads. But going above, makes the ".poll()", call take longer and longer (starting from 3-4 ms on a single thread to 40-50 ms with 10 threads).
My naive kafka understanding is that the consumer opens a connection to the broker and sends a request to retrieve a small portion of it's log. While all of this has some involved latency, and retrieving a batch of messages will be much better - i would imagine that it would scale with the number of receivers involved, with the expense of increased server usage on both the VM running the consumers and the VM running the broker. But both of them are idling.
So apparently there is some synchronization happening on broker side, but i can't figure out if it is due to my usage of Kafka or some inherent limitation of using .seek
I would appreaciate some hints of whether i should try something else, or this is all i can get.
Kafka is a streaming platform by design. It means there are many, many things has been developed for accelerating sequential access. Storing messages in batches is just one thing. When you use poll() you utilize Kafka in such way and Kafka do its best. Random access is something for what Kafka don't designed.
If you want fast random access to distributed big data you would want something else. For example, distributed DB like Cassandra or in-memory system like Hazelcast.
Also you could want to transform Kafka stream to another one which would allow you to use sequential way.

How many Producers can I use to write to a single topic

I have a web application which put messages into a Kafka topic. There are a lot of instances of this application (200) and each of them contains it's own Kafka Producer.
Questions:
Does there exist any upper bound of Producers amount per topic?
Does the number of Producers impact on Kafka performance? If yes, how?
What is the best practice for Producers? One synchronous producer per application, an asynchronous producer, or a custom pool of sync producers?
Is exists any upper bound of Producers amount per topic?
The only limitation I am aware of is the number of available IP addresses. It is unlikely you'd bump into any practical limit in your described application.
Does Producer amount impact on Kafka performance? If yes, how?
No, all other things being equal (traffic volume, asynchronous vs synchronous (including batch size / time constraints), etc).
Presumably there's some overhead somewhere for the connection, but its small enough that I've never managed to notice it.
What is Producer best practice (One sync producer per application, async producer or custom pool of sync producers)
Depends a whole bunch on your use case, which I am not clear on. For the most part, asynchronous > synchronous. If you choose to use asynchronous, then you have to deal with the risks of batching on the producers (ie data loss), and the delays associated with building up enough messages for a batch / waiting for the batch timeout to trigger. Those delays could be significant if your use case is sufficiently demanding.