Force assignment of ALL partitions - apache-kafka

My topic has many partitions, many producers, but only one consumer. The consumer can run only for a short period of time, let's say one minute. During this period, I need to make sure it will consume all the records from all the partitions that were produced before the consumer was initialized, but ALSO the records produced during the minute the consumer was running.
The problem is that can't find the correct partition assignation strategy that guarantee I will get all the partitions. If I use consumer.Subscribe(topic), I will only get some partitions, not all. If I use consumer.Assign(partition) during the initialization, I WILL get all the active partitions, but if a new partition comes along and receives records, I will miss those.
The only solution I have so far is to re-do the assignments periodically (every 10 seconds).

If I use consumer.Subscribe(topic), I will only get some partitions, not all
It should get all. If you don't get all, then that means you more than likely have some un-closed consumer instance in the same consumer group that has already been assigned other partitions.
You can periodically run kafka-consumer-groups --describe command to inspect this.
Using assignment doesn't use the consumer group protocol, thus why it would work.
There is no guarantee Kafka can provide that your consumer will read any/all data between two time intervals. You'd need to track this on your own, and may potentially require your consumer instance to run for longer than you expect.

Related

Is there a limit on consumers and consumer groups?

Is there any limit on the number of consumers or consumer groups in Kafka?
I am planning to push 200 MB of data every 10 mins to a topic and have 200+ distinct consumers listen and consume from this topic. Is there any other recommended way to do this?
As Rohit answer states, there' no such limit.
Regarding your issue, it seems like you want to achieve some kind of paralellization of consumption. If you send 200 consumers with 200 different consumer groups, each consumer will read all the data independently, so you'll have 200 threads reading the same 200MB every 10 minutes (200x200 MB = 40GB received every 10 minutes). I guess you wanted every consumer to read 1MB every 10 mins with your approach, but that's not how it works.
If the logic implemented by each consumer is the same, you shouldn't declare more than a consumer group. If you declare two consumer groups, each one will read the same data, and you'll just repeat the job done, duplicating the output. Set different consumer groups if the job to be done on the topic's records is different: for example, one consumer group must store the records into a DDBB. The other consumer group must visualize the data into Grafana. Those are two different processing mechanisms, so each one must read all the data at its own. This is not the only reason to declare different consumer groups, but one example of them. There are multiple justifications for declaring more than a consumer group for a topic.
Imagine an scenario where the only job to be done is storing the messages into a DDBB. If you declare two consumer groups and launch your consumers, what you'll get is duplicate values stored in your database, as the first consumer group is just doing the same work than the second. Not only you are re-reading from kafka, you are re-storing the same messages into the ddbb.
In order to achieve launching multiple consumers that efficiently share the work (so for example, launching 4 consumers each one reads 50MB), you must partition your topic.
Only one consumer thread from the same consumer group can read from an specific partition. If you have 4 partitions in that topic, and 4 consumer threads that share the same consumer group, launching them will lead to each thread reading from one partition. If you launch two consumers, both will be assigned 2 partitions. Works like this:
And in this scenario, you do have a limit in the number of consumers concurrently reading if they share the same consumer group, which is, the number of partitions of that topic. If you launch a 5th consumer thread, one of them will block/wait, because it wasn't assigned any partition. In the example, consumer 5 waits until a partition is avaliable for him (so maybe waits forever).
What I suggest is: decide how many consumer threads you'll need to consume the data and partition the topic in base of that. If you, for example, partition the topic to 8 different partitions, you'll be able to launch 8 consumers from the same consumer group. Each one will then read, more or less, (depending on the producer partitioner) 25MB (200/8) of the incoming data, efficiently sharing the work load: Each consumer will read from its own partition.
If you launch 200 consumers with 200 different consumer groups,
you'll just multiply the work to be done x200, as every single consumer will read the data from start to end.
If you launch 200 consumers with the same consumer group and the topic has a single partition,
you'll have one thread doing all the job and 199 stale consumers.
In Kafka, there is no limit on the number of Consumer groups for a particular topic. However, the increase in consumer groups increases network utilization.
Worth nothing that newer versions of Kafka, store offsets in the internal Kafka topic called __consumer_offsets.

Kafka message partitioning by key

We have a business process/workflow that is being started when initial event message is received and closed when the last message is processed. We have up to 100,000 processes executed each day. My problem is that the order of the messages that come to specific process has to be processed by the same order messages were received. If one of the messages fails, the process has to freeze until the problem is fixed, despite that all other processes has to continue. For this kind of situation i am thinking of using Kafka. first solution that came to my mind was to use Topic partitioning by message key. The key of the message would be the ProcessId. This way i could be sure that all process messages would be partitioned and kafka would guarantee the order. As i am new to Kafka what i managed to figure out that partitions has to be created in advance and that makes everything to difficult. so my questions are:
1) when i produce message to kafka's topic that does not exist, the topic is created on runtime. Is it possible to have same behavior for topic partitions?
2) there can be more than 100,000 active partitions on the topic, is that a problem?
3) can partition be deleted after all messages from that topic were read?
4) maybe you can suggest other approaches to my problem?
When i produce message to kafka's topic that does not exist, the topic is created on runtime. Is it possible to have same behavior for topic partitions?
You need to specify number of partitions while creating topic. New Partitions won't be create automatically(as is the case with topic creation), you have to change number of partitions using topic tool.
More Info: https://kafka.apache.org/documentation/#basic_ops_modify_topi
As soon as you increase number of partitions, producer and consumer will be notified of new paritions, thereby leading them to rebalance. Once rebalanced, producer and consumer will start producing and consuming from new partition.
there can be more than 100,000 active partitions on the topic, is that a problem?
Yes, having this much partitions will increase overall latency.
Go through how-choose-number-topics-partitions-kafka-cluster on how to decide number of partitions.
can partition be deleted after all messages from that topic were read?
Deleting a partition would lead to data loss and also the remaining data's keys would not be distributed correctly so new messages would not get directed to the same partitions as old existing messages with the same key. That's why Kafka does not support decreasing partition count on topic.
Also, Kafka doc states that
Kafka does not currently support reducing the number of partitions for a topic.
I suppose you choose wrong feature to solve you task.
In general, partitioning is used for load balancing.
Incoming messages will be distributed on given number of partition according to the partitioning strategy which defined at broker start. In short, default strategy just calculate i=key_hash mod number_of_partitions and put message to ith partition. More about strategies you could read here
Message ordering is guaranteed only within partition. With two messages from different partitions you have no guarantees which come first to the consumer.
Probably you would use group instead. It's option for consumer
Each group consumes all messages from topic independently.
Group could consist of one consumer or more if you need it.
You could assign many groups and add new group (in fact, add new consumer with new groupId) dynamically.
As you could stop/pause any consumer, you could manually stop all consumers related to specified group. I suppose there is no single command to do that but I'm not sure. Anyway, if you have single consumer in each group you could stop it easily.
If you want to remove the group you just shutdown and drop out related consumers. No actions on broker side is needed.
As a drawback you'll get 100,000 consumers which read (single) topic. It's heavy network load at least.

Get latest values from a topic on consumer start, then continue normally

We have a Kafka producer that produces keyed messages in a very high frequency to topics whose retention time = 10 hours. These messages are real-time updates and the used key is the ID of the element whose value has changed. So the topic is acting as a changelog and will have many duplicate keys.
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal, keeping the minimum load on Kafka server and letting the consumer do most of the job. We tried many ways and none of them seems the best.
What we tried:
1 changelog topic + 1 compact topic:
The producer sends the same message to both topics wrapped in a transaction to assure successful send.
Consumer launches and requests the latest offset of the changelog topic.
Consumes the compacted topic from beginning to construct the table.
Continues consuming the changelog since the requested offset.
Cons:
Having duplicates in compacted topic is a very high possibility even with setting the log compaction frequency the highest possible.
x2 number of topics on Kakfa server.
KSQL:
With KSQL we either have to rewrite a KTable as a topic so that consumer can see it (Extra topics), or we will need consumers to execute KSQL SELECT using to KSQL Rest Server and query the table (Not as fast and performant as Kafka APIs).
Kafka Consumer API:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Kafka Streams:
By using KTables as following:
KTable<Integer, MarketData> tableFromTopic = streamsBuilder.table("topic_name", Consumed.with(Serdes.Integer(), customSerde));
KTable<Integer, MarketData> filteredTable = tableFromTopic.filter((key, value) -> keys.contains(value.getRiskFactorId()));
Kafka Streams will create 1 topic on Kafka server per KTable (named {consumer_app_id}-{topic_name}-STATE-STORE-0000000000-changelog), which will result in a huge number of topics since we a big number of consumers.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Thanks in advance.
By using KTables, Kafka Streams will create 1 topic on Kafka server per KTable, which will result in a huge number of topics since we a big number of consumers.
If you are just reading an existing topic into a KTable (via StreamsBuilder#table()), then no extra topics are being created by Kafka Streams. Same for KSQL.
It would help if you could clarify what exactly you want to do with the KTable(s). Apparently you are doing something that does result in additional topics being created?
1 changelog topic + 1 compact topic:
Why were you thinking about having two separate topics? Normally, changelog topics should always be compacted. And given your use case description, I don't see a reason why it should not be:
Now, what we're trying to achieve is that when a Kafka consumer launches, regardless of the last known state (new consumer, crashed, restart, etc..), it will somehow construct a table with the latest values of all the keys in a topic, and then keeps listening for new updates as normal [...]
Hence compaction would be very useful for your use case. It would also prevent this problem you described:
Consumer starts and consumes the topic from beginning. This worked perfectly, but the consumer has to consume the 10 hours change log to construct the last values table.
Note that, to reconstruct the latest table values, all three of Kafka Streams, KSQL, and the Kafka Consumer must read the table's underlying topic completely (from beginning to end). If that topic is NOT compacted, this might indeed take a long time depending on the data volume, topic retention settings, etc.
From what we have tried, it looks like we need to either increase the server load, or the consumer launch time. Isn't there a "perfect" way to achieve what we're trying to do?
Without knowing more about your use case, particularly what you want to do with the KTable(s) once they are populated, my answer would be:
Make sure the "changelog topic" is also compacted.
Try KSQL first. If this doesn't satisfy your needs, try Kafka Streams. If this doesn't satisfy your needs, try the Kafka Consumer.
For example, I wouldn't use the Kafka Consumer if it is supposed to do any stateful processing with the "table" data, because the Kafka Consumer lacks built-in functionality for fault-tolerant stateful processing.
Consumer starts and consumes the topic from beginning. This worked
perfectly, but the consumer has to consume the 10 hours change log to
construct the last values table.
During the first time your application starts up, what you said is correct.
To avoid this during every restart, store the key-value data in a file.
For example, you might want to use a persistent map (like MapDB).
Since you give the consumer group.id and you commit the offset either periodically or after each record is stored in the map, the next time your application restarts it will read it from the last comitted offset for that group.id.
So the problem of taking a lot of time occurs only initially (during first time). So long as you have the file, you don't need to consume from beginning.
In case, if the file is not there or is deleted, just seekToBeginning in the KafkaConsumer and build it again.
Somewhere, you need to store this key-values for retrieval and why cannot it be a persistent store?
In case if you want to use Kafka streams for whatever reason, then an alternative (not as simple as the above) is to use a persistent backed store.
For example, a persistent global store.
streamsBuilder.addGlobalStore(Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore(topic), keySerde, valueSerde), topic, Consumed.with(keySerde, valueSerde), this::updateValue);
P.S: There will be a file called .checkpoint in the directory which stores the offsets. In case if the topic is deleted in the middle you get OffsetOutOfRangeException. You may want to avoid this, perhaps by using UncaughtExceptionHandler
Refer to https://stackoverflow.com/a/57301986/2534090 for more.
Finally,
It is better to use Consumer with persistent file rather than Streams for this, because of simplicity it offers.

Kafka one consumer with two different checkpoints

I have a Kafka consumer project which consumes data from a specific Kafka topic. The 90% of the records are processed as soon as I got them but I have the delay processing some of the records (10%).
This these records need to be delayed, I can't commit the records so it may cause Kafka to reassign the partitions to new nodes. In order to avoid that, I can read the same topic twice and delay the fetching data part in the second consumer but it requires deserialization twice so comes with an overhead.
Is it possible the read records using single consumer but have two separate commits with Kafka consumers? It will be basically similar to having two different consumers in terms of commit, consumer.poll will be called from a single consumer but there will be two consumer.commitSync for each batch. I will help me to avoid extra deserialization and also the network cost.
Below mentioned are the things you can do to achieve the above-mentioned task.
Create a pipe Line having two topics(T1, T2) push all the messages (90%) in topic T1 and rest all the messages 10% in topic T2.
Make your Kafka consumer configurable i.e. you can easily pass polling interval, batchSize, and batch timeout whenever you are starting your consumer.
Find a logic/ or if your second topic consumption is time-based then schedule the cron which will start and stop your consumer topic T2 when it is required.
Regarding consumer Groups, you can place both of your topics in the same group or indifferent. It's completely your choice.
By this way you will be keeping the topics clean.and each and every time you need to process the messages you can do it easily by setting up the pipeline just for once.

Consuming from added partitions in Apache Kafka

I have a KafkaConsumer and use manual partition assignment. To distribute partitions I use consumer.partitionsFor(topicId) in regular intervals to detect added partitions, because the job runs forever and I want to support this case. However, this always returns initial list of partitions, unless I restart the consumer.
Is there a way detect added partitions from consumer? What to poll or listen for?
KafkaConsumer has a configuration property "metadata.max.age.ms", which defaults to 5 minutes. That means, that each call to consumer.partitionsFor will return cached copy for that time and only then fetch new metadata.
Setting the property to 0 fetches new metadata each time.
Maybe this is in the process of the partition reassign when you query by the assignment() method. You could sleep 5 seconds and then check this to test it.
Get the set of partitions currently assigned to this consumer. If subscription happened by directly assigning partitions using assign(Collection) then this will simply return the same partitions that were assigned. If topic subscription was used, then this will give the set of topic partitions currently assigned to the consumer (which may be none if the assignment hasn't happened yet, or the partitions are in the process of getting reassigned).