Kafka Rest Proxy Consumer Creation - apache-kafka

Let's say I have a service that that consumes messages through kafka-rest-proxy and always on the same consumer group. Let's also say that it is consuming on a topic that has one partition. When the service is started, it creates a new consumer in kafka-rest-proxy, and uses the generated consumer url until the service is shutdown. When the service comes back up, it will create a new consumer in kafka-rest-proxy, and use the new url (and new consumer) for consuming.
My Questions
Since kafka can only have at most one consumer per partition. What will happen in kafka and kafka-rest-proxy, when the consumer is restarted? i.e. A new consumer is created in kafka-rest-proxy, but the old one didn't have a chance to be destroyed. So now there are 'n' consumers after 'n' restarts of my service in kafka-rest-proxy, but only one of them is actively being consumed. Will I even be able to consume messages on my new consumer since there are more consumers than partitions?
Let's make this more complicated and say that I have 5 instances of my service on the same consumer group and 5 partitions in the topic. After 'n' restarts of all 5 instances of my service, would I even be guranteed to consume all messages without ensuring the proper destruction of the existing consumers. i.e. What does Kafka and kafka-rest-proxy do during consumer creation, when the consumers out number the partitions?
What is considered to be the kafka-rest-proxy best practice, to ensure stale consumers are always cleaned up? Do you suggest persisting the consumer url? Should I force a kafka-rest-proxy restart to ensure existing consumers are destroyed before starting my service?
* EDIT *
I believe part of my question is answered with this configuration, but not all of it.
consumer.instance.timeout.ms - Amount of idle time before a consumer instance is automatically destroyed.
Type: int
Default: 300000
Importance: low

If you cannot cleanly shutdown the consumer, it will stay alive for a period after last request was made to it. The proxy will garbage collect stale consumers for exactly this case -- if it isn't cleanly shutdown, the consumer would hold on to some partitions indefinitely. By automatically garbage collecting the consumers, you don't need some separate durable storage to keep track of your consumer instances. As you discovered, you can control this timeout via the config consumer.instance.timeout.ms.
Since instances will be garbage collected, you are guaranteed to eventually consume all the messages. But during the timeout period, some partitions may still be assigned to the old set of consumers and you will not make any progress on those partitions.
Ideally unclean shutdown of your app is rare, so best practice is just to clean up the consumer when you're app is shutting down. Even in exceptional cases, you can use the finally block of a try/catch/finally to destroy the consumer. If one is left alive, it will eventually recover. Other than that, consider tweaking the consumer.instance.timeout.ms setting to be lower if your application can tolerate that. It just needs to be larger than the longest period between calls that use the consumer (and you should keep in mind possible error cases, e.g. if processing a message requires interacting with another system and that system can become slow/inaccessible, you should account for that when setting this config).
You can persist the URLs, but even that is at some risk for losing track of consumers since you can't atomically create the consumer and save its URL to some other persistent storage. Also, since completely uncontrolled failures where you have no chance to cleanup shouldn't be a common case, it often doesn't benefit you much to do that. If you need really fast recovery from that failure, the consumer instance timeout can probably be reduced significantly for your application anyway.
Re: forcing a restart of the proxy, this would be fairly uncommon since the REST Proxy is often a shared service and doing so would affect all other applications that are using it.

Related

How to manage Kafka transactional producer objects in request oriented applications

What is the best practice for managing Kafka producer objects in request oriented (e.g. http or RPC servers) applications, when configured as transactional producers? Specifically, how to share producer objects among serving threads, and how to define the transactional.id configuration value for those objects?
In non-transactional usage, producer objects are thread safe and it is common to share one object among all request serving threads. It is also straightforward to setup transactional producer objects to be used by kafka consumer threads, just instantiating one object for each consumer thread works well.
Combining transactional producers with request oriented applications appears to be more complicated, as the life-cycle of serving threads is usually dynamically controlled by a thread pool. I can think of a few options, all with downsides:
Share a single object, protected against concurrency by some kind of mutex. Contention under load would probably be a serious problem.
Instantiate a producer object for each request coming in. KafkaProducer objects are slow to initialize, as they maintain network connections, threads, and other heavyweight objects; paying this cost for each request seems impractical.
Maintain a pool of producer objects, and lease one for each request. The main downside I can see is the amount of machinery required. It is also unclear how to configure transactional.id for these objects, as their lifecycle does not map cleanly to a shard identifier in a partitioned, stateful, application as the documentation says.
Are there other options? Is there an optimal approach?
TL;DR
The transactional id is for preventing duplicates caused by zombie processes in the read-process-write pattern where you read from and produce to kafka topics. For request oriented applications, e.g. messages being produced by an incoming http request, transactional id doesn't bring any benefit (of course you still need to assign one if you want to use transactions and shouldn't be repeated between producers in the same process or different processes in your cluster)
Long answer
As the docs say, transactional producers are not thread safe
As is hinted at in the example, there can be only one open transaction per producer. All messages sent between the beginTransaction() and commitTransaction() calls will be part of a single transaction
so as you correctly explained there can't be concurrent access to the producer so we must pick one of the three options you described.
For this answer I'm going to assume that request oriented applications corresponds to http requests as the mechanism is triggering a message being produced with a transaction (actually, more than one message, otherwise will be enough with idempotent producers and transactions won't be needed)
In terms of correctness all of them are ok as, option 1 would work but depending on your application throughput it could have a high contention, option 2 will also work but you will pay the price of a higher latency and won't be very efficient.
IMHO I think option 3 could be the best since is a compromise between of the two previous options, although of course requires a more careful implementation than just opening a new producer each time.
Transactional id
The question that remains is how to assign a transactional id to the producer, specially in the last case (although both options 1 and 3 share the same concern, since in both cases we are reusing a producer with the same transactional id to handle different requests).
To answer this we first need to understand that the goal of transactional.id is to protect us from having duplicate message being produced caused by zombie processes (a process that hangs for a while, e.g. bc of a long gc pause, and is considered dead but after a while comes back and continues), this is called zombie fencing.
An important detail to understand the need of zombie fencing is understanding in which use case it could happen and this is the read-process-write pattern where you read from a topic, process the element and write to an output topic and the offset topic, which give us atomicity and Exactly-once semantics (if you are not doing any side effects on the process step).
Idempotent producers prevent us from having duplicates caused by producer retries (where the message was persisted by the broker but the ack wasn't received by the producer) and two-phase commit within kafka (where we are not only writing to the output but also marked the message as consumed by also producing to the offset topic) prevent us from having duplicates caused by consuming the message more than once (if the process crashes after producing to the output topic but before committing the offset).
There is still a subtle case where a duplicate can be introduced and it is a zombie producer, which is fenced by monotonically increasing an epoch each time a producer calls initTransactions that will be send with every message the producer sends.
So, for a producer to be fenced, another producer should have being started with the same transaction id, the key here is explained by Jason Gustafson in this talk
"what we are looking for is a guarantee that for each input partition there is only a single write that is responsible for reading that data and writing the output"
This means the transactional.id is assigned in terms of the partition is being consumed in the "read-process-write" pattern.
So if a process that has assigned partition 0 of topic A is considered dead, a rebalance will kick off and the new process that is assigned should create a producer with the same transactional.id, that's why it should be something like this <prefix><group>.<topic>.<partition> as described in this answer, where the partition is part of the transactional.id. This also means a producer per partition assigned, which could also represent an overhead depending on how many topics and partitions your consumers are being assigned.
This slides from the talk clarifies this situation
Transactional id before process crash
Transactional id reassigned to other process after crash
Transactional id in http requests
Going back to your original question, http requests won't follow the read-process-write pattern where zombies can introduce duplicates, because each http request will be unique, even if you introduce a unique identifier it will be a different message from the point of view of the transactional producer.
In this case I would argue that you may still have value using the transactional producer if you want the atomicity of writing to two different topics, but you can choose a random transactional id for option 2, or reuse it for options 1 and 3.
UPDATE
My answer is outdated since is based in an old version of kafka.
The overhead of having one producer per partition described before was a concern that was tackled in KIP-447
This architecture does not scale well as the number of input partitions increases. Every producer come with separate memory buffers, a separate thread, separate network connections. This limits the performance of the producer since we cannot effectively use the output of multiple tasks to improve batching. It also causes unneeded load on brokers since there are more concurrent transactions and more redundant metadata management.
This is the main difference as explained in this post
When the partition assignment is finalized after a consumer group rebalance, the first step for the consumer is to always get the next offset to begin fetching data. With this observation, the OffsetFetch protocol protection is enhanced, such that when a consumer group has pending transactional offsets associated with one partition, the OffsetFetch call can be blocked until the associated transaction completes. Previously, the “outdated” offset data would be returned and the application allowed to continue immediately.
Whit this new feature, the use of transactional.id is no longer clear to me.
Although it is still unclear why fencing requires both blocking the poll if there are pending transactions while it seems to me that the sending the consumer group metadata should be enough (I assume a zombie producer will be fenced by commiting with an old generation.id for that group.id, the generation.id being bumped with each rebalance) it seems the transactional.id doesn't play a major role anymore. e.g. spring docs says
With mode V1, the producer is "fenced" if another instance with the same transactional.id is started. Spring manages this by using a Producer for each group.id/topic/partition; when a rebalance occurs a new instance will use the same transactional.id and the old producer is fenced.
With mode V2, it is not necessary to have a producer for each group.id/topic/partition because consumer metadata is sent along with the offsets to the transaction and the broker can determine if the producer is fenced using that information instead.

Static membership in Apache Kafka for consumers

I got to know in a recent version of kafka, static membership strategy is available for consumer subscription instead of early dynamic membership detection which helps is scenario when consumer is bounces as part of rolling deployment. Now when consumer is up after getting bounced it catches up with the same partition and starts processing.
My question is what will happen if we have deliberately shutdown consumer ? How message in partition to which particular consumer was subscribed will get processed ?
After a consumer has been shutdown, the Consumer Group will undergo a normal rebalance after the consumer's session.timeout.ms has elapsed.
https://kafka.apache.org/10/documentation/streams/developer-guide/config-streams.html#kafka-consumers-and-producer-configuration-parameters
When configuring Static Membership, it is important to increase the session.timeout.ms higher than the default of 10000 so that consumers are not prematurely rebalanced. Set this value high enough to allow workers time to start, restart, deploy, etc. Otherwise, the application may get into a restart cycle if it misses too many
heartbeats during normal operations. Setting it too high may cause
long periods of partial unavailability if a worker dies, and the
workload is not rebalanced. Each application will set this value
differently based on its own availability needs.
If you manually subscribe then you would have to deal with that scenario in your application code - that's the advantage of automatic subscription, all partitions will be assigned to one of the group after a rebalance.
To cater for consumers permanently leaving the group with manual subscription, I guess you would need to track subscriptions somewhere and maybe have each consumer pinging to let you know it is alive.
I'm not sure what use cases the manual subscription is catering for - I will have to go back and check the Javadoc in KafkaConsumer, which is pretty comprehensive. As long as you have no local state in consumers the automatic subscription seems much safer and more resilient.

Processing kafka messages taking long time

I have a Python process (or rather, set of processes running in parallel within a consumer group) that processes data according to inputs coming in as Kafka messages from certain topic. Usually each message is processed quickly, but sometimes, depending on the content of the message, it may take a long time (several minutes). In this case, Kafka broker disconnects the client from the group and initiates the rebalance. I could set session_timeout_ms to a really large value but it would be like 10 minutes of more, which means if a client dies, the cluster would not be properly rebalanced for 10 minutes. This seems to be a bad idea. Also, most messages (about 98% of them) are fast, so paying such penalty for just 1-2% of messages seems wasteful. OTOH, large messages are frequent enough to cause a lot of rebalances and cost a lot of performance (since while the group is rebalancing, nothing is getting done, and then the "dead" client re-joins again and causes another rebalance).
So, I wonder, are there any other ways for handling messages that take a long time to process? Is there any way to initiate heartbeats manually to tell the broker "it's ok, I am alive, I'm just working on the message"? I thought the Python client (I use kafka-python 1.4.7) was supposed to do that for me but it doesn't seem to happen. Also, the API doesn't seem to even have separate "heartbeat" function at all. And as I understand, calling poll() would actually get me the next messages - while I am not even done with the current one, and would also mess up iterator API for Kafka consumer, which is quite convenient to use in Python.
In case it's important, the Kafka cluster is Confluent, version 2.3 if I remember correctly.
In Kafka, 0.10.1+ Kafka polling and session heartbeat are decoupled to each other.
You can get an explanationhere
max.poll.interval.ms how much time permit to complete processing by consumer instance before time out means if processing time takes more than max.poll.interval.ms time Consumer Group will presume its die remove from Consumer Group and invoke rebalance.
To increase this will increase the interval between expected polls which give consumers more time to handle a batch of records returned from poll(long).
But at the same time, it will also delay group rebalances since the consumer will only join the rebalance inside the call to poll.
session.timeout.ms is the timeout used to identify if the consumer is still alive and sending a heartbeat on a defined interval (heartbeat.interval.ms). In general, the thumb-rule is heartbeat.interval.ms should be 1/3 of session timeout so in case of network failure consumers can miss at most 3-time heartbeat before session timeout.
session.timeout.ms: low value would be good to detect failure more quickly.
max.poll.interval.ms: large value will reduce the risk of failure due to increased processing time however increases the rebalancing time.
Note: A large number of partition and topics consumed by Consumer Group also effect on overall rebalance time
The other approach if you would really want to get rid of rebalancing you can assign partitions on each consumer instance manually, using partition assign. In that case, each consumer instance will be running independently with their own assigned partitions. But in that case, you would not able to leverage the rebalance features to assign partitions automatically.

Kafka Consumer Rebalancing and Its Impact

I'm new to Kafka and I'm trying to design a wrapper library in both Java and Go (uses Confluent/Kafka-Go) for Kafka to be used internally. For my use-case, CommitSync is a crucial step and we should do a read only after properly committing the old one. Repeated processing is not a big issue and our client service is idempotent enough. But data loss is a major issue and should not occur.
I will create X number of consumers initially and will keep on polling from them. Hence I would like to know more about the negative scenario's that could happen here, Impact of them and how to properly handle them.
I would like to know more about:
1) Network issue during consumer processing:
What happens when network goes of for a brief period and comes back? Does Kafka consumer automatically handle this and becomes alive when network comes back or do we have to reinitialise them? If they come back alive do they resume work from where they left of?
Eg: Consumer X read 50 records from Partition Y. Now internally the consumer offset moved to +50. But before committing network issue happens and the comes back alive. Now will the consumer have the metadata about what it read for last poll. Can it go on to commit +50 in offset?
2) Rebalancing in consumer groups. Impact of them on existing consumer process - whether the existing working consumer instance will pause and resume work during a rebalance or do we have to reinitialize them? How long can rebalance occur? If the consumer comes back alive after rebalance, does it have metadata about it last read?
3) What happens when a consumer joins during a rebalancing. Ideally it is again a rebalancing scenario. What will happen now? The existing will be discarded and the new one starts or will wait for the existing rebalance to complete?
What happens when network goes of for a brief period and comes back? Does Kafka consumer automatically handle this and becomes alive when network comes back or do we have to reinitialise them?
The consumer will try to reconnect. If the consumer group coordinator doesn't receive heartbeats or brokers don't respond to brokers, then the group rebalances.
If they come back alive do they resume work from where they left of?
From the last committed offset, yes.
whether the existing working consumer instance will pause and resume work during a rebalance
It will pause and resume. No action needed.
How long can rebalance occur?
Varies on many factors, and can happen indefinitely under certain conditions.
If the consumer comes back alive after rebalance, does it have metadata about it last read?
The last committed offsets are stored on the broker, not by consumers.
The existing will be discarded and the new one starts or will wait for the existing rebalance to complete?
All reblances must complete before any polls continue.

kafka consumer rebalancing in case of manual/assigned partitioning

I have some doubt regarding rebalancing. Right now, I am manually assigning partition to consumer. So as per docs, there will no rebalancing in case consumer leave/crashed in a consumer groups.
Let's say there are 3 partition and 3 consumers in same group and each partition is manually assigned to each consumer. And after some time, the 3rd consumer went down. Since there is no rebalancing, what all measures I can take to ensure minimum downtime?
Do I need to change config of any of the 1st two partition to start consuming from 3rd partition or something else?
Well I don't know why would you assign partitions to consumers manually?
I think you need to write rebalanceListener. https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/ConsumerRebalanceListener.html
My advice: just let kafka decide which consumer will listen to which partition and you would not have to worry about this.
Although there might be context that would make the approach valid, as written, I question your approach a little bit.
The best way to ensure minimum downtime is to let the kafka brokers and zookeeper do what they're good at, managing your workload (partitions) among your consumers, which includes reassigning partitions when a consumer goes down.
Your best path is likely to use the OnPartitionsRevoked and OnpartitionsAssigned events to handle whatever logic you need to be able to assume a new partition (see JRs link for more-details information on these events).
I'll describe a recent use-case I've had, in the hope it is relevant to your use-case.
I recently had 5 consumers that required an in-memory cache of 50 million objects. Without partitioning, each consumer had its own cache, resulting in 250 mil objects.
To reduce that number to the original 50 million, we could use the onpartitionsrevoked event to clear the cache and the onassigned to repopulate the cache with the relevant cache for the assigned partitions.
Short of using those two handlers, if you really want to manually assign your partitions, you're going to have to do all of the orchestration yourself:
Something to monitor if one of the other consumers is down
Something to pick up the dead consumer's partition and process it
Orchestrate communication between the consumers to communicate when the dead consumer is alive again, so it can start working again.
As you can probably tell from the list, you're in for a real world of hurt if you force yourself down that path, and you probably won't do a better job than the kafka brokers - there's an entire business whose entire focus focus is developing and maintaining kafka so you don't have to handle all of that complexity.