Consumer Group between Cloud - apache-kafka

i want to ask about consumer group, but between cloud..
here is my example. i have 5 cloud server, lets say A B C D E F
Producer: A
Broker Cluster: B C
Consumer: D E F
Consumer with same group: D E
i want to ask, can we configure cloud D and E using same group? how the kafka broker notice the group from different cloud?
because when i start my consumer script(i use kafka/laravel), i just ran kafka:My-Topic , then the kafka consumer run.
When i use multiple cloud, i need to run each script in each cloud. i don't know what happen to my broker, can the broker cluster know the consumer group? or i need to decler each consumer group in broker?
Broker cluster is a replicate between broker right? can i setting broker B for D and E , then broker C for D E F ? does it count ae broker cluster?
What happend to broker cluster when i have multiple group with 1 consumer each group. Does my massage get consumed by 1 group, or multiple group will consume 1 massage?
is there any tutorial on how to create Key-Value for kafka? i need to store data inside kafka memory.
thx

You simply give both apps the same group.id config. Doesn't really matter where they are running, as long as they can reach the brokers.
Yes, the brokers will know about the consumer groups that all clients are using.
No, you cannot/shouldn't assign specific brokers to clients. You might be able to assign partitions, but this is rarely a good idea for basic consumption.
If there are multiple, unique group names, they get all messages, regardless of the number of consumers in each group.
Kafka itself doesn't have "in memory" storage. I don't understand this last point, exactly. Your consumer can build a dictionary on its own when it processes messages, but obviously anything in-memory wouldn't be able to be distributed and queried across all consumers. Therefore, it might make sense to use something like KSQL to consume events into a lookup table, then use REST calls from php to it, rather than Kafka clients

Related

handle the case of consumer death when each consumer subscribes to a partition in kafka

My system has 2 services:
Service A publishes messages via a Kaffka Topic (with 5 partitions, P0, P1, P2, P3, P4 at initialization)
Service B has 10 instances in total, which are deployed on both on-premise and AWS. Each 2 instances handles messages on 1 partition of Kafka topic
Instance of service B can't be deployed to K8S as the infrastructure are both On-premise and AWS.
The system seems to be designed well, however, I'm considering these cases and still doesn't have solutions for them:
What if 2 consumers handling messages for partition P0 break down the same time?
What needs to be done in service B when we need to increase/decrease the number of partitions?
What if service B needs to callback some information for service A after implementing business logic?
Can anyone please help me with this?
If I understood correctly, in total you would be having 10 consumer instances, distributed and deployed under two consumer groups, and each one having 5 consumer instances.
So for a given point in time, each partition will be assigned to 2 consumers instance, one from each consumer group.
What if 2 consumers handling messages for partition P0 break down the same time?
Ans: Re-balance will happen immediately (It depends on other factors such as session, heartbeat timeouts and closing a consumer through Finally()), and the Partition 0 will be assigned to other active consumer instances.
What needs to be done in service B when we need to increase/decrease the number of partitions?
Ans: Increasing the partitions is an admin task done on the broker side, and you need not do anything on the Service B, unless, your consumer instances having any Partition Assignor based logic.
What if service B needs to callback some information for service A after implementing business logic?
Ans: From your perspective, Service A and Service B are two independent clients integrated via broker. So call-back some information, but be done via broker for Future Java Object or Meta Data handling unless you have exposed any RPC.

Forward Messages from two Kafka Clusters to Another

I have three Kafka clusters: A, B, and C. I have data incoming on Cluster A on topic incoming.dataA and Cluster B on topic incoming.dataB
I need a way to send all messages received on incoming.dataA on Cluster A and incoming.dataB on Cluster B to a topic on Cluster C, received.data. Can this be done?
I am aware of mirroring and streaming but neither of those help when forwarding data from kafka cluster to another (when their topic names differ).
MirrorMaker can only be used between two clusters, so you'd have to chain A->B->C
Your next option would be to use some Apache projects (or just regular client app) such as Spark/Flink/Beam/Nifi/Camel to consume from each cluster with individually configured consumers, and somehow forward records with a single producer client (would be recommended to join the data first, somehow, assuming order or some characteristics matter)

How to consume from two different clusters in Kafka?

I have two kafka clusters say A and B, B is replica of A. I would like to consume messages from cluster B only if A is down and viceversa. Nevertheless consuming messages from both the clusters would result in duplicate messages. So is there any way I can configure my kafka consumer to receive messages from only one cluster.
Thanks--
So is there any way I can configure my kafka consumer to receive messages from only one cluster.
Yes: a Kafka consumer instance will always receive messages from one Kafka cluster only. That is, there's no built-in option to use the same consumer instance for reading from 2+ clusters. But I think you are looking for something different, see below.
I would like to consume messages from cluster B only if A is down and viceversa. Nevertheless consuming messages from both the clusters would result in duplicate messages.
There's no built-in failover support such as "switch to cluster B if cluster A fails" in Kafka's consumer API. If you need such behavior (as in your case), you would need to do so in your application that uses the Kafka consumer API.
For example, you could create a consumer instance to read from cluster A, monitor that instance and/or that cluster to determine whether failover to cluster B is required, and (if needed) perform the failover to B by creating another consumer instance to read from B in the event that A fails.
There are a few gotchas however that makes this failover behavior more complex than my simplified example. One difficulty is to know which messages from cluster A have already been read when switching over to B: this is tricky because, typically, the message offsets differ between clusters so determining whether the "copy" of a message (in B) was already read (from A) is not trivial.
Note: Sometimes you can simplify such an application / such a failover logic in situations where e.g. message processing is idempotent (i.e. where duplicate messages / duplicate processing of messages will not alter the processing outcome).

Apache Kafka Multiple Consumer Instances

I have a consumer that is supposed to read messages from a topic. This consumer actually reads the messages and writes them to a time series database. We have multiple instances of the time series database running as a cluster on multiple physical machines.
Our plan is to deploy the consumer on all those machines where the time series service is running. So if I have 5 nodes on which the time series service is running, I will install one consumer instance per node. All those consumer instances belong to the same consumer group. So in pictures the set up looks like below:
As you can see, the Producer P1 and P2 write into 2 partitions namely partition 1 and partition 2 of the kafka topic. I then have 4 instances of the time series service where one consumer is running per instance. How should I read using my consumer properly such that I do not end up with duplicate messages in my time series database?
Edit: After reading through the Kafka documentation, I came across these two statements:
If all the consumer instances have the same consumer group, then this works just like a traditional queue balancing load over the consumers.
If all the consumer instances have different consumer groups, then this works like publish-subscribe and all messages are broadcast to all consumers.
So in my case above, it is behaving like a Queue? Is my understanding correct?
If all consumers belong to one group (have the same groupId), then kafka topic will behave for you as a queue.
Important: there is no reason to have consumers more than partitions, as consumers (out-of-the-box kafka consumers) are scaled by partitions.

Kafka Only One Consumer in Consumer Group Getting Messages

In my setup, I have a consumer group with three processes (3 instances of a service) that can consume from Kafka. What I've found to be happing is that the first node is receiving all of the traffic. If one node is manually killed, the next node picks up all Kafka traffic, but the last remaining node sits idle.
The behavior desired is that all messages get distributed evenly across all instances within the consumer group, which is what I thought should happen. As I understand, the way Kafka works is that it is supposed to distribute the messages evenly amongst all members of a consumer group. Is my understanding correct? I've been trying to determine why it may be that only one member of the consumer group is getting all traffic with no luck. Any thoughts/suggestions?
You need to make sure that the topic has more than one partition to be able to consume it in parallel. A consumer in a consumer group gets one or more allocated partitions from the broker but a single partition will never be shared across several consumers within the same group unless a consumer goes offline. The number of partitions a topic has equals the maximum number of consumers in a consumer group that can feed from a topic.