Forward Messages from two Kafka Clusters to Another

Forward Messages from two Kafka Clusters to Another - apache-kafka

I have three Kafka clusters: A, B, and C. I have data incoming on Cluster A on topic incoming.dataA and Cluster B on topic incoming.dataB
I need a way to send all messages received on incoming.dataA on Cluster A and incoming.dataB on Cluster B to a topic on Cluster C, received.data. Can this be done?
I am aware of mirroring and streaming but neither of those help when forwarding data from kafka cluster to another (when their topic names differ).

MirrorMaker can only be used between two clusters, so you'd have to chain A->B->C
Your next option would be to use some Apache projects (or just regular client app) such as Spark/Flink/Beam/Nifi/Camel to consume from each cluster with individually configured consumers, and somehow forward records with a single producer client (would be recommended to join the data first, somehow, assuming order or some characteristics matter)

Related

How to filter messages on kafka topic based on hostnames under message properties , where as consumers are on 2 different servers?

I have a requirement to consume messages from kafka based on hostname under message properties. Messages are getting dropped under Kafka topic from downstream and on the listener side 2 process are configured to same kafka queue but running on 2 different servers/hostnames. What is the procedure to filter out messages based on hostnames under properties to route messages to the right server. For example message 1 got property with hostname name as serverA and message 2 got property with hostname serverB . Message 1 should be picked up by server A and message 2 should be picked up by serverB.

I think the best approach is to have 2 different topics for server A and server B. That way those can also scale independently. If you want those services to have one interface you can do it as follows with 3 topics:
services A + B interface - you will have a router that listens to this topic and route the message based on your business logic to topics A/B.
topic A - service A will listen to that one.
topic B - service B will listen to this one.
Both service A and B will write the output to your response topic.
In that approach topics A and B are internal mechanism of your backend system, your clients will only write to the interface topic.
Another approach would be to implement custom partitioner - I think that's an over kill for your use case and there are a lot of things to take into account such as skewed data.

You could set the hostname as the record key in the producer, and define a static partitioner that routes certain host name patterns to specific partitions.
Then, the consumer would need to use assign rather than subscribe to read specific partitions.
Alternatively, just make different topics unless you plan on having thousands of them

Consumer Group between Cloud

i want to ask about consumer group, but between cloud..
here is my example. i have 5 cloud server, lets say A B C D E F
Producer: A
Broker Cluster: B C
Consumer: D E F
Consumer with same group: D E
i want to ask, can we configure cloud D and E using same group? how the kafka broker notice the group from different cloud?
because when i start my consumer script(i use kafka/laravel), i just ran kafka:My-Topic , then the kafka consumer run.
When i use multiple cloud, i need to run each script in each cloud. i don't know what happen to my broker, can the broker cluster know the consumer group? or i need to decler each consumer group in broker?
Broker cluster is a replicate between broker right? can i setting broker B for D and E , then broker C for D E F ? does it count ae broker cluster?
What happend to broker cluster when i have multiple group with 1 consumer each group. Does my massage get consumed by 1 group, or multiple group will consume 1 massage?
is there any tutorial on how to create Key-Value for kafka? i need to store data inside kafka memory.
thx

You simply give both apps the same group.id config. Doesn't really matter where they are running, as long as they can reach the brokers.
Yes, the brokers will know about the consumer groups that all clients are using.
No, you cannot/shouldn't assign specific brokers to clients. You might be able to assign partitions, but this is rarely a good idea for basic consumption.
If there are multiple, unique group names, they get all messages, regardless of the number of consumers in each group.
Kafka itself doesn't have "in memory" storage. I don't understand this last point, exactly. Your consumer can build a dictionary on its own when it processes messages, but obviously anything in-memory wouldn't be able to be distributed and queried across all consumers. Therefore, it might make sense to use something like KSQL to consume events into a lookup table, then use REST calls from php to it, rather than Kafka clients

How do I send data to multiple Kafka producers one time

I am looking for a help on Kafka producer to multiple clusters in parallel. I have two environments for pushing data to (cert and dev), every time I run producer to send data to cert and dev separately (one topic), is there away I can send data to both clusters together?

Tying your application (producers) to a particular environment topology (cert / dev) doesn't sound like the best approach. There is no way to produce from the same producer instance to two clusters - so then you would have to have two producer instances, and hope that both behave exactly the same when producing. Any problems (e.g. network glitch) that causes one to fail and the other not means you end up with divergence in your two environments.
Instead use something like Confluent Replicator or MirrorMaker 2 to stream records from one cluster to another. That way you can build your application to producer records to a target cluster, and decoupled from that populate additional environments/clusters as desired.

what is the best approach to keep two kafka clusters in Sync

I have to setup two kafka clusters in two different data centers (DCs), which have same topics and configuration. the reason is that the connectivity between two data centers is nasty we cannot create a global one.
We are having producers and consumers to publish and subcribe to the topics of each DC.
the problem is that I need to keep both clusters in sync.
Lets say: all messages are written to the first DC should be eventually replicated to the second, and otherway around.
I am evaluation the kafka MirrorMaker tool by creating the Mirror by consuming messages of the first and procuding messages to the second one. However it is also requried to replicate data from the second to the first because writing data is both allowed in two clusters.
I dont think the Kafka MirrorMaker tool is fit to our case.
Appricate any suggestion?
Thanks in advance.

Depending on your exact requirements, you can use MirrorMaker for your use case.
One option would be to just have two separate topics, lets call them topic1 on cluster 1 and topic2 on cluster 2. All your producing threads write to the "local" topic and you use mirrormaker to replicate this topic to the remote cluster.
For your consumers, you simply subscribe to both topics on whatever cluster is closest to you, that way you will get all records that were written on either cluster.
I have created an illustration that hopefully helps:
Alternatively, you could create aggregation topics on both clusters and use MirrorMaker to replicate data into this topic, this would enable you to have all data in one topic for consumption.
You would have duplicate data on the same cluster this way, but you could take care of this by lower retention settings on the input topic.
Again, hopefully the following picture helps to explains my thinking:
In order for this to work, you will need to configure MirrorMaker to replicate a topic into a topic with a different name, which is not a standard thing for it to do, I have written a small blog post on how to do this, if you want to investigate this option further.

How to consume from two different clusters in Kafka?

I have two kafka clusters say A and B, B is replica of A. I would like to consume messages from cluster B only if A is down and viceversa. Nevertheless consuming messages from both the clusters would result in duplicate messages. So is there any way I can configure my kafka consumer to receive messages from only one cluster.
Thanks--

So is there any way I can configure my kafka consumer to receive messages from only one cluster.
Yes: a Kafka consumer instance will always receive messages from one Kafka cluster only. That is, there's no built-in option to use the same consumer instance for reading from 2+ clusters. But I think you are looking for something different, see below.
I would like to consume messages from cluster B only if A is down and viceversa. Nevertheless consuming messages from both the clusters would result in duplicate messages.
There's no built-in failover support such as "switch to cluster B if cluster A fails" in Kafka's consumer API. If you need such behavior (as in your case), you would need to do so in your application that uses the Kafka consumer API.
For example, you could create a consumer instance to read from cluster A, monitor that instance and/or that cluster to determine whether failover to cluster B is required, and (if needed) perform the failover to B by creating another consumer instance to read from B in the event that A fails.
There are a few gotchas however that makes this failover behavior more complex than my simplified example. One difficulty is to know which messages from cluster A have already been read when switching over to B: this is tricky because, typically, the message offsets differ between clusters so determining whether the "copy" of a message (in B) was already read (from A) is not trivial.
Note: Sometimes you can simplify such an application / such a failover logic in situations where e.g. message processing is idempotent (i.e. where duplicate messages / duplicate processing of messages will not alter the processing outcome).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse