Client Configuration for Message Load Balancing - activemq-artemis

We have 6 AMQ instances (3 masters + 3 slaves) in clustering mode, each in a different machine. Each instance has let's say queue/ExampleQueue. We would like to achieve load-balancing as we have massive messages to be sent to queue/ExampleQueue.
This example uses ConnectionFactory only from one node.
In Spring we utilize the ConnectionFactory by configuring all 6 Node-ConnectionFactory-URLs:
connectionFactory.ConnectionFactory: (tcp://vbox:61616,tcp://vbox:61716,tcp://vbox:61816,tcp://vbox:62616,tcp://vbox:62716,tcp://vbox:62816)?type=CF...
What would happen if we only configure 1 master node as follows:
connectionFactory.ConnectionFactory: (tcp://vbox:61616)?type=CF...
Would this 1 master node do round-robin-load-balancing?
Does ist mean that in case of multiple brokers URL, there are tend to be more than 1 ConnectionFactory used? In our case, that would be 6 ConnectionFactory instances involved. Each of them will also has its own topology instance? Is it necessary to also put the 3 slave URls? What would happen if we set useTopologyuseTopologyForLoadBalancing=false?

Listing multiple brokers in the same URL is mainly useful when making the initial connection. Each broker in the list will be tried until a connection is successfully established. Then as soon as the client connects to any node in the cluster it will receive the topology of the entire cluster and any additional connections made with that same ConnectionFactory instance will be distributed across the cluster in a round-robin fashion based on the received topology.
The down side of just listing one broker in the URL is that if that one broker is down the client won't connect to anything at all.

Related

Snowflake Kafka connector doubts and questions

I am using 3 server cluster for the Kafka Configuration, with Snowflake connector REST API to push the data to Snowflake database: All are 3 different VMs running on AWS
1.In this, does we require 3 kafka individual server zookeeper-services needs to be up and running in cluster else only 1 is enough, as if it needs to be executed in all the 3 servers zookeeper services, does it require different port configurations like for ex:
1.a:zookeeper.connect=xx.xx.xx.xxx:2181, xx.xx.xx.xxx:2182, xx.xx.xx.xxx:2183 else it should be 2181 in all the servers.properties file
1.b:PLAINTEXT://localhost:9091 in server1, PLAINTEXT://localhost:9092 and PLAINTEXT://localhost:9093 (Even in this it should be localhost else IP Address) that needs to be given?
1.c:server.1=<zookeeper_1_IP>:2888:3888, server.1=<zookeeper_2_IP>:2888:3888, server.1=<zookeeper_3_IP>:2888:3888 (Over here on each server the 2888:3888 needs to be same right?)
1.d:Clientport=2181 needs to be the same across the services in all 3 VMs else it needs to be different?
1.e:Does the listeners = PLAINTEXT://your.host.name:9092 on each server should have separate port like
VM-Server1:9092, VM-Server2:9093, VM-Server3:9094. Else the master server-IP should be given in the worker-nodes that is Server2 and Server3 else the own server IP of that worker-node
What should be the configuration for connector in regards with REST-API for the configuration item "tasks.max":"1". As I am going with 3 server cluster for Kafka and would be starting the 3 distribute-connector on all the 3 machines
I am getting duplicates, if I am starting the services of distributed connector in the 2nd server, how these duplicate records can be avoided. But yes if its only 1 distributed-connector is running the services, then there are no duplicates. Please advice, as the lag gets increased if only 1 distributed-connector services is up and running.
Create /data/zookeeper/myid file and give value 1 for zookeeper1 , 2 for zookeeper2 and 3for zookeeper3. Is this necessary when you are in different VM?
The distributed-connector services once started executing for sometime and then it gets disconnected
Any other parameter for the 3 server cluster architecture and best practices which needs to be followed
Kafka and Zookeeper
You only need one Kafka broker and Zookeeper server, although having more would provide fault tolerance. You don't need to manually create anything in Zookeeper such as myid files.
The ports don't need to be the same, but it is obviously easier to draw a network diagram and automate the configuration if they are.
Regarding Kafka listeners, read this post. For Zookeeper, follow its documentation if you want to create a cluster.
Or use Amazon MSK / Confluent Cloud, etc. instead of EC2, and this is all done for you.
Kafka Connect
tasks.max can be as much as you want, but if you have a source connector, then multiple threads will probably cause duplicates, yes.

Kafka producer posting messages to secondary cluster

Description of proposed cluster setup
2 Data centres and each having 5 node Kafka cluster
Clusters are having the same topics and same producer/consumer instances working with it
There is no data replication across the clusters. So data in Cluster 1 and 2 is distinct
There is no message affinity required. [It will not make any difference functionally if either the Producer 1 were to start posting message to Cluster 2 and vice versa]
What we want to achieve is, Lets say Producer 1 posts a message asynchronously to Cluster 1, but receives a negative acknowledgment ( after all the retry timeout has occurred). This is easily detected in the producer callback method
On receiving this failure, We use another KafkaTemplate (having details of Cluster 2) to be used by producer. Now producer tries posting the same message on to cluster 2 [ It applies other way round as well, if producer 2 unable to post locally , it will send message to cluster 1]
The advantage that we get here is
message is not lost and posted automatically to the other cluster
Since this activity occurs for each message, so once the Cluster 1 is back up, automatically Producer 1 is able to send messages to cluster 1
One down side we see is, We are handling the failover logic ourselves by producing to secondary cluster in exception handling block of either Metadata fetch timeout or on Negative acknowledgment
I could not find any where on the net showing a similar setup. Is there is something fundamentally wrong with this approach
Sure; just configure 2 sets of infrastructure beans - producer and consumer factories, container factories, templates.
You can't use Boot's auto configuration for that, but you can define the beans yourself.

How many bootstrap servers to provide for large Kafka cluster

I have a use case where my Kafka cluster will have 1000 brokers and I am writing Kafka client.
In order to write client, i need to provide brokers list.
Question is, what are the recommended guidelines to provide brokers list in client?
Is there any proxy like service available in kafka which we can give to client?
- that proxy will know all the brokers in cluster and connect client to appropriate broker.
- like in redis world, we have twemproxy (nutcracker)
- confluent-rest-api can act as proxy?
Is it recommended to provide any specific number of brokers in client, for example provide list of 3 brokers even though cluster has 1000 nodes?
- what if provided brokers gets crashed?
- what if provided brokers restarts and there location/ip changes?
The list of broker URL you pass to the client are only to bootstrap the client. Thus, the client will automatically learn about all other available brokers automatically, and also connect to the correct brokers it need to "talk to".
Thus, if the client is already running, the those brokers go down, the client will not even notice. Only if all those brokers are down at the same time, and you startup the client, the client will "hang" as it cannot connect to the cluster and eventually time out.
It's recommended to provide at least 3 broker URLs to "survive" the outage of 2 brokers. But you can also provide more if you need a higher level of resilience.

Connecting Storm with remote Kafka cluster, what would happen if new brokers are added

We are working on an application that uses Storm to pull data from a remote Kafka cluster. As the two cluster lies in different environment there is an issue with network connectivity between them. In simple term by default the remote zookeeper and Kafka brokers does not allow connection from our Storm's worker/supervisor nodes. In order to do that we need firewall access to be given.
My concern is what would happen if new Brokers or Zookeeper is added in the remote cluster ? I understand that we don't have to specify all the zk nodes in order to consume but say they add few brokers and we need to consume from a partition which is served by those new set of nodes ? What would be the impact on the running Storm application ?

Zookeeper - what will happen if I pass in a connection string only some of the nodes from the zk cluster (ensemble)?

I have a zookeeper cluster consisting of N nodes (which knows about each other). What if I pass only M < N of the nodes' addresses in zk client connection string? What will be the cluster's behavior?
In a more specific case, what if I pass host address of only 1 zk from the cluster? Is it possible then for the zk client to connect to other hosts from the cluster? What if this one host is down? Will be client able to connect to other zookeeper nodes in an ensemble?
The other question is, is it possible to limit client to use only specific nodes from the ensemble?
What if I pass only M < N of the nodes' addresses in zk client
connection string? What will be the cluster's behavior?
ZooKeeper clients will connect only to the M nodes specified in the connection string. The ZooKeeper ensemble's back-end interactions (leader election and processing write transaction proposals) will continue to be processed by all N nodes in the cluster. Any of the N nodes still could become the ensemble leader. If a ZooKeeper server receives a write transaction request, and that server is not the current leader, then it will forward the request to the current leader.
In a more specific case, what if I pass host address of only 1 zk from
the cluster? Is it possible then for the zk client to connect to other
hosts from the cluster? What if this one host is down? Will be client
able to connect to other zookeeper nodes in an ensemble?
No, the client would only be able to connect to the single address specified in the connection string. That address effectively becomes a single point of failure for the application, because if the server goes down, the client will not have any other options for establishing a connection.
The other question is, is it possible to limit client to use only specific nodes from the ensemble?
Yes, you can limit the nodes that the client considers for establishing a connection by listing only those nodes in the client's connection string. However, keep in mind that any of the N nodes in the cluster could still become the leader, and then all client write requests will get forwarded to that leader. In that sense, the client is using the other nodes indirectly, but the client is not establishing a direct socket connection to those nodes.
The ZooKeeper Overview page in the Apache documentation has further discussion of client and server behavior in a ZooKeeper cluster. For example, there is a relevant quote in the Implementation section:
As part of the agreement protocol all write requests from clients are
forwarded to a single server, called the leader. The rest of the
ZooKeeper servers, called followers, receive message proposals from
the leader and agree upon message delivery. The messaging layer takes
care of replacing leaders on failures and syncing followers with
leaders.