How to set kafka schema registry cluster - apache-kafka

I have set up zookeeper and kafka broker cluster. I want to setup multiple schema registry cluster for fail over.
Zookeeper cluster having 3 node
kafka broker cluster having 3 node.
Could you please mention details steps how to set multiple schema registry?
I am using confluent 5.0 version

Schema Registry is designed to work as a distributed service using single master architecture, so at any given time there will be only one master and rest of the nodes refer back to it. You can refer the schema-registry arch here
You can choose 3 nodes schema-registry cluster (you can run on the same nodes along with zookeeper/Kafka), As you are using confluent 5.0, you can use the confluent CLI,
confluent start schema-registry
Update the schema-registry.properties,
#zookeeper urls
kafkastore.connection.url=zookeeper-1:2181,zookeeper-2:2181,...
#make every node eligible to become master for failover
master.eligibility=true
On the consumer and producer side, pass the list of schema-registry urls in the Consumer.props & Produce.props
props.put("schema.registry.url","http://schemaregistry-1:8081,http://schemaregistry-2:8081,http://schemaregistry-3:8081")
*By default schema-registry port will be 8081.
Hope this helps.

Related

Kafka broker setup

To connect to a Kafka cluster I've been provided with a set of bootstrap servers with name and port :
s1:90912
s2:9092
s3:9092
Kafka and Zookeeper are running on the instance s4. From reading https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-properties-bootstrap-servers.html, it states:
bootstrap server is a comma-separated list of host and port pairs that
are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster
that a Kafka client connects to initially to bootstrap itself.
I reference the above bootstrap server definition as I'm trying to understand the relationship between the kafka brokers s1,s2,s3 and kafka,zookeeper running on s4.
To connect to the Kafka cluster, I set the broker to a CSV list of 's1,s1,s3'. When I send messages to the CSV list of brokers, to verify the messages are added to the topic, I ssh onto the s4 box and view the messages on the topic.
What is the link between the Kafka brokers s1,s2,s3 and s4? I cannot ssh onto any of the brokers s1,s2,s3 as these brokers do not seem accessible using ssh, should s1,s2,s3 be accessible?
The individual responsible for the setup of the Kafka box is no longer available, and I'm confused as to how this configuration works. I've searched for config references of the brokers s1,s2,s3 on s4 but there does not appear to be any configuration.
When Kafka is being set up and configured what allows the linking between the brokers (in this case s1,s2,s3) and s4?
I start Kafka and Zookeeper on the same server, s4.
Should Kafka and Zookeeper also be running on s1,s2,s3?
What is the link between the Kafka brokers s1,s2,s3 and s4?
As per the Kafka documentation about adding nodes to a cluster, each server must share the same zookeeper.connect string and have a unique broker.id to be part of the cluster.
You may check which nodes are in the cluster via zookeeper-shell with an ls /brokers/ids, or via the Kafka AdminClient API, or kafkacat -L
should s1,s2,s3 be accessible?
Via SSH? They don't have to be.
They should respond to TCP connections from your Kafka client machines on their Kafka server ports, though
Should Kafka and Zookeeper also be running on s1,s2,s3?
You should not have 4 Zookeeper servers in a cluster (odd numbers, only)
Otherwise, you've at least been given some ports for Kafka on those machines, therefore Kafka should be

what is the right order to stop/start kafka with zookeeper and schema registry

we have 3 kafka brokers in the cluster
kafka version - 1.0.0
each kafka machine include also : zookeper server , schema registry
what is the right order to stop/start the following services:
1. kafka
2 zookeeper
3 schema registry
To start the services:
ZooKeeper is a prerequisite for Kafka and Schema Registry so it has
to go first.
(Ideally, you would also want to verify that Leader election took place too with the 4-letter stat command.)
Kafka is a prerequisite for Schema Registry so it goes 2nd.
Schema Registry goes last.
To stop them, go in the reverse order.
Schema Registry
Kafka
ZooKeeper
References
Using Schema Registry
As the accepted answer says, it goes like;
Zookeeper
Kafka
Schema Registry
+Kafka Connect
If you are using Kafka Connect, like me, you should start it after you've started both Zookeeper and Kafka. I don't think the order between Schema Registry and Kafka Connect matters but I do start Kafka Connect last and haven't faced any problem so far.

2 cluster of zookeper servers in hadoop+kafka cluster - is it posible?

We have Kafka cluster with the following details
3 kafka machines
3 zookeeper servers
We also have Hadoop cluster that includes datanode machines
And all application are using the zookeeper servers, including the kafka machines
Now
We want to do the following changes
We want to add additional 3 zookeeper servers that will be in a separate cluster
And only kafka machine will use this additional zookeeper servers
Is it possible ?
Editing the ha.zookeeper.quorum in Hadoop configurations to be separate from zookeeper.connect in Kafka configurations, such that you have two individual Zookeeper clusters, can be achieved, yes.
However, I don't think Ambari or Cloudera Manager, for example, allow you to view or configure more than one Zookeeper cluster at a time.
Yes, that's possible. Kafka uses Zookeeper to perform various distributed coordination tasks, such as deciding which Kafka broker is responsible for allocating partition leaders, and storing metadata on topics in the broker.
After closing kafka, the original zookeeper cluster data will be copied to the new cluster using tools, this is a zookeeper cluster data transfer util zkcopy
But if your Kafka cluster didn't stop work, you should think about Zookeeper data transfer to additional zookeeper servers.

How to auto scale apache zookeeper

Anyone using Auto Scale to scale you Zookeeper cluster? If the zookeeper scale, how clients know it has been scale up or down? Specially like Kafka where the zookeeper list is being added into config file, what happen zookeeper scaled how kafka now it has been scale etc?
Short answer: ZooKeeper clients do not need to essentially know/track if there are new nodes added to the ZooKeeper cluster. They just need at least one ZK node available (healthy) for them.
Longer answer (with Kafka as example client of ZK):
If you're only adding new nodes to the ZooKeeper cluster, it's not essential for Kafka brokers to know about this, because the zookeeper.connect configuration still contains healthy ZK nodes.
If however, you're replacing/removing some of the ZooKeeper nodes, and these are the only nodes present in the zookeeper.connect configuration, then a rolling restart of the Kafka nodes will be required, after updating the zookeeper.connect configuration.
For #1 above, best to add the new ZK nodes to the Kafka configuration at the next opportunity of Kafka cluster restart.
Same is applicable for other technologies also that depend on ZK (e.g. Apache Storm).

Zookeeper install via Ambari

Performing install via Ambari 1.7 and would like to get some clarification regarding the Zookeeper installation. The setup involves (3) Zookeeper and (3) Kafka instances.
Ambari UI asks to specify Zookeeper master(s) and Zookeeper clients/slaves. Should I choose all three Zookeeper nodes as masters and install Zookeeper client on each Kafka server?
Zookeeper doesn't have any master node(s) and I am a little confused here with this Ambari master/slave terminology.
Zookeeper Server is considered a MASTER component in Ambari terminology. Kafka has the requirement that Zookeeper Server be installed on at least one node in the cluster. Thus the only requirement you have is to install Zookeeper server on one of the nodes in your cluster for Kafka to function. Kafka does not require Zookeeper clients on each Kafka node.
You can determine all this information by looking at the Service configurations for KAFKA and ZOOKEEPER. The configuration is specified in the metainfo.xml file for each component under the stack definition. The location of the definitions will differ based on the version of Ambari you have installed.
On newer versions of Ambari this location is:
/var/lib/ambari-server/resources/common-services/<service name>/<service version>
On older version of Ambari this location is:
/var/lib/ambari-server/resources/stacks/HDP/<stack version>/services/<service name>