HBase: MasterNotRunningException: the node /hbase is not in zookeeper - apache-kafka

I'm building a pipeline with streamsets to read data from a kafka topic and write it to a HBase table. I am able to write it to an HDFS file, but when I try to use an HBase destination I get the following error:
I'm using cloudera to manage the services, and I configured the following properties on the HBase destination:
Zookeeper quorum : (my zookeeeper server IP^)
Zookeeper client port: 2181
Zookeeper parent znode: /hbase
I've the following configuration on the HBase Cloudera service:
zookeeper.znode.parent: /hbase
so there isn't a mismatch between the indicated parameters.
What can be happening?
Thank you in advance.

Look over your Zookeeper server IP address. You should give the IP of the Zookeper that is in the same cluster as HBase. If you have multiple clusters managed by Cloudera Manager you may have multiple Zookeeper services in different clusters.
It is ok to use a Zookeper service for Kafka from one cluster and a different one from another cluster as long as StreamSets is configured accordingly.

Related

Can Kafka Connect consume data from a separate kerberized Kafka instance and then route to Splunk?

My pipeline is:
Kerberized Kafka --> Logstash (hosted on a different server) --> Splunk.
Can I replace the Logstash component with Kafka Connect?
Could you point me to a resource/guide where I can use kerberized Kafka as a source for my Kafka connect (which is hosted separately)?
From the documentation, what I understood is that if Kafka Connect is hosted on the same cluster as that of Kafka, that's quite possible. But I don't have that option right now, as our Kafka cluster is multi-tenant and hence not approved for additional processes on the cluster.
Kerberos keytabs aren't commonly machine/JVM specific, so yes, Kafka Connect should be able to be configured very similarly to Logstash since both are JVM processes using native Kafka protocol.
You shouldn't run Connect on the brokers anyway
If you can't add Kafka Connect to an existing Kafka cluster, you will have to spin up a separate Kafka Connect (Cluster or standalone).
I've written about it here: enter link description here

How to set kafka schema registry cluster

I have set up zookeeper and kafka broker cluster. I want to setup multiple schema registry cluster for fail over.
Zookeeper cluster having 3 node
kafka broker cluster having 3 node.
Could you please mention details steps how to set multiple schema registry?
I am using confluent 5.0 version
Schema Registry is designed to work as a distributed service using single master architecture, so at any given time there will be only one master and rest of the nodes refer back to it. You can refer the schema-registry arch here
You can choose 3 nodes schema-registry cluster (you can run on the same nodes along with zookeeper/Kafka), As you are using confluent 5.0, you can use the confluent CLI,
confluent start schema-registry
Update the schema-registry.properties,
#zookeeper urls
kafkastore.connection.url=zookeeper-1:2181,zookeeper-2:2181,...
#make every node eligible to become master for failover
master.eligibility=true
On the consumer and producer side, pass the list of schema-registry urls in the Consumer.props & Produce.props
props.put("schema.registry.url","http://schemaregistry-1:8081,http://schemaregistry-2:8081,http://schemaregistry-3:8081")
*By default schema-registry port will be 8081.
Hope this helps.

For Kafka,what IP values need to be setup in listeners & advertised.listeners value?

I have created an multi-node Azure Databricks cluster inside a VNET & I have created a multi-node Kafka HDInsight cluster inside different VNET. I have peered this 2 VNETs. After peering, my 2 machines are able to ping each other.
I am trying to dump messages to Kafka topic from Databricks cluster using Spark Structured Streaming & I am getting socket timeout error.
Upon research, I found that in Kafka we need to setup listeners & advertised.listeners in server.properties file.
In my scenario, what should I put the values for listeners & advertised.listeners? Would be very helpful if anyone can suggest me what all changes I need to make in server.properties file.
You need to create a listener for the host/IP on which your client machine (where Spark is running) can connect to your broker.
See https://rmoff.net/2018/08/02/kafka-listeners-explained/

Having Kafka connected with ip as well as service name - Openshift

In our Openshift ecosystem, we have a kafka instance sourced from wurstmeister/kafka. As of now I am able to have the kafka accessible withing the Openshift system using the below parameters,
KAFKA_LISTENERS=PLAINTEXT://:9092
KAFKA_ADVERTISED_HOST_NAME=kafka_service_name
And ofcourse, the params for port and zookeper is there.
I am able to access the kafka from the pods within the openshift system. But I am unable to access kafka service from the host machine. Eventhough I am able to access the kafka pod using its IP and able to telnet the pod using, telnet Pod_IP 9092
When I am trying to connect using the kafka producer from the host machine, I am getting the below error,
2017-08-07 07:45:13,925] WARN Error while fetching metadata with
correlation id 2 : {tls21=LEADER_NOT_AVAILABLE}
(org.apache.kafka.clients.NetworkClient)
And When I try to connect from Kafka consumer from the host machine using IP, it is blank.
Note: As of now, its a single openshift server. And the use case is for dev testing.
Maybe you want to take a look at this POC for having Kafka on OpenShift ?
https://github.com/EnMasseProject/barnabas

Zookeeper install via Ambari

Performing install via Ambari 1.7 and would like to get some clarification regarding the Zookeeper installation. The setup involves (3) Zookeeper and (3) Kafka instances.
Ambari UI asks to specify Zookeeper master(s) and Zookeeper clients/slaves. Should I choose all three Zookeeper nodes as masters and install Zookeeper client on each Kafka server?
Zookeeper doesn't have any master node(s) and I am a little confused here with this Ambari master/slave terminology.
Zookeeper Server is considered a MASTER component in Ambari terminology. Kafka has the requirement that Zookeeper Server be installed on at least one node in the cluster. Thus the only requirement you have is to install Zookeeper server on one of the nodes in your cluster for Kafka to function. Kafka does not require Zookeeper clients on each Kafka node.
You can determine all this information by looking at the Service configurations for KAFKA and ZOOKEEPER. The configuration is specified in the metainfo.xml file for each component under the stack definition. The location of the definitions will differ based on the version of Ambari you have installed.
On newer versions of Ambari this location is:
/var/lib/ambari-server/resources/common-services/<service name>/<service version>
On older version of Ambari this location is:
/var/lib/ambari-server/resources/stacks/HDP/<stack version>/services/<service name>