Kafka - zookeeper doesn't run with others - apache-kafka

I have a problem with Apache kafka
I have 4 clusters where I want to install kafka instances. On 3 clusters its works, they can product, and consume messages between each other, zookepers work fine. But on 4th cluster I can't run zookeeper connected with others zookeepers. If I set in zoo.cfg only local server (0.0.0.0:2888:3888) zookeeper runs in mode standalone, but if I add others servers I get error
./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Error contacting service. It is probably not running.
How I can fix this error? I will add, that I can ping servers, so they can see each other.

Related

Snowflake Kafka connector doubts and questions

I am using 3 server cluster for the Kafka Configuration, with Snowflake connector REST API to push the data to Snowflake database: All are 3 different VMs running on AWS
1.In this, does we require 3 kafka individual server zookeeper-services needs to be up and running in cluster else only 1 is enough, as if it needs to be executed in all the 3 servers zookeeper services, does it require different port configurations like for ex:
1.a:zookeeper.connect=xx.xx.xx.xxx:2181, xx.xx.xx.xxx:2182, xx.xx.xx.xxx:2183 else it should be 2181 in all the servers.properties file
1.b:PLAINTEXT://localhost:9091 in server1, PLAINTEXT://localhost:9092 and PLAINTEXT://localhost:9093 (Even in this it should be localhost else IP Address) that needs to be given?
1.c:server.1=<zookeeper_1_IP>:2888:3888, server.1=<zookeeper_2_IP>:2888:3888, server.1=<zookeeper_3_IP>:2888:3888 (Over here on each server the 2888:3888 needs to be same right?)
1.d:Clientport=2181 needs to be the same across the services in all 3 VMs else it needs to be different?
1.e:Does the listeners = PLAINTEXT://your.host.name:9092 on each server should have separate port like
VM-Server1:9092, VM-Server2:9093, VM-Server3:9094. Else the master server-IP should be given in the worker-nodes that is Server2 and Server3 else the own server IP of that worker-node
What should be the configuration for connector in regards with REST-API for the configuration item "tasks.max":"1". As I am going with 3 server cluster for Kafka and would be starting the 3 distribute-connector on all the 3 machines
I am getting duplicates, if I am starting the services of distributed connector in the 2nd server, how these duplicate records can be avoided. But yes if its only 1 distributed-connector is running the services, then there are no duplicates. Please advice, as the lag gets increased if only 1 distributed-connector services is up and running.
Create /data/zookeeper/myid file and give value 1 for zookeeper1 , 2 for zookeeper2 and 3for zookeeper3. Is this necessary when you are in different VM?
The distributed-connector services once started executing for sometime and then it gets disconnected
Any other parameter for the 3 server cluster architecture and best practices which needs to be followed
Kafka and Zookeeper
You only need one Kafka broker and Zookeeper server, although having more would provide fault tolerance. You don't need to manually create anything in Zookeeper such as myid files.
The ports don't need to be the same, but it is obviously easier to draw a network diagram and automate the configuration if they are.
Regarding Kafka listeners, read this post. For Zookeeper, follow its documentation if you want to create a cluster.
Or use Amazon MSK / Confluent Cloud, etc. instead of EC2, and this is all done for you.
Kafka Connect
tasks.max can be as much as you want, but if you have a source connector, then multiple threads will probably cause duplicates, yes.

Kafka Cluster cotinues to run without zookeeper

I have a five node kafka cluster(confluent 5.5 community edition) with 3 zookeeper nodeseach on different aws instances.
While doing failover testing , noticed that the kafka cluster works fine even if all zookeeper nodes are down.
I was able to produce , consume and also create new consumers.
why does the kafka cluster not stop if it cannot connect to any zookeeper nodes ?
What would be the possible issues if we are unaware of such a failure scenario in production and kafka cluster continues to run without zookeeper connectivity ?
how do we handle such a scenario ?
Broker leader election, topic creation, simple ACLs (if you use them) still depend on Zookeeper. For other basic functions relying on the Kafka bootstrap protocols, they might still work, sure. There should definitely be broker logs indicating connection was lost
Ideally you'd have basic process healthchecking and incident management software that you shouldn't miss critical services going down in prod
How to handle? Restart Zookeeper...

Configuring kafka connect with multi brokers

Steps
I have used two kafka brokers and I have started zookeeper,kafka server and kafka connect services.
I have one source type kafka connector which can be used for getting data from Database.
If i start the connector[connector 1] by using the rest API, then it will hit any one kafka server [Server 1] using load balancer.After that server 1 will store and running the connector.But server 2 does not know the connector [connector 1] which is running in the server 1.
Expectation
So if the kafka server 1 is down, then the another kafka server 2 should be able to run the connector in the failed kafka server 1.
While starting the connector, kafka server should know how many connectors are in running, so that if any one broker failed to do the job then another server will be able to continue the job.
Reality
Another Kafka server 2 which is not doing the job as per the requirement.
is there any thing to make it by configuration setup with kafka?.
Kindly suggest me some ideas.
Kafka Server 1
Kafka Server 2
It appears that you have started all processes in single pods.
You should run Kafka, Zookeeper, and Connect all as separate services in different pods.
I suggest you refer the Confluent or Strimzi sites to find Kafka Kubernetes Helm Charts / Operators
But to answer the question - You could give one or more broker to connect-distributed.properties bootstrap.server value. Then each broker is connected to as part of the Kafka cluster, and will reconnect in the event that one broker is unavailable
"Kakfa servers" (brokers) do not run Connectors
If you want to run a cluster of connect workers, you also need to setup their rest.advertised.listener address so that they can communicate with each other.

Run two kafka server separately on same machine

I have two Kafka servers which are run in the same machine in Ubuntu. One Development Cluster consist of 1 ZK 1 Kafka Broker and 2 Workers, and Production Cluster consist of 3 ZK 3 Kafka Broker and planned 3-4 Workers.
Both are running, but Zookeeper in prod server are affected with the developer one, and when i see in my controller logs, it show some task which run in dev and show that my prod kafka run in the same cluster as kafka dev. And also after several minutes, production server are down and only one broker are run. How to isolate and separate both of them so no can affect another one?
Suggestion 1: Use Docker Compose to completely isolate both stacks
Suggestion 2: Don't run more than one of any service on a single, physical server. Otherwise, should this one machine crash and fail, then you lose everything.

Timed out waiting for connection while in state : CONNECTING

I have a 3 node cluster for Kafka and zookeeper. Both Kafka and zookeeper cluster use same 3 machines. When I start the Kafka service, it fails to start if all 3 zookeeper servers are not up. Why is Kafka waiting for zookeeper cluster to be fully available ? I am not sure if I am overlooking g some setting. I am trying to launch zookeeper followed by Kafka one machine at a time. Error recieved Kafka.zookeeper.ZooKeeperClientTimeoutException