Is Kafka topic linked with zookeeper and If zookeeper changed will topic disappeare - apache-kafka

I was working with Kafka. I downloaded the zookeeper, extracted and started it.
Then I downloaded Kafka, extracted the zipped file and started Kafka. Everything was working good. I created few topics and I was able to send and receive messages. After that I stopped Kafka and Zookeeper. Then I read that Kafka itself provides Zookeeper. So I started Zookeeper that was provided with Kafka. However the data directory for it was different, and then I started Kafka from same configuration file and same data directory location. However after starting Kafka I could not find the topics that I had created.
I just want to know that, does this mean the meta data about the topics is maintained by Zookeeper. I searched Kafka documentation, however, I could not find anything in detail.
https://kafka.apache.org/documentation/

Check this documentation provided by confluent. According to this Apache Kafka® uses ZooKeeper to store persistent cluster metadata and is a critical component of the Confluent Platform deployment. For example, if you lost the Kafka data in ZooKeeper, the mapping of replicas to Brokers and topic configurations would be lost as well, making your Kafka cluster no longer functional and potentially resulting in total data loss.
So, the answer to your question is, yes, the purpose of zookeeper is to store relevant metadata about the kafka brokers, topics, etc,.
Also, since you have just started working on Kafka and Zookeeper, I would like to mention this. By default, Kafka stored it's data in a temp location which get's deleted on system reboot, so you should change that as well.

the answer to your question tag is yes,
1)Initially you started standalone zookeeper from zip file and you stopped the zookeeper, which means the topics that are created are stored in the zookeeper standalone are lost.Now you persistent cluster metadata related to Kafka is lost .
2)second time you started the zookeeper from the package that comes along with Kafka, now the new zookeeper instance does not have any topics information that you created previously, so you need to create newly .
3) suppose in case 1: if you close the terminal and start again the zookeeper from standalone , you no need to create the Topic again ,but if you stopped the zookeeper server from standalone then topics are lost.
in simple : you created two separate zookeeper instances, where topics will not be shared between them .

Related

Migration Cloudera Kafka (CDK) to Apache Kafka

I am looking to migrate a small 4 node Kafka cluster with about 300GB of data on the each brokers to a new cluster. The problem is we are currently running Cloudera's flavor of Kafka (CDK) and we would like to run Apache Kafka. For the most part CDK is very similar to Apache Kafka but I am trying to figure out the best way to migrate. I originally looked at using MirrorMaker, but to my understanding it will re-process messages once we cut over the consumers to the new cluster so I think that is out. I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster (not sure how this will work yet, if at all) then decommission the CDK server one at a time. Otherwise I am out of ideas other than spinning up a new Apache Kafka cluster and just making code changes to every producer/consumer to point to the new cluster. which I am not really a fan of as it will cause down time.
Currently running 3.1.0 which is equivalent to Apache Kafka 1.0.1
MirrorMaker would copy the data, but not consumer offsets, so they'd be left at their configured auto.offset.reset policies.
I was wondering if we could spin up a new Apache Kafka cluster and add it to the CDK cluster
If possible, that would be the most effective way to migrate the cluster. For each new broker, give it a unique broker ID and the same Zookeeper connection string as the others, then it'll be part of the same cluster.
Then, you'll need to manually run the partition reassignment tool to move all existing topic partitions off of the old brokers and onto the new ones as data will not automatically be replicated
Alternatively, you could try shutting down the CDK cluster, backing up the data directories onto new brokers, then starting the same version of Kafka from your CDK on those new machines (as the stored log format is important).
Also make sure that you backup a copy of the server.properties files for the new brokers

How to add two more kafka brokers in the local machine if my current running kafka broker already has the data

I have one broker running in my local machine with Windows OS which has 2-3 topics with messages stored. I want to scale up my machine by adding two more broker instances. I have followed all the steps to configure 3 brokers on the same machine by creating different properties file.
My broker=0 getting shutdown when I am starting broker=1 server with below error.
[2019-07-11 13:56:33,580] INFO Stopping serving logs in dir C:\kafka_2.12-2.2.1\data\kafka (kafka.log.LogManager)
[2019-07-11 13:56:33,585] ERROR Shutdown broker because all log dirs in C:\kafka_2.12-2.2.1\data\kafka have failed (kafka.log.LogManager)
Is it possible to add more brokers if my existing broker instance has the data.
Or do I need to delete the data directory and freshly start the broker 0. Is there any possibility to preserve the data without deleting it from the kafka server.
Yes you can add brokers to your cluster and migrate/spread data across all your brokers.
The Expanding your cluster section in the documentation details the steps to achieve this.
After starting the new brokers, you basically need to use the bin/kafka-reassign-partitions.sh tool (other 3rd party tools also exists) to move data onto them.
Please note however that adding brokers on the same machine does not provide a lot of resiliency as if the machine was to go down, all brokers would be affected. But if you want to just play around and learn about Kafka that may be fine.
To run multiple brokers on the same physical machine, it is necessary for each broker in the config to specify a unique broker.id, different log.dirs and ports in listeners.
For example,
config/server{1,2,3}.properties
in every config set difference
broker.id=<id>
log.dirs=/data/kafka<id>
listeners=PLAINTEXT://localhost:909<id>
When all three brokers start, new topics will be created evenly throughout the cluster, but old ones need to be rebalanced.

Migrating topics,ACL and messages from apache kafka to confluent platform

We are migrating our application from Apache Kafka to Confluent Platform .
Apache Kafka version:1.1.0
Confluent :4.1.0
Tried these options:
Manually copying the zookeeper logs and Kafka Logs- Not an optimal way
because of volume and data correctness.
Mirror Maker - This will replicate newly created topics and ACL. It will not
migrate old details in Apache Kafka
Please suggest better approaches on this.
You can keep your existing Kafka and Zookeeper installation.
Confluent does not change any way these run or manage data.
You can configure the REST Proxy, Schema Registry, Control Center, KSQL, etc. to use your existing bootstrap servers or Zookeeper connection; nothing should need migrated, you're only adding extra consumer/producer services which just happen to be provided by Confluent.
If you later plan on upgrading your brokers, then you can start up new ones from the Confluent package, migrate the partitions, then shut down the old ones. Similarly for Zookeeper, but make sure that you have at least 2 up during this process, and always have an odd number of them available after your transition

Load data from separate kafka cluster to Samza?

I am trying to create a Samza job that as closely resembles the Wikipedia example job as I can make it. However in the "WikipediaFeed" object I am trying to get data from a different Kafka broker than the Kafka broker that is running when you start the Hello-Samza grid.
Do I have to create a thread safe Kafka consumer inside the "WikipediaFeed" object to consume data from a different Kafka cluster or is there another way I'm not seeing?
Edit 1:
Here is a link to their Wikipedia example.
https://github.com/apache/samza-hello-samza/tree/master/src/main
Thanks
In your example you need change this config (https://github.com/apache/samza-hello-samza/blob/master/src/main/config/wikipedia-feed.properties) :
systems.kafka.consumer.zookeeper.connect=KAFKA_CLUSTER_FRONTING:2181
systems.kafka.producer.bootstrap.servers=KAFKA_CLUSTER_FRONTING:9092
task.inputs=kafka.topic1,kafka.topic2,kafka.topic3
Change the config with your Fronting Kafka cluster
and add your topic in task.inputs separated with ","
Edit:
Just to be clear, you can deploy your Samza into a Cluster 1 and consume a Kafka topic from another cluster. You need change the config in your Samza properties.
To see more information : Samza config
Then if you need send your message after process to another Kafka cluster you will need create another system in your config.
See more information : https://samza.apache.org/learn/documentation/0.13/api/overview.html

How to save a kafka topic at shutdown

I'm configuring my first kafka network. I can't seem to find any support on saving a configured topic. I know I can configure a topic from the quickstart guide here, but how do I save it? I thought I could add the topic info to a .properties file inside the config dir, but I don't see any support for that.
If I shutdown my machine, my topic is deleted. How do I save the configuration?
Could the topic be deleted because you are using the default broker config? With the default config, Kafka logs are stored under /tmp folder. This folder gets wiped out during a machine reboot. You could change the broker config and pick another location for Kafka logs.