mapR Kafka cannot start second time round - apache-kafka

To date I have either used an existing professional installation for Hadoop with components running, or, installed Kafka and used the also-supplied Zookeeper in a native VM.
I am trying to get the mapR Community Edition Sandbox to run now.
There is a KAFKA library on mapR, but here is no kafka shown when using jps. Seems odd? I managed to get KAFKA to start once.
There is a Zookeeper service on mapR but it uses port 5181, not 2181.
Kafka uses port 9092.
The log.dirs for kafka was set to /tmp/kafka-logs, I changed that to /opt/kafka-logs
The dataDir was also set to /tmp/zookeeper, I changed that to /opt/zookeeper
I also changed the Zookeeper port to 5181 as that is what mapR uses.
It ran once, and then I re-started and I still get this type of error:
java.io.FileNotFoundException: /tmp/kafka-logs/.lock (Permission denied)
I have done chmod 777 where required I think, but I changed the paths to /opt/... from /tmp. So why is it picking /tmp up again?
I have the impression that it keeps on point to /tmp regardless of the updates to the configurations.
I also see a warning - although I do not think this is an issue:
[2019-01-14 13:26:46,355] WARN No meta.properties file under dir /tmp/kafka-logs/meta.properties (kafka.server.BrokerMetadataCheckpoint)
May be because of the mapR Streams I cannot influence it so as to run natively?

OK, I could delete the question as I solved it, but for those on mapR I deduced:
You need to update the port 2181 to 5181 on server.properties immediately. In this case we integrate with an existing zookeeper instance.
Likewise, update the log.dirs for Kafka from /tmp/kafka-logs asap to /opt/kafka-logs.
Likewise, update the dataDir from /tmp/zookeeper asap to /opt/zookeeper.
Trying to fix latterly otherwise leads to all sorts of issues. I ended up just re-installing and doing it right from scratch.
mapR has a faster version called mapR Streams which implements Kafka. I was not wanting to use that for what I was wanting to do, but mapR Sandbox has a lot of up-to-date items straight out of the box -certainly compared to Cloudera.

Related

Kafka: ERROR Shutdown broker because all log dirs in E:\kafka\data\kafka have failed (kafka.log.LogManager)

I searched for solutions to this problem but there are only temporary fixes for this like changing the log dirs location in server.properties. The error creeps on you after some days.
My application runs continuously, so are there any permanent fixes for this?
Existing bug reports about this issue since 2018 are not being addressed by Apache Kafka because the product is not officially supported on Windows. Only Linux etc. are supported.
According to users in this bug report, the problem is caused by the way windows handles file deletion (deleting files that are still open for writing is not allowed by the OS, wich is no problem on linux).
During the cleaning of the logging (default every 168 hours is 7 days) this causes a crash.
A fix was made by prehistoric-penguin and provided in a pull request # 6403, but this fix is never accepted by Apache Kafka as mentioned above
Possible solutions/workarounds:
Start running the Kafka cluster as a linux machine
Using the fork of prehistoric-penguin for an update of your Kafka including the fix. (not tried myself)
Change the retention settings in such a way that it occurs less often.
Somehow automatically stop Kafka in the weekends after stop all clients, delete logs of Kafka and Zookeeper and build it all up again.
The settings are at \config\server.properties log.retention.hours = 168

Kafka not starting on windows

I have installed Zookeeper and Kakfa separately. Have started Zookeeper successfully. When I try to start Kafka on windows using the command,
C:\kafka_2.12-2.3.0\bin\windows>kafka-server-start.bat ../../config/server.properties
I keep getting,
\Novosoft\C2J\Bin\c2jruntime.zip was unexpected at this time.
Not sure what's causing this.
My environment variables had the CLASSPATH variable set to C:\Program Files (x86)\Novosoft\C2J\Bin\c2jruntime.zip.
Maybe Kafka was not liking it. Removed it and worked.
Kafka server started now

After reboot KAFKA topic appears to be lost

Having installed KAFKA and having looked at these posts:
kafka loses all topics on reboot
Kafka topic no longer exists after restart
and thus moving kafka-logs to /opt... location, I still note that when I reboot:
I can re-create the topic again.
the kafka-logs directory contains information on topics, offsets etc. but it gets corrupted.
I am wondering how to rectify this.
Testing of new topics prior to reboot works fine.
There can be two potential problems
If it is kafka running in docker, then docker image restart always cleans up the previous state and creates a new cluster hence all topics are lost.
Check the log.dir or Zookeeper data path. If either is set to /tmp directory, it will be cleaned on each reboot. Hence you will lose all logs and topics will be lost.
In this VM I noted the Zookeeper log was defined on /tmp. Changed that to /opt (presume it should be /var though) and the clearing of Kafka data when instance terminated was corrected. Not sure how to explain this completely.

How to save a kafka topic at shutdown

I'm configuring my first kafka network. I can't seem to find any support on saving a configured topic. I know I can configure a topic from the quickstart guide here, but how do I save it? I thought I could add the topic info to a .properties file inside the config dir, but I don't see any support for that.
If I shutdown my machine, my topic is deleted. How do I save the configuration?
Could the topic be deleted because you are using the default broker config? With the default config, Kafka logs are stored under /tmp folder. This folder gets wiped out during a machine reboot. You could change the broker config and pick another location for Kafka logs.

How to create a durable topic in kafka

I am new to kafka and am still learning the basics of the same. I want to create a durable topic which is preserved even after the zoopkeeper and/or kafka server shutdown.
What I notice it this - I have a zookeeper and kafka server running on my local macbook. When I shutdown the zookeeper server and again bring it up quickly I can see the previously created topics. But If I restart the system and then restart the zookeeper server - I dont see the topic that I had created earlier.
I am running kafka_2.9.2-0.8.1.1 on my local system.
It happens because /tmp is getting cleaned after reboot resulting in loss of your data.
To fix this modify your Zookeeper dataDir property (in config/zookeeper.properties) and Kafka log.dirs (in config/server.properties) to be somewhere NOT in /tmp.