Wiping ALL Kafka data? - apache-kafka

Running Kafka on Windows 10 x64. I stopped Zookeeper and Kafka. Deleted the logs folder in both. Deleted my kafka-streams folder. But still, when I start Kafka up, I get a bunch of:
[2020-04-02 10:32:46,717] INFO [Partition xxx-stream-e1d97106-95ab-4f64-a692-ebfe73382e4c-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition-0 broker=0] No checkpointed highwatermark is found for partition xxx-stream-e1d97106-95ab-4f64-a
and about consumer offsets.
Where is it storing this stuff? I thought it was all in the logs directory?

Related

How to handle source topic deletion in Mirror Maker 2?

When I try to delete a topic in the source cluster that MM2 is trying to replicate, it starts throwing the below error continuously. While this is expected, the error doesn't stop and continues forever causing huge log files on my system. Is there a way to have MM2 handle source topic deletion gracefully?
[2022-05-12 14:42:57,473] WARN [Consumer clientId=consumer-4, groupId=null] Received unknown topic or partition error in fetch for partition sourcetopic-0 (org.apache.kafka.clients.consumer.internals.Fetcher:1250)
PS: I am running MM2 in dedicated cluster mode
You would need to write some api that will update the MirrorMaker without the deleted topic after the topic gets deleted and restart the consumer (happens automatically when you POST an update to the Connect API)
There's no way for it to know a topic should be consumed or not, as it has a static configuration

Kafka brokers shuts down because log dirs have failed

I have a 3 broker Kafka clusters with the Kafka logs in the /tmp directory. I am running Debezium Source Connector to MongoDB which polls data from 4 collections.
However within 5 mins after starting the connector, the Kafka brokers are shutting down with the following error:
[2020-04-16 18:25:08,642] ERROR Shutdown broker because all log dirs in /tmp/kafka-logs-1 have failed (kafka.log.LogManager)
I have tried the different suggestions viz. Deleting the Kafka logs and cleaning out the Zookeeper logs. But I ran into the same problem again.
I have also noticed that the kafka logs occupy 100% of the /tmp directory when this happens. So I have also changed the log retention policy based on size.
log.retention.hours=168
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=10000
This also turned up to be futile.
I would like to have some assistance regarding this. Thanks in advance!
Your log files got corrupted probably because you've ran out of storage.
I would suggest to change log.dirs in server.properties. Also make sure that you don't use the tmp/ location, as this is going to be purged once your machine turns off. Once you have changed log.dirs you can restart Kafka.
Note that the older messages will be lost.

Kafka Offsets.storage

What is the "offsets.storage" for kafka 0.10.1.1?
As per the documentation it shows up under Old Consumer Configs as "zookeeper".
offsets.storage zookeeper Select where offsets should be stored (zookeeper or kafka).
My consumer is spring-boot-1.5.13 RELEASE app which uses kafka-clients-0.10.1.1 internally. As per the source code ConsumerConfig.scala, offsetStorage is "zookeeper", but when I run the consumer, I see the "__consumer_offsets" are getting created under /tmp/kafka-logs directory which is defined in server.properties [i.e. broker];
Moreover it doesn't show up under zookeeper ephemeral nodes, when I check with zookeeper-shell.sh.
ls /consumers
[]
If the offsets.stroage is zookeeper, then why does the __consumer_offsets show up under /tmp/kafka-logs and doesn't show up in zookeeper ephemeral nodes?
Spring Kafka uses the "new" consumer (Java) not the old scala consumer.

Checking Offset of Kafka topic for a storm consumer

I am using storm-kafka-client 1.2.1 and creating my spout config for KafkaTridentSpoutOpaque as below
kafkaSpoutConfig = KafkaSpoutConfig.builder(brokerURL, kafkaTopic)
.setProp(ConsumerConfig.GROUP_ID_CONFIG,"storm-kafka-group")
.setProcessingGuarantee(ProcessingGuarantee.AT_MOST_ONCE)
.setProp(ConsumerConfig.CLIENT_ID_CONFIG,InetAddress.getLocalHost().getHostName())
I am unable to find neither my group-id nor the offset in both Kafka and Zookeeper. Through Zookeeper I tried with zkCli.sh and tried ls /consumers but there were none as I think Kafka itself is now maintaining offsets rather than zookeeper.
I tried with Kafka too with the command below
bin/kafka-run-class.sh kafka.admin.ConsumerGroupCommand --list --bootstrap-server localhost:9092
Note: This will not show information about old Zookeeper-based consumers.
console-consumer-20130
console-consumer-82696
console-consumer-6106
console-consumer-67393
console-consumer-14333
console-consumer-21174
console-consumer-64550
Can someone help me how I can find my offset and will it replay my events in Kafka again if I restart the topology ?
Trident doesn't store offsets in Kafka, but in Storm's Zookeeper. If you're running with default settings for Storm's Zookeeper config the path in Storm's Zookeeper will be something like /coordinator/<your-topology-id>/meta.
The objects below that path will contain the first and last offset, as well as topic partition for each batch. So e.g. /coordinator/<your-topology-id>/meta/15 would contain the first and last offset emitted in batch number 15.
Whether the spout replays offsets after restart is controlled by the FirstPollOffsetStrategy you set in the KafkaSpoutConfig. The default is UNCOMMITTED_EARLIEST, which does not start over on restart. See the Javadoc at https://github.com/apache/storm/blob/v1.2.1/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L126.

Recovering Kafka Data from .log Files

I have a 1-node kafka that crashed recently. I was able to salvage the .log and .index files from /tmp/kafka-logs/mytopic-0/ and I have moved these files to a different server and installed kafka on it.
Is there a way to have the new kafka server serve the data contained in these .log files?
Update:
I probably didn't do this the right way, but here is what I've tired:
created a topic named recovermytopic on the new kafka server
stopped kafka
moved all the .log files into /tmp/kafka-logs/recovermytopic-0
restarted kafka
it appeared that for each .log file, kafka generated a .index file, looked promising but after the index files were created, I saw messeages below:
WARN Partition [recovermytopic,0] on broker 0: No checkpointed highwatermark is found for partition [recovermytopic,0] (kafka.cluster.Partition)
INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [recovermytopic,0] (kafka.server.ReplicaFetcherManager)
When I try to check the topic using kafka-console-consumer, the kafka server says:
INFO Closing socket connection to /127.0.0.1. (kafka.network.Processor)
no messages being consumed..
Kafka comes packaged with a DumpLogSegments tool that will extract messages (along with offsets, etc.) from Kafka data log files:
$KAFKA_HOME/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files mytopic-0/00000000000000132285.log > 00000000000000132285_messages.out
The output will vary a bit depending on which version of Kafka you're using, but it should be easy to extract the message keys and values with the use of sed or some other tool. The messages can then be replayed into your Kafka cluster using the kafka-console-producer.sh tool, or programmatically.
While this method is a bit roundabout, I think it's more transparent/reliable than trying to get a broker to start with data log files obtained from somewhere else. I've tested the DumpLogSegments tool with various versions of Kafka from 0.9 all the way up to 2.4.