I am new beginner of kafka, i try to use COMPACT mode in kafka.
I look kafka document, I think setting log.cleaner.enable=true and log.cleanup.policy=compact for COMPACT modem, but it seem not work.
Could you please help me. Thanks.
environment
Ubuntu 14.04.3 LTS
Kafka kafka_2.11-0.8.2.2
kafka config
log.retention.ms=10000
log.retention.check.interval.ms=1000
log.cleaner.enable=true
log.cleaner.delete.retention.ms=1000
log.cleaner.backoff.ms=1000
log.cleaner.min.cleanable.ratio=0.01
log.cleanup.policy=compact
At kafka log, i see 'INFO [kafka-log-cleaner-thread-0], Starting (kafka.log.LogCleaner)’, so i think success to enable COMPACT mode.
I follow QUICK START of kafka document, and input some data to test topic.
root#50d8fe84c573:~/kafka_2.11-0.8.2.2# date
Sun Jan 10 14:23:26 UTC 2016
root#50d8fe84c573:~/kafka_2.11-0.8.2.2# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
[2016-01-10 14:23:38,312] WARN Property topic is not valid (kafka.utils.VerifiableProperties)
22:23:45
22:24:05
22:25:00
root#50d8fe84c573:~/kafka_2.11-0.8.2.2# date
Sun Jan 10 14:25:21 UTC 2016
root#50d8fe84c573:~/kafka_2.11-0.8.2.2# ./bin/kafka-console-consumer.sh --zookeeper localhost --topic test --from-beginning --property print.key=true
null 22:23:45
null 22:24:05
null 22:25:00
root#50d8fe84c573:~/kafka_2.11-0.8.2.2# date
Sun Jan 10 14:30:06 UTC 2016
root#50d8fe84c573:~/kafka_2.11-0.8.2.2# ./bin/kafka-console-consumer.sh --zookeeper localhost --topic test --from-beginning --property print.key=true
null 22:23:45
null 22:24:05
null 22:25:00
I expect only keep on same key, so 22:23:45 and 22:24:05 message will be removed.
Thanks for morganw suggest. I have been push message of has key.
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# ./bin/kafka-topics.sh --zookeeper localhost --topic test --delete
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# date
Wed Jan 13 01:21:59 UTC 2016
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# bpython3
from kafka import (
KafkaClient, KeyedProducer,
Murmur2Partitioner, RoundRobinPartitioner)
kafka = KafkaClient('localhost:9092')
producer = KeyedProducer(kafka)
>>> producer.send_messages(b'test', b'key', b'09:22:08')
[ProduceResponse(topic=b'test', partition=0, error=0, offset=0)]
>>> producer.send_messages(b'test', b'key', b'09:22:28')
[ProduceResponse(topic=b'test', partition=0, error=0, offset=1)]
>>> producer.send_messages(b'test', b'key', b'09:23:00')
[ProduceResponse(topic=b'test', partition=0, error=0, offset=2)]
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# date
Wed Jan 13 01:23:05 UTC 2016
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# ./bin/kafka-console-consumer.sh --zookeeper localhost --topic test --from-beginning --property print.key=true
key 09:22:08
key 09:22:28
key 09:23:00
^CConsumed 3 messages
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# date
Wed Jan 13 01:27:18 UTC 2016
root#9d8bdae48429:~/kafka_2.11-0.8.2.2# ./bin/kafka-console-consumer.sh --zookeeper localhost --topic test --from-beginning --property print.key=true
key 09:22:08
key 09:22:28
key 09:23:00
But in result, the test topic still keep ALL message.
I don’t know i miss somehow.
Could you please help me? Thanks. :-)
Related
Does anyone knows how to fix the error when creating a topic in Kafka?
C:\kafka\bin\windows>kafka-topics.bat --create --bootstrap-server localhost:2181 --replication-factor 1 --partition 1 --topic test
Exception in thread "main" joptsimple.UnrecognizedOptionException: partition is not a recognized option
at joptsimple.OptionException.unrecognizedOption(OptionException.java:108)
at joptsimple.OptionParser.handleLongOptionToken(OptionParser.java:510)
at joptsimple.OptionParserState$2.handleArgument(OptionParserState.java:56)
at joptsimple.OptionParser.parse(OptionParser.java:396)
at kafka.admin.TopicCommand$TopicCommandOptions.<init>(TopicCommand.scala:567)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:47)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
The parameter is partitions
The bootstrap server normally (default) runs in port 9092
C:\kafka\bin\windows>kafka-topics.bat --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test
In recent versions, you don't have to create topics on zookeeper. You can directly create topics on the bootstrap servers of Kafka. In the later version, they plan to remove the zookeeper altogether, so they are preparing for that in the current versions.
Use the below to create a new partition. I suggest adding the below parameters as well to control the topic behaviour appropriately.
kafka-topics.bat --create --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1 --config retention.ms=604800000 segment.bytes=26214400 retention.bytes=1048576000 min.insync.replicas=1 --topic test
I am using dockerised wurstmeister/kafka-docker. I created a topic using
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 27 --topic raw-sensor-data --config retention.ms=86400000
After a few days I tried changing retention period by
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name raw-sensor-data --add-config retention.ms=3600000
I also tried
bin/kafka-topics.sh --zookeeper locahost:2181 --alter --topic raw-sensor-data --config retention.ms=3600000
and
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic raw-sensor-data --config cleanup.policy=delete
This also gets reflected in topic describe details
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topics-with-overrides
Topic: raw-sensor-data PartitionCount: 27 ReplicationFactor: 1 Configs: cleanup.policy=delete,retention.ms=3600000
But I can still see old data and data is not getting deleted in 1 hour time.
In server.properties I have
log.retention.check.interval.ms=300000
Only closed log segments will be deleted. The default segment size is 1GB.
So, if you have less data in the topic, it will remain, regardless of the time that has passed.
we are trying to capture the retention bytes values from the topic - topic_test
we try the following example , but seems this isnt the right path from zookeeper
zookeeper-shell kafka1:2181,kafka2:2181,kafka3:2181 <<< "ls /brokers/topics/topic_test/partitions/88/state"
Connecting to kafka1:2181,kafka2:2181,kafka3:2181
Welcome to ZooKeeper!
JLine support is disabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[]
any idea where are the values of retention bytes per topic that can capture from zookeeper?
I did the following , but not see the retention bytes ( what is wrong here ) , we have kafka confluent version - 0.1
zookeeper-shell confluent1:2181 get /config/topics/test_topic
Connecting to kafka1:2181
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
{"version":1,"config":{}}
cZxid = 0xb30a00000038
ctime = Mon Jun 29 11:42:30 GMT 2020
mZxid = 0xb30a00000038
mtime = Mon Jun 29 11:42:30 GMT 2020
pZxid = 0xb30a00000038
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 25
numChildren = 0
Configurations are stored in Zookeeper under the /config path.
For example, for the topic topic_test:
# Create topic
./bin/kafka-topics.sh --bootstrap-server localhost:9092 --create \
--topic topic_test --partitions 1 --replication-factor 1 --config retention.bytes=12345
# Retrieve configs from Zookeeper
./bin/zookeeper-shell.sh localhost get /config/topics/topic_test
Connecting to localhost
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
{"version":1,"config":{"retention.bytes":"12345"}}
Note in most cases, you should not rely on direct access to Zookeeper but instead use the Kafka API to retrieve these values.
Using:
kafka-topics.sh:
./bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic topic_test
kafka-configs.sh:
./bin/kafka-configs.sh --bootstrap-server localhost:9092 --describe --entity-type topics --entity-name topic_test
The Admin API using describeConfigs()
Steps followed:
cd /opt/kafka_2.11-0.9.0.0
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic-test
bin/kafka-topics.sh --list --zookeeper localhost:2181
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test
Time of occurance: The moment You write anything in the producer bash this error starts coming up
Already Tried: eleting topics from zookeper bash and removing topic logs in tmp location
[2018-10-25 10:03:17,919] INFO [Kafka Server 0], started (kafka.server.KafkaServer)
[2018-10-25 10:03:18,080] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [topic-test,0] (kafka.server.ReplicaFetcherManager)
[2018-10-25 10:03:18,099] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions [topic-test,0] (kafka.server.ReplicaFetcherManager)
[2018-10-25 10:03:48,864] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.ArrayIndexOutOfBoundsException: 18
at org.apache.kafka.common.protocol.ApiKeys.forId(ApiKeys.java:68)
at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:39)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:79)
at kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:426)
at kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:421)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.run(SocketServer.scala:421)
at java.lang.Thread.run(Thread.java:748)
It would be very helpful if someone provides a deep insight for trouble shooting other error like this in the future.
I downloaded the latest /kafka_2.11-2.0.0 and followed the steps:
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic-test
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-test
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-test --from-beginning
Things working fine.
Please note --bootstrap-server localhost:9092 has changed in consumer script
I'm basically doing the kafka quickstart using kafka_2.11-2.0.0 which comes with zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT.
I also use Ubuntu:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
I do these with the order below:
start zookeeper
start kafka server0 with port 9092
start kafka server1 with port 9093
start kafka server2 with port 9094
create a topic: kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic my-replicated-topic
produce some messages: kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic
consume from server0: kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic
consume from server1: kafka-console-consumer.sh --bootstrap-server localhost:9093 --from-beginning --topic my-replicated-topic
check the leader for my-replicated-topic and find it to be server0 -> here's the tricky part; one should kill server1 and then maybe server2 but never server0 and then restore them just in order to get server0 to be the leader of my-replicated-topic
kill server0
check for the new leader (happens to be server2): kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-replicated-topic
produce some messages to server2: kafka-console-producer.sh --broker-list localhost:9094 --topic my-replicated-topic
consume from server2 (or server1): kafka-console-consumer.sh --bootstrap-server localhost:9094 --from-beginning --topic my-replicated-topic -> this will hang till restarting server0
starting again server0
consumer from server2 outputs all messages including the one sent to server2 when was the leader
What is wrong and how one would solve the problem so not to matter which server becomes the leader?
I think this is due to replication factor of "__consumer_offsets" topic. This is set to one in server.properties file for testing purpose.
set offsets.topic.replication.factor=3 for high availability.
Copied from KAFKA-7526 comment