__consumer_offset is unable to sync - apache-kafka

I am using mm2 with below properties
source(A),sink(B) clusters both have their own separate zookeeper
I consume some data from topic test in source A.
then I stopped consumer, and start mirror process
when I pointed consumer with same group id to sink then it start consuming from beginning. I am expecting it should start in sink from where it left off in source.
###############
A.bootstrap.servers = localhost:9092
B.bootstrap.servers = localhost:9093
A->B.enabled = true
A->B.topics = test
#B->A.enabled = true
#B->A.topics = .*
checkpoints.topic.replication.factor=1
heartbeats.topic.replication.factor=1
offset-syncs.topic.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
config.storage.replication.factor=1```

Since Kafka 2.7, MirrorMaker can automatically mirror consumer group offsets by setting sync.group.offsets.enabled=true.
In your example:
A->B.sync.group.offsets.enabled=true
Before 2.7, MirrorMaker does not automatically commit consumer group offsets and you need to use RemoteClusterUtils to do the offsets translation.

Related

How Apache Flume and Kafka works together?

Regarding this configuration my understanding is flume is reading message to kafka topic source-topic , push this message/event to kafka channel/topic test-topic and then sink consume it and write it to ElasticSearch.
To test this flow, I explicitly pushed 1 message/event to kafka topic source-topic and was expecting this event on sink side. But it did not work for me.
Then I did some debugging on it and thought message / event must be in kafka channel. But when I tried to run the bin/kafka-topics.sh --list --zookeeper localhost:2181 command then it did not return test-topic on console.
Now my question is , is this channel name is not kafka topic ?
if not then how can I query the event from kafka channel or may be if someone can help me to understand this flow.
test.sources = ks
test.sinks = es
test.channels = kc
# SOURCES
test.sources.ks.type = org.apache.flume.source.kafka.KafkaSource
test.sources.ks.zookeeperConnect = 127.0.0.1:2181
test.sources.ks.topic = source-topic
test.sources.ks.groupId = cst
test.sources.ks.batchSize = 1000
test.sources.ks.batchDurationMillis = 1000
test.sources.ks.kafka.consumer.timeout.ms = 100
test.sources.ks.kafka.auto.offset.reset = smallest
# sink
test.sinks.es.type = org.es.TestElasticSearchSink
test.sinks.es.hostNames = 127.0.0.1:9200
test.sinks.es.indexName = test-idx
test.sinks.es.batchSize = 1000
test.sinks.es.iaCacheLifetime = 20
# Normal channel
test.channels.kc.type = org.kc.TestKafkaChannel
test.channels.kc.capacity = 10000
test.channels.kc.transactionCapacity = 1000
test.channels.kc.brokerList = 127.0.0.1:9092
test.channels.kc.topic = test-topic
test.channels.kc.zookeeperConnect = 127.0.0.1:2181
test.channels.kc.parseAsFlumeEvent = false
test.channels.kc.readSmallestOffset = true
test.channels.kc.groupId = test-flume
You will probably want to pre-create all necessary Kafka topics before starting Flume. However, it's not clear what is org.kc.TestKafkaChannel, or org.es.TestElasticSearchSink. Flume has provided classes for both of these (Kafka channel +Elasticsearch sink), I believe, so anything "not working" would begin in either of your "custom" classes here...
Alternatively, Kafka Connect already has an Elasticsearch sink connector, so you don't need an intermediate Kafka topic just to send data between Kafka and Elasticsearch. Logstash would work as well.

Kafka Mirror Maker 2 Offset Replication Not Working

We are testing DR Scenario for kafka. we have 2 kafka cluster in separate region. We are using MirrorMaker2 to replicate the topics and messages.
Topics and messages are able to replicate. But we are observing offset is not replicating.
e.g
produced 10 messages from producuder pointed to kafka region 1.
Consumed 5 messages on from consumer pointed to kafka region 1
stop consumer pointed to region1
start consumer pointed to region2
consume the message
here expectation is region 2 consumer should consume from offset 6
but it starts consuming from offset 0
below is property file
clusters = primary, secondary
# primary cluster information
primary.bootstrap.servers = test1-primary.com:9094,test2-primary.com.apttuscloud.io:9094,test3-primary.com:9094
primary.security.protocol= SASL_SSL
primary.ssl.truststore.password= dummypassword
primary.ssl.truststore.location= /opt/bitnami/kafka/config/certs/kafka.truststore.jks
primary.ssl.keystore.password= dummypassword
primary.ssl.keystore.location= /opt/bitnami/kafka/config/certs/kafka.keystore.jks
primary.ssl.endpoint.identification.algorithm=
primary.sasl.mechanism= PLAIN
primary.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dummyuser" password="dummypassword";
# secondary cluster information
secondary.bootstrap.servers = test1-secondary.com:9094,test2-secondary.com.apttuscloud.io:9094,test3-secondary.com:9094
secondary.security.protocol= SASL_SSL
secondary.ssl.truststore.password= dummypassword
secondary.ssl.truststore.location= /opt/bitnami/kafka/config/certs/kafka.truststore.jks
secondary.ssl.keystore.password= dummypassword
secondary.ssl.keystore.location= /opt/bitnami/kafka/config/certs/kafka.keystore.jks
secondary.ssl.endpoint.identification.algorithm=
secondary.sasl.mechanism=PLAIN
secondary.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dummyuser" password="dummypassword";
# Topic Configuration
primary->secondary.enabled = true
primary->secondary.topics = .*
secondary->primary.enabled = true
secondary->primary.topics = .*
############################# Internal Topic Settings #############################
# The replication factor for mm2 internal topics "heartbeats", "B.checkpoints.internal" and
# "mm2-offset-syncs.B.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3
checkpoints.topic.replication.factor= 3
heartbeats.topic.replication.factor= 3
offset-syncs.topic.replication.factor= 3
# The replication factor for connect internal topics "mm2-configs.B.internal", "mm2-offsets.B.internal" and
# "mm2-status.B.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3
replication.factor = 3
refresh.topics.enabled = true
sync.topic.configs.enabled = true
refresh.topics.interval.seconds = 10
topics.blacklist = .*[\-\.]internal, .*\.replica, __consumer_offsets
groups.blacklist = console-consumer-.*, connect-.*, __.*
primary->secondary.emit.heartbeats.enabled = true
primary->secondary.emit.checkpoints.enabled = true
Please note some confedentilal values are placed with dummy values
Regards,
Narendra Jadhav
With MirrorMaker 2.5, when moving consumers between clusters, offsets are not automatically translated.
So upon starting consumers on another cluster, consumers need to use RemoteClusterUtils.translateOffsets() to find their offsets in this cluster.
In 2.7 (expected November 2020), you can have MirrorMaker 2 automatically translate offsets, see https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0

Apache Beam KafkaIO mention topic partition instead of topic name

Apache Beam KafkaIO has support for kafka consumers to read only from specified partitions. I have the following code.
KafkaIO.<String, String>read()
.withCreateTime(Duration.standardMinutes(1))
.withReadCommitted()
.withBootstrapServers(endPoint)
.withConsumerConfigUpdates(new ImmutableMap.Builder<String, Object>()
.put(ConsumerConfig.GROUP_ID_CONFIG, groupName)
.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 5)
.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
.build())
.commitOffsetsInFinalize()
.withTopicPartitions(List<TopicPartitions>)
I have the following 2 questions.
How do I get the partition names from kafka? How do I mention it in kafkaIO?
Does Apache beam spawn the number of kafka consumers equal to the partition list mentioned during the creation of the kafka consumer?
I found the answers myself.
How do I tell kafkaIO to read from particular partitions?
kafkaIO has the method withTopicPartitions(List<TopicPartitions>) which accepts a list of TopicPartition objects.
Topic Partitions are named as sequential numbers starting from zero. Hence, the following should work
KafkaIO.<String, String>read()
.withCreateTime(Duration.standardMinutes(1))
.withReadCommitted()
.withBootstrapServers(endPoint)
.withConsumerConfigUpdates(new ImmutableMap.Builder<String, Object>()
.put(ConsumerConfig.GROUP_ID_CONFIG, groupName)
.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 5)
.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest")
.build())
.commitOffsetsInFinalize()
.withTopicPartitions(Arrays.asList(new TopicPartition(topicName, 0),new TopicPartition(topicName, 1),new TopicPartition(topicName, 2)))
To test it out, use kafkacat and the following command
kafkacat -P -b localhost:9092 -t sample -p 0 - This command produces to specified partition.
Does Apache beam spawn the number of kafka consumers equal to the partition list mentioned during the creation of the kafka consumer?
It will spawn a single consumer group with the number of consumers as the number of partitions mentioned during the building of the kafka Producer object explicitly.

kafka MirrorMaker : No broker partitions consumed by consumer thread kafka-mirror

This is regarding kafka MirrorMaker tool.
I have configured kafka on two machines.
source:
destination: vm [ubuntu at the source only]
Kafka at both source and destination are of same version of kafka [kafka_2.11-0.9.0.0]
At source and destination, respective zookeeper and kafka servers are running.
with the MirrorMaker tool I wanted to replicate/make mirror of topics from source to destination.
Below is the command , that I have used:
./bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config ./config/mirror_consumer.properties --producer.config ./config/mirror_producer.properties --whitelist='.*' &>mirror-log.log
configuration files contains
a. mirror_consumer.properties
#host:port of kafka source zookeeper to be mirrored
zookeeper.connect=source-ip:3181
zookeeper.connection.timeout.ms=1000000
consumer.timeout.ms=-1
security.protocol=PLAINTEXT
group.id=kafka-mirror
where,
source-ip is ip address of source machine.
my zookeeper at source is running at port 3181.
b. mirror_producer.properties
# mirror broker (local) at the destination
bootstrap.servers=localhost:9092
producer.type=async
where,
localhost, resolves to destination i.e. ubuntu vm
and kafka is runnning on default port i.e. 9092
Initially, I have created few topics with name say source1 and source2.
From source machine with respective producers from command line I have sent some messages to the topics created.
after executing the MirrorMaker command from destination,
I could see that the consumer at destination is trying to consume the topics.
Unfortunately, consumer at destination fails to read the partitions from broker for each topic.
please have a look at the sample log entry below:
[2016-05-06 13:25:00,931] WARN No broker partitions consumed by consumer thread kafka-mirror_mojes-VirtualBox-1462521159741-6c2475c3-0 for topic source1 (kafka.consumer.RangeAssignor)
[2016-05-06 13:25:00,931] WARN No broker partitions consumed by consumer thread kafka-mirror_mojes-VirtualBox-1462521295337-c3742307-0 for topic source1 (kafka.consumer.RangeAssignor)
[2016-05-06 13:25:00,931] WARN No broker partitions consumed by consumer thread kafka-mirror_mojes-VirtualBox-1462517840512-a134d048-0 for topic source2 (kafka.consumer.RangeAssignor)
[2016-05-06 13:25:00,932] WARN No broker partitions consumed by consumer thread kafka-mirror_mojes-VirtualBox-1462519206297-63bc9c58-0 for topic source2 (kafka.consumer.RangeAssignor)
[2016-05-06 13:25:00,932] WARN No broker partitions consumed by consumer thread kafka-mirror_mojes-VirtualBox-1462519513695-bee7950e-0 for topic source2 (kafka.consumer.RangeAssignor)
Please let me know , if you see anything that is missing / need to be fixed.
It would be great help.
Thanks in advance.
We get this issue when there is a mismatch between the number of partitions in a topic to the number of consumers in a consumer group feeding to the same topic.

Kafka 0.8, is it possible to create topic with partition and replication using java code?

In Kafka 0.8beta a topic can be created using a command like below as mentioned here
bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 2 --partition 3 --topic test
the above command will create a topic named "test" with 3 partitions and 2 replicas per partition.
Can I do the same thing using Java ?
So far what I found is using Java we can create a producer as seen below
Producer<String, String> producer = new Producer<String, String>(config);
producer.send(new KeyedMessage<String, String>("mytopic", msg));
This will create a topic named "mytopic" with the number of partition specified using the "num.partitions" attribute and start producing.
But is there a way to define the partition and replication also ? I couldn't find any such example. If we can't then does that mean we always need to create topic with partitions and replication (as per our requirement) before and then use the producer to produce message within that topic. For example will it be possible if I want to create the "mytopic" the same way but with different number of partition (overriding the num.partitions attribute) ?
Note: My answer covers Kafka 0.8.1+, i.e. the latest stable version available as of April 2014.
Yes, you can create a topic programatically via the Kafka API. And yes, you can specify the desired number of partitions as well as the replication factor for the topic.
Note that the recently released Kafka 0.8.1+ provides a slightly different API than Kafka 0.8.0 (which was used by Biks in his linked reply). I added a code example to create a topic in Kafka 0.8.1+ to my reply to the question How Can we create a topic in Kafka from the IDE using API that Biks was referring to above.
`
import kafka.admin.AdminUtils;
import kafka.cluster.Broker;
import kafka.utils.ZKStringSerializer$;
import kafka.utils.ZkUtils;
String zkConnect = "localhost:2181";
ZkClient zkClient = new ZkClient(zkConnect, 10 * 1000, 8 * 1000, ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = new ZkUtils(zkClient, new ZkConnection(zkConnect), false);
Properties pop = new Properties();
AdminUtils.createTopic(zkUtils, topic.getTopicName(), topic.getPartitionCount(), topic.getReplicationFactor(),
pop);
zkClient.close();`