How to influnce into the kafka partition leader election process - apache-kafka

I am going to setup a kafka cluster for our intensive messaging system.
Currently we have setup two kafka clusters, one based in London (LD) as primary and antoher one based in New York (NY) as DR ( backup), and we have made java clients to replicate data from LD to NY.
As Kafka has built-in features such as partitioning, and replication for scalibity, high availability and failover purpose so that we want to create a single bigger cluster comprising of both servers in London and New York
But...
We are having the problem with connectivity between NY and LD servers, the network speed is really bad.
I have performed server tests.
producer config:
- acks=1 ( requires acknowlegement from partition leader only)
- sending Async.
when producers in London sending messages to brokers in LD , the thoughput 100,000 msg /sec, providing message size is : 100bytes => 10MB/sec
when producers in London and sending message to broker in NY, the thoughput 10 msg/sec, providing message size is : 100bytes => 1KB/sec
So...
I am considering any way to make sure the producer/consumer take the advantage of locality that means if they are in the same network will send messages to the neariest broker.
Lets say: consumers in LD will send messages to LD based brokers.
(I understand that the write/read request only happens on partition leader).
Any suggestion would be highly appriciate.

From what I understood your current structure is:
1 Broker located in NY.
1 Broker located in LD.
n number of topics. (I am going to assume the number of topics is 1).
n number of partitions on the topic. (I am going to assume the number of partitions is 2).
Both of the partitions replicated over the brokers.
You want to make broker located in LD leader of all the partitions, so all the producers will interact with this broker and the broker located in NY will be used as replication. If this is the case, then, you can do the following:
Check the configuration of your topic:
./kafka-topics.sh --describe --topic stream-log
Topic:<topic-name> PartitionCount:2 ReplicationFactor:2 Configs:
Topic: stream-log Partition: 0 Leader: 0 Replicas: 0,1 Isr: 0,1
Topic: stream-log Partition: 1 Leader: 1 Replicas: 1,0 Isr: 1,0
And assuming:
LD Broker ID: 0
NY Broker ID: 1
You can observe how the leader of the partition 1 is handled by the broker 1 (NY), we want to modify that, to do so is necessary to reassign the partitions:
./kafka-reassign-partitions.sh --reassignment-json-file manual_assign.json --execute
The contents of the JSON file:
{"partitions": [
{"topic": "<topic-name>", "partition": 0, "replicas": [0,1]},
{"topic": "<topic-name>", "partition": 1, "replicas": [0,1]}
],
"version":1
}
Finally, to force kafka to update the leader, run:
./kafka-preferred-replica-election.sh
The last command will affect all the topics you have created if do not specify a list of topics, that should not be a problem but have it in mind.
Is worth to have a look to this guide, it explains something similar. And if you are curious you can check the official documentation of the tools here.

Related

Under what circumstances is endOffset > lastMsg.offset + 1?

Kafka returns endOffset 15 for a partition, but the last message that can be consumed from has the offset 13, rather than 14, which I would expect. I wonder why.
The Kafka docs read
In the default read_uncommitted isolation level, the end offset is the high watermark (that is, the offset of the last successfully replicated message plus one). For read_committed consumers, the end offset is the last stable offset (LSO), which is the minimum of the high watermark and the smallest offset of any open transaction.
Here's kafkacat's output. I'm using kafkacat, because it can print the message offsets:
$ kafkacat -Ce -p0 -tTK -f'offset: %o key: %k\n'
offset: 0 key: 0108
offset: 1 key: 0253
offset: 4 key: 0278
offset: 5 key: 0198
offset: 8 key: 0278
offset: 9 key: 0210
offset: 10 key: 0253
offset: 11 key: 1058
offset: 12 key: 0141
offset: 13 key: 1141
% Reached end of topic TK [0] at offset 15: exiting
What's also baffling - and it may very well be related - is that the offsets are not consecutive, although I have not set up compaction etc.
Some more details:
$ kafka-topics.sh --bootstrap-server localhost:9092 --topic TK --describe
Topic: TK PartitionCount: 2 ReplicationFactor: 1 Configs: segment.bytes=1073741824
Topic: TK Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: TK Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Printing the keys via kafka-console-consumer.sh:
$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TK \
--offset earliest --partition 0 --timeout-ms 5000 \
--property print.key=true --property print.value=false
0108
0253
0278
0198
0278
0210
0253
1058
0141
1141
[2021-09-15 10:54:06,556] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.TimeoutException
Processed a total of 10 messages
N.B. This topic has been produced to without involvement of transactions, and *) consumption is being done in read_uncommitted mode.
*) Actually, processing.guarantee is set to exactly_once_beta, so that would amount to using transactions.
More info
It turns out I can reliably reproduce this case with my Streams app (1. wipe kafka/zookeeper data, 2. recreate topics, 3. run app), whose output is the topic that shows this problem.
I've meanwhile trimmed down the Streams app to this no-op topology and can still reproduce it:
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [TK1])
--> KSTREAM-SINK-0000000001
Sink: KSTREAM-SINK-0000000001 (topic: TK)
<-- KSTREAM-SOURCE-0000000000
News
Meanwhile I have replaced the locally running Kafka broker (2.5.0) with one running in a Docker container (wurstmeister/kafka:2.13-2.6.0). The problem persists.
The app is using kafka libraries versioned 6.0.1-ccs, corresponding to 2.6.0.
You should avoid doing calculations on offsets, Kafka ensures any new offset will merely be greater than the last one. You may wish to use keys and track whether or not you have received the proper amount of messages by verifying the proper number of keys have been received.
Kafka has many things to juggle such as Exactly-Once Semantics, re-sending messages, and other internal tasks related to the topic. Those messages will be discard (not share with you). You will only see your messages, and those message offsets will only go up.
These transaction markers are not exposed to applications, but are used by consumers in read_committed mode to filter out messages from aborted transactions and to not return messages which are part of open transactions
When I remove the setting processing.guarantee: exactly_once_beta the problem goes away. In terms of this problem, it doesn't matter whether I use exactly_once_beta or exactly_once.
I still wonder why that happens with exactly_once(_beta) - after all, in my tests there is smooth sailing and no transaction rollbacks etc.
In my latest tests this rule seems to apply to all partitions with at least one item in them:
endOffset == lastMsg.offset + 3
Which is 2 more than expected.
The Kafka docs mentioned in the question say that
For read_committed consumers, the end offset is the last stable offset (LSO), which is the minimum of the high watermark and the smallest offset of any open transaction.
So is Kafka perhaps pre-allocating offsets for 2 (???) transactions per partition?
This seems to be the answer:
https://stackoverflow.com/a/54637004/200445
each commit (or abort) of a transaction writes a commit (or abort) marker into the topic -- those transactional markers also "consume" one offset

Meaning of `leader.imbalance.per.broker.percentage` in Apache Kafka Configuration

Going through the kafka docs, I found this particular configuration. leader.imbalance.per.broker.percentage.
What does leader.imbalance.per.broker.percentage mean intuitively? How can I simulate the working of this configuration?
Type: int
Default: 10
Valid Values:
Importance: high
Update Mode: read-only
Why is the value 10 as default?
leader.imbalance.per.broker.percentage defines the percentage of non-preferred leaders allowed. If the ratio goes over this value on a broker, and auto.leader.rebalance.enable is true, Kafka will automatically move the leadership for these partitions onto the preferred leader.
If a partition has multiple replicas, any of them can become the leader however there is always a preferred one. The preferred leader is the replica listed first in the replica list. For example, in the following Broker 0 is the preferred leader:
Topic:test PartitionCount:3 ReplicationFactor:3 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
By default, this setting is set to 10, so Kafka allows up to 10% of leaders to be on non-preferred replicas before electing the preferred replicas again.

Explain why metricbeat Kafka partition metric has a higher count than consumer metric

The problem
Hi, I am trying to visualize Kafka lags using Grafana. I have been trying to log kafka lags using Metricbeat and doing the math myself since Metricbeat does not support logging Kafka lags in the version that I am using (but it has been implemented recently). Instead of using max(partition.offset.newest) - max(consumergroup.offset) to calculate the lags, I am using sum(partition.offset.newest) - sum(consumergroup.offset) filtered on a particular kafka.topic.name. However, the sum does not tally, upon further investigation, I found out that the count does not even tally! The count for partition offsets is 30 per 10s while the count for consumergroup offsets is 12 per 10s. I expect the count for both to be the same
I do not understand why Metricbeat logs the partition more than the consumergroup. At first I thought it was because of my Metricbeat configuration where I have 2 host groups defined, which might caused it to be logged multiple times. However, after changing my configurations, the count just droppped by half.
TL;DR
Why is the Metricbeat counts of partition and consumergroup different?
Setup
Kafka 2 brokers
Kafka topic partitions:
Topic: xxx PartitionCount:3 ReplicationFactor:2 Configs:
Topic: xxx Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: xxx Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: xxx Partition: 2 Leader: 2 Replicas: 2,1 Isr: 2,1
Metricbeat config (modules.d/kafka.yml):
- module: kafka
#metricsets:
# - partition
# - consumergroup
period: 10s
hosts: ["xxx.yyy:9092"]
Versions
Kafka 2.11-0.11.0.0
Elasticsearch-7.2.0
Kibana-7.2.0
Metricbeats-7.2.0
after much debugging I have figured out what is wrong:
For some reason, my kafka broker 1 has only producer metric and no consumer metric, connecting to broker 2 solved this problem. Connecting both brokers will add both metrics together.
Lucene uses fuzzy search so my data has some other consumer groups inside as well. For exact word matching, use kafka.partition.topic.keyword: 'xxx' instead. This made the ratio of my kafka producer offset to consumer offset 2:1
metricbeat logs the replicas as well, so I need to set NOT kafka.partition.partition.is_leader: false to get all partition leaders. This made the consumer to partition ratio 1:1.
After the 3 steps is done, I can use the formula sum(partition.offset.newest) - sum(consumergroup.offset) to get the lags
However, I do not know why broker 1 doesn't have the consumer information.

Reading from multiple broker kafka with flink

I want to read multiple kafka from flink.
I have a cluser of 3 computers for kafka. With the following topic
Topic:myTopic PartitionCount:3 ReplicationFactor:1 Configs:
Topic: myTopic Partition: 0 Leader: 2 Replicas: 2 Isr: 2
Topic: myTopic Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: myTopic Partition: 2 Leader: 1 Replicas: 1 Isr: 1
From Flink I execute the following code :
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "x.x.x.x:9092,x.x.x.x:9092,x.x.x.x:9092");
properties.setProperty("group.id", "flink");
DataStream<T> stream = env.addSource(new FlinkKafkaConsumer09<>("myTopic", new SimpleStringSchema(), properties)
stream.map(....)
env.execute()
I launch 3 times the same job.
If I execute this code with one broker it's work well but with 3 broke (on 3 different machine) only one partition is read.
(In this question) the solution proposed was
to create separate instances of the FlinkKafkaConsumer for each cluster (that's what you are already doing), and then union the resulting streams
It's not working in my case.
So my questions are :
Do I missing something ?
If we had a new computer in the Kafka cluster do we need to change flink's code to add a consumer for the new borker ? Or can we handle this automatically at runtime ?
It seems you've misunderstood the concept of Kafka's distributed streams.
Kafka topic consists of several partitions (3 in your case). Each consumer can consume one or more of these partitions. If you start 3 instances of your app with the same group.id, each consumer will indeed read data from just one broker – it tries to distribute the load evenly so it's one partition per consumer.
I recommend to read more about this topic, especially about the concept of consumer groups in Kafka documentation.
Anyway FlinkKafkaConsumer09 can run in multiple parallel instances, each of which will pull data from one or more Kafka partitions. You don't need to worry about creating more instances of the consumer. One instance of consumer can pull records from all of the partitions.
I have no idea why you're starting the job 3 times instead of once with parallelism set to 3. That would solve your problem.
DataStream<T> stream =
env.addSource(new FlinkKafkaConsumer09<>("myTopic", new SimpleStringSchema(), properties))
.setParallelism(3);

Apache kafka : broker leader -1 (topic received from Orion via Cygnus)

I'm working with Apache Kafka and receiving topics from Orion Context Broker via Cygnus (Fiware Labs)
I'm receiving 10 topics, and I can see data arriving in the consumer console for 8 topics.
But for 2 others topics, I cannot see any data arriving. And there is no error code (the consumer is just empty). If i try to add a test line to the topic via the producer console, i get this error:
ERROR Error when sending message to topic sensors_presence2_sensors with key: null, value: 4 bytes with error: Batch Expired (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
So I used the describe command and I get this :
Topic:sensors_presence2_sensors PartitionCount:1 ReplicationFactor:1 Configs:
Topic: sensors_presence2_sensors Partition: 0 Leader: -1 Replicas: 2 Isr:
I'm just starting with Kafka, so for the moment I have 1 broker(0) and no partition. But why is my leader -1 ? This broker do not even exist. How can I change that ? I didn't choose the configuration for my topic, they arrived automatically from Cygnus (Orion Context Broker) with a OrionKafkaSink.
An example of one of the 8 topics that works well :
Topic:sensors_presence1_sensors PartitionCount:1 ReplicationFactor:1 Configs:
Topic:sensors_presence1_sensors Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Thanks
Edit : In Cygnus logs, it shows that data is correctly send to kafka :
time=2016-03-02T11:07:09.504UTC | lvl=INFO | trans=1456915468-194-0000000039 | srv=egmmqtt | subsrv=egmmqttpath | function=persistAggregation | comp=Cygnus | msg=com.telefonica.iot.cygnus.sinks.OrionKafkaSink[279] : [kafka-sink] Persisting data at OrionKafkaSink. Topic (sensors_presence2_sensors), Data (...
The result of describe command is showing Replicas:2 and Isr: (empty) that means broker with id 2 was active at the time of creation of that topic and the same broker(id=2) is not active now. Because of that Isr( In sync replicas) showing empty.
There is no chance of getting Replicas: 2 when you have only one node (broker-id=0) kafka cluster. Make broker-2 up and everything will works well.
Hope this helps!