What does running following Kafka tool actually give ?
./bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --throughput=10000--topic=TOPIC--num-records=50000000 --record-size=200 --producer-props bootstrap.servers=SERVERS buffer.memory=67108864 batch.size=64000
When running with a single producer I get 90MB/s. When I use 3 separate producers on separate nodes I get only around 60 MB/s per producer. ( My Kafka cluster consists of 2 nodes, and topic has 6 partitions )
What does 90 MB/s mean? Is it the maximum rate at which a producer can produce?
Does partition count affect this value?
Why it drops to 60 MB/s when there are 3 producers ( still no network saturation on broker front)?
Thank you
Related
Partitions define the unit of parallelism in kafka , but increasing partitions may result in decreased producer throughput as due to replication,cluster bandwidth will decrease.
but in experiments it was observed ,
With 3 brokers: When we take 2 partitions on each broker then performance reduces as compared to 1 partition on each broker.
With 9 brokers: When we take 3 partitions on each broker then performance increases as compared to 1 partition on each broker.
Considering the scenerio of 3 brokers the performance should have degraded but it increased.
What can be the reason for such behaviour ??
Experiment details:
kafka-producer-perf-test was used to do benchmarking
Parameters passed to tool: --num-records 12000000 --throughput -1 acks=1 linger.ms=100 buffer.memory=5242880 compression.type=none request.timeout.ms=30000 --record-size 1000
Results of test in attached image
I want to copy all messages from a topic in Kafka cluster. So I ran Kafka Mirrormaker however it seems to have copied roughly only half of the messages from the source cluster (I checked that there's no consumer lag in source topic). I have 2 brokers in the source cluster does this have anything to do with this?
This is the source cluster config:
log.retention.ms=1814400000
transaction.state.log.replication.factor=2
offsets.topic.replication.factor=2
auto.create.topics.enable=true
default.replication.factor=2
min.insync.replicas=1
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
The source topic has 4 partitions and is not compacted. The Mirrormaker config is:
mirrormaker-consumer.properties
bootstrap.servers=broker1:9092,broker2:9092
group.id=picturesGroup3
auto.offset.reset=earliest
mirrormaker-producer.properties
bootstrap.servers=localhost:9092
max.in.flight.requests.per.connection=1
retries=2000000000
acks=all
max.block.ms=2000000000
Below are the stats from Kafdrop on the source cluster topic:
Partition
First Offset
Last Offset
Size
Leader Node
Replica Nodes
In-sync Replica Nodes
Offline Replica Nodes
Preferred Leader
Under-replicated
0
13659
17768
4109
1
1
1
Yes
No
1
13518
17713
4195
2
2
2
Yes
No
2
13664
17913
4249
1
1
1
Yes
No
3
13911
18072
4161
2
2
2
Yes
No
and these are the stats for the target topic after Mirrormaker run:
Partition
First Offset
Last Offset
Size
Leader Node
Replica Nodes
In-sync Replica Nodes
Offline Replica Nodes
Preferred Leader
Under-replicated
0
2132
4121
1989
1
1
1
Yes
No
1
2307
4217
1910
1
1
1
Yes
No
2
2379
4294
1915
1
1
1
Yes
No
3
2218
4083
1865
1
1
1
Yes
No
As you can see roughly only half of the source messages are in the target topic based on size column. What am I doing wrong?
I realized that the issue happened because I was copying data from a cluster with 2 brokers to a cluster with 1 broker. So I assume Mirrormaker1 just copied data from one broker from original cluster. When I configured the target cluster to have 2 brokers all of the messages were copied to it.
Regarding the advice of #OneCricketeer to use Mirrormaker2 this also worked however it took me a while to get to correct configuration file:
clusters = source, dest
source.bootstrap.servers = sourcebroker1:9092,sourcebroker2:9092
dest.bootstrap.servers = destbroker1:9091,destbroker2:9092
topics = .*
groups = mm2topic
source->dest.enabled = true
offsets.topic.replication.factor=1
offset.storage.replication.factor=1
auto.offset.reset=latest
In addition Mirrormaker2 can be found in connect container in this KafkaConnect project (enter the container and in the /kafka/bin directory there will be connect-mirror-maker.sh executable).
A major downside with Mirrormaker2 solution is it will add a prefix to the topics in target cluster (in my case new names would require changing application code). The prefix can't be changed in Mirrormaker2 configuration so the only way is to implement a custom Java class as explained here.
I have a kafka streams application with 4 instances, each runing on a separate ec2 instance with 16 threads. Total threads = 16 * 4. The input topic has only 32 partitions. I understand that some of the threads will remain idle.
I am continously seeing this exception
Caused by: org.apache.kafka.common.errors.InvalidProducerEpochException: Producer attempted to produce with an old epoch.
01:57:23.971 [kafka-producer-network-thread | bids_kafka_streams_beta_007-fd78c6fa-62bc-437d-add0-c31f5b7c1901-StreamThread-12-1_6-producer] ERROR org.apach
e.kafka.streams.processor.internals.RecordCollectorImpl - stream-thread [bids_kafka_streams_beta_007-fd78c6fa-62bc-437d-add0-c31f5b7c1901-StreamThread-12] t
ask [1_6] Error encountered sending record to topic kafka_streams_bids_output for task 1_6 due to:
org.apache.kafka.common.errors.InvalidProducerEpochException: Producer attempted to produce with an old epoch.
Written offsets would not be recorded and no more records would be sent since the producer is fenced, indicating the task may be migrated out
The only settings I have change in the streams config are the producer configs to reduce CPU usage on brokers
linger.ms=10000
commit.interval.ms=10000
Records are windowed by 2 mins
Is it due to rebalancing? Why so frequent?
I am trying to come up with a configuration that would enforce producer quota setup based on an average byte rate of producer.
I did a test with a 3 node cluster. The topic however was created with 1 partition and 1 replication factor so that the producer_byte_rate can be measured only for 1 broker (the leader broker).
I set the producer_byte_rate to 20480 on client id test_producer_quota.
I used kafka-producer-perf-test to test out the throughput and throttle.
kafka-producer-perf-test --producer-props bootstrap.servers=SSL://kafka-broker1:6667 \
client.id=test_producer_quota \
--topic quota_test \
--producer.config /myfolder/client.properties \
--record.size 2048 --num-records 4000 --throughput -1
I expected the producer client to learn about the throttle and eventually smooth out the requests sent to the broker. Instead I noticed there is alternate throghput of 98 rec/sec and 21 recs/sec for a period of more than 30 seconds. During this time average latency slowly kept increseing and finally when it hits 120000 ms, I start to see Timeout exception as below
org.apache.kafka.common.errors.TimeoutException : Expiring 7 records for quota_test-0: 120000 ms has passed since batch creation.
What is possibly causing this issue?
The producer is hitting timeout when latency reaches 120 seconds (default value of delivery.timeout.ms )
Why isnt the producer not learning about the throttle and quota and slowing down or backing off
What other producer configuration could help alleviate this timeout issue ?
(2048 * 4000) / 20480 = 400 (sec)
This means that, if your producer is trying to send the 4000 records full speed ( which is the case because you set throughput to -1), then it might batch them and put them in the queue.. in maybe one or two seconds (depending on your CPU).
Then, thanks to your quota settings (20480), you can be sure that the broker won't 'complete' the processing of those 4000 records before at least 399 or 398 seconds.
The broker does not return an error when a client exceeds its quota, but instead attempts to slow the client down. The broker computes the amount of delay needed to bring a client under its quota and delays the response for that amount of time.
Your request.timeout.ms being set to 120 seconds, you then have this timeoutException.
I use Kafka to maintain a Python service which should be working on parallel to handle the slow API requests for each message efficiently.
I used multiprocessing module on Python and kafka-python for the consumers.
ZooKeeper and Kafka 2.11 runs on the same Ubuntu server with mostly defult configurations.
The topic is auto-created with another kafka-python producer and set to have 10 partitions in order to use up to 10 consumers at the same time.
When I check, I see that the queue is really long, so the producer sends so many requests:
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic usrReq --time -1
usrReq:8:1157
usrReq:2:1185
usrReq:5:1167
usrReq:4:1115
usrReq:7:1164
usrReq:10:1150
usrReq:1:1149
usrReq:9:1138
usrReq:3:1186
usrReq:6:1220
usrReq:0:6264
However; although working 10 cores in parallel, consumers take very long time (117 seconds on a sample log below) to get the next message from the queue.
thread 7, consumer: 117.485 sec
api1:0.412 sec
api2:0.752 sec
db_insert:0.132 sec
This is how each Process creates its own consumer, fetches messages and runs the analysis on the code:
consumer = KafkaConsumer(group_id='my-group',
bootstrap_servers='localhost',
value_deserializer=lambda m: json.loads(m.decode('ascii'))
consumer.subscribe(topics='usrReq')
while True:
msg = next(consumer).value['id']
method(msg)
Where could be the problem in this setup?