How to get Kafka Connect to balance tasks/connectors evenly? - apache-kafka

I trying to run a Kafka Connect distributed cluster with 5 workers and 2 sink connectors. One connector is configured with max.tasks and actual tasks set to 5. The other connector is configured for 6 tasks.
If I understand things correctly this should mean 13 "work units" (1 connector + 5 tasks + 1 connector + 6 tasks) that need to be distributed to the workers. Which i figured would mean around 2-3 tasks/connectors per worker.
However, I'm finding that the eventual assignments are as follows:
Worker 0: (3 Work Units)
- Connector 0
- Connector 1 Task 0
- Connector 1 Task 1
Worker 1: ( 4 Work units)
- Connector 1
- Connector 0 Task 0
- Connector 0 Task 2
- Connector 1 Task 3
Worker 2: (2 Work units)
- Connector 0 Task 3
- Connector 0 Task 4
Worker 3: (4 Work units)
- Connector 0 Task 1
- Connector 1 Task 2
- Connector 1 Task 4
- Connector 1 Task 5
Worker 4: ( 0 Work Units)
I'm using Kafka Connect workers running 2.6.0 and a connector built against 2.6.0 libraries. Workers are deployed as containers in an orchestration system (Nomad) and have gone through several rounds of rolling restarts/reallocations but the balance always seems off.
Based on this I'm finding myself at a loss for the following:
I'd get that tasks may get a bit unbalanced while restarting or moving containers but i'd expect once the last one came up and joined the cluster a final balancing would have sorted things out. Can some one point me to why this might not be happening?
Is there a recommended way to trigger a rebalancing for the whole cluster? From the docs it seems that changing config for a connector or a worker failing/joining might cause a rebalance but that does not seem to be an ideal process.
Can the task balancing process be configured or controlled in anyway?

Related

High CPU utilization KSQL db

We are running KSQL db server on kubernetes cluster.
Node config:
AWS EKS fargate
No of nodes: 1
CPU: 2 vCPU (Request), 4 vCPU (Limit)
RAM: 4 GB (Request), 8 GB (Limit)
Java heap: 3 GB (Default)
Data size:
We have ~11 source topic with 1 partition, some one them having 10k record few has more than 100k records. ~7 sink topic but to create those 7 sink topic have ~60 ksql table, ~38 ksql streams & ~64 persistent queries because of joins and aggregation. So heavy computation.
KSQLdb version: 0.23.1 and we are using confluent official KSQL docker image
The problem:
When running our KSQL script we are seeing spike in CPU to 350-360% and memory 20-30%. And when that happen kubernetes restarting the server instance. Which is resulting ksql-migration to fail.
Error:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection
refused:
<deployment-name>.<namespace>.svc.cluster.local/172.20.73.150:8088
Error: io.vertx.core.VertxException: Connection was closed
We have 30 migration files, and each file has multiple table and stream creation.
And its always failing on v27.
What we have tried so far:
Running it alone. And in that case it pass with no error.
Increase initial CPU to 4 vCPU but no change in CPU utilization
Had 2 nodes with 2 partition in kafka, but that also had same issue with addition few data columns having no data.
So something is not right in our configuration or resource allocation.
What's the standard of deployment for KSQL in kubernetes? maybe its not meant for kubernetes.

Kafka Connect Task spawn strategy

I have a general question regarding Kafka-Connect. I went through documentation, blogs but couldn't find a straight answer.
If there are two workers, running single Connector(instance) then
How does a Connector(instance) decide when to spawn a new task, if eg. tasks.max = 10? Also, how does a Connector(instance) decide how many tasks to spawn, if eg. tasks.max = 10?
Does it depend upon underlying hardware configuration? eg. number of cores or memory or cpu utilization?
The exact algorithm is internal to Kafka-Connect but it generally relates to the number of partitions and topics. So for example if you set tasks.max = 10 and have the following sink connector configuration:
1 topic, 1 partition - then Kafka connect will only spawn a single task
2 topics, 1 partition each - then Kafka connect will spawn 2 tasks, 1 for each topic
2 topics, 5 partitions each - then Kafka connection will spawn 10 tasks, 1 for each topic partition
4 topics, 5 partitions each - the Kafka connection will spawn 10 tasks, each handling data from 2 topic partitions.
Got this explanation on another forum.

Incremental Cooperative Rebalancing leads to unevenly balanced connectors

We encounter lots of unevenly balanced connectors on our setup since the upgrade to Kafka 2.3 (also with Kafka connect 2.3) that should include the new Incremental Cooperative Rebalancing in Kafka connect explained here :
https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
Let me explain a bit our setup, we are deploying multiple Kafka connect clusters to dump Kafka topic on HDFS. A single connect cluster is spawned for each hdfs-connector, meaning that at any times, exactly one connector is running on a connect cluster. Those clusters are deployed on top of Kubernetes with randomly selected ips in a private poll.
Let's take an example. For this hdfs connector, we spawned a connect cluster with 20 workers. 40 tasks should run on this cluster, so we could expect to have 2 tasks per worker. But as shown in the command below, while querying the connect API after a while, the connector seems really unbalanced, some workers are even not working at all while one of them took ownership of 28 tasks.
bash-4.2$ curl localhost:8083/connectors/connector-name/status|jq '.tasks[] | .worker_id' | sort |uniq -c
...
1 "192.168.32.53:8083"
1 "192.168.33.209:8083"
1 "192.168.34.228:8083"
1 "192.168.34.46:8083"
1 "192.168.36.118:8083"
1 "192.168.42.89:8083"
1 "192.168.44.190:8083"
28 "192.168.44.223:8083"
1 "192.168.51.19:8083"
1 "192.168.57.151:8083"
1 "192.168.58.29:8083"
1 "192.168.58.74:8083"
1 "192.168.63.102:8083"
Here we would expect that the whole poll of workers are used and the connector evenly balanced after a while. We would expect to have somethings like :
bash-4.2$ curl localhost:8083/connectors/connector-name/status|jq '.tasks[] | .worker_id' | sort |uniq -c
...
2 "192.168.32.185:8083"
2 "192.168.32.53:8083"
2 "192.168.32.83:8083"
2 "192.168.33.209:8083"
2 "192.168.34.228:8083"
2 "192.168.34.46:8083"
2 "192.168.36.118:8083"
2 "192.168.38.0:8083"
2 "192.168.42.252:8083"
2 "192.168.42.89:8083"
2 "192.168.43.23:8083"
2 "192.168.44.190:8083"
2 "192.168.49.219:8083"
2 "192.168.51.19:8083"
2 "192.168.55.15:8083"
2 "192.168.57.151:8083"
2 "192.168.58.29:8083"
2 "192.168.58.74:8083"
2 "192.168.59.249:8083"
2 "192.168.63.102:8083"
The second result was actually achieved by manually killing some workers, and a bit of luck (we didn't found a proper way to force an even balance across the connect cluster for now, it's more a process of try and fail until the connector is evenly balanced.
Does anyone already came across this issue and manage to solve it properly ?

How to monitor consumer lag in kafka via jmx?

I have a kafka setup that includes a jmx exporter to prometheus. I'm looking for a metric, that gives the offset lag based on topic and groupid. I'm running kafka 2.2.0.
Some resources online point to a metric called kafka.consumer, but I have no such metric in my setup.
From my jmxterminal:
$>domains
#following domains are available
JMImplementation
com.sun.management
java.lang
java.nio
java.util.logging
jdk.management.jfr
kafka
kafka.cluster
kafka.controller
kafka.coordinator.group
kafka.coordinator.transaction
kafka.log
kafka.network
kafka.server
kafka.utils
I am, however, able to see the data I need by using the following command:
root#kafka-0:/kafka# bin/kafka-consumer-groups.sh --describe --group benchmark_consumer_group --bootstrap-server localhost:9092
Consumer group 'benchmark_consumer_group' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
benchmark_topic_10B 2 2795128 54223220 51428092 - - -
benchmark_topic_10B 9 4 4 0 - - -
benchmark_topic_10B 6 7 7 0 - - -
benchmark_topic_10B 7 5 5 0 - - -
benchmark_topic_10B 0 2834028 54224939 51390911 - - -
benchmark_topic_10B 1 15342331 54222342 38880011 - - -
benchmark_topic_10B 4 5 5 0 - - -
benchmark_topic_10B 5 6 6 0 - - -
benchmark_topic_10B 8 8 8 0 - - -
benchmark_topic_10B 3 4 4 0 - - -
But that does not help since I need to track if from a metric. Also, this command takes about 25 seconds to execute, making it unreasonable to use as a source for metrics.
My guess is that the metric kafka.consumer does not exist in version 2.2.0 and was replaced with another. Although, I can't find any resources online with up-to-date information on how and where to get that metric
You can give Kafka Minion ( https://github.com/cloudworkz/kafka-minion ) a try. While Kafka Minion internally works similiarly as Burrow (consumes __consumer_offsets topic for Consumer Group Offsets) it has several advantages for your use case
Advantages of Kafka Minion over Burrow for your case:
Has native prometheus support (no additional deployment necessary to just expose metrics to prometheus)
Has a sample Grafana dashboard
Has additional metrics (such as last commit timestamp for a consumergroup:topic:partition combination, commitrates, info about cleanup policy, you can list all consumer groups for a given topic, etc)
No zookeeper dependency included (which also means that consumers who still commit offsets to zookeeper are not supported)
High Availability support (!!). Burrow has the problem that it will always expose metrics, which will be wrong when it just has started consuming the __consumer_offsets topic. Therefore you cannot run it in a HA mode. This is a problem when you want to setup alerts based on consumer group lags
Kafka Minion does not support multiple clusters, which reduces complexity in code and as enduser. You can obviously still deploy Kafka Minion per cluster
Disclaimer: I am the author of Kafka Minion, and I am still looking for more feedback from other users. I intend to actively maintain and develop the exporter for my projects, the company I am working for and for the community.
To answer your question regarding what you are seeing using the kafka-consumer-groups.sh shell script. This won't work as it cannot report lags for inactive consumers which is a bit counterproductive.
The kafka.consumer JMX metrics are only present on the consumer processes themselves, not on the Kafka broker processes. Note that you would not get the kafka.consumer metric from consumers using a consumer library other than the Java one.
Currently, there are no available JMX metrics for consumer lag from the Kafka broker itself. There are other solutions that are commonly used for monitoring consumer lag, such as Burrow by LinkedIn. There are also a few open source projects such as kafka9.offsets that expose consumer lag metrics via JMX, but may not be updated to work with the latest Kafka.

FlinkKafkaConsumer010 can not consume data with full parallelism

I have a Kafka(0.10.2.0) cluster with 10 partitions(with 10 individual kafka server ports) on one machine which holded 1 topic named "test"
And have a Flink Cluster with 294 task slots on 7 machines, a Flink app with parallelism 250 runs on this Flink Cluster using FlinkKafkaConsumer010 to consume data from Kafka Server with one group id "TestGroup".
But I found that there are only 2 flink ips with 171 tcp connection has been establised with kafka cluster, and more worse, only 10 connections are transfer data, only these 10 connections had data transfered from beginning to end.
I have checked this Reading from multiple broker kafka with flink, but not work in my case.
Appreciated for any information, thank you.