INVALID_FETCH_SESSION_EPOCH errors with confluent HDFS sink connector - apache-kafka

I have been trying to get the confluent HDFS sink connector to work, without success. I have a distributed kafka connect cluster (3 nodes) with version 5.2.2 of the confluent platform, talking to a 5-node kafka cluster running kafka 2.3.0.
I have an event generator writing events to a topic with 3 partitions and replication factor 1. I see the events appearing in HDFS, and everything looks correct.
However, for topics with a different number of partitions, or a different replication factor, nothing is written to HDFS. Instead I get errors that look like this:
Node 121 was unable to process the fetch request with (sessionId=378162138, epoch=7242): INVALID_FETCH_SESSION_EPOCH. (org.apache.kafka.clients.FetchSessionHandler:381)
These events appear in the kafka connect logs and it looks like one of these appears for each attempt to commit.
For a topic with 3 partitions, and replication factor 1, everything works.
For a topic with 3 partitions, and replication factor 2, nothing is written to HDFS.
For a topic with 50 partitions, and replication factor 1, nothing is written.
In fact, if I increase either the replication factor or the number of partitions, nothing is written to HDFS.
I can't seem to find the right configuration to make this work. What settings should I be paying attention to?

Related

how to distribute messages to all partitions in topic defined by `offset.storage.topic` in kafka connect

I have deployed debezium using the docker image pulled from docker pull debezium/connect
In the documentation provided at https://hub.docker.com/r/debezium/connect the description for one of the environment variable OFFSET_STORAGE_TOPIC is as follows:
This environment variable is required when running the Kafka Connect
service. Set this to the name of the Kafka topic where the Kafka
Connect services in the group store connector offsets. The topic must
have a large number of partitions (e.g., 25 or 50), be highly
replicated (e.g., 3x or more) and should be configured for compaction.
I've created the required topic named mydb-connect-offsets with 25 partitions and replication factor of 5.
The deployment is successful and everything is working fine. A sample message in mydb-connect-offsets topic looks like this. The key is ["sample-connector",{"server":"mydatabase"}] and value is
{
"transaction_id": null,
"lsn_proc": 211534539955768,
"lsn_commit": 211534539955768,
"lsn": 211534539955768,
"txId": 709459398,
"ts_usec": 1675076680361908
}
As the key is fixed, all the messages are getting to the same partition of the topic. My question is why does the documentation says that the topic must have a large number of partitions when only one partition is going to be used eventually? Also, what needs to be done to distribute the messages across all partitions?
The offsets are keyed by connector name because they must be ordered.
The large partition count is to manage offset storage of many distinct connectors in parallel, not only one.

Kafka Cluster - issue with one broker not being utilized

I am having Kafka cluster with 3 brokers and 3 zookeeper node running. we have added 4th broker recently. When we bring it as new cluster, few partitions got stored in the 4th broker as expected. Replication factor for all topics is 3 and has each topic has 10 partitions.
Later, Whenever we bring down whole kafka cluster for maintenance activity and bring it back, all topic partitions is getting stored in first 3 brokers and no partition is getting stored in 4th broker. (Note: Due to bug, we had to use new log directory every time kafka is brought up, pretty much like a new cluster)
I can see that all 4 brokers is available in zookeeper (when i do ls /brokers/ids i can see 4 broker ids) but partition is not distributed to 4th broker.
But when i trigger partition reassignment to move few partitions to 4th broker, it worked fine and 4th broker started storing the given partition. Both producer and consumer able to send and fetch data form 4th broker respectively.I cant find the reason why this storage imbalance is happening among kafka brokers. Please share your suggestion.
When we bring it as new cluster, few partitions got stored in the 4th broker as expected.
This should only be expected when you create new topics or expand partitions of existing ones. Topics do not automatically relocate to new brokers
had to use new log directory every time kafka is brought up
That might explain why data is missing. Unclear what bug you're running into, but this step shouldn't be necessary
when i trigger partition reassignment to move few partitions to 4th broker, it worked fine and 4th broker started storing the given partition. Both producer and consumer able to send and fetch data form 4th broker respectively
This is the correct way to expand a cluster, and sounds like it's working as expected.

storm-kafka-client spout consume message at different speed for different partition

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!

Why is my kafka topic not consumable with a broker down?

My issue is that I have a three broker Kafka Cluster and an availability requirement to have access to consume and produce to a topic when one or two of my three brokers is down.
I also have a reliability requirement to have a replication factor of 3. These seem to be conflicting requirements to me. Here is how my problem manifests:
I create a new topic with replication factor 3
I send several messages to that topic
I kill one of my brokers to simulate a broker issue
I attempt to consume the topic I created
My consumer hangs
I review my logs and see the error:
Number of alive brokers '2' does not meet the required replication factor '3' for the offsets topic
If I set all my broker's offsets.topic.replication.factor setting to 1, then I'm able to produce and consume my topics, even if I set the topic level replication factor to 3.
Is this an okay configuration? Or can you see any pitfalls in setting things up this way?
You only need as many brokers as your replication factor when creating the topic.
I'm guessing in your case, you start with a fresh cluster and no consumers have connected yet. In this case, the __consumer_offsets internal topic does not exist as it is only created when it's first needed. So first connect a consumer for a moment and then kill one of the brokers.
Apart from that, in order to consume you only need 1 broker up, the leader for the partition.

How to run kafka on different machines

From last 10 days i am trying to set Kafka on different machine:
Server32
Server56
Below are the list of task which i have done so far
Configured Zookeeper and started on both server with
server.1=Server32_IP:2888:3888
server.2=Server56_IP:2888:3888
I also changed server and server-1 properties as below
broker.id=0 port=9092 log.dir=/tmp/kafka0-logs
host.name=Server32
zookeeper.connect=Server32_IP:9092,Server56_IP:9062
& server-1
broker.id=1 port=9062 log.dir=/tmp/kafka1-logs
host.name=Server56
zookeeper.connect=Server32_IP:9092,Server56_IP:9062
Server.property i ran in Server32
Server-1.property i ran in Server56
The Problem is : when i start producer in both the servers and if i try to consume from any one then it is working BUT
When i stop any one server then another one is not able to send the details
Please help me in explaining the process
Running 2 zookeepers is not fault tolerant. If one of the zookeepers is stopped, then the system will not work. Unlike Kafka brokers, zookeeper needs a quorum (or majority) of the configured nodes in order to work. This is why zookeeper is typically deployed with an odd number of instances (nodes). Since 1 of 2 nodes is not a majority it really is no better than running a single zookeeper. You need at least 3 zookeepers to tolerate a failure because 2 of 3 is a majority so the system will stay up.
Kafka is different so you can have any number of Kafka brokers and if they are configured correctly and you create your topics with a replication factor of 2 or greater, then the Kafka cluster can continue if you take any one of the broker nodes down , even if it's just 1 of 2.
There's a lot of information missing here like the Kafka version and whether or not you're using the new consumer APIs or the old APIs. I'm assuming you're probably using a new version of Kafka like 0.10.x along with the new client APIs. In the new version of the client APIs the log data is stored on the Kafka brokers and not Zookeeper as in the older versions. I think your issue here is that you created your topics with a replication factor of 1 and coincidently the Kafka broker server you shutdown was hosting the only replica, so you won't be able to produce or consume messages. You can confirm the health of your topics by running the command:
kafka-topics.sh --zookeeper ZHOST:2181 --describe
You might want to increase the replication factor to 2. That way you might be able to get away with one broker failing. Ideally you would have 3 or more Kafka Broker servers with a replication factor of 2 or higher (obviously not more than the number of brokers in your cluster). Refer to the link below:
https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor
For a topic with replication factor N, we will tolerate up to N-1 server >failures without losing any records committed to the log."