LibrdKafkaError: Broker: Unknown member after running around about 2 hours randomly - apache-kafka

Right now, i want to implement node-rdkafka into our service, but i faced this error many times Broker: Unknown member.
The same issue on github was https://github.com/confluentinc/confluent-kafka-dotnet/issues/1464. they say our consumer using same group id to retry or delay. but i didn't find any retry and delay on my code.
or https://github.com/confluentinc/confluent-kafka-python/issues/1004, but i have recheck all consumer group id and it was unique.
The config of node-rdkafka producer as follows:
this.producer = new Producer({
"client.id": this.cliendID,
"metadata.broker.list": this.brokerList,
'compression.codec': "lz4",
'retry.backoff.ms': 200,
'socket.keepalive.enable': true,
'queue.buffering.max.messages': 100000,
'queue.buffering.max.ms': 1000,
'batch.num.messages': 1000000,
"transaction.timeout.ms": 2000,
"enable.idempotence": false,
"max.in.flight.requests.per.connection": 1,
"debug": this.debug,
'dr_cb': true,
"retries": 0,
"log_cb": (_: any) => console.log(`log_cb =>`, _),
"sasl.username": this.saslUsername,
"sasl.password": this.saslPassword,
"sasl.mechanism": this.saslMechanism,
"security.protocol": this.securityProtocol
}, {
"acks": -1
})
The config of node-rdkafka consumer as follows:
this.consumer = new KafkaConsumer({
'group.id': this.groupID,
'metadata.broker.list': this.brokerList,
"sasl.username": this.saslUsername,
"sasl.password": this.saslPassword,
"enable.auto.commit": false,
"auto.commit.interval.ms": 2000,
"session.timeout.ms": 45000,
"max.poll.interval.ms": 300000,
"heartbeat.interval.ms": 3000,
"api.version.request.timeout.ms": 10000,
"max.in.flight.requests.per.connection": 1,
"debug": this.debug,
"sasl.mechanism": this.saslMechanism,
"log.connection.close": true,
"log.queue": true,
"log_level": 7,
"log.thread.name": true,
"isolation.level": "read_committed",
"ssl.ca.location": "/etc/ssl/certs/",
"log_cb": (_: any) => console.log(`log_cb =>`, _),
"security.protocol": this.securityProtocol
}, {})
await new Promise(resolve => {
this.consumer?.connect()
this.consumer?.on('ready', () => {
try {
this.consumer?.subscribe(subscriptions)
this.consumer?.consume()
console.log('[SUCCESS] Subscribe Event => all event')
} catch (err) {
console.log('[FAILED] Subscribe => all event')
console.log(err)
}
resolve(this.consumer)
}).on('data', async (data) => {
this.topicFunctionMap[data.topic]({
partition: data.partition,
topic: data.topic,
message: {
key: data.key,
offset: data.offset.toString(),
size: data.size,
value: data.value,
timestamp: data.timestamp?.toString()
}
} as ISubsCallbackParam)
this.consumer?.commitSync({
topic: data.topic,
offset: data.offset,
partition: data.partition
})
})
})
Using those configuration, the consumer is able to receive event but its not last for long. around 2hours more it randomly gives those error.
I am not sure if it because manual commit or our function tooks long because i have tried both async and sync commit so the commitSync its not depend on our function.
Let says it because the our function tooks long, and it make our cosumer kicked from the group. maybe its the suspect after i found additional error Broker: Specified group generation id is not valid
source: https://github.com/confluentinc/confluent-kafka-dotnet/issues/1155
Its says i need to increase the session time out, then i tried to increase it to "session.timeout.ms": 300000 or 5min, and the heartbeat "heartbeat.interval.ms":3000, i found in github issue, that the heartbeat should less than = (timeout/3). so i think 3sec will fine.
Using "session.timeout.ms": 300000 and "heartbeat.interval.ms":3000
the consumer is able to consume and last for long but the problems is:
first time using those config, its fine running around 0-2sec to receive
after a while, its received, but tooks 1-10sec to receive the message
The detail errors:
received event => onCustomerServiceRegister
[COMMIT_ERR] LibrdKafkaError: Broker: Unknown member
at Function.createLibrdkafkaError [as create] (/src/app/node_modules/node-rdkafka/lib/error.js:454:10)
at KafkaConsumer.Client._errorWrap (/src/app/node_modules/node-rdkafka/lib/client.js:481:29)
at KafkaConsumer.commitSync (/src/app/node_modules/node-rdkafka/lib/kafka-consumer.js:560:8)
at KafkaRDConnect.<anonymous> (/src/app/dist/events/connectors/kafkaRD.js:240:110)
at step (/src/app/dist/events/connectors/kafkaRD.js:53:23)
at Object.next (/src/app/dist/events/connectors/kafkaRD.js:34:53)
at /src/app/dist/events/connectors/kafkaRD.js:28:71
at new Promise (<anonymous>)
at __awaiter (/src/app/dist/events/connectors/kafkaRD.js:24:12)
at KafkaConsumer.<anonymous> (/src/app/dist/events/connectors/kafkaRD.js:213:72)
at KafkaConsumer.emit (node:events:376:20)
at KafkaConsumer.EventEmitter.emit (node:domain:470:12)
at /src/app/node_modules/node-rdkafka/lib/kafka-consumer.js:488:12 {

Related

Unable to send message to kafka Producer using kafka-node

I am using default server.properties/zookeeper.properties files provided by Kafka framework.
I am trying to create a simple NodeJS app which would send messages to Producer and consume them.
Below is NodeJS code.
config.js
module.exports = {
kafka_topic: 'catalog',
kafka_server: 'localhost:9092',
};
nodejs-producer.js
const kafka = require('kafka-node');
const config = require('./config');
try {
// set the desired timeout in options
const options = {
timeout: 5000,
};
const Producer = kafka.Producer;
const client = new kafka.KafkaClient({kafkaHost: config.kafka_server, requestTimeout: 5000});
const producer = new Producer(client);
const kafka_topic = config.kafka_topic;
let payloads = [
{
topic: kafka_topic,
messages: 'This is test message'
}
];
producer.on('ready', async function() {
let push_status = producer.send(payloads, (err, data) => {
if (err) {
console.log(err.toString());
console.log('[kafka-producer -> '+kafka_topic+']: broker update failed');
} else {
console.log(data.toString());
console.log('[kafka-producer -> '+kafka_topic+']: broker update success');
}
});
});
producer.on('error', function(err) {
console.log(err);
console.log('[kafka-producer -> '+kafka_topic+']: connection errored');
throw err;
});
}
catch(e) {
console.log(e);
}
kafka version = 2.8.0
kafka-node version = 5.0.0
I am getting the error - Error: LeaderNotAvailable
How to fix this? I tried playing with different values in server.properties file like advertised.listeners but didn't get solution.
I have already answered this problem here
In short: this problem happens when trying to produce messages to a topic that doesn't exist.
You may configure your kafka installation to automatically create topic in such case: what will then happen is - in order: you will still receive the error message and the framework will create the topic. In my case i then had to re-produce the same message a second time but this was on an old version of Kafka.
EDIT:
here a link to a post which explains how to setup your kafka configuration to automatically create kafka topics.
I have also faced same issue while sending a message. I solved the issue by adding a partition in the payload and same partition is used in the consumer also.
Code I have used
Since I got this error in the development environment. I solved this problem by deleting the zookeeper snapshot and Kafka consumer offset.
NOTE: Don't do this on production.
rm -rf /tmp/zookeeper
rm -rf /tmp/kafka-logs

Producing a batch of events to the same kafka topic overwrite each other

I'm trying to produce two events to the same kafka topic in a batch, only the second event ends on kafka and the first is not sent.
// sudo code of what i'm doing
// producer
await kafka.produce(
event1 { message: "vito", topic: "corleone" },
event2 { message: "sonny", topic: "corleone" }
event3 { message: "fredo", topic: "corleone" }
)
// consumer listening to topic "corleone"
kafka.handler(payload) {
log(payload) // prints "fredo" but doesn't print "vito" or "sonnie"
}
What works though is if I have these events go to different topics:
// producer
await kafka.produce(
event1 { message: "vito", topic: "corleone" },
event2 { message: "sonny", topic: "deadinpart1" }
event3 { message: "fredo", topic: "deadinpart2" }
)
If I do that, I receive all three events (by listening to the three topics) which makes me think that Kafka might not be supporting multiple messages to the same topic in a batch.
My producer settings looks like this:
const kafkaConfig: KafkaConfigSchema = {
brokers: config().kafka.brokers, // array of brokers
useSasl: config().kafka.useSasl, // true
useSsl: config().kafka.useSsl, // true
username: config().kafka.username,
password: config().kafka.password,
groupId: config().kafka.groupId, // a unique string
};
Are there any settings I am missing or am I doing something wrong architecturally by sending messages that share the topic in the same batch?

Can’t get Logstash to read existing Kafka topic from start

I'm trying to consume a Kafka topic using Logstash, for indexing by Elasticsearch. The Kafka events are JSON documents.
We recently upgraded our Elastic Stack to 5.1.2.
I believe that I was able to consume the topic OK in 5.0, using the same settings, but that was a while ago so perhaps I'm doing something wrong now, but can't see it. This is my config (slightly sanitized):
input {
kafka {
bootstrap_servers => "host1:9092,host2:9092,host3:9092"
client_id => "logstash-elastic-5-c5"
group_id => "logstash-elastic-5-g5"
topics => "trp_v1"
auto_offset_reset => "earliest"
}
}
filter {
json {
source => "message"
}
mutate {
rename => { "#timestamp" => "indexedDatetime" }
remove_field => [
"#timestamp",
"#version",
"message"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["host10:9200", "host11:9200", "host12:9200", "host13:9200"]
action => "index"
index => "trp-i"
document_type => "event"
}
}
When I run this, no messages are consumed, no sign of activity appears in the log after "[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] Setting newly assigned partitions", and in Kafka Manager the consumer appears to immediately appear with "total lag = 0" for the topic.
This version of the Kafka plugin stores consumer offsets in Kafka itself, so each time I try to run Logstash against the same topic, I increment the group_id so in theory, it should start from the earliest offset for the topic.
Any advice?
EDIT: It appears that despite setting auto_offset_reset to "earliest", it isn't working - it's as if it's being set to "latest". I left Logstash running, then had more entries loaded into the Kafka queue, and they were processed by Logstash.

Reconnection to the failed mongo server

I'm connecting to the mongo with reconnect options on the startup and using created db over the whole app.
var options = {
"server": {
"auto_reconnect": true,
"poolSize": 10,
"socketOptions": {
"keepAlive": 1
}
},
"db": {
"numberOfRetries": 60,
"retryMiliSeconds": 5000
}
};
MongoClient.connect(dbName, options).then(useDb).catch(errorHandler)
When I restart mongo server, driver reconnect successful. If I stop server and start it after a 30 second I get MongoError "topology was destroyed" on every operation. This 30 second seems to me is a default value for numberOfRetries = 5 and my given option doesn't have effect. Am I doing something wrong? How can I manage reconnection for a long time?
According to this answer, in order to fix this error, you should increase connection timeout in the options:
var options = {
"server": {
"auto_reconnect": true,
"poolSize": 10,
"socketOptions": {
"keepAlive": 1,
"connectTimeoutMS": 30000 // increased connection timeout
}
},
"db": {
"numberOfRetries": 60,
"retryMiliSeconds": 5000
}
};

can't accept new chunks because there are still 1 deletes from previous migration

I have a mongodb production cluster running in 2.6.11 with 20 replicatSets. I getting space disk issue, because the chunks majority are store in one replicatSet. When I check the log, I can see that move chunk failed because of "deletes from previous migration"
2015-12-28T17:13:32.164+0000 [conn6504] about to log metadata event: { _id: "db1-2015-12-28T17:13:32-56816dbc6b0464b0a5801db8", server: "db1", clientAddr: "xx.xx.xx.11:50077", time: new Date(1451322812164), what: "moveChunk.start", ns: "emailing_nQafExtB.reports", details: { min: { email: "xxxxxxx" }, max: { email: "xxxxxxx" }, from: "shard16", to: "shard22" } }
2015-12-28T17:13:32.675+0000 [conn6504] about to log metadata event: { _id: "db1-2015-12-28T17:13:32-56816dbc6b0464b0a5801db9", server: "db1", clientAddr: "xx.xx.xx.11:50077", time: new Date(1451322812675), what: "moveChunk.from", ns: "emailing_nQafExtB.reports", details: { min: { email: "xxxxxxx" }, max: { email: "xxxxxxx" }, step 1 of 6: 3, step 2 of 6: 314, note: "aborted", errmsg: "moveChunk failed to engage TO-shard in the data transfer: can't accept new chunks because there are still 1 deletes from previous migration" } }
I follow the answer from this question, but doesn't work for me. I run stepDown command on one primary and all my cluster primary. I do the same with the cleanUpOrphaned command.
Does somedody run over this problem ?
Thanks in advance for any insights.