I'm developping a Kafka producer on .NET6 using Confluent.Kafka (rev 1.9.0) and I'm looking for a way to create a topic specificying a "retention.ms" value different from the one configured in my broker configuration.
Here is the code I'm using:
TopicSpecification newTopic = new TopicSpecification
{
Name = TOPIC_NAME,
NumPartitions = 2,
ReplicationFactor = 1,
};
Dictionary<string, string> config = new Dictionary<string, string>();
config.Add("retention.ms", "3000");
newTopic.Configs = config;
try
{
await adminClient.CreateTopicsAsync(new List<TopicSpecification> { newTopic });
}
catch(CreateTopicsException e)
{
Console.WriteLine("An excpetion occured while creating the topic " + e.Results[0].Topic + ": " + e.Results[0].Error.Reason);
}
When I use that code, the topic is created but the retention.ms value I would like to use is not taken in account.
Could some one tell me what I'm doing wrong?
Thanks in advance for your help!
Is there any way we can programmatically find lag in the Kafka Consumer.
I don't want external Kafka Manager tools to install and check on dashboard.
We can list all the consumer group and check for lag for each group.
Currently we do have command to check the lag and it requires the relative path where the Kafka resides.
Spring-Kafka, kafka-python, Kafka Admin client or using JMX - is there any way we can code and find out the lag.
We were careless and didn't monitor the process, the consumer was in zombie state and the lag went to 50,000 which resulted in lot of chaos.
Only when the issue arises we think of these cases as we were monitoring the script but didn't knew it will be result in zombie process.
Any thoughts are extremely welcomed!!
you can get this using kafka-python, run this on each broker or loop through list of brokers, it will give all topic partitions consumer lag.
BOOTSTRAP_SERVERS = '{}'.format(socket.gethostbyname(socket.gethostname()))
client = BrokerConnection(BOOTSTRAP_SERVERS, 9092, socket.AF_INET)
client.connect_blocking()
list_groups_request = ListGroupsRequest_v1()
future = client.send(list_groups_request)
while not future.is_done:
for resp, f in client.recv():
f.success(resp)
for group in future.value.groups:
if group[1] == 'consumer':
#print(group[0])
list_mebers_in_groups = DescribeGroupsRequest_v1(groups=[(group[0])])
future = client.send(list_mebers_in_groups)
while not future.is_done:
for resp, f in client.recv():
#print resp
f.success(resp)
(error_code, group_id, state, protocol_type, protocol, members) = future.value.groups[0]
if len(members) !=0:
for member in members:
(member_id, client_id, client_host, member_metadata, member_assignment) = member
member_topics_assignment = []
for (topic, partitions) in MemberAssignment.decode(member_assignment).assignment:
member_topics_assignment.append(topic)
for topic in member_topics_assignment:
consumer = KafkaConsumer(
bootstrap_servers=BOOTSTRAP_SERVERS,
group_id=group[0],
enable_auto_commit=False
)
consumer.topics()
for p in consumer.partitions_for_topic(topic):
tp = TopicPartition(topic, p)
consumer.assign([tp])
committed = consumer.committed(tp)
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
if last_offset != None and committed != None:
lag = last_offset - committed
print "group: {} topic:{} partition: {} lag: {}".format(group[0], topic, p, lag)
consumer.close(autocommit=False)
Yes. We can get consumer lag in kafka-python. Not sure if this is best way to do it. But this works.
Currently we are giving our consumer manually, you also get consumers from kafka-python, but it gives only the list of active consumers. So if one of your consumers is down. It may not show up in the list.
First establish client connection
from kafka import BrokerConnection
from kafka.protocol.commit import *
import socket
#This takes in only one broker at a time. So to use multiple brokers loop through each one by giving broker ip and port.
def establish_broker_connection(server, port, group):
'''
Client Connection to each broker for getting consumer offset info
'''
bc = BrokerConnection(server, port, socket.AF_INET)
bc.connect_blocking()
fetch_offset_request = OffsetFetchRequest_v3(group, None)
future = bc.send(fetch_offset_request)
Next we need to get the current offset for each topic the consumer is subscribed to. Pass the above future and bc here.
from kafka import SimpleClient
from kafka.protocol.offset import OffsetRequest, OffsetResetStrategy
from kafka.common import OffsetRequestPayload
def _get_client_connection():
'''
Client Connection to the cluster for getting topic info
'''
# Give comma seperated info of kafka broker "broker1:port1, broker2:port2'
client = SimpleClient(BOOTSTRAP_SEREVRS)
return client
def get_latest_offset_for_topic(self, topic):
'''
To get latest offset for a topic
'''
partitions = self.client.topic_partitions[topic]
offset_requests = [OffsetRequestPayload(topic, p, -1, 1) for p in partitions.keys()]
client = _get_client_connection()
offsets_responses = client.send_offset_request(offset_requests)
latest_offset = offsets_responses[0].offsets[0]
return latest_offset # Gives latest offset for topic
def get_current_offset_for_consumer_group(future, bc):
'''
Get current offset info for a consumer group
'''
while not future.is_done:
for resp, f in bc.recv():
f.success(resp)
# future.value.topics -- This will give all the topics in the form of a list.
for topic in self.future.value.topics:
latest_offset = self.get_latest_offset_for_topic(topic[0])
for partition in topic[1]:
offset_difference = latest_offset - partition[1]
offset_difference gives the difference between the last offset produced in the topic and the last offset (or message) consumed by your consumer.
If you are not getting current offset for a consumer for a topic then it means your consumer is probably down.
So you can raise alerts or send mail if the offset difference is above a threshold you want or if you get empty offsets for your consumer.
The java client exposes the lag for its consumers over JMX; in this example we have 5 partitions...
Spring Boot can publish these to micrometer.
I'm writing code in scala but use only native java API from KafkaConsumer and KafkaProducer.
You need only know the name of Consumer Group and Topics.
it's possible to avoid pre-defined topic, but then you will get Lag only for Consumer Group which exist and which state is stable not rebalance, this can be a problem for alerting.
So all that you really need to know and use are:
KafkaConsumer.commited - return latest committed offset for TopicPartition
KafkaConsumer.assign - do not use subscribe, because it causes to CG rebalance. You definitely do not want that your monitoring process to influence on the subject of monitoring.
kafkaConsumer.endOffsets - return latest produced offset
Consumer Group Lag - is a difference between the latest committed and latest produced
import java.util.{Properties, UUID}
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.common.TopicPartition
import org.apache.kafka.common.serialization.{StringDeserializer, StringSerializer}
import scala.collection.JavaConverters._
import scala.util.Try
case class TopicPartitionInfo(topic: String, partition: Long, currentPosition: Long, endOffset: Long) {
val lag: Long = endOffset - currentPosition
override def toString: String = s"topic=$topic,partition=$partition,currentPosition=$currentPosition,endOffset=$endOffset,lag=$lag"
}
case class ConsumerGroupInfo(consumerGroup: String, topicPartitionInfo: List[TopicPartitionInfo]) {
override def toString: String = s"ConsumerGroup=$consumerGroup:\n${topicPartitionInfo.mkString("\n")}"
}
object ConsumerLag {
def consumerGroupInfo(bootStrapServers: String, consumerGroup: String, topics: List[String]) = {
val properties = new Properties()
properties.put("bootstrap.servers", bootStrapServers)
properties.put("auto.offset.reset", "latest")
properties.put("group.id", consumerGroup)
properties.put("key.deserializer", classOf[StringDeserializer])
properties.put("value.deserializer", classOf[StringDeserializer])
properties.put("key.serializer", classOf[StringSerializer])
properties.put("value.serializer", classOf[StringSerializer])
properties.put("client.id", UUID.randomUUID().toString)
val kafkaProducer = new KafkaProducer[String, String](properties)
val kafkaConsumer = new KafkaConsumer[String, String](properties)
val assignment = topics
.map(topic => kafkaProducer.partitionsFor(topic).asScala)
.flatMap(partitions => partitions.map(p => new TopicPartition(p.topic, p.partition)))
.asJava
kafkaConsumer.assign(assignment)
ConsumerGroupInfo(consumerGroup,
kafkaConsumer.endOffsets(assignment).asScala
.map { case (tp, latestOffset) =>
TopicPartitionInfo(tp.topic,
tp.partition,
Try(kafkaConsumer.committed(tp)).map(_.offset).getOrElse(0), // TODO Warn if Null, Null mean Consumer Group not exist
latestOffset)
}
.toList
)
}
def main(args: Array[String]): Unit = {
println(
consumerGroupInfo(
bootStrapServers = "kafka-prod:9092",
consumerGroup = "not-exist",
topics = List("events", "anotherevents")
)
)
println(
consumerGroupInfo(
bootStrapServers = "kafka:9092",
consumerGroup = "consumerGroup1",
topics = List("events", "anotehr events")
)
)
}
}
if anyone is looking for consumer lag in confluent cloud here is simple script
BOOTSTRAP_SERVERS = "<>.aws.confluent.cloud"
CCLOUD_API_KEY = "{{ ccloud_apikey }}"
CCLOUD_API_SECRET = "{{ ccloud_apisecret }}"
ENVIRONMENT = "dev"
CLUSTERID = "dev"
CACERT = "/usr/local/lib/python{{ python3_version }}/site-packages/certifi/cacert.pem"
def main():
client = KafkaAdminClient(bootstrap_servers=BOOTSTRAP_SERVERS,
ssl_cafile=CACERT,
security_protocol='SASL_SSL',
sasl_mechanism='PLAIN',
sasl_plain_username=CCLOUD_API_KEY,
sasl_plain_password=CCLOUD_API_SECRET)
for group in client.list_consumer_groups():
if group[1] == 'consumer':
consumer = KafkaConsumer(
bootstrap_servers=BOOTSTRAP_SERVERS,
ssl_cafile=CACERT,
group_id=group[0],
enable_auto_commit=False,
api_version=(0,10),
security_protocol='SASL_SSL',
sasl_mechanism='PLAIN',
sasl_plain_username=CCLOUD_API_KEY,
sasl_plain_password=CCLOUD_API_SECRET
)
list_members_in_groups = client.list_consumer_group_offsets(group[0])
for (topic,partition) in list_members_in_groups:
consumer.topics()
tp = TopicPartition(topic, partition)
consumer.assign([tp])
committed = consumer.committed(tp)
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
if last_offset != None and committed != None:
lag = last_offset - committed
print("group: {} topic:{} partition: {} lag: {}".format(group[0], topic, partition, lag))
consumer.close(autocommit=False)
I have a kafka producer initialized as follows:
var config = new Dictionary<string, object> { { "bootstrap.servers", BOOTSTRAP_SERVERS } };
Producer _producer = new Producer(config);
After the initialization, when I am about to produce a message on a particular topic I want to add compression to the producer (Only for the topic). I need to add the compression config { "compression.codec", "gzip" }.
How can I do this?
I have a kafka 0.10 cluster with several topics that have messages produced to them.
When I subscribe to the topics with a KafkaConsumer and a new group Id I get no records returned, but if I subscribe to the topics with a ConsumerRebalanceListener that seeks to the beginning with the same group Id, then I get the records in the topic.
#Grab('org.apache.kafka:kafka-clients:0.10.0.0')
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.clients.consumer.ConsumerRecords
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.clients.consumer.ConsumerRebalanceListener
import org.apache.kafka.common.TopicPartition
import org.apache.kafka.common.PartitionInfo
Properties props = new Properties()
props.with {
put("bootstrap.servers","***********:9091")
put("group.id","script-test-noseek")
put("enable.auto.commit","true")
put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer")
put("session.timeout.ms",30000)
}
KafkaConsumer consumer = new KafkaConsumer(props)
def topicMap = [:]
consumer.listTopics().each { topic, partitioninfo ->
topicMap[topic] = 0
}
topicMap.each {topic, count ->
def stopTime = new Date().time + 30_000
def stop = false
println "Starting topic: $topic"
consumer.subscribe([topic])
//consumer.subscribe([topic], new CRListener(consumer:consumer))
while(!stop) {
ConsumerRecords<String, String> records = consumer.poll(5_000)
topicMap[topic] += records.size()
consumer.commitAsync()
if ( new Date().time > stopTime || records.size() == 0) {
stop = true
}
}
consumer.unsubscribe()
}
def total = 0
println "------------------- Results -----------------------"
topicMap.each { k,v ->
if ( v > 0 ) {
println "Topic: ${k.padRight(64,' ')} Records: ${v}"
}
total += v
}
println "==================================================="
println "Total: ${total}"
def dummy = "Process End"
class CRListener implements ConsumerRebalanceListener {
KafkaConsumer consumer
void onPartitionsAssigned(java.util.Collection partitions) {
consumer.seekToBeginning(partitions)
}
void onPartitionsRevoked(java.util.Collection partitions) {
consumer.commitSync()
}
}
The code is Groovy 2.4.x. And I've masked the bootstrap server.
If I uncomment the consumer subscribe line with the listener, it does what I expect it to do. But as it is I get no results.
Assume that I change the group Id for each run, just so as to not be picking up where another execution leaves off.
I can't see what I'm doing wrong. Any help would be appreciated.
If you use a new consumer group id and want to read the whole topic from beginning, you need to specify parameter "auto.offset.reset=earliest" in your properties. (default value is "latest")
Properties props = new Properties()
props.with {
// all other values...
put("auto.offset.reset","earliest")
}
On consumer start-up the following happens:
look for (valid) committed offset for use group.id
if (valid) offset is found, resume from there
if no (valid) offset is found, set offset according to auto.offset.reset
I am just exploring Kafka, currently i am using One producer and One topic to produce messages and it is consumed by one Consumer. very simple.
I was reading the Kafka page, the new Producer API is thread-safe and sharing single instance will improve the performance.
Does it mean i can use single Producer to publish messages to multiple topics?
Never tried it myself, but I guess you can. Since the code for producer and sending the record is (from here https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html):
Producer<String, String> producer = new KafkaProducer<>(props);
for(int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("my-topic", Integer.toString(i), Integer.toString(i)));
So, I guess, if you just write different topics in the ProducerRecord, than it should be possible.
Also, here http://kafka.apache.org/081/documentation.html#producerapi it explicitly says that you can use a method send(List<KeyedMessage<K,V>> messages) to write into multiple topics.
If I understand you correctly, you are more looking using the same producer instance to send the same/multiple messages on multiple topics.
Not sure about java, but here you can do in C#(.NET) using the Kafka .NET Client DependentProducerBuilder
using (var producer = new ProducerBuilder<string, string>(config).Build())
using (var producer2 = new DependentProducerBuilder<Null, int>(producer.Handle).Build())
{
producer.ProduceAsync("first-topic", new Message<string, string> { Key = "my-key-value", Value = "my-value" });
producer2.ProduceAsync("second-topic", new Message<Null, int> { Value = 42 });
producer2.ProduceAsync("first-topic", new Message<Null, int> { Value = 107 });
producer.Flush(TimeSpan.FromSeconds(10));
}