Kafka last message poll results in 0 messages - apache-kafka

I have a Kafka Topic (1.0.0) with a single partition. The Consumer is packed inside an EAR and when deployed to Wildfly 10, the poll of the last message always returns 0 messages. Although the topic is not empty.
final TopicPartition tp = new TopicPartition(topic, 0);
final Long beginningOffset = consumer.beginningOffsets(Collections.singleton(tp)).get(tp);
final Long endOffset = consumer.endOffsets(Collections.singleton(tp)).get(tp);
consumer.assign(Collections.singleton(tp));
consumer.seek(tp, endOffset - 1);
When I do a poll I get 0 records. Although the logging states:
Consumer is now at position 377408 while Topic begin is 0 and end is 377409
When I change to -2 like:
consumer.seek(tp, endOffset - 2);
I DO get one message:
Consumer is now at position 377407 while Topic begin is 0 and end is 377409
But of course this is not the proper record, WHERE is message 377408 ?
Tried many ways to seek to end etc, but it never works.
Here is my Consumer config:
Properties properties = new Properties();
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, Configuration.KAFKA_SERVERS.getAsString());
properties.put(ConsumerConfig.GROUP_ID_CONFIG, GROUP_ID);
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class);
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
properties.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
Note: I tried with read_uncommitted AND read_committed, both give the same result.

As mentioned in the Javadoc, this is because endOffsets() returns:
the offset of the last successfully replicated message plus one
This is effectively the offset the next message will get.
This is why seeking to endOffset - 1 does not return anything while seeking to endOffset - 2 only returns the last message.
I agree this may not be the most intuitive behaviour but this is how it currently works!

Related

Kafka MirrorMaker2 - not mirroring consumer group offsets

I have setup MirrorMaker2 for replicating data between 2 DCs.
My mm2.properties:
# mm2.properties
name=source->dest
clusters=source, dest
source.bootstrap.servers=localhost:9091
dest.bootstrap.servers=localhost:9092
source->dest.enabled=true
offset.storage.partitions=2
config.storage.replication.factor=1
status.storage.replication.factor=1
Seeing the below on MM2 startup:
[2020-02-16 07:31:07,547] INFO MirrorConnectorConfig values:
admin.timeout.ms = 60000
checkpoints.topic.replication.factor = 3
config.action.reload = restart
config.properties.blacklist = [follower\.replication\.throttled\.replicas, leader\.replication\.throttled\.replicas, message\.timestamp\.difference\.max\.ms, message\.timestamp\.type, unclean\.leader\.election\.enable, min\.insync\.replicas]
config.property.filter.class = class org.apache.kafka.connect.mirror.DefaultConfigPropertyFilter
connector.class = org.apache.kafka.connect.mirror.MirrorCheckpointConnector
consumer.poll.timeout.ms = 1000
emit.checkpoints.enabled = true
emit.checkpoints.interval.seconds = 60
emit.heartbeats.enabled = true
emit.heartbeats.interval.seconds = 1
enabled = true
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
group.filter.class = class org.apache.kafka.connect.mirror.DefaultGroupFilter
groups = [.*]
groups.blacklist = [console-consumer-.*, connect-.*, __.*]
header.converter = null
heartbeats.topic.replication.factor = 3
key.converter = null
metric.reporters = null
name = source->dest
offset-syncs.topic.replication.factor = 3
offset.lag.max = 100
refresh.groups.enabled = true
refresh.groups.interval.seconds = 600
refresh.topics.enabled = true
refresh.topics.interval.seconds = 600
replication.factor = 2
replication.policy.class = class org.apache.kafka.connect.mirror.DefaultReplicationPolicy
replication.policy.separator = .
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
source.cluster.alias = source
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
sync.topic.acls.enabled = true
sync.topic.acls.interval.seconds = 600
sync.topic.configs.enabled = true
sync.topic.configs.interval.seconds = 600
target.cluster.alias = dest
task.assigned.groups = null
task.assigned.partitions = null
tasks.max = 1
topic.filter.class = class org.apache.kafka.connect.mirror.DefaultTopicFilter
topics = [.*]
topics.blacklist = [.*[\-\.]internal, .*\.replica, __.*]
transforms = []
value.converter = null
(org.apache.kafka.connect.mirror.MirrorConnectorConfig:347)
My data is being replicated as expected. Source topic gets created in the destination cluster as source.<TOPIC>. But, the consumer group offset is not being replicated.
Started a consumer group in the source cluster.
./kafka-console-consumer.sh --bootstrap-server localhost:9091 --topic test-1 --group test-1-group
Consumed few messages and stopped it. Posted new messages in this topic and mirror maker also mirrored the data to the target cluster.
I tried to consume message from the target cluster as follows.
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic source.test-1 --group test-1-group
Since, I use the same consumer group, I was expecting my offset also to be synced and won't consume the same message which I consumed in the cluster1. But, still consume all the messages. Is there anything I am missing here?
there are several fundamental reasons why replicating offsets is non-trivial:
kafka is an at-least-once system (ignoring the hype). this means that mirror maker, because its built on top of kafka consumers and producers that can each timeout/disconnect, will result in some degree of duplicate records being delivered to the destination. this means that offsets dont map 1:1 between source and destination. even if you were to try and use the "exactly once" support (which the MM2 KIP clearly says its not using) all it would do is skip over partially-delivered batches, but those batches would still occupy offsets at the destination
if you setup mirroring long after the source topic has started expiring records, your destination topic will start at offset 0 while the source will have much higher "oldest" offsets. there has been an attempt to address this (see KIP-391) but it was never accepted
in general there's no guarantee that your mirroring topology mirrors from a single source to a single destination. the linkedin topology, for example, mirrors from multiple source clusters into "aggregate" tier clusters. mapping offsets is meaningless for such topologies
looking at the MM2 KIP there's an "offset sync topic" mentioned.
in your code you can use class RemoteClusterUtils to translate checkpoints between clusters:
Map<TopicPartition, Long> newOffsets = RemoteClusterUtils.translateOffsets(
newClusterProperties, oldClusterName, consumerGroupId
);
consumer.seek(newOffsets);
this was taken out of the following presentation - https://www.slideshare.net/ConfluentInc/disaster-recovery-with-mirrormaker-20-ryanne-dolan-cloudera-kafka-summit-london-2019
alternatively, you could use the seek by timespamp API to start your consumer group on the destination to the rough time at which data was delivered to the destination (or delivered to source, if the broker settings for log append timestamps on the destination dont overwrite those times). you'd need to rewind a little for safety.
Kafka 2.7 introduced "automated consumer offset sync".
By default, consumer offsets are not synced between clusters.
You should explicitly enable this feature.
support automated consumer offset sync across clusters in MM 2.0
My data is being replicated as expected. Source topic gets created in the destination cluster as source.. But, the consumer group offset is not being replicated.
By default, MM2 won't replicate consumer groups from kafka-console-consumer. In the MM2 logs on startup, we can see that groups.blacklist = [console-consumer-.*, connect-.*, __.*]. I believe you can override this in your mm2.properties configuration file.
Since, I use the same consumer group, I was expecting my offset also to be synced and wont consume the same message which I consumed in the cluster1.
Once the consumer groups are properly being mirrored and the checkpoints are enabled, there should be an internal topic that is automatically created in your destination cluster (something like dest.checkpoints.internal). This checkpoint topic contains the last committed offsets in the source and destination clusters for mirrored topic partitions in each consumer group.
Then you can use Kafka’s RemoteClusterUtils utility class to translate these offsets and get the synced offsets for source.test-1 that map to the consumer's last committed offsets for test-1. If you end up creating a consumer with Java, you can add the RemoteClusterUtils as a dependency to your project:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>connect-mirror-client</artifactId>
<version>2.4.0</version>
</dependency>
Otherwise, it is likely that you will have to write a tool that wraps RemoteClusterUtils.java to get the translated offsets. This functionality or something similar looks to be planned as part of a future release for MM2.
I see that your configurations on checkpoints are
emit.checkpoints.enabled = true
emit.checkpoints.interval.seconds = 60
So, your checkpoints topic will reflect the new changes only after 60 sec. If you try immediately it won't work So, try after 1 min.

Kafka consumer not getting a single message from partition

I just noticed that when I produce a single message into a partition, my consumer is not receiving it. Only after I produce a few more messages into the same partition, the consumer receives them. My fetch.min.bytes is set to 1.
Is there some other config that could affect here?
I have a dedicated consumer for each partition.
Consumer code for the relevant part. My consumer starts several threads for different topics that are defined by the configs['stream']. Uses https://github.com/mmustala/rdkafka-ruby which is a fork from original consumer gem. I added a batch consuming method. And a method to shut down the consumer in a managed way.
key = configs['app_key']
consumer = Rdkafka::Config.new(config(configs)).consumer
topic = "#{topic_prefix}#{app_env}_#{configs['stream']}"
consumer.subscribe(topic)
logger.info "#{rand}| Starting consumer for #{key} with topic #{topic}"
begin
retry_counter = 0
retries_started_at = nil
current_assignment = nil
partitions = []
consumer.each_batch(configs['max_messages_per_partition'] || 5, 100, rand) do |messages|
partitions = messages.collect {|m| m.partition}.uniq.sort
logger.info "#{rand}| Batch started. Received #{messages.length} messages from partitions #{partitions} for app #{key}"
current_assignment = consumer.assignment.to_h
values = messages.collect {|m| JSON.parse(m.payload)}
skip_commit = false
begin
values.each_slice((values.length / ((retry_counter * 2) + 1).to_f).ceil) do |slice|
logger.info "#{rand}| Sending #{slice.length} messages to lambda"
result = invoke_lambda(key, slice)
if result.status_code != 200 || result.function_error
logger.info "#{rand}| Batch finished with error #{result.function_error}"
raise LambdaError, result.function_error.to_s
end
end
rescue LambdaError => e
logger.warn "#{rand}| #{e}"
if consumer.running? && current_assignment == consumer.assignment.to_h
retry_counter += 1
retries_started_at ||= Time.now
if retry_counter <= 5 && Time.now - retries_started_at < 600
logger.warn "#{rand}| Retrying from: #{e.cause}, app_key: #{key}"
Rollbar.warning("Retrying from: #{e.cause}", app_key: key, thread: rand, partitions: partitions.join(', '))
sleep 5
retry if consumer.running? && current_assignment == consumer.assignment.to_h
else
raise e # Raise to exit the retry loop so that consumers are rebalanced.
end
end
skip_commit = true
end
retry_counter = 0
retries_started_at = nil
if skip_commit
logger.info "#{rand}| Commit skipped"
else
consumer.commit
logger.info "#{rand}| Batch finished"
end
end
consumer.close
logger.info "#{rand}| Stopped #{key}"
rescue Rdkafka::RdkafkaError => e
logger.warn "#{rand}| #{e}"
logger.info "#{rand}| assignment: #{consumer.assignment.to_h}"
if e.to_s.index('No offset stored')
retry
else
raise e
end
end
config
def config(app_config)
{
"bootstrap.servers": brokers,
"group.id": app_configs['app_key'],
"enable.auto.commit": false,
"enable.partition.eof": false,
"log.connection.close": false,
"session.timeout.ms": 30*1000,
"fetch.message.max.bytes": ['sources'].include?(app_configs['stream']) ? 102400 : 10240,
"queued.max.messages.kbytes": ['sources'].include?(app_configs['stream']) ? 250 : 25,
"queued.min.messages": (app_configs['max_messages_per_partition'] || 5) * 10,
"fetch.min.bytes": 1,
"partition.assignment.strategy": 'roundrobin'
}
end
Producer code uses https://github.com/zendesk/ruby-kafka
def to_kafka(stream_name, data, batch_size)
stream_name_with_env = "#{Rails.env}_#{stream_name}"
topic = [Rails.application.secrets.kafka_topic_prefix, stream_name_with_env].compact.join
partitions_count = KAFKA.partitions_for(topic)
Rails.logger.info "Partition count for #{topic}: #{partitions_count}"
if #job.active? && #job.partition.blank?
#job.connect_to_partition
end
partition = #job.partition&.number.to_i % partitions_count
producer = KAFKA.producer
if data.is_a?(Array)
data.each_slice(batch_size) do |slice|
producer.produce(JSON.generate(slice), topic: topic, partition: partition)
end
else
producer.produce(JSON.generate(data), topic: topic, partition: partition)
end
producer.deliver_messages
Rails.logger.info "records sent to topic #{topic} partition #{partition}"
producer.shutdown
end
UPDATE: It looks like the number of messages is irrelevant. I just produced over 100 messages into one partition and the consumer has not yet started to consume those.
UPDATE2: It didn't start consuming the messages during the night. But when I produced a new set of messages into the same partition this morning, it woke up and started to consume the new messages I just produced. It skipped over the messages produced last night.
I believe the issue was that the partition had not received messages for a while and apparently it did not have an offset saved. When the offset was acquired it was set to the largest value which is the default. After I set auto.offset.reset: 'smallest' I have not seen such an issue where messages would have been skipped.

Latest records/messages present in a topic kafka

Is there a way to fetch the latest 1000 records/messages present in a topic in kafka ? similar to tail -f 1000 in case of a file in linux ?
Using Python Kafka I think!!! I found this way to get the last message.
Configure it to get n last messages but make sure there are enough messages in case the topic is empty. this looks like a job for streaming i.e Kafka streams or Kafka SQL
#!/usr/bin/env python
from kafka import KafkaConsumer, TopicPartition
TOPIC = 'example_topic'
GROUP = 'demo'
BOOTSTRAP_SERVERS = ['bootstrap.kafka:9092']
consumer = KafkaConsumer(
bootstrap_servers=BOOTSTRAP_SERVERS,
group_id=GROUP,
# enable_auto_commit=False,
auto_commit_interval_ms=0,
max_poll_records=1
)
candidates = []
consumer.commit()
msg = None
partitions = consumer.partitions_for_topic(TOPIC)
for p in partitions:
tp = TopicPartition(TOPIC, p)
consumer.assign([tp])
committed = consumer.committed(tp)
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
print(f"\ntopic: {TOPIC} partition: {p} committed: {committed} last: {last_offset} lag: {(last_offset - committed)}")
consumer.poll(
timeout_ms=100,
# max_records=1
)
# consumer.assign([partition])
consumer.seek(tp, last_offset-4)
for message in consumer:
# print(f"Message is of type: {type(message)}")
print(message)
# print(f'message.offset: {message.offset}')
# TODO find out why the number is -1
if message.offset == last_offset-1:
candidates.append(message)
# print(f' {message}')
# comment if you don't want the messages committed
consumer.commit()
break
print('\n\ngooch\n\n')
latest_msg = candidates[0]
for msg in candidates:
print(f'finalists:\n {msg}')
if msg.timestamp > latest_msg.timestamp:
latest_msg = msg
consumer.close()
print(f'\n\nlatest_message:\n{latest_msg}')
I know that in Java/Scala Kafka Streams there is a possibility to create a table i.e a sub topic with only the last entry in another topic so confluence Kafka library in c might offer a more elegant and efficient way. it has python and java bindings and kafkacat CLI.
You can use the seek method of KafkaConsumer class - you need to find current offsets for every partition, and then perform calculation to find correct offsets.
consumer = KafkaConsumer()
partition = TopicPartition('foo', 0)
start = 1234
end = 2345
consumer.assign([partition])
consumer.seek(partition, start)
for msg in consumer:
if msg.offset > end:
break
else:
print msg
source

org.apache.kafka.common.errors.RecordTooLargeException in Flume Kafka Sink

I am trying to read data from JMS source and pushing them into KAFKA topic, while doing that after few hours i observed that pushing frequency to the KAFKA topic became almost zero and after some initial analysis i found following exception in FLUME logs .
28 Feb 2017 16:35:44,758 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:158) - Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to publish events
at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:252)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1399305 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:686)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:449)
at org.apache.flume.sink.kafka.KafkaSink.process(KafkaSink.java:212)
... 3 more
Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1399305 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
my flume shows the current set value (in logs ) for max.request.size as 1048576 , which is clearly very less than 1399305 , increasing this max.request.size may eliminate these exception but am unable to find correct place for updating that value .
My flume.config ,
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.channels.c1.type = file
a1.channels.c1.transactionCapacity = 1000
a1.channels.c1.capacity = 100000000
a1.channels.c1.checkpointDir = /data/flume/apache-flume-1.7.0-bin/checkpoint
a1.channels.c1.dataDirs = /data/flume/apache-flume-1.7.0-bin/data
a1.sources.r1.type = jms
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i1.preserveExisting = true
a1.sources.r1.channels = c1
a1.sources.r1.initialContextFactory = some context urls
a1.sources.r1.connectionFactory = some_queue
a1.sources.r1.providerURL = some_url
#a1.sources.r1.providerURL = some_url
a1.sources.r1.destinationType = QUEUE
a1.sources.r1.destinationName = some_queue_name
a1.sources.r1.userName = some_user
a1.sources.r1.passwordFile= passwd
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = some_kafka_topic
a1.sinks.k1.kafka.bootstrap.servers = some_URL
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.flumeBatchSize = 1
a1.sinks.k1.channel = c1
Any help will be really appreciated !!
This change has to be done at Kafka.
Update the Kafka producer configuration file producer.properties with a larger value like
max.request.size=10000000
It seems like i have resolved my issue ; As suspected increasing the max.request.size eliminated the exception , for updating such kafka sink(producer) properties FLUME provides the constant prefix as kafka.producer. and we can append this constant prefix with any kafka properties ;
so mine goes as, a1.sinks.k1.kafka.producer.max.request.size = 5271988 .

AssertionError: Unassigned partition

Im trying to consume data from a topic by setting the offset but get assertion error -
from kafka import KafkaConsumer
consumer = KafkaConsumer('foobar1',
bootstrap_servers=['localhost:9092'])
print 'process started'
print consumer.partitions_for_topic('foobar1')
print 'done'
consumer.seek(0,10)
for message in consumer:
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
message.value))
print 'process ended'
Error:-
Traceback (most recent call last):
File "/Users/pn/Documents/jobs/ccdn/kafka_consumer_1.py", line 21, in <module>
consumer.seek(0,10)
File "/Users/pn/.virtualenvs/vpsq/lib/python2.7/site-packages/kafka/consumer/group.py", line 549, in seek
assert partition in self._subscription.assigned_partitions(), 'Unassigned partition'
AssertionError: Unassigned partition
Here is an example to solve the problem:
from kafka import KafkaConsumer, TopicPartition
con = KafkaConsumer(bootstrap_servers = my_bootstrapservers)
tp = TopicPartition(my_topic, 0)
con.assign([tp])
con.seek_to_beginning()
con.seek(tp, 1000000)
Reference:
kafka consumer seek is not working: AssertionError: Unassigned partition
You have to call consumer.assign() with a list of TopicPartitions before calling seek.
Also note that first argument for seek is also a TopicPartition.
See KafkaConsumer API
In my case with Kafka 0.9 and kafka-python, partition assignment is happened during for message in consumer. So, the seek opration should after the iteration. I reset my group's offset by the following code:
import kafka
ps = []
for i in xrange(topic_partition_number):
ps.append(kafka.TopicPartition(topic, i))
consumer = kafka.KafkaConsumer(topic, bootstrap_servers=address, group_id=group)
for msg in consumer:
print msg
consumer.seek_to_beginning(*ps)
consumer.commit()
break