Restarting a Kafka python MultiProcessConsumer consumes all the messages in the queue again - offset

REF : Restarting a Kafka (python) consumer consumes all the messages in the queue again
I'm new to kafka and am trying to deal with offsets management too.
Using last release of Apache-Kafka (0.8.1.1.) with kafka-python 0.9.2 installed from pypi (last upload on 2014-08-27) which is different from the current master branch on github.
When testing with "SimpleConsumer" => crashing and restarting the script consumes messages from the last known offset.
When testing with "MultiProcessConsumer" => crashing and restarting the script restarts consuming from offset "0"
my little script (MultiProcessConsumer):
from kafka import KafkaClient, MultiProcessConsumer
KFK = KafkaClient("localhost:9092")
consumer = MultiProcessConsumer(KFK, "my-group1", "my-topic", num_procs=2)
I can check offsets by:
consumer.offsets
{0: 0, 1: 0}
Then, I run:
A = consumer.get_messages(count=1235)
consumer.offsets
{0: 1235, 1: 0}
After crashing and restarting the script again, first call of "consumer.offsets" returns "{0: 1235, 1: 0}" which is good. But running :
A.consumer.get_messages(count=388)
consumer.offsets
{0: 388, 1: 0}
Any idea about how to deal with this problem ? Moreover, is there anyway to alter properly the MultiProcessConsumer offsets to start from a defined position ?
Thanks for your help.
Edit :
After diving into kafka-python lib source and checking issues on GitHub,
see : https://github.com/mumrah/kafka-python/issues/173
So the problem is that when master multiprocessconsumer starts sub-process, it inits their offsets to "0" on each partition of the topic (because of autocommit set to false for sub-processes) instead of giving them the right values.
See "mahall" comment on GitHub.

It depends on the way consumer requests kafka broker for offset. Most likely you are doing something equivalent to this in Java
readOffset = getLastOffset(consumer,topic, partition, kafka.api.OffsetRequest.EarliestTime(), clientName);
try something like this
readOffset = getLastOffset(consumer,topic, partition, kafka.api.OffsetRequest.LatestTime(), clientName);

Related

How does a consumer know it is no longer listed in the Kafka cluster?

We have this issue that when Kafka brokers must be taken offline, no consumer service has any idea about that and keeps running.
We tried listing consumers in the new Kafka instance, and saw no existing consumer listed there. All consumers listed are those newly created.
We had to manually terminate all existing consumer services which is not convenient every time we hit this issue.
Question - How does a consumer know it is no longer listed in the Kafka cluster so it should terminate itself?
P.S. We use Spring Kafka.
1 -- To Check Clusters & Replica status ?
Check Kafka cluster all broker status
$ zookeeper-shell.sh localhost:9001 ls /brokers/ids
Check Kafka cluster Specific broker status
$ zookeeper-shell.sh localhost:9001 get /brokers/ids/<id>
specific to replica_unavailability check
$ kafka-check --cluster-type=sample_type replica_unavailability
For first broker check
$ kafka-check --cluster-type=sample_type --broker-id 3 replica_unavailability --first-broker-only
Any partitions replicas not available
$ kafka-check --cluster-type=sample_type replica_unavailability
Checking offline partitions
$ kafka-check --cluster-type=sample_type offline
2 -- Code sample to send/auto-shutdown
2 custom options to do handle the shutdown using a kill-message,
do it gracefully by sending a kill-message before taking down
brokers or topics.
Option 1: Consider an in-band message/signal - i.e. send a “kill” message pertaining to topics/brokers consumer is listening to as it follows the offset order on the topic-partition
Option 2: make the consumer listen to 2 topics for e.g. “topic” and “topic_kill”
The difference between the 2 options above, is that the first version is comes in the the order it was sent, consider that there maybe blocking messages maybe waiting, depending on your implementation, to be consumed before that “kill message”.
While, the second version allows kill-signal to arrive independently without being blocked out of band, this is a nicer & reusable architectural pattern, with a clear separation between data topic and signaling.
Code Sample a) producer sending the kill-message & b) consumer to recieve and handle the shutdown
// Producer -- modify and adapt as needed
import json
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_serializer=lambda m: m.encode('utf8'),
value_serializer=lambda m: json.dumps(m).encode('utf8'))
def send_kill(topic: str, partitions: [int]):
for p in partitions:
producer.send(topic, key='kill', partition=p)
producer.flush()
// Consumer to accept a kill-message -- please modify and adapt as needed
import json
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
consumer = KafkaConsumer(bootstrap_servers=['0.0.0.0:<my port number>'],
key_deserializer=lambda m: m.decode('utf8'),
value_deserializer=lambda m: json.loads(m.decode('utf8')),
auto_offset_reset="earliest",
group_id='1')
consumer.subscribe(['topic'])
for msg in consumer:
tp = TopicPartition(msg.topic, msg.partition)
offsets = {tp: OffsetAndMetadata(msg.offset, None)}
if msg.key == "kill":
consumer.commit(offsets=offsets)
consumer.unsuscribe()
exit(0)
# do your work...
consumer.commit(offsets=offsets)

How to move a topic from one broker to another broker in kafka?

I first tried to see if I can create a topic in a particular broker. But looks like this is not possible. Even if I mention the broker host in the bootstrap
admin_client = AdminClient({
"bootstrap.servers": "xxx1.com:9092,xxx2.com:9092"
})
futmap=admin_client.create_topics(topic_list)
The program is arbitrarily picking up one of the 5 brokers that I have as the leader broker for the topic. I am trying to understand why it happens like this.
I am also trying to see if I can reassign the topic leader to another broker. I know it may be possible through the kafka-reassign-partitions command line script, but I wanted to do it programmatically using python and confluent-Kafka package. Is it possible to do this programmatically. I did not find the reassign partition function in the ADMIN class of confluent-Kafka package
Thanks
I have finally got the solution for this, the documentation of the confluent Kafka python package is not adequate for this. But one good thing about open source is that you can read the code and figure out. So, to create the topic in a particular broker, I had to code the create topic code as below. Please note that I have used replica_assignment instead of replication_factor. These two are mutually exclusive. If you use the replication_factor, the partitions will be assigned by Kafka, you can control the assignment through replica_assignment. However, I am sure that this will get re-assigned in case of a rebalancing/re-assigning of partitions. But that can also be handled through the on_revoke event. But for now, this works for me.
def createTopic(admin_client,topics):
#topic_name=topics
topic_name = ['rajib1_test_xxx_topic']
replica_assignment = [[262, 261]]
topic_list = [NewTopic(topic, num_partitions=1, replica_assignment=replica_assignment) for topic in topic_name]
futmap=admin_client.create_topics(topic_list)
# Wait for each operation to finish.
for topic, f in futmap.items():
try:
f.result() # The result itself is None
print("Topic {} created".format(topic))
except Exception as e:
print("Failed to create topic {}: {}".format(topic, e))
#return futmap
You could also use the kafka-reassign-partitions.sh tool that comes with Kafka to change the replicas of one topic to another broker.
For example, if you want to have your (in this example single-replicated, and single-partitioned) topic "test" be located on broker "1", you can first define a plan (named replicachange.json):
{
"partitions":
[{"topic": "test", "partition": 0,
"replicas": [
1
]
}],
"version":1
}
and then execute it using:
kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute \
--reassignment-json-file replicachange.json

io.confluent.ksql.exception.KafkaTopicExistsException: when launching ksql-server-start ksql-server.properties

I'm working with ksql from quite some time. Kafka cluster if of 3 nodes. I've been using udf as well and all looks good until I stop the servers and start them again.
On server start I'm seeing the following in the logs:
[2019-04-03 11:29:54,381] ERROR Exception encountered running command: A Kafka topic with the name 'czxcorp-structured-data-enriched' already exists, with different partition/replica configuration than required. KSQL expects 4 partitions (topic has 9), and 1 replication factor (topic has 1).. Retrying in 5000 ms (io.confluent.ksql.util.RetryUtil:80)
[2019-04-03 11:29:54,381] ERROR Stack trace: io.confluent.ksql.exception.KafkaTopicExistsException: A Kafka topic with the name 'czxcorp-structured-data-enriched' already exists, with different partition/replica configuration than required. KSQL expects 4 partitions (topic has 9), and 1 replication factor (topic has 1).
at io.confluent.ksql.services.TopicValidationUtil.validateTopicProperties(TopicValidationUtil.java:51)
at io.confluent.ksql.services.TopicValidationUtil.validateTopicProperties(TopicValidationUtil.java:35)
at io.confluent.ksql.services.KafkaTopicClientImpl.validateTopicProperties(KafkaTopicClientImpl.java:292)
at io.confluent.ksql.services.KafkaTopicClientImpl.createTopic(KafkaTopicClientImpl.java:76)
at io.confluent.ksql.planner.plan.KsqlStructuredDataOutputNode.createSinkTopic(KsqlStructuredDataOutputNode.java:244)
at io.confluent.ksql.planner.plan.KsqlStructuredDataOutputNode.buildStream(KsqlStructuredDataOutputNode.java:146)
at io.confluent.ksql.physical.PhysicalPlanBuilder.buildPhysicalPlan(PhysicalPlanBuilder.java:106)
at io.confluent.ksql.QueryEngine.buildPhysicalPlan(QueryEngine.java:113)
at io.confluent.ksql.KsqlEngine$EngineExecutor.execute(KsqlEngine.java:625)
at io.confluent.ksql.KsqlEngine$EngineExecutor.access$800(KsqlEngine.java:577)
at io.confluent.ksql.KsqlEngine.execute(KsqlEngine.java:247)
at io.confluent.ksql.rest.server.computation.StatementExecutor.startQuery(StatementExecutor.java:277)
at io.confluent.ksql.rest.server.computation.StatementExecutor.executeStatement(StatementExecutor.java:191)
at io.confluent.ksql.rest.server.computation.StatementExecutor.handleStatementWithTerminatedQueries(StatementExecutor.java:167)
at io.confluent.ksql.rest.server.computation.StatementExecutor.handleRestore(StatementExecutor.java:101)
at io.confluent.ksql.rest.server.computation.CommandRunner.lambda$null$0(CommandRunner.java:139)
at io.confluent.ksql.util.RetryUtil.retryWithBackoff(RetryUtil.java:63)
at io.confluent.ksql.util.RetryUtil.retryWithBackoff(RetryUtil.java:36)
at io.confluent.ksql.rest.server.computation.CommandRunner.lambda$processPriorCommands$1(CommandRunner.java:135)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at io.confluent.ksql.rest.server.computation.CommandRunner.processPriorCommands(CommandRunner.java:134)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:414)
at io.confluent.ksql.rest.server.KsqlServerMain.createExecutable(KsqlServerMain.java:80)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:42)
(io.confluent.ksql.util.RetryUtil:84)
Though I've stopped/terminated all the queries, the log prints all the commands I've executed from the beginning for my testing till data, including create, select, drop. I've pulled out the .jar(UDF) from /ext folder and the server started, though the log prints udf function(i'm using) not available.
This is my ksql-server.properties:
bootstrap.servers=hostname:9092
service.id=cyan_ksql
commit.interval.ms=5000
cache.max.bytes.buffering=20000000
num.stream.threads=10
fail.on.deserialization.error=false
listeners=http://localhost:8088
ksql.extension.dir=/opt/ksql-master/ext/
Going nuts with the error. I'm deleting the topic and somehow its recreated. Someone please help.
Check out the error:
A Kafka topic with the name 'czxcorp-structured-data-enriched' already exists, with different partition/replica configuration than required.
KSQL expects 4 partitions (topic has 9), and 1 replication factor (topic has 1)
If you've deleted the topic then either
it didn't actually get deleted
it got deleted and something else recreated it with nine partitions and your erroring KSQL query has not specified an override (WITH (PARTITIONS=9) to the default four
another KSQL command is creating it ahead of the one that errors out and your erroring KSQL query has not specified an override (WITH (PARTITIONS=9) to the default four
If you want to blow away your state and start from scratch, simply change your ksql.service.id which will cause KSQL to use a new command topic (which is what get replayed when you restart the process)

UnknownProducerIdException in Kafka streams when enabling exactly once

After enabling exactly once processing on a Kafka streams application, the following error appears in the logs:
ERROR o.a.k.s.p.internals.StreamTask - task [0_0] Failed to close producer
due to the following error:
org.apache.kafka.streams.errors.StreamsException: task [0_0] Abort
sending since an error caught with a previous record (key 222222 value
some-value timestamp 1519200902670) to topic exactly-once-test-topic-
v2 due to This exception is raised by the broker if it could not
locate the producer metadata associated with the producerId in
question. This could happen if, for instance, the producer's records
were deleted because their retention time had elapsed. Once the last
records of the producerId are removed, the producer's metadata is
removed from the broker, and future appends by the producer will
return this exception.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:125)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:48)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:180)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1199)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:596)
at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:557)
at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:481)
at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:692)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.UnknownProducerIdException
We've reproduced the issue with a minimal test case where we move messages from a source stream to another stream without any transformation. The source stream contains millions of messages produced over several months. The KafkaStreams object is created with the following StreamsConfig:
StreamsConfig.PROCESSING_GUARANTEE_CONFIG = "exactly_once"
StreamsConfig.APPLICATION_ID_CONFIG = "Some app id"
StreamsConfig.NUM_STREAM_THREADS_CONFIG = 1
ProducerConfig.BATCH_SIZE_CONFIG = 102400
The app is able to process some messages before the exception occurs.
Context information:
we're running a 5 node Kafka 1.1.0 cluster with 5 zookeeper nodes.
there are multiple instances of the app running
Has anyone seen this problem before or can give us any hints about what might be causing this behaviour?
Update
We created a new 1.1.0 cluster from scratch and started to process new messages without problems. However, when we imported old messages from the old cluster, we hit the same UnknownProducerIdException after a while.
Next we tried to set the cleanup.policy on the sink topic to compact while keeping the retention.ms at 3 years. Now the error did not occur. However, messages seem to have been lost. The source offset is 106 million and the sink offset is 100 million.
As explained in the comments, there currently seems to be a bug that may cause problems when replaying messages older than the (maximum configurable?) retention time.
At time of writing this is unresolved, the latest status can always be seen here:
https://issues.apache.org/jira/browse/KAFKA-6817

How to recover lost message in kafka consumer

I'm writing an application in Apache camel. I am consume messages from some Kafka topic via camel Kafka component and dumps into database for recovery in case of any crash/restart happens. Below is the camel URI
kafka:?autoCommitEnable=false&groupId=r&keySerializerClass=org.apache.kafka.common.serialization.StringSerializer&serializerClass=org.apache.kafka.common.serialization.StringSerializer&topic=
My use case is - I have consumed some message(s) from Kafka but could not dumped the same into the database for recovery and crash happens.Now how to get the all the lost messages with the same consumer group ID after restarting the application ?
Thanks
Now how to get the all the lost messages with the same consumer group ID after restarting the application ?
Actually kafka store the consumer offset for you, If you do commit offset in your application. So when you restart the application, It will consume message from the last offset stored in kafka.
You could set the AutocommitEnable=true OR
I also found this https://github.com/apache/camel/blob/camel-2.18.2/components/camel-kafka/src/main/java/org/apache/camel/component/kafka/KafkaConsumer.java.
There are some piece code :
if (endpoint.getConfiguration().isAutoCommitEnable() != null
&& !endpoint.getConfiguration().isAutoCommitEnable()) {
long partitionLastoffset = partitionRecords.get(partitionRecords.size() - 1).offset();
consumer.commitSync(Collections.singletonMap(
partition, new OffsetAndMetadata(partitionLastoffset + 1)));
}
The Camel will take care of this even you do not set the AutocommitEnable.