Detected out-of-order KTable update when updating same GlobalKtable from different input topics - apache-kafka

I have 2.4.1 streams app which tracks 2 different topics lets say topic A and B, joins them with one GlobalKtable and sends messages to the source topic of this GlobalKtable. So like output of the app will be one of the input sources.
The problem I am experiencing is with 2 close messages (< 200 mls) in topics A and B, the output of the app from first message in topic A is not considered when next message from topic B is processed. So GlobalKtable state is rewritten with incorrect results produced by second B join with table. And next warn is shown in the logs
| WARN | org.apache.kafka.streams.kstream.internals.KTableSource |
Detected out-of-order KTable update for store at
offset 3, partition 5. | |
The questions are next:
Am I understanding correctly from https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization that Kafka streams buffer some messages for processing and don't consider any updates done by these messages to app-state
To make the logic behave correctly I need to break app for two for each topic, is there any chance there will be same problem?
Can kafka-streams be improved to correctly process such situations, like make iteration not every n buffered messages/ n time, but every input message?
UPDATE
I tried to use 2 different streams aps (point 2):
topic A + join table -> source topic of table.
topic B + join table -> source topic of table.
Now see different behavior, but result is same, even with max.task.idle.ms=1000
Topic A: timestamp 1626946183424
Topic B: timestamp 1626946183427
Output - source table topic:
Table topic: 0 1 timestamp 1626946183427
Table topic: 0 2 timestamp 1626946183424
Out of order message too.
Detected out-of-order KTable update for store at offset 2, partition 0
Detected out-of-order KTable update for store at offset 2, partition 0

Related

KStream-KTable LeftJoin, Join occured while KTable is not fully loaded

I am trying to use KStream-KTable leftJoin to enrich the item from topic A with Topic B. Topic A is my KStream, and topic B is my KTtable which has around 23M records. The keys from both topics are not mathced, so I have to KStream(topic B) to KTable using reducer.
Here is my code:
KTable<String, String> ktable = streamsBuilder
.stream("TopicB", Consumed.withTimestampExtractor(new customTimestampsExtractor()))
.filter((key, value) -> {...})
.transform(new KeyTransformer()) // generate new key
.groupByKey()
.reduce((aggValue, newValue) -> {...});
streamBuilder
.stream("TopicA")
.filter((key, value) -> {...})
.transform(...)
.leftJoin(ktable, new ValueJoiner({...}))
.transform(...)
.to("result")
1) the KTable initialization is slow. (around 2000 msg/s), is this normal? My topic is only have 1 partition. Any way to improve the performance?
I tried to set the following to reduec write throughput but seems doesn't improve a lot.
CACHE_MAX_BYTES_BUFFERING_CONFIG = 10 * 1024 * 1024
COMMIT_INTERVAL_MS_CONFIG = 15 * 1000
2) The join occurs when KTable is not finished loaded from Topic B.
Here is the offset when join is occured (CURRENT-OFFSET/LOG-END-OFFSET)
Topic A: 32725/32726 (Lag 1)
Topic B: 1818686/23190390 (Lag 21371704)
I checked the timestamp of the record of Topic A that failed, it is a record of 4 days ago, and the last record of Topic B which is processed is 6 days ago.
As my understanding, kstream process record based on timestamp, I don't understand why in my case, KStream(Topic A) didn't wait till KTable(Topic B) is fully loaded up to the point when it is 4 days ago to trigger the join.
I also tried setting timestamp extractor return 0, but it doesn't work as well.
Updated: When setting timestamp to 0, I am getting the following error:
Caused by: org.apache.kafka.common.errors.UnknownProducerIdException: This exception is raised by the broker if it could not locate the producer metadata associated with the producerId in question. This could happen if, for instance, the producer's records were deleted because their retention time had elapsed. Once the last records of the producerID are removed, the producer's metadata is removed from the broker, and future appends by the producer will return this exception.
I also tried setting max.task.idle.ms to > 0 (3 seconds and 30 minute), but still getting the same error.
Updated: I fixed the 'UnknownProducerIdException' error by setting the customTimestampsExtractor to 6 days ago which is still earlier than record from Topic A. I thhink (not sure) setting to 0 trigger retention on the changelog which caused this error. However, join is still not working where it still happen before the ktable finished loading. Why is that?
I am using Kafka Streams 2.3.0.
Am I doing anything wrong here? Many thanks.
1.the KTable initialization is slow. (around 2000 msg/s), is this normal?
This depend on your network, and I think the limition is the consuming rate of TopicB, two config CACHE_MAX_BYTES_BUFFERING_CONFIG and COMMIT_INTERVAL_MS_CONFIG which you use is to choose the trade-off between how much output of KTable you want to produce (cause KTable changelog is stream of revisions) and how much latency you accept when you update KTable to underlying topic and downstream processor. Take a detail look at Kafka Streams caching config for state store and this blog part Tables, Not Triggers.
I think the good way to increase the consuming rate of TopicB is to add more partition.
KStream.leftJoin(KTable,...) is always table lookup, it's always join the current stream record with the latest updated record on KTable, it'll not take stream time into account when deciding whether to join or not. If you want to consider stream time when joining, take a look at KStream-KStream join.
In your case this lag is the lag of TopicB, it does not mean KTable is not fully loaded. Your KTable is not fully loaded when it's in the state restore process when it's read from underlying changelog topic of KTable to restore the current state before actually running your stream app, in just case you will not able to do the join because stream app is not running until state is fully restore.

How to ensure for Kafka Streams when listening to topics with multiple partitions that all related data is processed?

I would like to know how Kafka Streams are assigned to partitions of topics for reading.
As far as I understand it, each Kafka Stream Thread is a Consumer (and there is one Consumer Group for the Stream). So I guess the Consumers are randomly assigned to the partitions.
But how does it work, if I have multiple input topics which I want to join?
Example:
Topic P contains persons. It has two partitions. The key of the message is the person-id so each message which belongs to a person always ends up in the same partition.
Topic O contains orders. It has two partitions. Lets say the key is also the person-id (of the person who ordered something). So here, too, each order-message which belongs to a person always ends up in the same partition.
Now I have stream which which reads from both topics and counts all orders per person and writes it to another topic (where the message also includes the name of the person).
Data in topic P:
Partition 1: "hans, id=1", "maria, id=3"
Partition 2: "john, id=2"
Data in topic O:
Partition 1: "person-id=2, pizza", "person-id=3, cola"
Partition 2: "person-id=1, lasagne"
And now I start two streams.
Then this could happen:
Stream 1 is assigned to topic P partition 1 and topic O partition 1.
Stream 2 is assigned to topic P partition 2 and topic O partition 2.
This means that the order lasagne for hans would never get counted, because for that a stream would need to consume topic P partition 1 and topic O partition 2.
So how to handle that problem? I guess its fairly common that streams need to somehow process data which relates to each other. So it must be ensured that the relating data (here: hans and lasagne) is processed by the same stream.
I know this problem does not occur if there is only one stream or if the topics only have one partition. But I want to be able to concurrently process messages.
Thanks
Your use case is a KStream-KTable join where KTable store info of Users and KStream is the stream of Order, so the 2 topics have to be co-partitioned which they must have same partitions number and partitioned by the same key and Partitioner. If you're using person-id as key for kafka messages, and using the same Partitioner you should not worry about this case, cause they are on the same partition number.
Updated : As Matthias pointed out each Stream Thread has it's own Consumer instance.

Autoscaling with KAFKA and non-transactional databases

Say, I have an application that reads a batch of data from KAFKA, it uses the keys of the incoming messages and makes a query to HBase (reads the current data from HBase for those keys), does some computation and writes data back to HBase for the same set of keys. For e.g.
{K1, V1}, {K2, V2}, {K3, V3} (incoming messages from KAFKA) --> My Application (Reads the current value of K1, K2 and K3 from HBase, uses the incoming value V1, V2 and V3 does some compute and writes the new values for K1 (V1+x), K2 (V2+y) and K3(V3+z) back to HBase after the processing is complete.
Now, let’s say I have one partition for the KAFKA topic and 1 consumer. My application has one consumer thread that is processing the data.
The problem is that say HBase goes down, at which point my application stops processing messages, and a huge lag builds into KAFKA. Even, though I have the ability to increase the number of partitions and correspondingly the consumers, I cannot increase either of them because of RACE conditions in HBase. HBase doesn’t support row level locking so now if I increase the number of partitions the same key could go to two different partitions and correspondingly to two different consumers who may end up in a RACE condition and whoever writes last is the winner. I will have to wait till all the messages gets processed before I can increase the number of partitions.
For e.g.
HBase goes down --> Initially I have one partition for the topic and there is unprocessed message --> {K3, V3} in partition 0 --> now I increase the number of partitions and message with key K3 is now present let’s say in partition 0 and 1 --> then consumer consuming from partition 0 and another consumer consuming from partition 1 will end up competing to write to HBase.
Is there a solution to the problem? Of course locking the key K3 by the consumer processing the message is not the solution since we are dealing with Big Data.
When you increase a number of partitions only new messages come to the newly added partitions. Kafka takes responsibility for processing one message exactly once
A message will only appear in one and only one kafka partition. It is using a hash function on the message modulo the number of partitions. I believe this guarantee solves your problem.
But bear in mind that if you change the number of partitions the same message key could be allocated to a different partition. That may matter if you care about the ordering of messages that is only guaranteed per partition. If you care about the ordering of messages repartitioning (e.g. increasing the number of partitions) is not an option.
As Vassilis mentioned, Kafka guarantee that single key will be only in one partition.
There are different strategies how to distribute keys on partitions.
When you increase partition number or change partitioning strategy, a rebalance process could occur which may affect to working consumers. If you stop consumers for a while, you could avoid possibility of processing the same key by two consumers.

Kafka and timestamp ordering within a single topic partition for ingestion time

When exclusively reading messages from a single partition living in a Kafka topic where timestamps are configured for ingestion (broker) time, can I assume that all message retrieved from the partition will always be in strict timestamp order?
Kafka provides ordering guarantees while storing as well as retrieving messages i.e messages are stored & retrieved in the order they are sent.
Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset(as well as lower Timestamp) than M2 and appear earlier in the log.
A consumer instance sees records in the order they are stored in the log.
However , Kafka only provides a total order over records within a partition, not between different partitions in a topic. But, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group(Not suggested). Using this analogy , if you have only 1 partition , then it's a yes for your use-case but if more partitions then again a yes for ordering on per partition basis but ordering can't be guaranteed across the topic(multiple partitions).
Yes I was talking about a Kafka topic when it is explicitly configured for log append time.
I'm assuming that since the broker determines the timestamp and the broker owns a particular partition that timestamps in that partition will reflect timestamp order.
Rephrasing the question, is this always true within a single partition configured for log append time:
timestamp x <= timestamp y
where
offset x < offset y
Thanks.

Kafka topic per producer

Lets say I have multiple devices. Each device has different type of sensors. Now I want to send the data from each device for each sensor to kafka. But I am confused about the kafka topics. For processing this real time data
Is it good to have kafka topic per device and all the sensors from that device will send the data to particular kafka topic, or I should create one topic and have all the devices send the data to that one topic.
If I go with first case where we will create topic per device then,
Device1 (sensor A, B, C) -> topic1
Device2 (sensor A, B, C) -> topic2
how many topics I can create?
Will this model scale?
Case 2: where, sending data to one topic
Device1 (sensor A, B, C), Device2 (sensor A, B, C)....DeviceN.... -> topic
Isn't this going to be bottleneck for data. Since it will behave as queue data from some sensor will be way behind in queue and will not be processed in real time.
Will this model scale?
EDIT
Lets say each device is associated with user (many to one). So I want to process data according to devices. So the way I want to process data is, each device and its sensor data will go to the user after some processing.
Say I do following
Device1
-> Sensor A - Topic1 Partition 1
-> Sensor B - Topic1 Partition 2
Device2
-> Sensor A - Topic2 Partition 1
-> Sensor B - Topic2 Partition 2
I want some pub/sub type of behavior. Since devices can be added or removed also sensors can be added or removed. Is there a way to create these topics and partition on the fly.
If not kafka, what pub/sub will be suitable for this kind of behavior.
It depends on your semantics:
a topic is a logical abstraction and should contain "unify" data, ie, data with the same semantical meaning
a topic can easily be scaled out via its number of partitions
For example, if you have different type of sensors collecting different data, you should use a topic for each type.
Since devices can be added or removed also sensors can be added or removed. Is there a way to create these topics and partition on the fly.
If device meta data (to distinguish where date comes from) is embedded in each message, you should use a single topic with many partitions to scale out. Adding new topics or partitions is possible but must be done manually. For adding new partitions, a problem might be that it might change your data distribution and thus might break semantics. Thus, best practice is to over partition your topic from the beginning on to avoid adding new partitions.
If there is no embedded meta data, you would need multiple topics (eg, per user, or per device) to distinguish message origins.
As an alternative, maybe a single topic with multiple partitions and a fixed mapping from device/sensor to partition -- via using a custom partitioner -- would work, too. For this case, adding new partitions is no problem, as you control data distribution and can keep it stable.
Update
There is a blog post discussing this: https://www.confluent.io/blog/put-several-event-types-kafka-topic/
I would create topics based on sensors and partitions based on devices:
A sensor on Device 1 -> topic A, partition 1.
A sensor on Device 2 -> topic A, partition 2.
B sensor on Device 2 -> topic B, partition 2.
and so on.
I don't know what kind of sensors you have, but they seems to belong semantically to the same set of data. With the help of partitions you can have parallel processing.
But it depends on how you want to process you data: is it more important to process sensors together or devices?