is Time Based log Compaction in Kafka based on Wall Clock time or Event time or a mix of Both? - apache-kafka

I have been trying to understand how to set time based log compaction, but still can't understand its behavior properly. In particular i am interested in the behavior of log.roll.ms.
What I would like to understand is the following statement taken from the official kafka doc: https://kafka.apache.org/documentation.html#upgrade_10_1_breaking
The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms
in T + log.roll.ms
a) I understand that T is based on the timestamp of the message and therefore can be considered the Event Time. However what is the clock behind log.roll.ms. In kafka stream for instance, when working with Event time it is clear what is the stream, it is the highest timestamp seen. So does the time for log compaction progress with the timestamp of the message and therefore is Event Time, or it progress based on the walk-clock time of the Brokers
I thought it was event time, but then i saw the following talk https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/ where
#Matthias J. Sax talks about it. From his talks i got confused. It seems indeed that compaction is driven by both Event time T and Walking time

Related

How does Google Dataflow determine the watermark for various sources?

I was just reviewing the documentation to understand how Google Dataflow handles watermarks, and it just mentions the very vague:
The data source determines the watermark
It seems you can add more flexibility through withAllowedLateness but what will happen if we do not configure this?
Thoughts so far
I found something indicating that if your source is Google PubSub it already has a watermark which will get taken, but what if the source is something else? For example a Kafka topic (which I believe does not inherently have a watermark, so I don't see how something like this would apply).
Is it always 10 seconds, or just 0? Is it looking at the last few minutes to determine the max lag and if so how many (surely not since forever as that would get distorted by the initial start of processing which might see giant lag)? I could not find anything on the topic.
I also searched outside the context of Google DataFlow for Apache Beam documentation but did not find anything explaining this either.
When using Apache Kafka as a data source, each Kafka partition may have a simple event time pattern (ascending timestamps or bounded out-of-orderness). However, when consuming streams from Kafka, multiple partitions often get consumed in parallel, interleaving the events from the partitions and destroying the per-partition patterns (this is inherent in how Kafka’s consumer clients work).
In that case, you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.
For example, if event timestamps are strictly ascending per Kafka partition, generating per-partition watermarks with the ascending timestamps watermark generator will result in perfect overall watermarks. Note, that TimestampAssigner is not provided in the example, the timestamps of the Kafka records themselves will be used instead.
In any data processing system, there is a certain amount of lag between the time a data event occurs (the “event time”, determined by the timestamp on the data element itself) and the time the actual data element gets processed at any stage in your pipeline (the “processing time”, determined by the clock on the system processing the element). In addition, there are no guarantees that data events will appear in your pipeline in the same order that they were generated.
For example, let’s say we have a PCollection that’s using fixed-time windowing, with windows that are five minutes long. For each window, Beam must collect all the data with an event time timestamp in the given window range (between 0:00 and 4:59 in the first window, for instance). Data with timestamps outside that range (data from 5:00 or later) belong to a different window.
However, data isn’t always guaranteed to arrive in a pipeline in time order, or to always arrive at predictable intervals. Beam tracks a watermark, which is the system’s notion of when all data in a certain window can be expected to have arrived in the pipeline. Once the watermark progresses past the end of a window, any further element that arrives with a timestamp in that window is considered late data.
From our example, suppose we have a simple watermark that assumes approximately 30s of lag time between the data timestamps (the event time) and the time the data appears in the pipeline (the processing time), then Beam would close the first window at 5:30. If a data record arrives at 5:34, but with a timestamp that would put it in the 0:00-4:59 window (say, 3:38), then that record is late data.

KStreamWindowAggregate 2.0.1 vs 2.5.0: skipping records instead of processing

I've recently upgraded my kafka streams from 2.0.1 to 2.5.0. As a result I'm seeing a lot of warnings like the following:
org.apache.kafka.streams.kstream.internals.KStreamWindowAggregate$KStreamWindowAggregateProcessor Skipping record for expired window. key=[325233] topic=[MY_TOPIC] partition=[20] offset=[661798621] timestamp=[1600041596350] window=[1600041570000,1600041600000) expiration=[1600059629913] streamTime=[1600145999913]
There seem to be new logic in the KStreamWindowAggregate class that checks if a window has closed. If it has been closed the messages are skipped. Compared to 2.0.1 these messages where still processed.
Question
Is there a way to get the same behavior like before? I'm seeing lots of gaps in my data with this upgrade and not sure how to solve this, as previously these gaps where not seen.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Update
While further exploring I indeed see it to be related to the graceperiod in ms. It seems that in my custom timestampextractor (that has the logic to use the timestamp from the payload instead of the normal timestamp), I'm able to see that the incoming timestamp for the expired window warnings indeed is bigger than the 24 hours compared to the event time from the payload.
I assume this is caused by consumer lags of over 24 hours.
The timestamp extractor extract method has a partition time which according to the docs:
partitionTime the highest extracted valid timestamp of the current record's partition˙ (could be -1 if unknown)
so is this the create time of the record on the topic? And is there a way to influence this in a way that my records are no longer skipped?
Compared to 2.0.1 these messages where still processed.
That is a little bit surprising (even if I would need to double check the code), at least for the default config. By default, store retention time is set to 24h, and thus in 2.0.1 older messages than 24h should also not be processed as the corresponding state got purged already. If you did change the store retention time (via Materialized#withRetention) to a larger value, you would also need to increase the window grace period via TimeWindows#grace() method accordingly.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Not sure what you mean by this or how you actually do this? The old and new logic are similar with regard to how a long a window is stored (retention time config). The new part is the grace period that you can increase to the same value as retention time if you wish).
About "partition time": it is computed base on whatever TimestampExtractor returns. For your case, it's the max of whatever you extracted from the message payload.

Deduplication using Kafka-Streams

I want to deduplication in my kafka-streams application which uses state-store and using this very good example:
https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java
I have few questions about this example.
As I correctly understand, this example briefly do this:
Message comes into input topic
Look at the store, if it does not exist, write to state-store and return
if it does exist drop the record, so the deduplication is applied.
But in the code example there is a time window size that you can determine. Also, retention time for the messages in the state-store. You can also check the record is in the store or not by giving timestamp timeFrom + timeTo
final long eventTime = context.timestamp();
final WindowStoreIterator<String> timeIterator = store.fetch(
key,
eventTime - leftDurationMs,
eventTime + rightDurationMs
);
What is the actual purpose for the timeTo and timeFrom ? I am not sure why I am checking the next time interval because I am checking the future messages that did not come to my topic yet ?
My second question does this time interval related and should HIT the previous time window ?
If I am able to search the time interval by giving timeTo and timeFrom, why time window size is important ?
If I give the window Size 12 hours, am I able to guarantee that I am deduplicated messages for 12 hours ?
I think like this:
First message comes with key "A" in the first minute of the application start-up, after 11 hours, the message with a key "A" comes again. Can I catch this duplicated message by giving enough time interval like eventTime - 12hours ?
Thanks for any ideas !
TimeWindow size decides how long you wants the "duplication" runs, no duplication forever or just during 5 minutes. Kafka has to store these records. A large timewindow may consume a large resource of your server.
TimeFrom and TimeTo, cause your record(event) may arrive/process late in kafka, so the event-time of the record is 1 minute ago, not now. Kafka is process an "old" record, and that's it needs to take care of records which are not that old, relatively "future" records to the "old" one.

KTable suppress(Suppressed.untilTimeLimit()) not contains records the specified time

I have implemented a stream processing app that makes some calculations and transformations, and send the result to an output topic.
After that, I read from that topic and I want to suppress the results for 35", just like a timer, meaning that all the output records from that suppress will be sended to an specific "timeout" topic.
The simplified code looks like this:
inputStream
.suppress(Suppressed.untilTimeLimit(Duration.ofSeconds(35), Suppressed.BufferConfig.unbounded()))
.toStream()
.peek((key, value) -> LOGGER.warn("incidence with Key: {} timeout -> Time = {}", key, 35))
.filterNot((key, value) -> value.isDisconnection())
The problem I have here is that suppress contains the records during an arbitrary time, not the specified 35 seconds.
For more information I'm using event-time extracted in the former process described at the beginning, and records are arriving each second;
Thanks
Update
This is an input record example:
rowtime: 4/8/20 8:26:33 AM UTC, key: 34527882, value: {"incidenceId":"34527882","installationId":"18434","disconnection":false,"timeout":false, "creationDate":"1270801593"}
I ran into a similar issue some time ago, and the reason why suppress contains the records during an arbitrary time it's due to the fact that the suppress operator uses what they call stream time instead of the intuitive wall-clock time.
As of now, untilTimeLimit only supports stream time, which limits its usefulness. Work is underway to add a wall-clock time option, expanding this feature into a general rate control mechanism.
The important aspects of time for Suppress are:
The timestamp of each record: This is also known as event time.
What time is “now”? Stream processing systems have two main choices here:
The intuitive “now” is wall-clock time, the time you would see if you look at a clock on the wall while your program is running.
There is also stream time, which is essentially the maximum timestamp your program has observed in the stream so far. If you’ve been polling a topic, and you’ve seen records with timestamps 10, 11, 12, 11, then the current stream time is 12.
Reference: Kafka Streams’ Take on Watermarks and Triggers

StreamsException: Extracted timestamp value is negative, which is not allowed

This could be a duplicate of Error in Kafka Streams using kafka-node - negative timestamp, but certainly not. My Kafka Streams app does some transformation logic on each message and forwards it to a new topic. There is no time-based aggregation/processing in the app, so there is no need of using any custom timestamp extractor. This app was running fine for several days, but all of sudden the app thrown a negative timestamp exception.
Exception in thread "StreamThread-4" org.apache.kafka.streams.errors.StreamsException: Extracted timestamp value is negative, which is not allowed.
After throwing this exception from all StreamThreads (10 in total), the app was kind of frozen as there was no further progress on the stream for several hours. There was no exception thrown after that. When I restarted the app, it started to process only the newly coming messages.
Now the question is, what happened to those messages that came in between (after throwing the exception and before restarting the app). In case, those missing messages had no embedded timestamp (Highly impossible as no changes happened in the broker and producer), isn't that the app should have thrown an exception for each such message? Or is't like the app stop the stream progress when it detects the negative timestamp in the message at first time? Is there a way to handle this situation so that the app can progress the stream, even after detecting any negative timestamp?My app uses Kafka Streams library version 0.10.0.1-cp1.
Note: I can easily put up a custom timestamp extractor which can check the negative timestamp in each message, but that is a lot of unnecessary overhead for my app. All I want to understand is why was the stream not progressed after detecting a message with negative timestamp.
Even if you do not have any time based operator, a Kafka Streams application checks if timestamps returned from timestamp extractor are valid, because timestamps are used to determine processing order of records from different partitions, to ensure records are processes in-order and all partitions are consumed in an time-based aligned manner.
If a negative timestamp is detected, the application (or actually the corresponding thread) dies. Unfortunately, it is currently not possible to recover from such an exception and you would need to restart your application. See also Confluent FAQs: http://docs.confluent.io/3.1.1/streams/faq.html#invalid-timestamp-exception
If your application dies and you restart it, it will resume processing where it left off. Unfortunately, in Kafka 0.10.0.1 there is a bug (fixed in upcoming release 0.10.2) and in case of failure an incorrect offset can get committed and the application "steps over" some records. I assume this happened in your case, and if you have only some records with an invalid timestamp, those record might have been skipped allowing your application to resume after restart. This behavior is actually a bug -- without the bug, Kafka Stream would try to process those records with invalid timestamp again and again and fail every time until you provide a custom timestamp extractor that fixes the problem by returning a valid timestamp.
How to fix it:
The correct fix would be to provide a custom timestamp extractor that does never return an invalid (ie, negative) timestamp.
I have no explanation why you got invalid timestamps though... This is quite strange and you might want to investigate your producer setup and try to figure out if there is the possibility that your producer puts and invalid timestamp (even if this is unlikely -- I have no other idea what the root cause of the problem could be).
Further remarks:
In the next release (0.10.2), handling invalid timestamps gets simplified and Kafka Streams provides more built-in timestamp extractors that handle records with invalid timestamps differently. For example, this allows you to auto-skip records with invalid timestamps instead of raising an error (current behavior). For more details see KIP-93: https://cwiki.apache.org/confluence/display/KAFKA/KIP-93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams