StreamsException: Extracted timestamp value is negative, which is not allowed - apache-kafka

This could be a duplicate of Error in Kafka Streams using kafka-node - negative timestamp, but certainly not. My Kafka Streams app does some transformation logic on each message and forwards it to a new topic. There is no time-based aggregation/processing in the app, so there is no need of using any custom timestamp extractor. This app was running fine for several days, but all of sudden the app thrown a negative timestamp exception.
Exception in thread "StreamThread-4" org.apache.kafka.streams.errors.StreamsException: Extracted timestamp value is negative, which is not allowed.
After throwing this exception from all StreamThreads (10 in total), the app was kind of frozen as there was no further progress on the stream for several hours. There was no exception thrown after that. When I restarted the app, it started to process only the newly coming messages.
Now the question is, what happened to those messages that came in between (after throwing the exception and before restarting the app). In case, those missing messages had no embedded timestamp (Highly impossible as no changes happened in the broker and producer), isn't that the app should have thrown an exception for each such message? Or is't like the app stop the stream progress when it detects the negative timestamp in the message at first time? Is there a way to handle this situation so that the app can progress the stream, even after detecting any negative timestamp?My app uses Kafka Streams library version 0.10.0.1-cp1.
Note: I can easily put up a custom timestamp extractor which can check the negative timestamp in each message, but that is a lot of unnecessary overhead for my app. All I want to understand is why was the stream not progressed after detecting a message with negative timestamp.

Even if you do not have any time based operator, a Kafka Streams application checks if timestamps returned from timestamp extractor are valid, because timestamps are used to determine processing order of records from different partitions, to ensure records are processes in-order and all partitions are consumed in an time-based aligned manner.
If a negative timestamp is detected, the application (or actually the corresponding thread) dies. Unfortunately, it is currently not possible to recover from such an exception and you would need to restart your application. See also Confluent FAQs: http://docs.confluent.io/3.1.1/streams/faq.html#invalid-timestamp-exception
If your application dies and you restart it, it will resume processing where it left off. Unfortunately, in Kafka 0.10.0.1 there is a bug (fixed in upcoming release 0.10.2) and in case of failure an incorrect offset can get committed and the application "steps over" some records. I assume this happened in your case, and if you have only some records with an invalid timestamp, those record might have been skipped allowing your application to resume after restart. This behavior is actually a bug -- without the bug, Kafka Stream would try to process those records with invalid timestamp again and again and fail every time until you provide a custom timestamp extractor that fixes the problem by returning a valid timestamp.
How to fix it:
The correct fix would be to provide a custom timestamp extractor that does never return an invalid (ie, negative) timestamp.
I have no explanation why you got invalid timestamps though... This is quite strange and you might want to investigate your producer setup and try to figure out if there is the possibility that your producer puts and invalid timestamp (even if this is unlikely -- I have no other idea what the root cause of the problem could be).
Further remarks:
In the next release (0.10.2), handling invalid timestamps gets simplified and Kafka Streams provides more built-in timestamp extractors that handle records with invalid timestamps differently. For example, this allows you to auto-skip records with invalid timestamps instead of raising an error (current behavior). For more details see KIP-93: https://cwiki.apache.org/confluence/display/KAFKA/KIP-93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams

Related

How can I implement a kafka streams retry for a error handler using a state store?

In a scenario where one would want to do retry on deserialization errors (or any kind of error for that matter), how would it be possible to link a state store to the deserialization error handler, so that we could store the offending event and later reprocess it?
I've tried to link a state store to the processorContext in the handler, with no success.
This is based on the suggestion made by #matthias-j-sax here: Kafka Streams - Retrying a message
Additionally, once we do have the event on a state store and we're able to later fetch it using a punctuation, what would a retry mean? Stream it into the initial source topic once again?
I guess I'll answer my own question here... looks like the only possible way is to forward the error message to a child processor and do the additional error processing there.
Potentially store it into a key/value state store and then with a scheduler implement the retry logic.
As for the actual retry, it gets a bit tricky, because if we're doing windowed aggregation with a custom timestamp extractor, we don't want to drop the retried event on the topic with a time that pre-dates the stream time as it will surely be dropped. So it looks like the timestamp needs to be modified before the retry.

is Time Based log Compaction in Kafka based on Wall Clock time or Event time or a mix of Both?

I have been trying to understand how to set time based log compaction, but still can't understand its behavior properly. In particular i am interested in the behavior of log.roll.ms.
What I would like to understand is the following statement taken from the official kafka doc: https://kafka.apache.org/documentation.html#upgrade_10_1_breaking
The log rolling time is no longer depending on log segment create time. Instead it is now based on the timestamp in the messages. More specifically. if the timestamp of the first message in the segment is T, the log will be rolled out when a new message has a timestamp greater than or equal to T + log.roll.ms
in T + log.roll.ms
a) I understand that T is based on the timestamp of the message and therefore can be considered the Event Time. However what is the clock behind log.roll.ms. In kafka stream for instance, when working with Event time it is clear what is the stream, it is the highest timestamp seen. So does the time for log compaction progress with the timestamp of the message and therefore is Event Time, or it progress based on the walk-clock time of the Brokers
I thought it was event time, but then i saw the following talk https://www.confluent.io/kafka-summit-san-francisco-2019/whats-the-time-and-why/ where
#Matthias J. Sax talks about it. From his talks i got confused. It seems indeed that compaction is driven by both Event time T and Walking time

KStreamWindowAggregate 2.0.1 vs 2.5.0: skipping records instead of processing

I've recently upgraded my kafka streams from 2.0.1 to 2.5.0. As a result I'm seeing a lot of warnings like the following:
org.apache.kafka.streams.kstream.internals.KStreamWindowAggregate$KStreamWindowAggregateProcessor Skipping record for expired window. key=[325233] topic=[MY_TOPIC] partition=[20] offset=[661798621] timestamp=[1600041596350] window=[1600041570000,1600041600000) expiration=[1600059629913] streamTime=[1600145999913]
There seem to be new logic in the KStreamWindowAggregate class that checks if a window has closed. If it has been closed the messages are skipped. Compared to 2.0.1 these messages where still processed.
Question
Is there a way to get the same behavior like before? I'm seeing lots of gaps in my data with this upgrade and not sure how to solve this, as previously these gaps where not seen.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Update
While further exploring I indeed see it to be related to the graceperiod in ms. It seems that in my custom timestampextractor (that has the logic to use the timestamp from the payload instead of the normal timestamp), I'm able to see that the incoming timestamp for the expired window warnings indeed is bigger than the 24 hours compared to the event time from the payload.
I assume this is caused by consumer lags of over 24 hours.
The timestamp extractor extract method has a partition time which according to the docs:
partitionTime the highest extracted valid timestamp of the current record's partition˙ (could be -1 if unknown)
so is this the create time of the record on the topic? And is there a way to influence this in a way that my records are no longer skipped?
Compared to 2.0.1 these messages where still processed.
That is a little bit surprising (even if I would need to double check the code), at least for the default config. By default, store retention time is set to 24h, and thus in 2.0.1 older messages than 24h should also not be processed as the corresponding state got purged already. If you did change the store retention time (via Materialized#withRetention) to a larger value, you would also need to increase the window grace period via TimeWindows#grace() method accordingly.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Not sure what you mean by this or how you actually do this? The old and new logic are similar with regard to how a long a window is stored (retention time config). The new part is the grace period that you can increase to the same value as retention time if you wish).
About "partition time": it is computed base on whatever TimestampExtractor returns. For your case, it's the max of whatever you extracted from the message payload.

Kafka Streams topology with windowing doesn't trigger state changes

I am building the following Kafka Streams topology (pseudo code):
gK = builder.stream().gropuByKey();
g1 = gK.windowedBy(TimeWindows.of("PT1H")).reduce().mapValues().toStream().mapValues().selectKey();
g2 = gK.reduce().mapValues();
g1.leftJoin(g2).to();
If you notice, this is a rhomb-like topology that starts at single input topic and ends in the single output topic with messages flowing through two parallel flows that eventually get joined together at the end. One flow applies (tumbling?) windowing, the other does not. Both parts of the flow work on the same key (apart from the WindowedKey intermediately introduced by the windowing).
The timestamp for my messages is event-time. That is, they get picked from the message body by my custom configured TimestampExtractor implementation. The actual timestamps in my messages are several years to the past.
That all works well at first sight in my unit tests with a couple of input/output messages and in the runtime environment (with real Kafka).
The problem seems to come when the number of messages starts being significant (e.g. 40K).
My failing scenario is following:
~40K records with the same
key get uploaded into the input topic first
~40K updates are
coming out of the output topic, as expected
another ~40K records
with the same but different to step 1) key get uploaded into the
input topic
only ~100 updates are coming out of the output topic,
instead of expected new ~40K updates. There is nothing special to
see on those ~100 updates, their contents seems to be right, but
only for certain time windows. For other time windows there are no
updates even though the flow logic and input data should definetly
generate 40K records. In fact, when I exchange dataset in step 1)
and 3) I have exactly same situation with ~40K updates coming from
the second dataset and same number ~100 from the first.
I can easily reproduce this issue in the unit tests using TopologyTestDriver locally (but only on bigger numbers of input records).
In my tests, I've tried disabling caching with StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG. Unfortunately, that didn't make any difference.
UPDATE
I tried both, reduce() calls and aggregate() calls instead. The issue persists in both cases.
What I'm noticing else is that with StreamsConfig.TOPOLOGY_OPTIMIZATION set to StreamsConfig.OPTIMIZE and without it, the mapValues() handler gets called in debugger before the preceding reduce() (or aggregate()) handlers at least for the first time. I didn't expect that.
Tried both join() and leftJoin() unfortunately same result.
In debugger the second portion of the data doesn't trigger reduce() handler in the "left" flow at all, but does trigger reduce() handler in the "right" flow.
With my configuration, if the number or records in both datasets is 100 in each, the problem doesn't manifests itself, I'm getting 200 output messages as I expect. When I raise the number to 200 in each data set, I'm getting less than 400 expected messages out.
So, it seems at the moment that something like "old" windows get dropped and the new records for those old windows get ignored by the stream.
There is window retention setting that can be set, but with its default value that I use I was expecting for windows to retain their state and stay active for at least 12 hours (what exceeds the time of my unit test run significantly).
Tried to amend the left reducer with the following Window storage config:
Materialized.as(
Stores.inMemoryWindowStore(
"rollup-left-reduce",
Duration.ofDays(5 * 365),
Duration.ofHours(1), false)
)
still no difference in results.
Same issue persists even with only single "left" flow without the "right" flow and without join(). It seems that the problem is in the window retention settings of my set up. Timestamps (event-time) of my input records span 2 years. The second dataset starts from the beginning of 2 years again. this place in Kafka Streams makes sure that the second data set records get ignored:
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryWindowStore.java#L125
Kafka Streams Version is 2.4.0. Also using Confluent dependencies version 5.4.0.
My questions are
What could be the reason for such behaviour?
Did I miss anything in my stream topology?
Is such topology expected to work at all?
After some debugging time I found the reason for my problem.
My input datasets contain records with timestamps that span 2 years. I am loading the first dataset and with that the "observed" time of my stream gets set to the maximum timestamp from from input data set.
The upload of the second dataset that starts with records with timestamps that are 2 years before the new observed time causes the stream internal to drop the messages. This can be seen if you set the Kafka logging to TRACE level.
So, to fix my problem I had to configure the retention and grace period for my windows:
instead of
.windowedBy(TimeWindows.of(windowSize))
I have to specify
.windowedBy(TimeWindows.of(windowSize).grace(Duration.ofDays(5 * 365)))
Also, I had to explicitly configure reducer storage settings as:
Materialized.as(
Stores.inMemoryWindowStore(
"rollup-left-reduce",
Duration.ofDays(5 * 365),
windowSize, false)
)
That's it, the output is as expected.

Avoid Data Loss While Processing Messages from Kafka

Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any
exception/errors during processing the messages.
My use case is as below.
a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.
b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.
c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the
same file.
Q-1) Is writing to file the only option to avoid data loss ?
Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor
is build, it might be reading the messages from the same file while another application is trying to write to the file.
ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.
I've run into something similar before. So, diving straight into your questions:
Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.
I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.
Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.
One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.
EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.
I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.
I hope this helps!
If you don't commit the consumed message before writing to the database, then nothing would be lost while Kafka retains the message. The tradeoff of that would be that if the consumer did commit to the database, but a Kafka offset commit fails or times out, you'd end up consuming records again and potentially have duplicates being processed in your service.
Even if you did write to a file, you wouldn't be guaranteed ordering unless you opened a file per partition, and ensured all consumers only ran on a single machine (because you're preserving state there, which isn't fault-tolerant). Deduplication would still need handled as well.
Also, rather than write your own consumer to a database, you could look into Kafka Connect framework. For validating a message, you can similarly deploy a Kafka Streams application to filter out bad messages from an input topic out into a topic to send to the DB