When using wall clock time, is the timestamp passed to a punctuator milliseconds since UNIX epoch? (Kafka) - apache-kafka

I'm using wall clock time and want to compare the timestamp passed to a punctuator's punctuate method to a UNIX timestamp stored in a record field.
Will the timestamp passed to the punctuator always represent milliseconds since UNIX epoch? It'd be helpful to know what Java code is being used to get wall clock time?

Yes, for WALL_CLOCK_TIME punctuation the passed timestamp will be the system timestamp, i.e., UNIX epoch ms timestamp, returned by System.currentTimeMillis().

Related

Implementing API throttling with RDB

I would like to implement this API throttling:
A user can only execute the operation once per minute (once executed, following requests will be rejected for 1 minute)
The expected total number of requests from all users is around 2 per second.
I am using PostgreSQL 14.5.
I guess I will need a table for exclusive processing. What kind of SQL/algorithm should I use?
You could store the latest accepted timestamp in a column. Every time a request is processed, the code could check if the interval between the current timestamp and the last accepted timestamp is less than a minute and reject if so.

Difference between eventTimeTimeout and processingTimeTimeout in mapGroupsWithState

What is the difference between eventTimeTimeout and processingTimeTimeout in mapGroupsWithState?
Also, is possible to make a state expire after every 10 min and if the data for that particular key arrives after 10 min the state should be maintained from the beginning?
In short:
processing-based timeouts rely on the time/clock of the machine your job is running. It is independent of any timestamps given in your data/events.
event-based timeouts rely on a timestamp column within your data that serves as the event time. In that case you need to declare this timestamp as a Watermark.
More details are available in the Scala Docs on the relevant class
GroupState:
With ProcessingTimeTimeout, the timeout duration can be set by calling GroupState.setTimeoutDuration. The timeout will occur when the clock has advanced by the set duration. Guarantees provided by this timeout with a duration of D ms are as follows:
Timeout will never be occur before the clock time has advanced by D ms
Timeout will occur eventually when there is a trigger in the query (i.e. after D ms). So there is a no strict upper bound on when the timeout would occur. For example, the trigger interval of the query will affect when the timeout actually occurs. If there is no data in the stream (for any group) for a while, then their will not be any trigger and timeout function call will not occur until there is data.
Since the processing time timeout is based on the clock time, it is affected by the variations in the system clock (i.e. time zone changes, clock skew, etc.).
With EventTimeTimeout, the user also has to specify the event time watermark in the query using Dataset.withWatermark(). With this setting, data that is older than the watermark are filtered out. The timeout can be set for a group by setting a timeout timestamp usingGroupState.setTimeoutTimestamp(), and the timeout would occur when the watermark advances beyond the set timestamp. You can control the timeout delay by two parameters - (i) watermark delay and an additional duration beyond the timestamp in the event (which is guaranteed to be newer than watermark due to the filtering). Guarantees provided by this timeout are as follows:
Timeout will never be occur before watermark has exceeded the set timeout.
Similar to processing time timeouts, there is a no strict upper bound on the delay when the timeout actually occurs. The watermark can advance only when there is data in the stream, and the event time of the data has actually advanced.
"Also, is possible to make a state expire after every 10 min and if the data for that particular key arrives after 10 min the state should be maintained from the beginning?"
This is happening automatically when using mapGroupsWithState. You just need to make sure to actually remove the state after the 10 minutes.

Calculating the timezone offset of a future date knowing only the current offset

So 3rd March 2021 in US Eastern timezone it is 5 hours from UTC, and the offset is 300 minutes.
For a future date, 01-Apr-2021 the clocks will change and the offset will be different.
But if the code only knows the offset and doesnt know which timezone the original offset came from, am I correct that it would be impossible to determine the future dates timezone offset.
Yes, you are correct. There are many other time zones that also use the UTC-5 offset. Some of them do not switch for DST at all. Some of them switch for DST at a later or earlier date. Some of them, UTC-5 is the DST offset and thus they switch back to UTC-6 instead of to UTC-4.
To have any understanding of how an offset will change over time, you need to identify the time zone, not just the offset. Preferably, you would use an IANA time zone identifiers, such as "America/New_York".
See the timezone tag wiki, in particular the section titled "Time Zone != Offset".
Additionally, note that even with a time zone identified, future offsets are always just an estimate. If a government changes their mind about what the time zone or DST rules are between the date your time zone data was last updated and the date such a change goes into effect, then the offset you determined might be incorrect under the new rules. There's not much you can do about that, other than to not speculate too far into the future, and to urge governments not to make short-notice changes.

in Linux, when reading an I2C-based RTC, who handles counter carry-over conditions?

When reading multiple bytes from an I2C-based RTC, it seems that it is possible that while reading each byte, one of the values may increment.
For instance, if the time is:
2014-12-31 23:59:59
as you're reading this value, the time may roll-over to
2015-01-01 00:00:00
so you may actually read:
2015-01-01 23:59:59
(depending on which values you read first).
So, is it the rtc driver's responsibility to ensure a reliable read?
Reading the datasheet for the DS1337, page 9 states:
When reading or writing the time and date registers, secondary (user)
buffers are used to prevent errors when the internal registers update.
When reading the time and date registers, the user buffers are
synchronized to the internal registers on any start or stop and when
the register pointer rolls over to zero.
Therefore, if reading (or writing) occurs with a single I2C operation (without wrapping around), the RTC device guarantees that everything is synchronized.
[I haven't examined the datasheets for any other devices, but I assume they all work similarly.]

RTSP RTP client streaming, timestamp, live555

I have an IP camera that is located in a different country (with a different time zone) and that has it's own date-time values applied, (for example:~2012-04-16 11:30:00) then the one my PC is located at. (so my PC's time for example is ~2012-14-16 06:10:00)
My purpose:
When streaming, i need to get this date-time value that is set in camera ("11:30:00")
(I'm not interested in a current local time of my PC).
Is there any way to calculate camera's date-time value from RTP's timestamp?
Is there any other approach?
I'm using a Live555 library, and for frame's date-time retrieval I was using a "presentation time" value, but this gives me a local time of my PC (not the time that is set in my camera)
So I'm stuck here..
Read the RFC on RTP packet layout
Note that the Timestamp is in the RTP packet at 0x32. This is the timestamp from the camera that encoded the stream.
For a CPP implementation, processing RTP packet and headers including the timestamp , see the link.
Java implementation of RTP packet handler here