MQTT Mosquitto - Persistence file won't autosave(update) after reached the time interval - persistence

I have set the time interval for autosave to 30 seconds but the broker won't auto update the persistence file even though it reached the time interval.
This are my settings:
autosave_interval 30
autosave_on_changes true
persistence true
persistence_file mosquitto.db
persistence_location C:\mosquitto\persistence\
I have to manually close the broker in order get the persistence file updated. Is there any other option I need to turned on? or is there any condition?
Thank you in advance.

Old question but
If autosave_on_changes is enabled, the autosave_interval is
interpreted as changes in retained, received and queued messages
instead of seconds. The default database file name can be changed with
persistence_file.
What I understand from this is, since your autosave_on_changes is set to true, it'll only save when you have 30 messages queued.

Related

Difference between eventTimeTimeout and processingTimeTimeout in mapGroupsWithState

What is the difference between eventTimeTimeout and processingTimeTimeout in mapGroupsWithState?
Also, is possible to make a state expire after every 10 min and if the data for that particular key arrives after 10 min the state should be maintained from the beginning?
In short:
processing-based timeouts rely on the time/clock of the machine your job is running. It is independent of any timestamps given in your data/events.
event-based timeouts rely on a timestamp column within your data that serves as the event time. In that case you need to declare this timestamp as a Watermark.
More details are available in the Scala Docs on the relevant class
GroupState:
With ProcessingTimeTimeout, the timeout duration can be set by calling GroupState.setTimeoutDuration. The timeout will occur when the clock has advanced by the set duration. Guarantees provided by this timeout with a duration of D ms are as follows:
Timeout will never be occur before the clock time has advanced by D ms
Timeout will occur eventually when there is a trigger in the query (i.e. after D ms). So there is a no strict upper bound on when the timeout would occur. For example, the trigger interval of the query will affect when the timeout actually occurs. If there is no data in the stream (for any group) for a while, then their will not be any trigger and timeout function call will not occur until there is data.
Since the processing time timeout is based on the clock time, it is affected by the variations in the system clock (i.e. time zone changes, clock skew, etc.).
With EventTimeTimeout, the user also has to specify the event time watermark in the query using Dataset.withWatermark(). With this setting, data that is older than the watermark are filtered out. The timeout can be set for a group by setting a timeout timestamp usingGroupState.setTimeoutTimestamp(), and the timeout would occur when the watermark advances beyond the set timestamp. You can control the timeout delay by two parameters - (i) watermark delay and an additional duration beyond the timestamp in the event (which is guaranteed to be newer than watermark due to the filtering). Guarantees provided by this timeout are as follows:
Timeout will never be occur before watermark has exceeded the set timeout.
Similar to processing time timeouts, there is a no strict upper bound on the delay when the timeout actually occurs. The watermark can advance only when there is data in the stream, and the event time of the data has actually advanced.
"Also, is possible to make a state expire after every 10 min and if the data for that particular key arrives after 10 min the state should be maintained from the beginning?"
This is happening automatically when using mapGroupsWithState. You just need to make sure to actually remove the state after the 10 minutes.

KStreamWindowAggregate 2.0.1 vs 2.5.0: skipping records instead of processing

I've recently upgraded my kafka streams from 2.0.1 to 2.5.0. As a result I'm seeing a lot of warnings like the following:
org.apache.kafka.streams.kstream.internals.KStreamWindowAggregate$KStreamWindowAggregateProcessor Skipping record for expired window. key=[325233] topic=[MY_TOPIC] partition=[20] offset=[661798621] timestamp=[1600041596350] window=[1600041570000,1600041600000) expiration=[1600059629913] streamTime=[1600145999913]
There seem to be new logic in the KStreamWindowAggregate class that checks if a window has closed. If it has been closed the messages are skipped. Compared to 2.0.1 these messages where still processed.
Question
Is there a way to get the same behavior like before? I'm seeing lots of gaps in my data with this upgrade and not sure how to solve this, as previously these gaps where not seen.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Update
While further exploring I indeed see it to be related to the graceperiod in ms. It seems that in my custom timestampextractor (that has the logic to use the timestamp from the payload instead of the normal timestamp), I'm able to see that the incoming timestamp for the expired window warnings indeed is bigger than the 24 hours compared to the event time from the payload.
I assume this is caused by consumer lags of over 24 hours.
The timestamp extractor extract method has a partition time which according to the docs:
partitionTime the highest extracted valid timestamp of the current record's partition˙ (could be -1 if unknown)
so is this the create time of the record on the topic? And is there a way to influence this in a way that my records are no longer skipped?
Compared to 2.0.1 these messages where still processed.
That is a little bit surprising (even if I would need to double check the code), at least for the default config. By default, store retention time is set to 24h, and thus in 2.0.1 older messages than 24h should also not be processed as the corresponding state got purged already. If you did change the store retention time (via Materialized#withRetention) to a larger value, you would also need to increase the window grace period via TimeWindows#grace() method accordingly.
The aggregate function that I'm using already deals with windowing and as a result with expired windows. How does this new logic relate to this expiring windows?
Not sure what you mean by this or how you actually do this? The old and new logic are similar with regard to how a long a window is stored (retention time config). The new part is the grace period that you can increase to the same value as retention time if you wish).
About "partition time": it is computed base on whatever TimestampExtractor returns. For your case, it's the max of whatever you extracted from the message payload.

Kafka Connect fetch.max.wait.ms & fetch.min.bytes combined not honored?

I'm creating a custom SinkConnector using Kafka Connect (2.3.0) that needs to be optimized for throughput rather than latency. Ideally, what I want is:
Batches of ~ 20 megabytes or 100k records whatever comes first, but if message rate is low, process at least every minute (avoid small batches, but minimum MySinkTask.put() rate to be every minute).
This is what I set for consumer settings in an attempt to accomplish it:
consumer.max.poll.records=100000
consumer.fetch.max.bytes=20971520
consumer.fetch.max.wait.ms=60000
consumer.max.poll.interval.ms=120000
consumer.fetch.min.bytes=1048576
I needs this fetch.min.bytes setting, or else MySinkTask.put() is called for multiple times per second despite the other settings...?
Now, what I observe in a low-rate situation is that MySinkTask.put() is called with 0 records multiple times and several minutes pass by, until fetch.min.bytes is reached, and then I get them all at once.
I fail to understand so far:
Why fetch.max.wait.ms=60000 is not pushing downwards from the consumer to the put() call of my connector? Shouldn't that have precedence over fetch.min.bytes?
What setting controls the ~ 2x per second call to MySinkTask.put() if fetch.min.bytes=1 (default)? I don't understand why it does that, even the verbose output of the Connect runtime settings don't show any interval below multiples of seconds.
I've double-checked the log output, and the lines INFO o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values: as printed by the Connect Runtime are showing the expected values as I pass with the consumer. prefixed values.
The "process at least every interval" part seems not possible, as the fetch.min.bytes consumer setting takes precedence and Connect does not allow you to dynamically adjust the ConsumerConfig while the Task is running. :-(
Work-around for now is batching in the Task manually; set fetch.min.bytes to 1 (yikes), buffer records in the Task on put() calls, and flush when necessary. This is not very ideal as it infers some overhead for the Connector which I hoped to avoid.
The logic how Connect does a ~ 2x per second batching from its consumer's poll to SinkTask.put() remains a mystery to me, but it's better than being called for every message.

kafka Streams session windows

Hello I am working on kafka session window with inactive time 5 mins. I want some kind of feedback when inactive time is reached and session is drooped for the key.
lets assume I have
(A,1)
record where 'A' is the key. now if i don't get any 'A' key record in 5 mins the session is dropped.
I want to do some operation on end of session lets say (value)*2 for that session. is there any way I can achieve this using Kafka Stream API
Kafka Streams does not drop a session after the gap-time passed. Instead, if will create a new session if another record with the same key arrives after the gap-time passed and maintain both session in parallel. This allows to handle out-of-order data. It could even happen, that two session get merged if an out-of-order data falls into a gap and "connects" both sessions with each other.
Sessions are maintained for 1 day by default. You can change this via SessionWindows#until() method. If a session expires it will be dropped silently. There is no notification. You also need to consider config parameter window.store.change.log.additional.retention.ms:
The default retention setting is Windows#maintainMs() + 1 day. You can override this setting by specifying StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG in the StreamsConfig.
Thus, you want to do react if time passed, you should look into punctuations that allow you to register regular callbacks (some kind of timer) either based on "even time progress" or wall-clock time. This allows you to react if a session is not update for a certain period of time and you think it's "completed".

How to configure message Time-To-Live in MSMQ?

Does anyone know if it is possible to configure the message time-to-live in MSMQ? So that the messages are moved to the dead letter queue once the time-to-live is elapsed. I know there is some default value, but I don't know where it is and how to change it.
Just want to emphasise that I know how to do it programmatically when sending message to a queue. But I need to change it in the MSMQ. Would be great for each queue, but if not possible, for the whole MSMQ.
I found the LongLiveTime parameter in the registry (HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/MSMQ/Parameters/MachineCache/), which is by default set to 345600 (seconds? = 4 days). I changed this value to 30 (seconds) and restarted the machine, but it did not work. More than that, this value was automatically returned back to 345600.
Can it be done?
Thank you
To set the Time-To-Reach-Queue (TTRQ), for a server see Set the default lifetime for messages:
You can use this procedure to set the default lifetime for Message
Queuing messages. The lifetime of a Message Queuing message specifies
the maximum time interval for a message to reach a destination queue.
If this time interval is exceeded before the message reaches the
destination queue then the message is placed in the deadletter queue
if the PROPID_M_JOURNAL property of the message is set to
MQMSG_DEADLETTER.
Membership in \Domain Users, or equivalent, is the minimum
required to complete this procedure.
To set the default lifetime for Message Queuing messages:
Click Start, point to Programs, point to Administrative Tools, and then click Active Directory Sites and Services.
On the View menu, click Show Services Node.
In the console tree, right-click MsmqServices.
Where? - Active Directory Sites and Services/Services/MsmqServices
Click Properties.
On the General page, type a new value and select new units as needed.
Edit:
The only way to set the TTBR - Time-To-Be-Received - is when sending the message, as it includes the time taken for the message to reach the destination queue:
In each hop, Message Queuing subtracts the time elapsed on the
applicable computer from MaxTimeToReceive when it dispatches the
message to the next computer, where a new timer is set. After a
message arrives at the destination queue, MaxTimeToReceive can be used
to find out how much time remains in the time-to-be-received timer.