Mongo changestream timestamp - mongodb

When tailing the oplog, I see a timestamp for each event. Change streams have advantages over tailing the oplog directly, so I'd like to use those. However, I can't find any way of figuring out when a change occurred. This would be problematic if my script went down for a while and then resumed using a resume token.
Is there any way of getting that timestamp?

I can't find any way of figuring out when a change occurred.
Currently (MongoDB v3.6), there is no way to find out the timestamp of an event returned by the server from the receiving end. This is because the cluster time's timestamp is actually embedded into the resume token as a binary format.
There is a ticket to request adding a tool to inspect this resume token SERVER-32283. Feel free to watch/upvote for updates on the ticket.
This would be problematic if my script went down for a while and then resumed using a resume token.
When resuming Change Streams using the resume token, it will resume from that point forward. This is because the token contains the cluster time, and the server is aware when the last operation the token has 'seen'.
You also said down for a while. Change streams is build upon Replica Set Oplog, which also means the resumability nature of change streams is limited by the size of the oplog window.
For example, if the last cached token time was 24 hours ago and the oplog size is only 12 hours, your application will not be able to utilise the change streams resume token. Since you're comparing change streams with tailing the oplog, in this regard both would have had the same potential issue.
If this is a real concern for your use case, please adjust your oplog window size accordingly. i.e. if the receiving client will have a potential downtime greater than the oplog window time.
See also Change Streams Production Recommendations

Related

MongoDB Atlas dedicated cluster: Should I be concerned about 'Restarts in last hour' alerts?

We’re using a standard 3-node Atlas replicaset in a dedicated cluster (M10, Mongo 6.0.3, AWS) and have configured an alert if the ‘Restarts in last hour is’ rule exceeds 0 for any node.
https://www.mongodb.com/docs/atlas/reference/alert-conditions/#mongodb-alert-Restarts-in-Last-Hour-is
We’re seeing this alert fire every now and then and we’re wondering what this means for a node in a dedicated cluster and whether this is something to be concerned about, since I don’t think we have any control over it. Should we should disable this rule or increase the restart threshold?
Thanks in advance for any advice.
(Note I've asked this over at the Mongo community support site also, but haven't received any traction yet so asking here too)
I got an excellent response on my question at the Mongo community support site:
A node restarting is not necessarily a cause for concern. However, you should investigate the cause of the restart itself to better determine if this is an issue or not. You should take a look at your Project Activity Feed to see if you can determine why the nodes are restarting. I understand you have noted this is an M10 cluster so you should have access to the MongoDB logs, you also can check those to try determine the cause of the node restart. If you do not have access to the logs, you can consider working with Atlas in-app chat support to diagnose the issue.
It’s always good to keep the alerts active, as they can indicate a potential problem as soon as they occur. You can consider increasing the restart threshold to reduce alert noise after concluding whether the restarts are expected or not.
In my case, having checked the activity feed I was able to match up all the alerts we were seeing to Mongo version auto-updates on the nodes. We still wanted to keep that so we've increased our alert threshold to fire on >1 restart per hour rather than >0 restart, assuming that auto-updates won't be applied multiple times in the same hour.

Marklogic DLS Document Checkout Timout

Does MarkLogic DLS offer a similar file versioning experience to subversion?
Under Subversion, once the file(document) has been locked, others could not update it anymore, unless the file has been committed (check-in) or released the lock.
However in MarkLogic Library Services (DLS), once the document has been checked out, others could still call dls:document-checkout-update-checkin to update and release the lock. Does it mean it is the developer who should use those dls functions to implement the file lock and unlock mechanism?
I tried to use the timeout parameter in dls:doucment-checkout. However, it seems the document will remain in the checkout status forever. But I do see that parameter when I call 'dls:coument-checkout-status'.
Does it mean that it is the developer who should check the server timestamp together with the initial checkout timestamp and timeout duration to determine whether the file is still in lock status?
If so, I will need to write some XQuery programs and set up a scheduled task in ML to clean up the file checkout daily. Is my above understanding correct?
Per https://docs.marklogic.com/guide/app-dev/dls#id_56448, I believe the timeout is not enforced automatically - i.e. there's no background process in MarkLogic that is periodically inspecting documents to see if they should be automatically checked back in or un-checked out. The timeout appears to be meant to be used by a developer to apply their own logic with it - e.g. allowing a UI to state that "Jane checked this document out and only intended to keep it for 10 minutes, but that was 2 hours ago - would you like to break her checkout?"

General question: Firestore offline persistence and synchronization

I could not find detailed information in the documentation. I have several questions regarding the offline persistence of firestore.
I understood that firestore locally caches everything and syncs back once online. My questions:
If I attach an onCompleteListener to my setDocument method it only fires when the device is online and has network access. But with offline persistence enabled, how can I detect that data has successfully been written to the cache (Is it always successful?!) - I see data is immediatly there without any listener ever triggering.
What if I wrote data to the cache while the device is offline then comes back online and everything gets synched. What if now any sort of error happens (So the onSuccessListener would contain an error, but the persistence cache already has the data). How do I know that offline and online data are ALWAYS in sync once network connection is restored on all devices?
What about race conditions? Lets say two users update a document at the "same time" while the device is offline. What happens once it comes back online?
But the most pressing question is: right now I continue with my programflow when the onSuccessListener fires, but it never does as long as the device is offline (showing an indefinete progress bar forever). I still need to continue with my program (thats why we have offline persistence) - How do I do this?
How can I detect that data has successfully been written to the cache
This is the case when the statement that write the data has completed. If writing to the local cache fails, an exception is thrown from that write statement.
You second point is hard to summarize, but:
Firestore keeps the pending writes separate from the snapshots it returns for local reads, and will update the cached snapshot correctly both for successful and for rejected writes.
If you want to know whether the snapshot you read contains any pending writes, you can check the pendingWrites field in its metadata.
What about race conditions? Let's say two users update a document at the "same time" while the device is offline. What happens once it comes back online?
The last write wins. If that's not what you need, use security rules to enforce your requirements on the server.

Issues with matching service in Cadence

Two days ago, we started presenting some issues with our cadence setup.
The first thing we noticed is the Open workflows were not disappearing from the list once they completed. For example this workflow appears as Open in the list:
But when you click on it, you will see that it’s actually completed:
At the same time this started to happen, we noticed how several workflows would take quite a long time to complete, several of them would stuck in “Schedule” states and never go further from there. After checking the logs, the only error we saw was this:
{"level":"error","ts":"2021-03-06T19:12:04.865Z","msg":"Persistent store operation failure","service":"cadence-matching","component":"matching-engine","wf-task-list-name":"cadence-sys-history-scanner-tasklist-0","wf-task-list-type":1,"store-operation":"create-task","error":"InternalServiceError{Message: CreateTasks operation failed. Error : Request on table cadence.tasks with ttl of 630720000 seconds exceeds maximum supported expiration date of 2038-01-19T03:14:06+00:00. In order to avoid this use a lower TTL, change the expiration date overflow policy or upgrade to a version where this limitation is fixed. See CASSANDRA-14092 for more details.}","wf-task-list-name":"cadence-sys-history-scanner-tasklist-0","wf-task-list-type":1,"number":6300094,"next-number":6300094,"logging-call-at":"taskWriter.go:176","stacktrace":"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\n\t/cadence/common/log/loggerimpl/logger.go:134\ngithub.com/uber/cadence/service/matching.(*taskWriter).taskWriterLoop\n\t/cadence/service/matching/taskWriter.go:176"}
Does somebody have an idea of why this is happening?
The first one is because of visibility sampling being enabled by default(to protect default core DB). You can disable it by configure system.enableVisibilitySampling to false.
But when you doing that, it’s better to separate the visibility and default store into different database cluster so that visibility doesn’t bring down the default(core data model) DB.
see more in https://github.com/uber/cadence/issues/3884
The second is a bug fixed in 0.16.0
It should be resolved if you upgrade server.
See https://github.com/uber/cadence/pull/3627
and https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/recoveringTtlYear2038Problem.html

Firebase - Change data before being sent from cache to server

I am using Firebase for my chat application I am developing with Swift. I have offline persistence enabled, so normally each query is first cached and then sent to the server. What I want to do is, when sending a message, have the message status first set to "Sending" and the time to current time, but when the data is sent to the server, change the status to "Sent" and the time to when the data was sent (because it could be minutes to whatever if there's slow connection or no connection at all). Is this possible using Firebase? If not, any workarounds? Thanks in advance!
In Cloud Firestore you can detect the status of your write operations by:
attaching a completion listener to the write operation
looking at the metadata of document snapshots
For the first step, have a look at the example of writing a document in the documentation. The completion listener there allows you to detect when the write operation is completed.
But if you want to show in the UI for each document you show whether it has pending writes that have not yet been committed on the server, you might be better off looking at the metadata of each document snapshot. As explained in the documentation in events for local changes. While the changes are pending, the snapshot.getMetadata().hasPendingWrites() will return true. Then once the changes are committed on the server, it will return false again.