Audit trail for TTL delete operations in AWS DocumentDB? - aws-documentdb

AWS DocumentDB support automatic deletion of documents based on a TTL index according to the documentation: https://docs.aws.amazon.com/documentdb/latest/developerguide/how-it-works.html#how-it-works.ttl-deletes
As far as I understand the audit functionality does not track changes on a document level: https://docs.aws.amazon.com/documentdb/latest/developerguide/event-auditing.html
My question is if these deletes can be tracked somehow or if they are logged somewhere in AWS?
Right now the only way to solve this looks like implementing a change stream listener in application code and log deletes from there.

DocumentDB currently does not support auditing DML operations such as TTL deletes. Implementing change stream listener to log deletes is the right solution for now.
-Meet

Related

Records updated in Compass keep reverting

I have a MongoDB instance hosted on AWS DocumentDB. There is only one node in the replica set, and this is MongoDB 4.0.0 Community edition.
Twice now I've updated records in Compass and clicked the "Update" button. I've confirmed that the change was made. A few hours later, the change reverts.
From my research, this is typically caused by a MongoDB rollback. But everything I've read says that rollbacks typically occur when the secondary databases associated with a replica set are out of sync with the primary. But I don't have secondary databases.
Can anyone provide any insight - I'm not sure where else to look or what else to research.
Edit to add: Also, is this likely to be a hosting problem (AWS DocumentDB) or a database problem directly?
All writes on Amazon DocumentDB are durable, write concern majority by default and can't be changed. There's also no rollback mechanism that would cause the database server to revert to a previous state. You must have other client or application that is making other update and changing the document.
Try enabling the profiler, or, probably better, enable change streams and watch the changes to identify what's making the change.

Updating AWS Elasticsearch cluster settings

By default in Elasticsearch, the maximum number of open scrolls is 500 but I need to increase this number. There s no problem in updating "search.max_open_scroll_context" in local machine but AWS Elasticsearch has not allowed to make changes.
While trying to update with answer given in this thread configure-search-max-open-scroll-context, the response is: {"Message":"Your request: '/_cluster/settings' payload is not allowed."} while I can perform such operation in my local Elasticsearch but AWS Elasticsearch doesn't seems to allow such operation. Does anyone has answer to this for AWS Elasticsearch or have faced similar?
This is restricted in AWS ES for customer end.
You need to reach out to AWS Support Team for this. Just let them know the value of "search.max_open_scroll_context" that you are looking for and they will update it from the backend.
Here the link to AWS-supported operations on elasticsearch.
Currently, AWS doesn't support updating "search.max_open_scroll_context" as of now. You can definitely contact AWS support to increase the scroll context count. Alternatively, you can use Search-After API instead of scroll.

firestore, do I need to enable persistence in order to trigger listener locally?

according to this video, at 8:58, Todd say:
when you write to document, the change is made locally and trigger any
real-time listeners on that document
pretty convenient for UI update
my question is, do I need to enable persistence in order to trigger listener locally?
because it is not enabled on web by default due to security issue and I don't feel like want to enable it
The Firestore SDK always fires events for operations that are made in the client. You don't need to enable disk persistence for that.

MongoDB/Spring: Subscribing to collection changes

I'm working with a Spring Boot app. I'm trying to implement callback-based event notification for collection modifications in a MongoDB. I'm running out of ideas, as I have tried the following:
Classic Polling - Redundant, as the existing implementation is a REST endpoint that's polled by the UI, where it queries data.
Tailable Cursors - Requires a collection to be capped, which is likely a limitation that won't suffice for a database with a very high storage forecast.
Change Streams - I got a runtime exception stating that the storage engine doesn't support 'Majority Read Concern'.
collection.watch(asList(Aggregates.match(Filters.in("operationType", asList("insert", "update"))))).forEach(printBlock);
I'm not authorizated to view the engine configuration, but I'm assuming if the DBA can't change the storage engine to wiredTiger, than I can't use change streams. Is this correct? Are there other solutions? How about spring's mongodb-reactive API? I was under the impression the API still depends on tailable cursors or change streams.

Event logging for auditing with replay

I need to implement an auditing log for GDPR compliance so that we have a record of every consent given or revoked (an event) per user of our system. It has to store how & when it happened alongside things like what the wording of the consent actually was at the time.
So that we can recover from a backup restore, this log will be stored separately from our main DB. We will then need to be able to update the state of the user consent so that it accurately reflects the event log (i.e. the last known value (true/false) of each consent question per user)
I could simply do this using a second postgres instance (our main DB is postgres) with a single table to store the information and then some simple application code to log each event as well as update the main DB. There could also be some simple application logic to find the last known states of each consent from the event log and update the master DB.
To me it seems like a bit of overkill using postgres to store this info? though adding a new technology to store this also seems overkill. Are there any technologies that are more suitable for this sort of thing? It sounds a lot like Event Sourcing to me.
If you're already running postgres, it doesn't seem like overkill, given that it needs to be online and queryable. Something like kafka is often a natural fit for this kind of problem, but that's even more overkill.
This bears a passing resemblance to event sourcing, but on a really small scale. Event sourcing usually means that all your data is expressed in terms of events, and replayed from beginning to end to materialize the current state.
Could you elaborate on this?:
So that we can recover from a backup restore, this log will be stored separately from our main DB.
Doesn't your main database recover from a backup / restore?