How to intercept chunk migration in mongoDB - mongodb

I want to intercept chunk migration events of a particular (sharded) collection.
The changeLogs collection (from the Config server database) is not capped and we can't define a tailable cursor over it. One solution could be to copy the events of interest from the changelog collection into a capped one and query it with a tailable cursor. I would like to know if there is a better way to do this.

Related

Tailable cursor vs Change streams for notifications of insert operations in capped collections

I am curious about the benefits and drawbacks of using a tailable cursor versus a change stream with filter for insert operations over a capped collection both in usability and performance-wise.
I have a capped collection that contains update objects. The only possible operation for this collection is insertion of new updates(as new records) which I need to relay to my application backend and distribute through SSE/WebSockets in real-time. I'll use a single parameter in the query for subscription and it is a timestamp which is part of the object's properties. The collection has an index on that field. I'll also do some basic filtering over the newly added records, so the aggregation framework of Change Streams would be helpful.
I've read: What is the difference between a changeStream and tailable cursor in MongoDB which summarizes the differences between tailable cursors and change streams in general.
Also, the mongodb article on Capped Collections just states the Tailable cursor approach and the article on Tailable cursors state that they do not use indexes and you should use a normal cursor to fetch the results.
In short, I need a reliable stream of collection updates based entirely on the insertion of new records in the collection. Which approach should I go for?
Tailable cursors and capped collections were created to support copying the oplog from a primary node to a secondary node. For any other activity the recommended approach is a ChangeStream. Change streams are integrated with the MongoDB authentication model and a change stream can be take at the collection, database, cluster and sharded cluster level. Tailable cursors only work on a single collection and are not resumable. Change streams are supported by all official MongoDB Drivers.

Reactive tailable cursor is closed when a mongodb collection is empty

I'm developing a stream with webflux from a mongodb collection with spring boot and spring data mongodb with tailable cursors.
The stream is working when the collection has 1 document or more due to you can get the cursor. The issue is that I want to open the stream with an empty collection, since I want to stream every document in the collection.
I've been reading the docs and it's supposed to be correct:
https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#tailable-cursors
Tailable cursors may become dead, or invalid, if either the query returns no match or the cursor returns the document at the “end” of the collection and the application then deletes that document
I'm evaluating the best way to achieve this (open the stream from an empty collection) with spring boot and flux but I would rather to know if there is some idea or workaround.
Thank you.
Indeed, even "find all" on empty capped collections is considered as no match and the cursor is dead.
reactiveMongoOperations.tail(new Query(), Event.class) returns a dead cursor, so does the annotated repository flavor.
Spring docs just duplicate mongo docs, which state
Tailable cursors may become dead, or invalid, if either:
the query returns no match.
the cursor returns the document at the “end” of the collection and then the application > deletes that document.
Only workaround seems to be an initial dummy entry inserted prior subscription.

Subscribing to a collection - that is not capped

I am aware that with a combination of capped collections and tailable cursors Mongo clients can subscribe to additions to the collection. This however introduces a few limitations:
When the collection is full, the oldest members are removed.
Existing members cannot be changed if they are not the same size. Cannot change the size of a document in a capped collection
Is there something more generic (such as RDBMS triggers) I can employ to listen to changes of all sorts happening to a Mongo collection?

mongodb read, copy, process and delete

I have to write an app that constantly polls a mongodb db collection in a given db. If it finds documents it reads them copies them to another db, does some extra processing and deletes them from the original db.
What is the most efficient way to implement this? What are the best practices?
Is it better to process one doc at a time: read one document, copy the document then delete it
or is it better to read all documents, copy all of them, then delete all of them?
What would be the best way to handle failures in the middle of one of these read, write deletes?
Bulk reads, inserts and deletes are almost always more performant than single document actions. But try to limit it to a maximum number of documents, e.g. in our setup 500 seemed to be optimal.
For handling errors, you could use the following pseudo transaction pattern:
findAndModify while setting "state":"pending" for all read documents
process documents
bulk insert
delete all documents with "state":"pending"
If something goes wrong in the processing part or the bulk insert, you can unlock all locked documents and try again.
A more elaborate example of these kind of psuedo transactions can be found in the MongoDB Tutorial:
http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/

Reset the MongoDB oplog

I'm developing an application with a Elastich Search and MongoDB. The elastic search is using the MongoDB oplog to index the content via a component called a river.
Is it possible to reset the MongoDB oplog so that all previous entries dissapear?
The oplog is for replication and shouldn't be tampered with.
The oplog is a capped collection:
You cannot delete documents from a capped collection. To remove all
records from a capped collection, use the ‘emptycapped’ command. To
remove the collection entirely, use the drop() method.
http://docs.mongodb.org/manual/core/capped-collections/
You might want to use a tailable cursor and tail the oplog in your river.
If your app is going to read the oplog continuously, it would need the ability to start at a particular timestamp (ts) value. Without that ability, if the app (or mongod) had to be restarted for any reason, it would have to re-process all the oplog entries that it had already processed but were still in the oplog. If the app does have the ability to start at a ts value, then just query the oplog for the max value of ts, and use that as the starting point.