I'm developing a stream with webflux from a mongodb collection with spring boot and spring data mongodb with tailable cursors.
The stream is working when the collection has 1 document or more due to you can get the cursor. The issue is that I want to open the stream with an empty collection, since I want to stream every document in the collection.
I've been reading the docs and it's supposed to be correct:
https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#tailable-cursors
Tailable cursors may become dead, or invalid, if either the query returns no match or the cursor returns the document at the “end” of the collection and the application then deletes that document
I'm evaluating the best way to achieve this (open the stream from an empty collection) with spring boot and flux but I would rather to know if there is some idea or workaround.
Thank you.
Indeed, even "find all" on empty capped collections is considered as no match and the cursor is dead.
reactiveMongoOperations.tail(new Query(), Event.class) returns a dead cursor, so does the annotated repository flavor.
Spring docs just duplicate mongo docs, which state
Tailable cursors may become dead, or invalid, if either:
the query returns no match.
the cursor returns the document at the “end” of the collection and then the application > deletes that document.
Only workaround seems to be an initial dummy entry inserted prior subscription.
Related
I am curious about the benefits and drawbacks of using a tailable cursor versus a change stream with filter for insert operations over a capped collection both in usability and performance-wise.
I have a capped collection that contains update objects. The only possible operation for this collection is insertion of new updates(as new records) which I need to relay to my application backend and distribute through SSE/WebSockets in real-time. I'll use a single parameter in the query for subscription and it is a timestamp which is part of the object's properties. The collection has an index on that field. I'll also do some basic filtering over the newly added records, so the aggregation framework of Change Streams would be helpful.
I've read: What is the difference between a changeStream and tailable cursor in MongoDB which summarizes the differences between tailable cursors and change streams in general.
Also, the mongodb article on Capped Collections just states the Tailable cursor approach and the article on Tailable cursors state that they do not use indexes and you should use a normal cursor to fetch the results.
In short, I need a reliable stream of collection updates based entirely on the insertion of new records in the collection. Which approach should I go for?
Tailable cursors and capped collections were created to support copying the oplog from a primary node to a secondary node. For any other activity the recommended approach is a ChangeStream. Change streams are integrated with the MongoDB authentication model and a change stream can be take at the collection, database, cluster and sharded cluster level. Tailable cursors only work on a single collection and are not resumable. Change streams are supported by all official MongoDB Drivers.
I am creating a lot of documents, and I am concerned that half of the server bandwidth is being spent on returning those new documents back to the caller.
I don't want the server to return the document to me. I just want to get an acknowledgement once the document has been saved.
I have been able to address this concern when updating a document, by using Model.updateOne() instead of Model.findOneAndUpdate().
But how can I do the same when creating a document?
So far I have tried:
Model.create(docData)
new Model(docData).save()
Model.collection.insert(docData)
Model.collection.insertMany([docData])
but in every case, mongoose (or the mongodb driver) returns a document to me.
I am not sure if MongoDB is actually sending the document back over the network, or if it is just sending back the new _id and that is being appended to the original data by the driver. But I fear the former.
How can I save the document without MongoDB sending it back to me?
According to mongoose document of insertMany it says that the result is an insertWriteOpCallback which, according to the documentation of mongodb driver says that it return the inserted documents.
In other hands the mongodb documentation about insert tells us about a WriteResult object that is not returning any document.
The answer is that mongodb-driver is adding the inserted document into it's response.
What you can do is to use the method bulkWrite from mongodb-native driver, which allow you to perform an insert and get the mongodb response only (so without documents).
I want to intercept chunk migration events of a particular (sharded) collection.
The changeLogs collection (from the Config server database) is not capped and we can't define a tailable cursor over it. One solution could be to copy the events of interest from the changelog collection into a capped one and query it with a tailable cursor. I would like to know if there is a better way to do this.
What is
Meteor.Collection
and
Meteor.Collection.Cursor
?
How does these two related to each other? Did:
new Meteor.Collection("name")
create a MONGODB collection with the parameter name?
Did new Meteor.Collection("name") create a MONGODB collection with the parameter name?
Not exactly. A Meteor.Collection represents a MongoDB collection that may or may not exist yet, but the actual MongoDB collection isn't actually created until you insert a document.
A Meteor.Collection.Cursor is a reactive data source that represents a changing subset of documents that exist within a MongoDB collection. This subset of documents is specified by the selector and options arguments you pass to the Meteor.Collection.find(selector, options) method. This find() method returns the cursor object. I think the Meteor Docs explain cursors well:
find returns a cursor. It does not immediately access the database or return documents. Cursors provide fetch to return all matching documents, map and forEach to iterate over all matching documents, and observe and observeChanges to register callbacks when the set of matching documents changes.
Collection cursors are not query snapshots. If the database changes between calling Collection.find and fetching the results of the cursor, or while fetching results from the cursor, those changes may or may not appear in the result set.
Cursors are a reactive data source. The first time you retrieve a cursor's documents with fetch, map, or forEach inside a reactive computation (eg, a template or autorun), Meteor will register a dependency on the underlying data. Any change to the collection that changes the documents in a cursor will trigger a recomputation. To disable this behavior, pass {reactive: false} as an option to find.
The reactivity of cursors is important. If I have a cursor object, I can retrieve the current set of documents it represents by calling fetch() on it. If the data changes in between calls, the fetch() method will actually return a different array of documents. Many things in Meteor natively understand the reactivity of cursors. This is why we can return a cursor object from a template helper function:
Template.foo.documents = function() {
return MyCollection.find(); // returns a cursor object, rather than an array of documents
};
Behind the scenes, Meteor's templating system knows to call fetch() on this cursor object. When the server sends the client updates telling it that the collection has changed, the cursor is informed of this change, which causes the template helper to be recomputed, which causes the template to be rerendered.
A Meteor.Collection is an object that you would define like this:
var collection = new Meteor.Collection("collection");
This object then lets you store data in your mongo database. Note just defining a collection this way does not create a collection in your mongo database. The colleciton would be created after you insert a document in.
So in this way you would not have a collection called name until you insert a document to it.
A cursor is the result of a .find() operation:
var cursor = collection.find()
You may have 1000s of documents, the cursor lets you go through them, one by one, without having to load all of them into your server's RAM.
You can then loop through using forEach, or use some of the other operations as specified in the docs : http://docs.meteor.com/#meteor_collection_cursor
A Cursor is also a reactive data source on the client, so if data changes, you can use the same query to update your DOM.
As Neil mentions its also worthwhile knowing Mongo is a NoSQL database. This means you don't have to really create tables/collections. You would just define a collecction like above, then insert a document to it. This way the collection would be created if it didn't exist. If it already existed, it would be inserted into that collection instead.
Browsing your local database
You don't really need to concern yourself with MongoDB until you are publishing your app, you can just interact with it using Meteor alone. In case you want to have a look at what it looks like:
If you want to have a look at your Mongo database. While meteor is running, in the same directory use meteor mongo to bring up a mongo shell, or use a tool like robomongo (Gui tool) to connect to localhost on port 3002 to have a peek at what your mongo database looks like.
I have a python script which needs to do some actions whenever a new object is added in a collection
is there any efficient method to poll for addition of new object in mongodb collection?
Have a look at mongodb 'tailable' cursor.
http://www.mongodb.org/display/DOCS/Tailable+Cursors
Use "find" method of your python driver with "tailable" = true,
it will keep realtime track of additions in the database just like "tail -f" of a file in linux .
Tailable is FALSE by default.
http://api.mongodb.org/python/current/api/pymongo/collection.html
find([spec=None[, fields=None[, skip=0[, limit=0[, timeout=True[, snapshot=False[, tailable=False[, sort=None[, max_scan=None[, as_class=None[, slave_okay=False[, **kwargs]]]]]]]]]]]])
tailable (optional): the result of this find call will be a tailable cursor - tailable cursors aren’t closed when the last data is retrieved but are kept open and the cursors location marks the final document’s position. if more data is received iteration of the cursor will continue from the last document received. For details, see the tailable cursor documentation.
use a separate thread to poll the data. It is less efficient but works..
the alternative solution is to use twisted and its async driver but you still need to poll
the data.