How to encode/decode mongodb cursor? - mongodb

I need to build a list of "pages" so part of this there will be a cursor. The issue is that I can't find a way to encode(to string) and decode the cursor. Any idea? The Cursor interface has no "encoding" method(there is ID, though undocumented) and there is no way to create a new cursor from a string(or int).
type Cursor interface {
// Get the ID of the cursor.
ID() int64
// Get the next result from the cursor.
// Returns true if there were no errors and there is a next result.
Next(context.Context) bool
Decode(interface{}) error
DecodeBytes() (bson.Reader, error)
// Returns the error status of the cursor
Err() error
// Close the cursor.
Close(context.Context) error
}
Why do I need the cursor encoded?
To provide pagination to the end client through a html or JSON API.

MongoDB does not provide a serializable cursor. Cursor is not serializable. The recommended workaround is to use a range query and sort on a field that generally changes in a consistent direction over time such _id.
function printStudents(startValue, nPerPage) {
let endValue = null;
db.students.find( { _id: { $lt: startValue } } )
.sort( { _id: -1 } )
.limit( nPerPage )
.forEach( student => {
print( student.name );
endValue = student._id;
} );
return endValue;
}
There is a go package minquery that tries to make the cursor query/serialization more convenient. You may find it helpful.

A mongo.Cursor object isn't something you can encode and put away for later use, like what you intend to use it for.
A mongo.Cursor is something you use to iterate over a "live query", a stream of documents. You can't use it to return a batch of documents which you send to your client, and when the client requests more documents (the next page), you decode the stored cursor and continue where you left off. A cursor have a server side resource under the hood, which is kept for 10 minutes (configurable, see cursorTimeoutMillis) or until you close the cursor implicitly. You do not want to keep the cursor "alive" while waiting for the client if he / she needs more documents, especially in an application with large traffic. Your MongoDB would quickly run out of resources. If cursor is closed by timeout, any attempt to read from the cursor will result with error "Cursor not found, cursor id: #####"
The Cursor.Decode() method is not to decode a cursor from some encoded form. It is to decode the next document the cursor designates into a Go value.
That's why there is no magic mongo.NewCursor() or mongo.ParseCursor() or mongo.DecodeCursor() function. A mongo.Cursor is handed to you by executing queries, e.g. with Collection.Find():
func (coll *Collection) Find(ctx context.Context, filter interface{},
opts ...findopt.Find) (Cursor, error)

Related

Efficient paging in MongoDB using mgo

I've searched and found no Go solution to the problem, not with or without using mgo.v2, not on StackOverflow and not on any other site. This Q&A is in the spirit of knowledge sharing / documenting.
Let's say we have a users collection in MongoDB modeled with this Go struct:
type User struct {
ID bson.ObjectId `bson:"_id"`
Name string `bson:"name"`
Country string `bson:"country"`
}
We want to sort and list users based on some criteria, but have paging implemented due to the expected long result list.
To achieve paging of the results of some query, MongoDB and the mgo.v2 driver package has built-in support in the form of Query.Skip() and Query.Limit(), e.g.:
session, err := mgo.Dial(url) // Acquire Mongo session, handle error!
c := session.DB("").C("users")
q := c.Find(bson.M{"country" : "USA"}).Sort("name", "_id").Limit(10)
// To get the nth page:
q = q.Skip((n-1)*10)
var users []*User
err = q.All(&users)
This however becomes slow if the page number increases, as MongoDB can't just "magically" jump to the xth document in the result, it has to iterate over all the result documents and omit (not return) the first x that need to be skipped.
MongoDB provides the right solution: If the query operates on an index (it has to work on an index), cursor.min() can be used to specify the first index entry to start listing results from.
This Stack Overflow answer shows how it can be done using a mongo client: How to do pagination using range queries in MongoDB?
Note: the required index for the above query would be:
db.users.createIndex(
{
country: 1,
name: 1,
_id: 1
}
)
There is one problem though: the mgo.v2 package has no support specifying this min().
How can we achieve efficient paging that uses MongoDB's cursor.min() feature using the mgo.v2 driver?
Unfortunately the mgo.v2 driver does not provide API calls to specify cursor.min().
But there is a solution. The mgo.Database type provides a Database.Run() method to run any MongoDB commands. The available commands and their documentation can be found here: Database commands
Starting with MongoDB 3.2, a new find command is available which can be used to execute queries, and it supports specifying the min argument that denotes the first index entry to start listing results from.
Good. What we need to do is after each batch (documents of a page) generate the min document from the last document of the query result, which must contain the values of the index entry that was used to execute the query, and then the next batch (the documents of the next page) can be acquired by setting this min index entry prior to executing the query.
This index entry –let's call it cursor from now on– may be encoded to a string and sent to the client along with the results, and when the client wants the next page, he sends back the cursor saying he wants results starting after this cursor.
Doing it manually (the "hard" way)
The command to be executed can be in different forms, but the command name (find) must be first in the marshaled result, so we'll use bson.D (which preserves order in contrast to bson.M):
limit := 10
cmd := bson.D{
{Name: "find", Value: "users"},
{Name: "filter", Value: bson.M{"country": "USA"}},
{Name: "sort", Value: []bson.D{
{Name: "name", Value: 1},
{Name: "_id", Value: 1},
},
{Name: "limit", Value: limit},
{Name: "batchSize", Value: limit},
{Name: "singleBatch", Value: true},
}
if min != nil {
// min is inclusive, must skip first (which is the previous last)
cmd = append(cmd,
bson.DocElem{Name: "skip", Value: 1},
bson.DocElem{Name: "min", Value: min},
)
}
The result of executing a MongoDB find command with Database.Run() can be captured with the following type:
var res struct {
OK int `bson:"ok"`
WaitedMS int `bson:"waitedMS"`
Cursor struct {
ID interface{} `bson:"id"`
NS string `bson:"ns"`
FirstBatch []bson.Raw `bson:"firstBatch"`
} `bson:"cursor"`
}
db := session.DB("")
if err := db.Run(cmd, &res); err != nil {
// Handle error (abort)
}
We now have the results, but in a slice of type []bson.Raw. But we want it in a slice of type []*User. This is where Collection.NewIter() comes handy. It can transform (unmarshal) a value of type []bson.Raw into any type we usually pass to Query.All() or Iter.All(). Good. Let's see it:
firstBatch := res.Cursor.FirstBatch
var users []*User
err = db.C("users").NewIter(nil, firstBatch, 0, nil).All(&users)
We now have the users of the next page. Only one thing left: generating the cursor to be used to get the subsequent page should we ever need it:
if len(users) > 0 {
lastUser := users[len(users)-1]
cursorData := []bson.D{
{Name: "country", Value: lastUser.Country},
{Name: "name", Value: lastUser.Name},
{Name: "_id", Value: lastUser.ID},
}
} else {
// No more users found, use the last cursor
}
This is all good, but how do we convert a cursorData to string and vice versa? We may use bson.Marshal() and bson.Unmarshal() combined with base64 encoding; the use of base64.RawURLEncoding will give us a web-safe cursor string, one that can be added to URL queries without escaping.
Here's an example implementation:
// CreateCursor returns a web-safe cursor string from the specified fields.
// The returned cursor string is safe to include in URL queries without escaping.
func CreateCursor(cursorData bson.D) (string, error) {
// bson.Marshal() never returns error, so I skip a check and early return
// (but I do return the error if it would ever happen)
data, err := bson.Marshal(cursorData)
return base64.RawURLEncoding.EncodeToString(data), err
}
// ParseCursor parses the cursor string and returns the cursor data.
func ParseCursor(c string) (cursorData bson.D, err error) {
var data []byte
if data, err = base64.RawURLEncoding.DecodeString(c); err != nil {
return
}
err = bson.Unmarshal(data, &cursorData)
return
}
And we finally have our efficient, but not so short MongoDB mgo paging functionality. Read on...
Using github.com/icza/minquery (the "easy" way)
The manual way is quite lengthy; it can be made general and automated. This is where github.com/icza/minquery comes into the picture (disclosure: I'm the author). It provides a wrapper to configure and execute a MongoDB find command, allowing you to specify a cursor, and after executing the query, it gives you back the new cursor to be used to query the next batch of results. The wrapper is the MinQuery type which is very similar to mgo.Query but it supports specifying MongoDB's min via the MinQuery.Cursor() method.
The above solution using minquery looks like this:
q := minquery.New(session.DB(""), "users", bson.M{"country" : "USA"}).
Sort("name", "_id").Limit(10)
// If this is not the first page, set cursor:
// getLastCursor() represents your logic how you acquire the last cursor.
if cursor := getLastCursor(); cursor != "" {
q = q.Cursor(cursor)
}
var users []*User
newCursor, err := q.All(&users, "country", "name", "_id")
And that's all. newCursor is the cursor to be used to fetch the next batch.
Note #1: When calling MinQuery.All(), you have to provide the names of the cursor fields, this will be used to build the cursor data (and ultimately the cursor string) from.
Note #2: If you're retrieving partial results (by using MinQuery.Select()), you have to include all the fields that are part of the cursor (the index entry) even if you don't intend to use them directly, else MinQuery.All() will not have all the values of the cursor fields, and so it will not be able to create the proper cursor value.
Check out the package doc of minquery here: https://godoc.org/github.com/icza/minquery, it is rather short and hopefully clean.

Mongo with Express: what does find return

I am developing an application in Express with Mongo, in which i have to query, whether a document exits in the collection or not. I am doing this:
var dept= req.body.dept;
var name= req.body.name;
mongoose.model('User').find({'dept': dept, 'name': name}, function(err, user){
if(err){
res.render('user/test1');
}else{
res.redirect('/');
}
});
What I want to do is to check if that document exits in the collection and upon true condition, i want to redirect to another page otherwise render the current page. But when I pass the wrong input, even then it goes to the else part and redirects it.
Well you are actually using mongoose, so there is in fact a distinction between what is returned by this specific version ( abtraction ) to what the base driver itself returns.
In the case of mongoose, what is returned is either an array of results or a null value if nothing is found. If you are looking for a singular match then you probably really want findOne() instead of a result array.
But of course the case would be a null result with either function if not returned as opposed to an err result that is not null, which is of course about specific database connection errors rather than simply that nothing was found.
The base driver on the other hand would return a "Cursor" ( or optionally considered a promise ) instead, which can either be iterated or converted to an array like mongoose does via the .toArray() method. The .findOne() method of the base driver similarly returns either a result or a null document where the conditions do not match.
mongoose.model('User').findOne({'dept': dept, 'name': name}, function(err, user){
if(err){
// if an error was actually returned
} else if ( !user ) {
// Nothing matched, in REST you would 404
} else {
// this is okay
}
});
In short, if you want to test that you actually found something, then look at the user value for a value that is not null in order to determine it matched. If it is null then nothing was matched.
So no match is not an "error", and an "error" is in fact a different thing.

Use allowDiskUse in criteria query with Grails and the MongoDB plugin?

In order to iterate over all the documents in a MongoDB (2.6.9) collection using Grails (2.5.0) and the MongoDB Plugin (3.0.2) I created a forEach like this:
class MyObjectService {
def forEach(Closure func) {
def criteria = MyObject.createCriteria()
def ids = criteria.list { projections { id() } }
ids.each { func(MyObject.get(it)) }
}
}
Then I do this:
class AnalysisService{
def myObjectService
#Async
def analyze(){
MyObject.withStatelessSession {
myObjectService.forEach { myObject ->
doSomethingAwesome(myObject)
}
}
}
}
This works great...until I hit a collection that is large (>500K documents) at which point a CommandFailureException is thrown because the size of the aggregation result is greater than 16MB.
Caused by CommandFailureException: { "serverUsed" : "foo.bar.com:27017" , "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)" , "code" : 16389 , "ok" : 0.0}
In reading about this, I think that one way to handle this situation is to use the option allowDiskUse in the aggregation function that runs on the MongoDB side so that the 16MB memory limit won't apply and I can get a larger aggregation result.
How can I pass this option to my criteria query? I've been reading the docs and the Javadoc for the Grails MongoDB plugin, but I can't seem to find it. Is there is another way to approach the generic problem (iterate over all members of a large collection of domain objects)?
This is not possible with the current implementation of MongoDB Grails plugin. https://github.com/grails/grails-data-mapping/blob/master/grails-datastore-gorm-mongodb/src/main/groovy/org/grails/datastore/mapping/mongo/query/MongoQuery.java#L957
If you look at the above line, then you will see that the default options are being used for building AggregationOptions instance so there is no method to provide an option.
But there is another hackish way to do it using the Groovy's metaclass. Let's do it..:-)
Store the original method reference of builder() method before writing criteria in your service:
MetaMethod originalMethod = AggregationOptions.metaClass.static.getMetaMethod("builder", [] as Class[])
Then, replace the builder method to provide your implementation.
AggregationOptions.metaClass.static.builder = { ->
def builderInstance = new AggregationOptions.Builder()
builderInstance.allowDiskUse(true) // solution to your problem
return builderInstance
}
Now, your service method will be called with criteria query and should not results in the aggregation error you are getting since we have not set the allowDiskUse property to true.
Now, reset the original method back so that it should not affect any other call (optional).
AggregationOptions.metaClass.static.addMetaMethod(originalMethod)
Hope this helps!
Apart from this, why do you pulling all IDs in forEach method and then re getting the instance using get() method? You are wasting the database queries which will impact the performance. Also, if you follow this, you don't have to do the above changes.
An example with the same: (UPDATED)
class MyObjectService {
void forEach(Closure func) {
List<MyObject> instanceList = MyObject.createCriteria().list {
// Your criteria code
eq("status", "ACTIVE") // an example
}
// Don't do any of this
// println(instanceList)
// println(instanceList.size())
// *** explained below
instanceList.each { myObjectInstance ->
func(myObjectInstance)
}
}
}
(I'm not adding the code of AnalysisService since there is no change)
*** The main point is here at this point. So whenever you write any criteria in domain class (without projection and in mongo), after executing the criteria code, Grails/gmongo will not immediately fetch the records from the database unless you call some methods like toString(), 'size()ordump()` on them.
Now when you apply each on that instance list, you will not actually loading all instances into memory but instead you are iterating over Mongo Cursor behind the scene and in MongoDB, cursor uses batches to pull record from database which is extremely memory safe. So you are safe to directly call each on your criteria result which will not blow up the JVM unless you called any of the methods which triggers loading all records from the database.
You can confirm this behaviour even in the code: https://github.com/grails/grails-data-mapping/blob/master/grails-datastore-gorm-mongodb/src/main/groovy/org/grails/datastore/mapping/mongo/query/MongoQuery.java#L1775
Whenever you write any criteria without projection, you will get an instance of MongoResultList and there is a method named initializeFully() which is being called on toString() and other methods. But, you can see the MongoResultList is implementing iterator which is in turn calling MongoDB cursor method for iterating over the large collection which is again, memory safe.
Hope this helps!

MongoDB: Retrieving the first document in a collection

I'm new to Mongo, and I'm trying to retrieve the first document from a find() query:
> db.scores.save({a: 99});
> var collection = db.scores.find();
[
{ "a" : 99, "_id" : { "$oid" : "51a91ff3cc93742c1607ce28" } }
]
> var document = collection[0];
JS Error: result is undefined
This is a little weird, since a collection looks a lot like an array. I'm aware of retrieving a single document using findOne(), but is it possible to pull one out of a collection?
The find method returns a cursor. This works like an iterator in the result set. If you have too many results and try to display them all in the screen, the shell will display only the first 20 and the cursor will now point to the 20th result of the result set. If you type it the next 20 results will be displayed and so on.
In your example I think that you have hidden from us one line in the shell.
This command
> var collection = db.scores.find();
will just assign the result to the collection variable and will not print anything in the screen. So, that makes me believe that you have also run:
> collection
Now, what is really happening. If you indeed have used the above command to display the content of the collection, then the cursor will have reached the end of the result set (since you have only one document in your collection) and it will automatically close. That's why you get back the error.
There is nothing wrong with your syntax. You can use it any time you want. Just make sure that your cursor is still open and has results. You can use the collection.hasNext() method for that.
Is that the Mongo shell? What version? When I try the commands you type, I don't get any extra output:
MongoDB shell version: 2.4.3
connecting to: test
> db.scores.save({a: 99});
> var collection = db.scores.find();
> var document = collection[0];
In the Mongo shell, find() returns a cursor, not an array. In the docs you can see the methods you can call on a cursor.
findOne() returns a single document and should work for what you're trying to accomplish.
So you can have several options.
Using Java as the language, but one option is to get a db cursor and iterate over the elements that are returned. Or just simply grab the first one and run.
DBCursor cursor = db.getCollection(COLLECTION_NAME).find();
List<DOCUMENT_TYPE> retVal = new ArrayList<DOCUMENT_TYPE>(cursor.count());
while (cursor.hasNext()) {
retVal.add(cursor.next());
}
return retVal;
If you're looking for a particular object within the document, you can write a query and search all the documents for it. You can use the findOne method or simply find and get a list of objects matching your query. See below:
DBObject query = new BasicDBObject();
query.put(SOME_ID, ID);
DBObject result = db.getCollection(COLLECTION_NAME).findOne(query) // for a single object
DBCursor cursor = db.getCollection(COLLECTION_NAME).find(query) // for a cursor of multiple objects

MongoDB Find using Array

I have a mongoDB collection which contains a list of documents which contain (among other items) a time stamp and an address for each different type of device. The device address is in hexidecimal (eg. 0x1001, 0x2001, 0x3001, etc.).
On the server side, I'm trying to query the collection to see what documents exists within a certain date range, and for a list of device
Collection.find(
{"addr": data.devices.Controls, "time": {$gte:d0, $lte:d1}},{},
function (err, docs) {
if( err|| !docs) console.log("No data found");
else {
//I've simplified the code here...
}
}
);
d0 and d1 are my start and end dates... and the data.devices.Controls is a list of device addresses. If I add the line:
console.log("Controls: " + JSON.stringify(data.devices.Controls));
I can see on the server side that it prints out a list of addresses that I'm looking for (the actual print statement looks like: Controls: ["0x1001", "0x2001", "0x3001"].)
However, this find statement doesn't seem to return any data from the query. There's no error (as I don't see the "No Data Found" message)... It just doesn't seem to return any data. What's strange is that if I specify a specific element out of the Controls array (something like data.devices.Controls[0]...), then it works fine. I can specify any element in the array and it works... but by passing an entire array in the argument, it doesn't seem to work. Does anyone know why this happens (and how to fix it)?
You need to use the $in operator to match against an array of values; like this:
Collection.find(
{"addr": {$in: data.devices.Controls}, "time": {$gte:d0, $lte:d1}}, ...