How do I set `skip` (offset) for reactivemongo mongodb driver? - mongodb

I need to skip a number of documents (offset) from a query, and only return limit number of documents that go after. I know the following naive approach:
collection.find(BSONDocument())
.cursor[T].collect[List](offset+limit).map(_.drop(offset))
but it is not really desired because it will load offset+limit number of documents in JVM memory, whereas I'd like to filter them on the "database" side.

Solution: use QueryOpts. Example:
collection.find(BSONDocument())
.options(QueryOpts(skipN = offset))
.cursor[T].collect[List](limit)
Note that using skip is not very efficient because mongodb does not support effective pagination, it will just skip the desired number by iterating through all the documents.

VasyaNovikov answer is certainly correct. Reactive mongo offers a more intuitive API:
collection.find(BSONDocument())
.skip(offset)
.cursor[T]
.collect[List](limit, Cursor.FailOnError[List[T]]())

Related

How to use mongodb change stream instead of periodic query?

I wan't to calculate sum the documents in my collection satisfying a query. I dont want to poll my collection. How can you do this with mongodb changestream?
For example there are documents in the database and they all have some property: {"destination": "Target1"} And i want to know the amount of documents which are satisfying this previous requirement.
I don't want to run a query on every change of a collection. Because the documents changing very often
I am looking for a similar to oracle's cqn
You can use changestream and watch changes as follow:
watchCursor = db.getSiblingDB("mydatabase").mycollection.watch()
while (!watchCursor.isExhausted()){
if (watchCursor.hasNext()){
printjson(watchCursor.next());
}
}
changeStream docs
But perhaps you may do some query and use some good indexes?
It seems you can just execute:
db.collection.count({destination:"Target1"})
and if you have index on "destination" field it will be pretty quick ...

how to get the max value of a field in MongoDB

like:
{id:4563214321,updateTime:long("124354354")}
there are always new collections enter the db, so I would like to always get latest updated documents aka the largest update time. how to design the shell script? thanks in advance.
You can use a combination of limit and sort to achieve this goal.
db.collectionName.find({}).sort({"updateTime" : -1}).limit(1)
This will sort all of your fields based on update time and then only return the one largest value.
I would recommend adding an index to this field to improve performance.
This is a repeated question, you can find an answer in this link
Using findOne in mongodb to get element with max id
use like this,
db.collection.find().sort({updateTime:-1}).limit(1).pretty()
as findOne you can do it with this syntax:
db.collection.findOne({$query:{},$orderby:{_updateTime:-1}})

Is it faster to use with_limit_and_skip=True when counting query results in pymongo

I'm doing a query where all I want to know if there is at least one row in the collection that matches the query, so I pass limit=1 to find(). All I care about is whether the count() of the returned cursor is > 0. Would it be faster to use count(with_limit_and_skip=True) or just count()? Intuitively it seems to me like I should pass with_limit_and_skip=True, because if there are a whole bunch of matching records then the count could stop at my limit of 1.
Maybe this merits an explanation of how limits and skips work under the covers in mongodb/pymongo.
Thanks!
Your intuition is correct. That's the whole point of the with_limit_and_skip flag.
With with_limit_and_skip=False, count() has to count all the matching documents, even if you use limit=1, which is pretty much guaranteed to be slower.
From the docs:
Returns the number of documents in the results set for this query. Does not take limit() and skip() into account by default - set with_limit_and_skip to True if that is the desired behavior.

How can i exclude a mongo index from a query?

Does anyone know of a way to run a query in MongoDB, and specify that a named index NOT be used?
We have multiple indexes on our data and there are situations where mongo is making a poor choice about which index to use to satisfy some types of queries. But we don't necessarily want to declare that a specific index be used. Only that we know which one is definitely a poor choice.
Using a named index is easy:
db.users.find({....}).hint( "index_name" )
Excluding a named index might look something like this:
db.users.find({....}).hint( "index_name", false)
Any insight is appreciated.
You can't exclude indexes, you can only specify the use of one.
However, MongoDB empirically tests indexes with your query by checking the search speed of the query against all indexes. It then determines what index to use based on these results. Can you please run the query with .explain(true) to show all the query plans.
Regards,
Charlie

How can I filter by the length of an embedded document in MongoDB?

For example given the BlogPost/Comments schema here:
http://mongoosejs.com/
How would I find all posts with more than five comments? I have tried something along the lines of
where('comments').size.gte(5)
But I'm getting tripped up with the syntax
MongoDb doesn't support range queries with size operator (Link). They recommend you to create a separate field to contain the size of the list that you increment yourself.
You cannot use $size to find a range of sizes (for example: arrays with more than 1 element). If you need to query for a range, create an extra size field that you increment when you add elements.
Note that for some queries, it may be feasible to just list all the counts you want in or excluded using (n)or conditions.
In your example, the following query will give all documents with more than 5 comments (using standard mongodb syntax, not mongoose):
db.col.find({"comments":{"$exists"=>true}, "$nor":[{"comments":{"$size"=>4}}, {"comments":{"$size"=>3}}, {"comments":{"$size"=>2}}, {"comments":{"$size"=>1}}, {"comments":{"$size"=>0}}]})
Obviously, this is very repetitive, so it only makes sense for small boundaries, if at all. Keeping a separate count variable, as recommended in the mongodb docs, is usually the better solution.
It's slow, but you could also use the $where clause:
db.Blog.find({$where:"this.comments.length > 5"}).exec(...);