MongoDB pagination - mongodb

I'm trying to implement MongoDB pagination based on a query which always should short by a date field named updateTime.
I tried to implement pagination paging by a next and previous ObjectID using the operators $gt and $lt and limiting by some results page size.
In adition the query may include multiple filters (distinc field matching values).
In order to build a consistent pagination approach, based on the query which may include or not filters, I try to determine the first and last ObjectID despite of the numbers of pages a search might generate.
#staticmethod
def paginate_reviews(gt_oid, lt_oid, account, location, rating, responded, start_datetime=None, end_datetime=None):
query = {'location': {'$in': location}}
if account is not None:
query['account'] = account
if start_datetime is not None and end_datetime is not None:
query['updateTime'] = {'$gte': start_datetime, '$lte': end_datetime}
if gt_oid is not None:
query['_id'] = {'$gt': ObjectId(gt_oid)}
elif lt_oid is not None:
query['_id'] = {'$lt': ObjectId(lt_oid)}
if rating is not None and len(rating) is not 0:
rating = [int(r) for r in rating]
query['starRatingNumber'] = {'$in': rating}
if responded is not None:
responded = json.loads(responded)
query['reviewReply'] = {'$exists': responded}
reviews = Review.objects.filter(__raw__=query).order_by('-updateTime', '_id')[:25]
response = {
'reviews': reviews
}
if gt_oid is None and lt_oid is None:
first = Review.objects().filter(__raw__=query).order_by('updateTime').first()
last = Review.objects().filter(__raw__=query).order_by('-updateTime').first()
first_oid = str(first.to_mongo().get('_id')) if first is not None else None
last_oid = str(last.to_mongo().get('_id')) if last is not None else None
response['first_oid'] = first_oid
response['last_oid'] = last_oid
return response
Where first_oid is supposed to be the first document of first page and last_oid should be the last element of last page.
When I'm at first page, I query a next one like this:
db.review.find({"_id" : {$gt: ObjectId("5a776c41c68932281c2240f1")}}).sort({'_id': 1, 'updateTime': -1}).limit(25)
Being 5a776c41c68932281c2240f1 the last element of first page.
So far, it's working properly when trying to paginate forward. However when trying to paginate backwards something weird happens as even I'm in a 3rd, 5th or whatever page, a match using $lt passing the first element in a page is returning first element of page 1.
This is breaking my whole implementation. How should I adjust it?

Related

How to update a field in a MongoDB collection so it increments linearly

I've been struggling with mongodb for some time now, and the idea is quite simple: I have a collection, and I want o to add a new ID field. This field is controlled by our API, and it auto increments it for each new document inserted.
The thing is, the collection already has some documents, so I must initialize each document with a number sequentially, no matter the order:
collection: holiday {
'date': date,
'name': string
}
The collection has 12 documents, so each document should get an ID property, with values from 1 to 12. What kind of query or function should I use to do this? No restrictions so far, and performance is not a problem.
Maybe it is not optimal but works :)
var newId = 1;
var oldIds = [];
db.holiday.find().forEach(it => {
const documentToMigrate = it;
oldIds.push(it._id);
documentToMigrate._id = newId;
db.holiday.save(documentToMigrate);
++newId;
})
db.holiday.remove({_id: {$in: oldIds}});

Skip and Limit for pagination for a Mongo aggregate

I am working on pagination in flask(Python framework) using flask-paginate (just for ref)
I am able to achieve pagination for just a find query as below:
from flask_paginate import Pagination
from flask_paginate import get_page_args
def starting_with_letter(letter):
page, per_page, offset = get_page_args()
collection_name=letter.lower()+'_collection'
words=db[collection_name]
data_db=words.find()
data=data_db.limit(per_page).skip(offset) '''Here I have achieved the limit and skip'''
pagination = Pagination(page=page, total=data.count(),per_page=per_page,offset=offset,record_name='words')
return render_template('startingwords.html',data=data,pagination=pagination)
But I am not able to do the same for the aggregate here:
def test():
page, per_page, offset = get_page_args()
cursor_list=[] '''appending each cursor in iteration of for loop '''
collections=db.collection_names()
for collection in collections:
cursor_objects = db[collection].aggregate([
{
"$match": {
"$expr": {"$eq": [{"$strLenCP": "$word"}, 6]}
}
},
{"$skip": offset},
{"$limit": per_page}
])
for cursor in cursor_objects:
cursor_list.append(cursor)
pagination = Pagination(page=page, total=len(cursor_list),per_page=per_page,offset=offset,record_name='words')
return render_template('lettersearch.html',data=cursor_list,pagination=pagination)
The results are displayed as :
Here all the 39 results are shown at single page
On hitting page 2 it showed :
Note: By default flask-paginate sets initially per_page as 10 and offset as 0
after referring many links i have tried:
placing skip and limit above match which is wrong any way
Also learnt that limit is always followed by skip
I am stuck with this, Any help is appreciated
Your issue is not with the skip() and limit(); that is working fine. The issue is with your overall logic; you are iterating all 39 collections in the first loop and then appending each result of the aggregation to cursor_list.
I can't figure out the logic of what you are trying to do, as the first example is looking in a words collection and second is looking in all collections for a word field; with that said, you can likely simplify your approach to something like:
offset = 0
per_page = 10
collections = db.list_collection_names()
#
# Add some logic on the collections array to filter what is needed
#
print(collections[offset:offset+per_page])
EDIT to reflect comments. Full worked example of a function to perform this. No need for an aggregation query - this adds complexity.
from pymongo import MongoClient
from random import randint
db = MongoClient()['testdatabase1']
# Set up some data
for i in range(39):
coll_name = f'collection{i}'
db[coll_name].delete_many({}) # Be careful; testing only; this deletes your data
for k in range (randint(0, 2)):
db[coll_name].insert_one({'word': '123456'})
# Main function
def test(offset, per_page, word_to_find):
found = []
collections = db.list_collection_names()
for collection in sorted(collections):
if db[collection].find_one({word_to_find: { '$exists': True}}) is not None:
found.append(collection)
print(found[offset:offset+per_page])
test(offset=0, per_page=10, word_to_find='word')

Mongo -Select parent document with maximum child documents count, Faster way?

I'm quite new to mongo, and trying to get work following query.and is working fine too, But it's taking a little bit more time. I think I'm doing something wrong.
There are many number of documents in a collection parent, near about 6000. Each document has certain number of childs (childs is an another collection with 40000 documents in it). parents & childs are associated with each other by an attribute in the document called parent_id. Please see the following code. Following code takes approximate 1 minute to execute the queries. I don't think mongo should take that much time.
function getChildMaxDocCount(){
var maxLen = 0;
var bigSizeParent = null;
db.parents.find().forEach(function (parent){
var currentcount = db.childs.count({parent_id:parent._id});
if(currcount > maxLen){
maxLen = currcount;
bigSizeParent = parent._id;
}
});
printjson({"maxLen":maxLen, "bigSizeParent":bigSizeParent });
}
Is there any feasible/optimal way to achieve this?
If I got you right, you want to have the parent with the most childs. This is easy to accomplish using the aggregation framework. When each child only can have one parent, the aggregation query would look like this
db.childs.aggregate(
{ $group: { _id:"$parent_id", children:{$sum:1} } },
{ $sort: { "children":-1 } },
{ $limit : 1 }
);
Which should return a document like:
{ _id:"SomeParentId", children:15}
If a child can have more than one parent, it heavily depends on the data modeling how the query would look like.
Have a look at the aggregation framework documentation for details.
Edit: Some explanation
The aggregation pipeline takes every document it is told do do so through a series of steps in a way that all documents are first processed through the first step and the resulting documents are put into the next step.
Step 1: Grouping
We group all documents into new documents (virtual ones, if you want) and tell mongod to increment the field children by one for each document which has the same parent_id. Since we are referring to a field of the current document, we need to add a $ sign.
Step 2: Sorting
Now that we have a bunch of documents which hold the parent_id and the number of children this parent has, we sort it by the children field in descending (-1) order.
Step3: Limiting
Since we are only interested in the parent_id which has the most children, we only let mongod return the first document after sorting.

MongoDB: Retrieving the first document in a collection

I'm new to Mongo, and I'm trying to retrieve the first document from a find() query:
> db.scores.save({a: 99});
> var collection = db.scores.find();
[
{ "a" : 99, "_id" : { "$oid" : "51a91ff3cc93742c1607ce28" } }
]
> var document = collection[0];
JS Error: result is undefined
This is a little weird, since a collection looks a lot like an array. I'm aware of retrieving a single document using findOne(), but is it possible to pull one out of a collection?
The find method returns a cursor. This works like an iterator in the result set. If you have too many results and try to display them all in the screen, the shell will display only the first 20 and the cursor will now point to the 20th result of the result set. If you type it the next 20 results will be displayed and so on.
In your example I think that you have hidden from us one line in the shell.
This command
> var collection = db.scores.find();
will just assign the result to the collection variable and will not print anything in the screen. So, that makes me believe that you have also run:
> collection
Now, what is really happening. If you indeed have used the above command to display the content of the collection, then the cursor will have reached the end of the result set (since you have only one document in your collection) and it will automatically close. That's why you get back the error.
There is nothing wrong with your syntax. You can use it any time you want. Just make sure that your cursor is still open and has results. You can use the collection.hasNext() method for that.
Is that the Mongo shell? What version? When I try the commands you type, I don't get any extra output:
MongoDB shell version: 2.4.3
connecting to: test
> db.scores.save({a: 99});
> var collection = db.scores.find();
> var document = collection[0];
In the Mongo shell, find() returns a cursor, not an array. In the docs you can see the methods you can call on a cursor.
findOne() returns a single document and should work for what you're trying to accomplish.
So you can have several options.
Using Java as the language, but one option is to get a db cursor and iterate over the elements that are returned. Or just simply grab the first one and run.
DBCursor cursor = db.getCollection(COLLECTION_NAME).find();
List<DOCUMENT_TYPE> retVal = new ArrayList<DOCUMENT_TYPE>(cursor.count());
while (cursor.hasNext()) {
retVal.add(cursor.next());
}
return retVal;
If you're looking for a particular object within the document, you can write a query and search all the documents for it. You can use the findOne method or simply find and get a list of objects matching your query. See below:
DBObject query = new BasicDBObject();
query.put(SOME_ID, ID);
DBObject result = db.getCollection(COLLECTION_NAME).findOne(query) // for a single object
DBCursor cursor = db.getCollection(COLLECTION_NAME).find(query) // for a cursor of multiple objects

Fetch Record from mongo db based on type and ancestry field

in mongodb records are store like this
{_id:100,type:"section",ancestry:nil,.....}
{_id:300,type:"section",ancestry:100,.....}
{_id:400,type:"problem",ancestry:100,.....}
{_id:500,type:"section",ancestry:100,.....}
{_id:600,type:"problem",ancestry:500,.....}
{_id:700,type:"section",ancestry:500,.....}
{_id:800,type:"problem",ancestry:100,.....}
i want to fetch records in order like this
first record whose ancestry is nil
then all record whose parent is first record we search and whose type is 'problem'
then all record whose parent is first record we search and whose type is 'section'
Expected output is
{_id:100,type:"section",ancestry:nil,.....}
{_id:400,type:"problem",ancestry:100,.....}
{_id:800,type:"problem",ancestry:100,.....}
{_id:300,type:"section",ancestry:100,.....}
{_id:500,type:"section",ancestry:100,.....}
{_id:600,type:"problem",ancestry:500,.....}
{_id:700,type:"section",ancestry:500,.....}
Try this MongoDB shell command:
db.collection.find().sort({ancestry:1, type: 1})
Different languages, where ordered dictionaries aren't available, may use a list of 2-tuples to the sort argument. Something like this (Python):
collection.find({}).sort([('ancestry', pymongo.ASCENDING), ('type', pymongo.ASCENDING)])
#vinipsmaker 's answer is good. However, it doesn't work properly if _ids are random numbers or there exist documents that aren't part of the tree structure. In that case, the following code would work rightly:
function getSortedItems() {
var sorted = [];
var ids = [ null ];
while (ids.length > 0) {
var cursor = db.Items.find({ ancestry: ids.shift() }).sort({ type: 1 });
while (cursor.hasNext()) {
var item = cursor.next();
ids.push(item._id);
sorted.push(item);
}
}
return sorted;
}
Note that this code is not fast because db.Items.find() will be executed n times, where n is the number of documents in the tree structure.
If the tree structure is huge or you will do the sort many times, you can optimize this by using $in operator in the query and sort the result on the client side.
In addition, creating index on the ancestry field will make the code quicker in either case.