How to get from n to n items in mongodb - mongodb

I'm trying to create an android app which pulls first 1-10 documents in the mongodb collection and show those item in a list, then later when the list reaches the end i want to pull 11-20 documents in the mongodb collection and it goes on.
def get_all_tips(from_item, max_items):
db = client.MongoTip
tips_list = db.tips.find().sort([['_id', -1]]).limit(max_items).skip(from_item)
if tips_list.count() > 0:
from bson import json_util
return json_util.dumps(tips_list, default=json_util.default), response_ok
else:
return "Please move along nothing to see here", response_invalid
pass
But the above code does not work the way i intended but rather it returns from from_item to max_items
Example: calling get_all_tips(3,4)
It returns:
Document 3, Document 4, Document 5, Document 6
I'm expecting:
Document 3, Document 4

In your code you are specifying two parameters.
from_item: which is the starting index of the documents to return
max_items: number of items to return
Therefore calling get_all_tips(3,4) will return 4 documents starting from document 3 which is exactly what's happening.
Proposed fixes:
If you want it to return documents 3 and 4 call get_all_tips(3,2) instead, which means return a maximum of two documents starting from 3.
If you'd rather like to specify the start and end indexes in your function, I recommend the following changes:
def get_all_tips(from_item, to_item):
if to_item < from_item:
return "to_item must be greater than from item", bad_request
db = client.MongoTip
tips_list = db.tips.find().sort([['_id', -1]]).limit(to_item - from_item).skip(from_item)
That being said, I'd like to point out that MongoDb documentation does not recommend use of skip for pagination for large collections.
MongoDb 3.2 cursor.skip

Related

MongoDB, retrieve specific field in array of objects

In my collection I have an array of objects. I'd like to share only a subset of those objects, but I can't find out how to do this?
Here are a few things I tried:
db.collections.find({},
{ fields: {
'myField': 1, // works
'myArray': 1, // works
'myArray.$': 1, // doesn't work
'myArray.$.myNestedField': 1, // doesn't work
'myArray.0.myNestedField': 1, // doesn't work
}
};
myArray.myNestedField':1 for projecting nested fields from the array.
I'll briefly explain all the variants you have.
'myField': 1 -- Projecting a field value
'myArray': 1 -- Projecting a array as a whole - (Can be scalar, embedded and sub document)
The below variants works only with positional operator($) in the query preceding the projections and projects only the first element matching the query.
'myArray.$': 1
'myArray.$.myNestedField': 1
This is not a valid projection operation.
'myArray.0.myNestedField': 1
More here on how to query & project documents

In mongo, how to find for a set of items and then add more to fill the required item count

Let's say I have a list of items. I need to find (return a cursor) exactly 8 items. First I need to see how many featured items are there. If I can get 8 featured items, then no issue. But if the count is less than 8, I need to randomly items until I get 8.
Is it possible to do this in mongodb?
If you sort the cursor by your featured field you can pick up the featured ones first and then fill in with others:
const noMoreThan8Docs = MyCollection.find({},{ sort: { featured: -1 }, limit: 8 });
This assumes that featured is a boolean key. Booleans sort false-then-true so you need to reverse the sort.
I'm not sure how random the documents that are selected after the featured ones will be. However, since you're using Meteor and Meteor uses random _ids (unlike MongoDB native) you can sort on that key as well.
const noMoreThan8Docs = MyCollection.find({},{ sort: { featured: -1, _id: 1 }, limit: 8 });
This is also not truly random since the same non-featured documents will tend to sort first. If you want to really randomize the non-featured items you'll want to do a random find of those and append them if you have less than 8 featured documents.
I think what you want to do is pad out the list of items to make sure you always return 8. You can do this in the helper method,
var rows = MyTable.find({search: "Something"}).limit(8).fetch();
for (var i=rows.length;i<8;i++) {
rows.push({name: "Empty data row "+i}):
}
return rows;

Selecting data from MongoDB where K of N criterias are met

I have documents with four fields: A, B, C, D Now I need to find documents where at least three fields matches. For example:
Query: A=a, B=b, C=c, D=d
Returned documents:
a,b,c,d (four of four met)
a,b,c (three of four met)
a,b,d (another three of four met)
a,c,d (another three of four met)
b,c,d (another three of four met)
So far I created something like:
`(A=a AND B=b AND C=c)
OR (A=a AND B=b AND D=d)
OR (A=a AND C=c AND D=d)
OR (B=b AND C=c AND D=d)`
But this is ugly and error prone.
Is there a better way to achieve it? Also, query performance matters.
I'm using Spring Data but I believe it does not matter. My current code:
Criteria c = new Criteria();
Criteria ca = Criteria.where("A").is(doc.getA());
Criteria cb = Criteria.where("B").is(doc.getB());
Criteria cc = Criteria.where("C").is(doc.getC());
Criteria cd = Criteria.where("D").is(doc.getD());
c.orOperator(
new Criteria().andOperator(ca,cb,cc),
new Criteria().andOperator(ca,cb,cd),
new Criteria().andOperator(ca,cc,cd),
new Criteria().andOperator(cb,cc,cd)
);
Query query = new Query(c);
return operations.find(query, Document.class, "documents");
Currently in MongoDB we cannot do this directly, since we dont have any functionality supporting Permutation/Combination on the query parameters.
But we can simplify the query by breaking the condition into parts.
Use Aggregation pipeline
$project with records (A=a AND B=b) --> This will give the records which are having two conditions matching.(Our objective is to find the records which are having matches for 3 out of 4 or 4 out of 4 on the given condition)`
Next in the pipeline use OR condition (C=c OR D=d) to find the final set of records which yields our expected result.
Hope it Helps!
The way you have it you have to do all permutations in your query. You can use the aggregation framework to do this without permuting all combinations. And it is generic enough to do with any K. The downside is I think you need Mongodb 3.2+ and also Spring Data doesn't support these oparations yet: $filter $concatArrays
But you can do it pretty easy with the java driver.
[
{
$project:{
totalMatched:{
$size:{
$filter:{
input:{
$concatArrays:[ ["$A"], ["$B"], ["$C"],["$D"]]
},
as:"attr",
cond:{
$eq:["$$attr","a"]
}
}
}
}
}
},
{
$match:{
totalMatched:{ $gte:3 }
}
}
]
All you are doing is you are concatenating the values of all the fields you need to check in a single array. Then select a subset of those elements that are equal to the value you are looking for (or any condition you want for that matter) and finally getting the size of that array for each document.
Now all you need to do is to $match the documents that have a size of greater than or equal to what you want.

PyMongo updating array records with calculated fields via cursor

Basically the collection output of an elaborate aggregate pipeline for a very large dataset is similar to the following:
{
"_id" : {
"clienta" : NumberLong(460011766),
"clientb" : NumberLong(2886729962)
},
"states" : [
[
"fixed", "fixed.rotated","fixed.rotated.off"
]
],
"VBPP" : [
244,
182,
184,
11,
299,
],
"PPF" : 72.4,
}
The intuitive, albeit slow, way to update these fields to be calculations of their former selves (length and variance of an array) with PyMongo before converting to arrays is as follows:
records_list = []
cursor = db.clientAgg.find({}, {'_id' : 0,
'states' : 1,
'VBPP' : 1,
'PPF': 1})
for record in cursor:
records_list.append(record)
for dicts in records_list:
dicts['states'] = len(dicts['states'])
dicts['VBPP'] = np.var(dicts['VBPP'])
I have written various forms of this basic flow to optimize for speed, but bringing in 500k dictionaries in memory to modify them before converting them to arrays to go through a machine learning estimator is costly. I have tried various ways to update the records directly via a cursor with variants of the following with no success:
cursor = db.clientAgg.find().skip(0).limit(50000)
def iter():
for item in cursor:
yield item
l = []
for x in iter():
x['VBPP'] = np.var(x['VBPP'])
# Or
# db.clientAgg.update({'_id':x['_id']},{'$set':{'x.VBPS': somefunction as above }},upsert=False, multi=True)
I also unsuccessfully tried using Mongo's usual operators since the variance is as simple as subtracting the mean from each element of the array, squaring the result, then averaging the results.
If I could successfully modify the collection directly then I could utilize something very fast like Monary or IOPro to load data directly from Mongo and into a numpy array without the additional overhead.
Thank you for your time
MongoDB has no way to update a document with values calculated from the document's fields; currently you can only use update to set values to constants that you pass in from your application. So you can set document.x to 2, but you can't set document.x to document.y + document.z or any other calculated value.
See https://jira.mongodb.org/browse/SERVER-11345 and https://jira.mongodb.org/browse/SERVER-458 for possible future features.
In the immediate future, PyMongo will release a bulk API that allows you to send a batch of distinct update operations in a single network round-trip which will improve your performance.
Addendum:
I have two other ideas. First, run some Javascript server-side. E.g., to set all documents' b fields to 2 * a:
db.eval(function() {
var collection = db.test_collection;
collection.find().forEach(function(doc) {
var b = 2 * doc.a;
collection.update({_id: doc._id}, {$set: {b: b}});
});
});
The second idea is to use the aggregation framework's $out operator, new in MongoDB 2.5.2, to transform the collection into a second collection that includes the calculated field:
db.test_collection.aggregate({
$project: {
a: '$a',
b: {$multiply: [2, '$a']}
}
}, {
$out: 'test_collection2'
});
Note that $project must explicitly include all the fields you want; only _id is included by default.
For a million documents on my machine the former approach took 2.5 minutes, and the latter 9 seconds. So you could use the aggregation framework to copy your data from its source to its destination, with the calculated fields included. Then, if desired, drop the original collection and rename the target collection to the source's name.
My final thought on this, is that MongoDB 2.5.3 and later can stream large result sets from an aggregation pipeline using a cursor. There's no reason Monary can't use that capability, so you might file a feature request there. That would allow you to get documents from a collection in the form you want, via Monary, without having to actually store the calculated fields in MongoDB.

Is it possible to refer to multiple documents in a mongo db query?

Suppose I have a collection containing the following documents:
...
{
event_counter : 3
event_type: 50
event_data: "yaya"
}
{
event_counter : 4
event_type: 100
event_data: "whowho"
}
...
Is it possible to ask for:
for each document, e where e.event_type == 100
get me any document f where
f.event_counter = e.event_counter+1
or equivalently:
find each f, where f.event_counter==e.event_counter+1 && e.event_type==100
I think the best way for you to approach this is on the application side, using multiple queries. You would want to run a query to match all documents with e.event_type = 100, like this one:
db.collection.find({"e.event_type" : 100});
Then, you'll have to write some logic to iterate through the results and run more queries to find documents with the right value of f.event_counter.
I am not sure it's possible to do this using MongoDB's aggregation framework. If it is possible, it will be quite a complicated query.