Is there a way to return part of an array in a document in MongoDB? - mongodb

Pretend I have this document:
{
"name": "Bob",
"friends": [
"Alice",
"Joe",
"Phil"
],
"posts": [
12,
15,
55,
61,
525,
515
]
}
All is good with only a handful of posts. However, let's say posts grows substantially (and gets to the point of 10K+ posts). A friend mentioned that I might be able to keep the array in order (i.e. the first entry is the ID of the newest post so I don't have to sort) and append new posts to the beginning. This way, I could get the first, say, 10 elements of the array to get the 10 newest items.
Is there a way to only retrieve posts n at a time? I don't need 10K posts being returned, when most of them won't even be looked at, but I still need to keep around for records.

You can use $slice operator of mongoDB in projection to get n elements from array like following:
db.collection.find({
//add condition here
}, {
"posts": {
$slice: 3 //set number of element here
//negative number slices from end of array
}
})

You can do this :
create a list for posts you want to have (say you want first 3 posts) and return that list
for doc in db.collections.find({your query}):
temp = ()
for i in range (2):
temp.push(doc['posts'][i])
return temp

Related

MongoDB find lowest missing value

I have a collection in MongoDB that looks something like:
{
"foo": "something",
"tag": 0,
},
{
"foo": "bar",
"tag": 1,
},
{
"foo": "hello",
"tag": 0,
},
{
"foo": "world",
"tag": 3,
}
If we consider this example, there are entries in the collection with tag of value 0, 1 or 3 and these aren't unique values, tag value can be repeated. My goal is to find that 2 is missing. Is there a way to do this with a query?
Query1
in the upcoming mongodb 5.2 we will have sort on arrays that could do this query easier without set operation but this will be ok also
group and find the min,max and all the values
take the range(max-min)
the missing are (setDifference range_above tags)
and from them you take only the smallest => 2
Test code here
aggregate(
[{"$group":
{"_id":null,
"min":{"$min":"$tag"},
"max":{"$max":"$tag"},
"tags":{"$addToSet":"$tag"}}},
{"$project":
{"_id":0,
"missing":
{"$min":
{"$setDifference":
[{"$range":[0, {"$subtract":["$max", "$min"]}]}, "$tags"]}}}}])
Query2
in Mongodb 5 (the current version) we can use also $setWindowFields
sort by tag, add the dense-rank(same values=same rank), and the min
then find the difference of tag-min
and then filter those that this difference < rank
and find the max of them (max of the tag that are ok)
increase 1 to find the one missing
*test it before using it to be sure, i tested it 3-4 times seemed ok,
for big collection if you have many different tags, this is better i think. (the above addtoset can cause memory problems)
Test code here
aggregate(
[{"$setWindowFields":
{"output":{"rank":{"$denseRank":{}}, "min":{"$first":"$tag"}},
"sortBy":{"tag":1}}},
{"$set":{"difference":{"$subtract":["$tag", "$min"]}}},
{"$match":{"$expr":{"$lt":["$difference", "$rank"]}}},
{"$group":{"_id":null, "last":{"$max":"$tag"}}},
{"$project":{"_id":0, "missing":{"$add":["$last", 1]}}}])

Does Indexing small arrays of subdocuments in Mongodb affect performance?

My Mongodb collection has this document structure:
{
_id: 1,
my_dict: {
my_key: [
{id: x, other_fields: other_values},
...
]
},
...
},
I need to update the array subdocuments very often, so an Index on the id field seems like a good idea. Still, I have many documents (millions) but my arrays inside them are small (max ~20 elements). Would it still improve performance a lot to index it, compared to the cost of indexing?
PS: I'm not using the id as a key (dict instead of an array), as I also often need to get the number of elements in "the array" ($size only works on arrays). I cannot use count as I am using Mongodb 3.2.
Followup question: If it would make a very big difference, I could instead use a dict like so:
{id: {others_fields: other_values}}
and store the size myself in a field. What I dislike about this is that I would need another field and update it myself (possible errors maybe, as I would need to use $inc each time I add/delete an item) instead of relying on "real" values. I would also have to manage the possibility that a key could be called _my_size, which would conflict with my logic. It would look then like this:
{
_id: 1,
my_dict: {
my_key: {
id: {other_fields: other_values},
_my_size: 1
},
},
},
Still not sure which is best for performance. I will need to update the subdocument (with id field) a lot, as well as computing the $size a lot (maybe 1/10 of the calls to update).
Which Schema/Strategy would give me a better performance? Or even more important, would it actually make a big difference? (possibly thousands of calls per second)
Update example:
update(
{_id: 1, my_dict.my_key.id: update_data_id},
{$set: {my_dict.my_key: update_data}}
)
Getting the size example:
aggregate(
{$match: {_id: 1}},
{$project: {_id: 0, nb_of_sub_documents: {$size: $my_dict.my_key}}}

Replace part of an array in a mongo db document

With a document structure like:
{
_id:"1234",
values : [
1,23,... (~ 2000 elements)
]
}
where values represent some time series
I need to update some elements in the values array and I'm looking for an efficient way to do it. The number of elements and the positions to update vary.
I would not like to get the whole array back to the client (application layer) so i'm doing something like :
db.coll.find({ "_id": 1234 })
db.coll.update(
{"_id": 128244 },
{$set: {
"values.100": 123,
"values.200": 124
}})
To be more precise, i'm using pymongo and bulk operations
dc = dict()
dc["values.100"] = 102
dc["values.200"] = 103
bulk = db.coll.initialize_ordered_bulk_op()
bulk.find({ "_id": 1234 }).update_one({$set:dc})
....
bulk.execute()
Would you know some better way to do it ?
Would it be possible to indicate a range in the array like (values from l00 to 110) ?

MongoDB find if all array elements are in the other bigger array

I have an array of id's of LEGO parts in a LEGO building.
// building collection
{
"name": "Gingerbird House",
"buildingTime": 45,
"rating": 4.5,
"elements": [
{
"_id": 23,
"requiredElementAmt": 14
},
{
"_id": 13,
"requiredElementAmt": 42
}
]
}
and then
//elements collection
{
"_id": 23,
"name": "blue 6 dots brick",
"availableAmt":20
}
{
"_id": 13,
"name": "red 8 dots brick",
"availableAmt":50
}
{"_id":254,
"name": "green 4 dots brick",
"availableAmt":12
}
How can I find it's possible to build a building? I.e. database will return the building only if the "elements" array in a building document consists of those elements that I have in a warehouse(elements collection) require less(or equal) amount of certain element.
In SQL(which from I came recently) I would write something likeSELECT * FROM building WHERE id NOT IN(SELECT fk_building FROM building_elemnt_amt WHERE fk_element NOT IN (1, 3))
Thank you in advance!
I wont pretend I get how it works in SQL without any comparison, but in mongodb you can do something like that:
db.buildings.find({/* building filter, if any */}).map(function(b){
var ok = true;
b.elements.forEach(function(e){
ok = ok && 1 == db.elements.find({_id:e._id, availableAmt:{$gt:e.requiredElementAmt}}).count();
})
return ok ? b : false;
}).filter(function(b){return b});
or
db.buildings.find({/* building filter, if any */}).map( function(b){
var condition = [];
b.elements.forEach(function(e){
condition.push({_id:e._id, availableAmt:{$gt:e.requiredElementAmt}});
})
return db.elements.find({$or:condition}).count() == b.elements.length ? b : false;
}).filter(function(b){return b});
The last one should be a bit quicker, but I did not test. If performance is a key, it must be better to mapReduce it to run subqueries in parallel.
Note: The examples above work with assumption that buildings.elements have no elements with the same id. Otherwise the array of elements needs to be pre-processed before b.elements.forEach to calculate total requiredElementAmt for non-unique ids.
EDIT: How it works:
Select all/some documents from buildings collection with find:
db.buildings.find({/* building filter, if any */})
returns a cursor, which we iterate with map applying the function to each document:
map(function(b){...})
The function itself iterates over elements array for each buildings document b:
b.elements.forEach(function(e){...})
and find number of documents in elements collection for each element e
db.elements.find({_id:e._id, availableAmt:{$gte:e.requiredElementAmt}}).count();
which match a condition:
elements._id == e._id
and
elements.availableAmt >= e.requiredElementAmt
until first request that return 0.
Since elements._id is unique, this subquery returns either 0 or 1.
First 0 in expression ok = ok && 1 == 0 turns ok to false, so rest of the elements array will be iterated without touching the db.
The function returns either current buildings document, or false:
return ok ? b : false
So result of the map function is an array, containing full buildings documents which can be built, or false for ones that lacks at least 1 resource.
Then we filter this array to get rid of false elements, since they hold no useful information:
filter(function(b){return b})
It returns a new array with all elements for which function(b){return b} doesn't return false, i.e. only full buildings documents.

How to incorporate $lt and $divide in mongodb

I am a newbie in MongoDB but I am trying to query to identify if any of my field meets the requirements.
Consider the following:
I have a collection where EACH document is formatted as:
{
"nutrition" : [
{
"name" : "Energy",
"unit" : "kcal",
"value" : 150.25,
"_id" : ObjectId("fdsfdslkfjsdf")
}
{---then there's more in the array---}
]
"serving" : 4
"id": "Food 1"
}
My current code looks something like this:
db.recipe.find(
{"nutrition": {$elemMatch: {"name": "Energy", "unit": "kcal", "value": {$lt: 300}}}},
{"id":1, _id:0}
)
Under the array nutrition, there's a field with its name called Energy with it's value being a number. It checks if that value is less than 300 and outputs all the documents that meets this requirement (I made it only output the field called id).
Now my question is the following:
1) For each document, I have another field called "serving" and I am supposed to find out if "value"/"serving" is still less than 300. (As in divide value by serving and see if it's still less than 300)
2) Since I am using .find, I am guessing I can't use $divide operator from aggregation?
3) I been trying to play around with aggregation operators like $divide + $cond, but no luck so far.
4) Normally in other languages, I would just create a variable a = value/serving then run it through an if statement to check if it's less than 300 but I am not sure if that's possible in MongoDB
Thank you.
In case anyone was struggling with similar problem, I figured out how to do this.
db.database.aggregate([
{$unwind: "$nutrition"}, //starts aggregation
{$match:{"nutrition.name": "Energy", "nutrition.unit": "kcal"}}, //breaks open the nutrition array
{$project: {"Calories per serving": {$divide: ["$nutrition.value", "$ingredients_servings"]}, //filters out anything not named Energy and unit is not kcal, so basically 'nutrition' array now has become a field with one data which is the calories in kcal data
"id": 1, _id:0}}, //show only the food id
{$match:{"Calories per serving": {$lt: 300}}} //filters out any documents that has their calories per serving equal to or greater than 300
]}
So basically, you open the array and filter out any sub-fields you don't want in the document, then display it using project along with any of your math evaluations that needs to be done. Then you filter out any condition you had, which for me was that I don't want to see any foods that have their calories per serving more than 300.