I have a collection in MongoDB that looks something like:
{
"foo": "something",
"tag": 0,
},
{
"foo": "bar",
"tag": 1,
},
{
"foo": "hello",
"tag": 0,
},
{
"foo": "world",
"tag": 3,
}
If we consider this example, there are entries in the collection with tag of value 0, 1 or 3 and these aren't unique values, tag value can be repeated. My goal is to find that 2 is missing. Is there a way to do this with a query?
Query1
in the upcoming mongodb 5.2 we will have sort on arrays that could do this query easier without set operation but this will be ok also
group and find the min,max and all the values
take the range(max-min)
the missing are (setDifference range_above tags)
and from them you take only the smallest => 2
Test code here
aggregate(
[{"$group":
{"_id":null,
"min":{"$min":"$tag"},
"max":{"$max":"$tag"},
"tags":{"$addToSet":"$tag"}}},
{"$project":
{"_id":0,
"missing":
{"$min":
{"$setDifference":
[{"$range":[0, {"$subtract":["$max", "$min"]}]}, "$tags"]}}}}])
Query2
in Mongodb 5 (the current version) we can use also $setWindowFields
sort by tag, add the dense-rank(same values=same rank), and the min
then find the difference of tag-min
and then filter those that this difference < rank
and find the max of them (max of the tag that are ok)
increase 1 to find the one missing
*test it before using it to be sure, i tested it 3-4 times seemed ok,
for big collection if you have many different tags, this is better i think. (the above addtoset can cause memory problems)
Test code here
aggregate(
[{"$setWindowFields":
{"output":{"rank":{"$denseRank":{}}, "min":{"$first":"$tag"}},
"sortBy":{"tag":1}}},
{"$set":{"difference":{"$subtract":["$tag", "$min"]}}},
{"$match":{"$expr":{"$lt":["$difference", "$rank"]}}},
{"$group":{"_id":null, "last":{"$max":"$tag"}}},
{"$project":{"_id":0, "missing":{"$add":["$last", 1]}}}])
Related
I would like to perform a complex merge:
e.g.
[
{
“one.two.three”: 4,
“number.two”: “B”
},
{
“one.two.three”: 7,
“number.two”: “A”
},
{
“one.two.three”: 10,
“number.two”: “B”
}
]
where the result is:
{
“one.two.three”: 10,
“number.two”: “A”
}
because those are the maximum values…I could have any N+ number of arbitrary KV pairs, so I can’t just sort on a specific field
Query
having field names with $ or . is problematic and hard to use, i think you need to change your field names to not contain those characters see this
group and find the max and min, i guess for letters you wanted the min because you wanted A letter
Playmongo
aggregate(
[{"$group":
{"_id": null,
"max-one": {"$max": "$one"},
"max-number": {"$min": "$number"}}}])
My Mongodb collection has this document structure:
{
_id: 1,
my_dict: {
my_key: [
{id: x, other_fields: other_values},
...
]
},
...
},
I need to update the array subdocuments very often, so an Index on the id field seems like a good idea. Still, I have many documents (millions) but my arrays inside them are small (max ~20 elements). Would it still improve performance a lot to index it, compared to the cost of indexing?
PS: I'm not using the id as a key (dict instead of an array), as I also often need to get the number of elements in "the array" ($size only works on arrays). I cannot use count as I am using Mongodb 3.2.
Followup question: If it would make a very big difference, I could instead use a dict like so:
{id: {others_fields: other_values}}
and store the size myself in a field. What I dislike about this is that I would need another field and update it myself (possible errors maybe, as I would need to use $inc each time I add/delete an item) instead of relying on "real" values. I would also have to manage the possibility that a key could be called _my_size, which would conflict with my logic. It would look then like this:
{
_id: 1,
my_dict: {
my_key: {
id: {other_fields: other_values},
_my_size: 1
},
},
},
Still not sure which is best for performance. I will need to update the subdocument (with id field) a lot, as well as computing the $size a lot (maybe 1/10 of the calls to update).
Which Schema/Strategy would give me a better performance? Or even more important, would it actually make a big difference? (possibly thousands of calls per second)
Update example:
update(
{_id: 1, my_dict.my_key.id: update_data_id},
{$set: {my_dict.my_key: update_data}}
)
Getting the size example:
aggregate(
{$match: {_id: 1}},
{$project: {_id: 0, nb_of_sub_documents: {$size: $my_dict.my_key}}}
With a document structure like:
{
_id:"1234",
values : [
1,23,... (~ 2000 elements)
]
}
where values represent some time series
I need to update some elements in the values array and I'm looking for an efficient way to do it. The number of elements and the positions to update vary.
I would not like to get the whole array back to the client (application layer) so i'm doing something like :
db.coll.find({ "_id": 1234 })
db.coll.update(
{"_id": 128244 },
{$set: {
"values.100": 123,
"values.200": 124
}})
To be more precise, i'm using pymongo and bulk operations
dc = dict()
dc["values.100"] = 102
dc["values.200"] = 103
bulk = db.coll.initialize_ordered_bulk_op()
bulk.find({ "_id": 1234 }).update_one({$set:dc})
....
bulk.execute()
Would you know some better way to do it ?
Would it be possible to indicate a range in the array like (values from l00 to 110) ?
Pretend I have this document:
{
"name": "Bob",
"friends": [
"Alice",
"Joe",
"Phil"
],
"posts": [
12,
15,
55,
61,
525,
515
]
}
All is good with only a handful of posts. However, let's say posts grows substantially (and gets to the point of 10K+ posts). A friend mentioned that I might be able to keep the array in order (i.e. the first entry is the ID of the newest post so I don't have to sort) and append new posts to the beginning. This way, I could get the first, say, 10 elements of the array to get the 10 newest items.
Is there a way to only retrieve posts n at a time? I don't need 10K posts being returned, when most of them won't even be looked at, but I still need to keep around for records.
You can use $slice operator of mongoDB in projection to get n elements from array like following:
db.collection.find({
//add condition here
}, {
"posts": {
$slice: 3 //set number of element here
//negative number slices from end of array
}
})
You can do this :
create a list for posts you want to have (say you want first 3 posts) and return that list
for doc in db.collections.find({your query}):
temp = ()
for i in range (2):
temp.push(doc['posts'][i])
return temp
I am a newbie in MongoDB but I am trying to query to identify if any of my field meets the requirements.
Consider the following:
I have a collection where EACH document is formatted as:
{
"nutrition" : [
{
"name" : "Energy",
"unit" : "kcal",
"value" : 150.25,
"_id" : ObjectId("fdsfdslkfjsdf")
}
{---then there's more in the array---}
]
"serving" : 4
"id": "Food 1"
}
My current code looks something like this:
db.recipe.find(
{"nutrition": {$elemMatch: {"name": "Energy", "unit": "kcal", "value": {$lt: 300}}}},
{"id":1, _id:0}
)
Under the array nutrition, there's a field with its name called Energy with it's value being a number. It checks if that value is less than 300 and outputs all the documents that meets this requirement (I made it only output the field called id).
Now my question is the following:
1) For each document, I have another field called "serving" and I am supposed to find out if "value"/"serving" is still less than 300. (As in divide value by serving and see if it's still less than 300)
2) Since I am using .find, I am guessing I can't use $divide operator from aggregation?
3) I been trying to play around with aggregation operators like $divide + $cond, but no luck so far.
4) Normally in other languages, I would just create a variable a = value/serving then run it through an if statement to check if it's less than 300 but I am not sure if that's possible in MongoDB
Thank you.
In case anyone was struggling with similar problem, I figured out how to do this.
db.database.aggregate([
{$unwind: "$nutrition"}, //starts aggregation
{$match:{"nutrition.name": "Energy", "nutrition.unit": "kcal"}}, //breaks open the nutrition array
{$project: {"Calories per serving": {$divide: ["$nutrition.value", "$ingredients_servings"]}, //filters out anything not named Energy and unit is not kcal, so basically 'nutrition' array now has become a field with one data which is the calories in kcal data
"id": 1, _id:0}}, //show only the food id
{$match:{"Calories per serving": {$lt: 300}}} //filters out any documents that has their calories per serving equal to or greater than 300
]}
So basically, you open the array and filter out any sub-fields you don't want in the document, then display it using project along with any of your math evaluations that needs to be done. Then you filter out any condition you had, which for me was that I don't want to see any foods that have their calories per serving more than 300.