Find document with array containing the maximum occurence of a specific value - mongodb

I have documents like the this
{ "_id" : ObjectId("5755d81e2935fe65f5d167aa"), "prices" : [ 23, 11, 2, 3, 4, 1, 232 ] },
{ "_id" : ObjectId("5755d81e2935fe65f5d167ab"), "prices" : [ 99, 3, 23, 23, 12 ] },
{ "_id" : ObjectId("5755d81e2935fe65f5d167ac"), "prices" : [ 999, 12, 3, 4, 4, 4, 4, 4, 123 ] },
{ "_id" : ObjectId("5755d81e2935fe65f5d167ad"), "prices" : [ 24, 3, 4, 5, 6, 7, 723 ] }
and I want to find the document with array 'prices' containing the highest amount of digit 4, which in my case is the third document. Is there any way to query it?

Starting from MongoDB 3.2, we can $project our documents and use the $size and the $filter operator to return the "count" of the number 4 in each array. From there we need to $group using that "value" and use the $push accumulator operator to return an array of the documents that have same "maximum". Next you $sort your documents by _id and use $limit to return the documents with the maximum occurrence of 4.
db.collection.aggregate(
[
{ "$project": {
"prices": 1,
"counter": {
"$size": {
"$filter": {
"input": "$prices",
"as": "p",
"cond": { "$eq": [ "$$p", 4 ] }
}
}
}
}},
{ "$group": {
"_id": "$counter",
"docs": { "$push": "$$ROOT" }
}},
{ "$sort": { "_id": -1 } },
{ "$limit": 1 }
]
)

Related

Counting numbers greater than a certain number in an array in MongoDb

{
"_id" : ObjectId(""),
"CustomerId" : 13038,
"AT" : ISODate("2021-12-01T04:00:00.000Z"),
"dwell" : [
7,
6,
12,
6 ]
},
{
"_id" : ObjectId(""),
"CustomerId" : 12036,
"AT" : ISODate("2021-12-01T04:00:00.000Z"),
"dwell" : [
15,
3,
12
]
}
In these documents, I only want to get the count of the numbers in the dwell which are greater than 10.
For Example:
{"CustomerId": 13038, "Count": 1} //because only 12 bigger than 10
{"CustomerId": 12036, "Count": 2}
You could do something like this using $size and $filter:
db.collection.aggregate([
{
$project: {
_id: 0,
CustomerId: 1,
Count: {
"$size": {
"$filter": {
"input": "$dwell",
"as": "num",
"cond": {
$gt: [
"$$num",
10
]
}
}
}
}
}
}
])
Example MongoPlayground

How to sort by a foreign field, the foreign field not using alphabetical/numerical order? [duplicate]

Following this question which #NeilLunn has gracefully answered, here is my problem in more detail.
This is the set of documents, some have user_id some don't. The user_id represent the user who created the document:
{ "user_id" : 11, "content" : "black", "date": somedate }
{ "user_id" : 6, "content" : "blue", "date": somedate }
{ "user_id" : 3, "content" : "red", "date": somedate }
{ "user_id" : 4, "content" : "black", "date": somedate }
{ "user_id" : 4, "content" : "blue", "date": somedate }
{ "user_id" : 90, "content" : "red", "date": somedate }
{ "user_id" : 7, "content" : "orange", "date": somedate }
{ "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
...
{ "user_id" : 4, "content" : "orange", "date": somedate }
{ "user_id" : 1, "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
{ "user_id" : 90, "content" : "purple", "date": somedate }
The front end is pulling pages, so each page will have 10 items and I do that with limit and skip and it is working very well.
In case we have a logged in user, I would like to display to that current logged in user documents which he may find more interesting first, based on the users he interacted with.
The list of users which the current user may find interesting is sorted by score and is located outside of mongo. So the first element is the most important user which I would like to show his documents first, and the last user on the list is the least important.
The list is a simple array which looks like this:
[4,7,90,1].
The system which created this user score is not located within mongo, but I can copy the data if that will help. I can also change the array to include a score number.
What I would like accomplish is the following:
Get the documents sorted by importance of the user_id from the list, so that documents from user_id 4 will be the first to show up, documents from user_id 7 second and so on. When where are no users left on the list I would like to show the rest of the documents. Like this:
all documents with user_d:4
all documents with user_d:7
all documents with user_d:90
all documents with user_d:1
all the rest of the documents
How should I accomplish this? Am I asking too much from mongo?
Given the array [4,7,90,1] what you want in your query is this:
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$or": [
{ "$eq": ["$user_id": 4] },
{ "$eq": ["$user_id": 7] },
{ "$eq": ["$user_id": 90] },
{ "$eq": ["$user_id": 1] },
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
So what that does is, for every item contained in that $or condition, the user_id field is tested against the supplied value, and $eq returns 1 or 0 for true or false.
What you do in your code is for each item you have in the array you build the array condition of $or. So it's just creating a hash structure for each equals condition, passing it to an array and plugging that in as the array value for the $or condition.
I probably should have left the $cond operator out of the previous code so this part would have been clearer.
Here's some code for the Ruby Brain:
userList = [4, 7, 90, 1];
orCond = [];
userList.each do |userId|
orCond.push({ '$eq' => [ 'user_id', userId ] })
end
pipeline = [
{ '$project' => {
'user_id' => 1,
'content' => 1,
'date' => 1,
'weight' => { '$or' => orCond }
}},
{ '$sort' => { 'weight' => -1, 'date' => -1 } }
]
If you want to have individual weights and we'll assume key value pairs, then you need to nest with $cond :
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$cond": [
{ "$eq": ["$user_id": 4] },
10,
{ "$cond": [
{ "$eq": ["$user_id": 7] },
9,
{ "$cond": [
{ "$eq": ["$user_id": 90] },
7,
{ "$cond": [
{ "$eq": ["$user_id": 1] },
8,
0
]}
]}
]}
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
Note that it's just a return value, these do not need to be in order. And you can think about the generation of that.
For generating this structure see here:
https://stackoverflow.com/a/22213246/2313887
Since mongoDB version 3.2 we can use a $filter which make this much easier to maintain in case there are more than 4 scores:
db.collection.aggregate([
{
$addFields: {
weight: [
{key: 4, score: 10}, {key: 8, score: 9}, {key: 90, score: 8}, {key: 1, score: 7}
]
}
},
{
$addFields: {
weight: {
$filter: {
input: "$weight",
as: "item",
cond: {$eq: ["$$item.key", "$user_id"]}
}
}
}
},
{
$set: {
weight: {
$cond: [{$eq: [{$size: "$weight"}, 1]}, {$arrayElemAt: ["$weight", 0]}, {score: 1}]
}
}
},
{$set: {weight: "$weight.score"}},
{$sort: {weight: -1, date: -1}}
])
See how it works on the playground example

MongoDB sort vs aggregate $sort on array index

With a MongoDB collection test containing the following documents:
{ "_id" : 1, "color" : "blue", "items" : [ 1, 2, 0 ] }
{ "_id" : 2, "color" : "red", "items" : [ 0, 3, 4 ] }
if I sort them in reversed order based on the second element in the items array, using
db.test.find().sort({"items.1": -1})
they will be correctly sorted as:
{ "_id" : 2, "color" : "red", "items" : [ 0, 3, 4 ] }
{ "_id" : 1, "color" : "blue", "items" : [ 1, 2, 0 ] }
However, when I attempt to sort them using the aggregate function:
db.test.aggregate([{$sort: {"items.1": -1} }])
They will not sort correctly, even though the query is accepted as valid:
{
"result" : [
{
"_id" : 1,
"color" : "blue",
"items" : [
1,
2,
0
]
},
{
"_id" : 2,
"color" : "red",
"items" : [
0,
3,
4
]
}
],
"ok" : 1
}
Why is this?
The aggregation framework just does not "deal with" arrays in the same way as is applied to .find() queries in general. This is not only true of operations like .sort(), but also with other operators, and namely $slice, though that example is about to get a fix ( more later ).
So it pretty much is impossible to deal with anything using the "dot notation" form with an index of an array position as you have. But there is a way around this.
What you "can" do is basically work out what the "nth" array element actually is as a value, and then return that as a field that can be sorted:
db.test.aggregate([
{ "$unwind": "$items" },
{ "$group": {
"_id": "$_id",
"items": { "$push": "$items" },
"itemsCopy": { "$push": "$items" },
"first": { "$first": "$items" }
}},
{ "$unwind": "$itemsCopy" },
{ "$project": {
"items": 1,
"itemsCopy": 1,
"first": 1,
"seen": { "$eq": [ "$itemsCopy", "$first" ] }
}},
{ "$match": { "seen": false } },
{ "$group": {
"_id": "$_id",
"items": { "$first": "$items" },
"itemsCopy": { "$push": "$itemsCopy" },
"first": { "$first": "$first" },
"second": { "$first": "$itemsCopy" }
}},
{ "$sort": { "second": -1 } }
])
It's a horrible and "iterable" approach where you essentially "step through" each array element by getting the $first match per document from the array after processing with $unwind. Then after $unwind again, you test to see if that array elements are the same as the one(s) already "seen" from the identified array positions.
It's terrible, and worse for the more positions you want to move along, but it does get the result:
{ "_id" : 2, "items" : [ 0, 3, 4 ], "itemsCopy" : [ 3, 4 ], "first" : 0, "second" : 3 }
{ "_id" : 1, "items" : [ 1, 2, 0 ], "itemsCopy" : [ 2, 0 ], "first" : 1, "second" : 2 }
{ "_id" : 3, "items" : [ 2, 1, 5 ], "itemsCopy" : [ 1, 5 ], "first" : 2, "second" : 1 }
Fortunately, upcoming releases of MongoDB ( as currently available in develpment releases ) get a "fix" for this. It may not be the "perfect" fix that you desire, but it does solve the basic problem.
There is a new $slice operator available for the aggregation framework there, and it will return the required element(s) of the array from the indexed positions:
db.test.aggregate([
{ "$project": {
"items": 1,
"slice": { "$slice": [ "$items",1,1 ] }
}},
{ "$sort": { "slice": -1 } }
])
Which produces:
{ "_id" : 2, "items" : [ 0, 3, 4 ], "slice" : [ 3 ] }
{ "_id" : 1, "items" : [ 1, 2, 0 ], "slice" : [ 2 ] }
{ "_id" : 3, "items" : [ 2, 1, 5 ], "slice" : [ 1 ] }
So you can note that as a "slice", the result is still an "array", however the $sort in the aggregation framework has always used the "first position" of the array in order to sort the contents. That means that with a singular value extracted from the indexed position ( just as the long procedure above ) then the result will be sorted as you expect.
The end cases here are that is just how it works. Either live with the sort of operations you need from above to work with a indexed position of the array, or "wait" until a brand new shiny version comes to your rescue with better operators.

MongoDB min/max aggregation

I've got documents with this simplified schema :
{
positon: 10,
value: 5,
count: 3
}
What I'd like to compute, is to group those documents by position and find the maximum value where the count is greater than 4 but with value less than the minimum value where the count is less than 4.
Here what I've done, but it does not work :
{ $group: {
_id: {
position: "$position",
},
result: {$max: { $cond: [ {$and: [ {$gte: ["$count", 4]},
{$lt: ["$value", {$min: { $cond: [ {$lt: ["$count", 4]},
{ value: "$value" },
10]
}
}]
}]},
{ value: "$value", nb: "$count"},
0]
}
}
}
}
I am said that $minis an invalid operator and I cant figure out how to write the right aggregation function. Would it be better to run a mapreduce ?
If for example I have those documents
{Position: 10, value: 1, count 5}
{Position: 10, value: 3, count 3}
{Position: 10, value: 4, count 5}
{Position: 10, value: 7, count 4}
I'd like the reslt to be
{Position: 10, value: 1, count 4}
As it is the maximum of 'value' where count is greater than 4 but also as there is a value of 3 that has only 3 counts so that the value 4 is not what I'm looking for.
That is a bit of a mouthful to say the least but I'll have another crack at explaining it:
You want:
For each "Position" value find the document whose "value" is less than the the largest "value" of the document with a "count" of less than four, whose own "count" is actually greater than 4.
Which reads like a math exam problem designed to confuse you with the logic. But catching that meaning then you perform the aggregation with the following steps:
db.positions.aggregate([
// Separate the values greater than and less than 4 by "Position"
{ "$group": {
"_id": "$Position",
"high": { "$push": {
"$cond": [
{ "$gt": ["$count", 4] },
{ "value": "$value", "count": "$count" },
null
]
}},
"low": { "$push": {
"$cond": [
{ "$lt": ["$count", 4] },
{ "value": "$value", "count": "$count" },
null
]
}}
}},
// Unwind the "low" counts array
{ "$unwind": "$low" },
// Find the "$max" value from the low counts
{ "$group": {
"_id": "$_id",
"high": { "$first": "$high" },
"low": { "$min": "$low.value" }
}},
// Unwind the "high" counts array
{ "$unwind": "$high" },
// Compare the value to the "low" value to see if it is less than
{ "$project": {
"high": 1,
"lower": { "$lt": [ "$high.value", "$low" ] }
}},
// Sorting, $max won't work over multiple values. Want the document.
{ "$sort": { "lower": -1, "high.value": -1 } },
// Group, get the highest order document which was on top
{ "$group": {
"_id": "$_id",
"value": { "$first": "$high.value" },
"count": { "$first": "$high.count" }
}}
])
So from the set of documents:
{ "Position" : 10, "value" : 1, "count" : 5 }
{ "Position" : 10, "value" : 3, "count" : 3 }
{ "Position" : 10, "value" : 4, "count" : 5 }
{ "Position" : 10, "value" : 7, "count" : 4 }
Only the first is returned in this case as it's value is less than the "count of three" document where it's own count is greater than 4.
{ "_id" : 10, "value" : 1, "count" : 5 }
Which I am sure is what you actually meant.
So the application of $min and $max really only applies when getting discrete values from documents out of a grouping range. If you are interested in more than one value from the document or indeed the whole document, then you are sorting and getting the $first or $last entries on the grouping boundary.
And aggregate is much faster than mapReduce as it uses native code without invoking a JavaScript interpreter.

Mongo: how to sort by external weight

Following this question which #NeilLunn has gracefully answered, here is my problem in more detail.
This is the set of documents, some have user_id some don't. The user_id represent the user who created the document:
{ "user_id" : 11, "content" : "black", "date": somedate }
{ "user_id" : 6, "content" : "blue", "date": somedate }
{ "user_id" : 3, "content" : "red", "date": somedate }
{ "user_id" : 4, "content" : "black", "date": somedate }
{ "user_id" : 4, "content" : "blue", "date": somedate }
{ "user_id" : 90, "content" : "red", "date": somedate }
{ "user_id" : 7, "content" : "orange", "date": somedate }
{ "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
...
{ "user_id" : 4, "content" : "orange", "date": somedate }
{ "user_id" : 1, "content" : "orange", "date": somedate }
{ "content" : "red", "date": somedate }
{ "user_id" : 90, "content" : "purple", "date": somedate }
The front end is pulling pages, so each page will have 10 items and I do that with limit and skip and it is working very well.
In case we have a logged in user, I would like to display to that current logged in user documents which he may find more interesting first, based on the users he interacted with.
The list of users which the current user may find interesting is sorted by score and is located outside of mongo. So the first element is the most important user which I would like to show his documents first, and the last user on the list is the least important.
The list is a simple array which looks like this:
[4,7,90,1].
The system which created this user score is not located within mongo, but I can copy the data if that will help. I can also change the array to include a score number.
What I would like accomplish is the following:
Get the documents sorted by importance of the user_id from the list, so that documents from user_id 4 will be the first to show up, documents from user_id 7 second and so on. When where are no users left on the list I would like to show the rest of the documents. Like this:
all documents with user_d:4
all documents with user_d:7
all documents with user_d:90
all documents with user_d:1
all the rest of the documents
How should I accomplish this? Am I asking too much from mongo?
Given the array [4,7,90,1] what you want in your query is this:
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$or": [
{ "$eq": ["$user_id": 4] },
{ "$eq": ["$user_id": 7] },
{ "$eq": ["$user_id": 90] },
{ "$eq": ["$user_id": 1] },
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
So what that does is, for every item contained in that $or condition, the user_id field is tested against the supplied value, and $eq returns 1 or 0 for true or false.
What you do in your code is for each item you have in the array you build the array condition of $or. So it's just creating a hash structure for each equals condition, passing it to an array and plugging that in as the array value for the $or condition.
I probably should have left the $cond operator out of the previous code so this part would have been clearer.
Here's some code for the Ruby Brain:
userList = [4, 7, 90, 1];
orCond = [];
userList.each do |userId|
orCond.push({ '$eq' => [ 'user_id', userId ] })
end
pipeline = [
{ '$project' => {
'user_id' => 1,
'content' => 1,
'date' => 1,
'weight' => { '$or' => orCond }
}},
{ '$sort' => { 'weight' => -1, 'date' => -1 } }
]
If you want to have individual weights and we'll assume key value pairs, then you need to nest with $cond :
db.collection.aggregate([
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$cond": [
{ "$eq": ["$user_id": 4] },
10,
{ "$cond": [
{ "$eq": ["$user_id": 7] },
9,
{ "$cond": [
{ "$eq": ["$user_id": 90] },
7,
{ "$cond": [
{ "$eq": ["$user_id": 1] },
8,
0
]}
]}
]}
]}
}},
{ "$sort": { "weight": -1, "date": -1 } }
])
Note that it's just a return value, these do not need to be in order. And you can think about the generation of that.
For generating this structure see here:
https://stackoverflow.com/a/22213246/2313887
Since mongoDB version 3.2 we can use a $filter which make this much easier to maintain in case there are more than 4 scores:
db.collection.aggregate([
{
$addFields: {
weight: [
{key: 4, score: 10}, {key: 8, score: 9}, {key: 90, score: 8}, {key: 1, score: 7}
]
}
},
{
$addFields: {
weight: {
$filter: {
input: "$weight",
as: "item",
cond: {$eq: ["$$item.key", "$user_id"]}
}
}
}
},
{
$set: {
weight: {
$cond: [{$eq: [{$size: "$weight"}, 1]}, {$arrayElemAt: ["$weight", 0]}, {score: 1}]
}
}
},
{$set: {weight: "$weight.score"}},
{$sort: {weight: -1, date: -1}}
])
See how it works on the playground example