Here is the sample data that I am working on:
{
"_id": 1,
"user": A,
"nums":[1,2,3,4]
}
{
"_id": 2,
"user": B,
"nums":[1,2,4]
}
{
"_id": 3,
"user": B,
"nums":[4,5,7]
}
What I am trying to get is the number of logs for each user and the distinct "nums" list for each user. So the result is something like this:
[
{
"user": A,
"total": 1,
"nums" : [1,2,3,4]
},
{
"user": B,
"total": 2,
"nums" : [1,2,4,5,7]
}
]
Is that possible to achieve in one aggregate query? I am now using two.
db.test.aggregate([{ $group: { _id:"$user", total:{$sum:1}}}])
db.test.aggregate([{$unwind:"$nums"}, { $group: { _id:"$user", nums:{$addToSet:"$nums"}}}])
Also, should one query be faster than two separate queries on large data set or I should just stay with two queries?
You can do this by assembling a list of the original _id values from the docs in the $group after the $unwind to provide a way to get the total count in a final $project:
db.test.aggregate([
{$unwind: '$nums'},
{$group: {
_id: '$user',
ids: {$addToSet: '$_id'},
nums: {$addToSet: '$nums'}
}},
{$project: {
_id: 0,
user: '$_id',
total: {$size: '$ids'},
nums: 1
}}
])
Result:
[
{
"nums": [
7,
5,
4,
2,
1
],
"user": "B",
"total": 2
},
{
"nums": [
4,
3,
2,
1
],
"user": "A",
"total": 1
}
]
I would expect that doing it all in one aggregate pipeline instead of two will be faster, but it's always best to test it in your own environment to be sure.
Related
1st collection
stocks = [
{"userId" : 1, "groupId": 1, "stockId": 1},
{"userId": 2, "groupId": 1, "stockId": 2},
{"userId": 3, "groupId": 4, "stockId": 3}
]
2nd collection:
items = [
{"userid": 1, "groupId": 1, "itemId": 1},
{"userid": 1, "groupId": 3, "itemId": 2},
{"userid": 1, "groupId": 4, "itemId": 3}
]
I have a collection user, from which i get the userid, here i have filtered to get userid as 1, I have tried the below lookup, i am getting all data for userid, but when I add condition to exclude group, its not working. can someone help or suggest where i am doing wrong?
{
from: "stocks",
localField: "user.id",
foreignField: "userId",
let: {group_id: "$groupId", user_id: "$userId" },
pipeline: [{ "$unionWith": { coll: "items", pipeline: [{$match: {$userid: "$$user_id", "$groupId": { $nin: ["$$group_id"]}}}]}}],
as: "stock_items"
}
I need list of data where userId and groupId should not be same, i need all the data from stocks and items excluding item[0], since both have same user id and group id.
I'm not entirely sure that I understand what you are asking for. I'm also confused by the sample aggregation that has been provided because fields like "user.id" doesn't exist in the sample documents. But at the end you specifically mention:
i need all the data from stocks and items excluding item[0], since both have same user id and group id
Based on that, this answer assumes that you are looking to find all documents in both collections where the value of the groupId field is different than the value of the userId field. If that is correct, then the following query should work:
db.stocks.aggregate([
{
$match: {
$expr: {
$ne: [
"$groupId",
"$userId"
]
}
}
},
{
"$unionWith": {
"coll": "items",
"pipeline": [
{
$match: {
$expr: {
$ne: [
"$groupId",
"$userId"
]
}
}
}
]
}
}
])
Playground demonstration here.
The way this operates is by using the $ne aggregation operator to compare the two fields in the document. We need to use the $expr operator to do this comparison as shown here in the documentation.
Will try to keep this concise with the input, result and desired/expected result. Need to find the minimum, maximum number of rows/records between the same "winCode" and the last time it occurred in the ordered data. So it makes me want to first group them by "winCode" which works perfectly, but I am not able to come up with something that would display how many records it took for the same "winCode" to appear last time, the minimum and maximum. Check desired output for more details. Below is the paste from: https://mongoplayground.net/p/bCzTO8ZLxNi
Input/collection
[
{
code: "1",
results: {
winCode: 3
}
},
{
code: "10",
results: {
winCode: 3
}
},
{
code: "8",
results: {
winCode: 2
}
},
{
code: "5",
results: {
winCode: 5
}
},
{
code: "5",
results: {
winCode: 4
}
},
{
code: "6",
results: {
winCode: 4
}
},
{
code: "7",
results: {
winCode: 5
}
},
{
code: "3",
results: {
winCode: 3
}
},
{
code: "9",
results: {
winCode: 2
}
},
{
code: "2",
results: {
winCode: 2
}
}
]
Current query
db.collection.aggregate([
{
$sort: {
code: -1
}
},
{
$group: {
_id: "$results.winCode",
count: {
$sum: 1
},
lastTimeOccurredCode: {
$first: "$code" // Any way to get it to display a count from the start to this point on how many records it went through to get the $first result?
},
}
},
{
$sort: {
_id: -1
}
},
])
Current output
[
{
"_id": 5,
"count": 2,
"lastTimeOccurredCode": "5"
},
{
"_id": 4,
"count": 2,
"lastTimeOccurredCode": "5"
},
{
"_id": 3,
"count": 3,
"lastTimeOccurredCode": "1"
},
{
"_id": 2,
"count": 3,
"lastTimeOccurredCode": "2"
}
]
Desired output
[
{
"_id": 5,
"count": 2,
"lastTimeOccurredRecordsCount": 4,
"minRecordsBetween": 3,
"maxRecordsBetween": 3
},
{
"_id": 4,
"count": 2,
"lastTimeOccurredRecordsCount": 5,
"minRecordsBetween": 1,
"maxRecordsBetween": 1
},
{
"_id": 3,
"count": 3,
"lastTimeOccurredRecordsCount": 1,
"minRecordsBetween": 1,
"maxRecordsBetween": 6
},
{
"_id": 2,
"count": 3,
"lastTimeOccurredRecordsCount": 3,
"minRecordsBetween": 1,
"maxRecordsBetween": 6
}
]
I have tried to add an $accumulator function, but I would need the $first functions result in it, but it's not available at the same $group stage. Feel like I am missing something here.
You can use $setWindowFields to define index and reduce to find the diff between them. If you want the index to be according to {$sort: {code: -1}}, then keep the $setWindowFields sortBy according to this example and remove the redundant {$sort: {code: -1}} step. If you want the index to be according to another sorting logic that only update the $setWindowFields sortBy.
Use $setWindowFields to define index
$sort according to your what you need (if it is different than the prev sort)
$group according to the $results.winCode and keep all index data.
Calculate the diff
Format
db.collection.aggregate([
{$setWindowFields: {
sortBy: {code: -1},
output: {index: {$sum: 1, window: {documents: ["unbounded", "current"]}}}
}},
{$sort: {code: -1}},
{$group: {
_id: "$results.winCode",
count: {$sum: 1},
lastTimeOccurredCode: {$first: "$code"},
index: {$push: "$index"}
}},
{$project: {
count: 1,
lastTimeOccurredCode: 1,
diff: {
$reduce: {
input: {$range: [1, {$size: "$index"}]},
initialValue: [],
in: {$concatArrays: [
"$$value",
[{$subtract: [
{$arrayElemAt: ["$index", "$$this"]},
{$arrayElemAt: ["$index", {$subtract: ["$$this", 1]}]}
]}]
]
}
}
}
}},
{$set: {
minRecordsBetween: {$min: "$diff"},
maxRecordsBetween: {$max: "$diff"},
diff: "$$REMOVE"
}},
{$sort: {_id: -1}}
])
See how it works on the playground example
I have a collection that looks like this:
{
"id": "id1",
"tags": ['a', 'b']
},
{
"id": "id2",
"tags": ['b', 'c']
},
{
"id": "id3",
"tags": ['a', 'c']
}
How can I make a query that groups by every element in the "tags" array, so the result looks like this?:
{'a': 2},
{'b': 2},
{'c': 2}
(where 2 is the number of times it appears, the count).
Thanks for your help!
You can use this aggregation query:
First $unwind the array to deconstruct an access like objects.
Then $group by tags and $sum 1 to get the total.
And last use $replaceRoot with $arrayToObject to get the desired output.
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$group": {
"_id": "$tags",
"count": {
"$sum": 1
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$count"
}
]
]
}
}
}
])
Example here
As an adittion, if you want to get sorted values (a, b, c...) you can add $sort stage like this example
I have a collection that looks like this:
{
"id": "id1",
"tags": ['a', 'b']
},
{
"id": "id2",
"tags": ['b', 'c']
},
{
"id": "id3",
"tags": ['a', 'c']
}
How can I make a query that groups by every element in the "tags" array, so the result looks like this?:
{'a': 2},
{'b': 2},
{'c': 2}
(where 2 is the number of times it appears, the count).
Thanks for your help!
You can use this aggregation query:
First $unwind the array to deconstruct an access like objects.
Then $group by tags and $sum 1 to get the total.
And last use $replaceRoot with $arrayToObject to get the desired output.
db.collection.aggregate([
{
"$unwind": "$tags"
},
{
"$group": {
"_id": "$tags",
"count": {
"$sum": 1
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$count"
}
]
]
}
}
}
])
Example here
As an adittion, if you want to get sorted values (a, b, c...) you can add $sort stage like this example
I have the following schema in my taxon collection :
{
"_id": 1,
"na": [ "root_1",
"root_2",
"root_3" ],
"pa": 1
},{
"_id": 2,
"na": [ "name_1",
"name_2",
"name_3"],
"pa": 1
},{
"_id": 4,
"na": [ "otherName_1",
"otherName_2",
"otherName_3"],
"pa": 2
}
Each document is related to another by the parent field, which correspond to the _id of it's parent.
I would like to perform a recursive search to get the following result:
{ "_id": 4,
"nameList": [ "otherName_1",
"name_1",
"root_1"]
}
From document with a certain _id, get the first item of na array of each parent until document with _id: 1 is reached
I currently get this result by performing X queries (one by parent document, here 3 for example), but I'm pretty sure that this can be achieved using a single query. I already looked at the new $graphLookup operator, but couldn't manage to get my way with it...
Is it possible to achieve this in a single query using MongoDB 3.4.1?
Edit
I would run this for 50 documents each time, so the optimal solution would be to combine everything in a single query
for example, it would looks like
var listId = [ 4, 128, 553, 2728, ...];
var cursor = db.taxon.aggregate([
{$match:
{ _id: {$in: listId}}
}, ...
)];
and would output :
[{ "_id": 4,
"nameList": [ "otherName_1",
"name_1",
"root_1"]
}, { "_id": 128,
"nameList": [ "some_other_ame_1",
"some_name_1",
"root_1"]
}, { "_id": 553,
"nameList": [ "last_other_ame_1",
"last_name_1",
"root_1"]
} ... ]
try it online: mongoplayground.net/p/Gfp-L03Ub0Y
You can try below aggregation.
Stages $match - $graphLookup - $project.
$reduce to pick the first element from the each of $graphLookup nameList's na array.
db.taxon.aggregate([{
$match: {
_id: {
$in: listId
}
}
}, {
$graphLookup: {
from: "taxon",
startWith: "$_id",
connectFromField: "pa",
connectToField: "_id",
as: "nameList"
}
}, {
$project: {
nameList: {
$reduce: {
input: "$nameList",
initialValue: [],
in: {
"$concatArrays": ["$$value", {
$slice: ["$$this.na", 1]
}]
}
}
}
}
}])