Mongo DB Grouping Queries - mongodb

I want to be able to retrieve a list of documents matching an id and trace the results back to a specific key.
Example:
// I have these objects:
[{_id: 1, group: 'a'}, {_id: 2, group: 'b'}, {_id: 1, group: 'c'}]
// and these database documents:
{_id: 1}, {_id:2}
I want to create a query that would return essentially {_id: 1}, {_id: 2}, {_id: 1}, or any other result where I can map back the find result to the 'group' property of the original objects.
The problem I run into is that if the list of id contains duplicates or keys not found in the target collection, I lose ordering and can't map the query result back to my objects.
I thought $group could achieve this, but I haven't been able to achieve this so far.

To get a list of all documents matching an id, wouldn't a simple find work?
db.collection.find({"_id":1})

Related

MongoDB Query generating same result set for multiple pages

Sample Data in mongodb collection attached as image.
Query with sort and limit.
Same Data Getting populated in multiple result set.
Page 1 ->Query
db.user_profile.find({},{ email: 1}).skip(0).limit(3).sort( {isEnabled:-1,firstName:1} )
Page 2 ->Query
db.user_profile.find({},{ email: 1}).skip(3).limit(3).sort( {isEnabled:-1,firstName:1} )
Page 3 ->Query
db.user_profile.find({},{ email: 1}).skip(6).limit(3).sort( {isEnabled:-1,firstName:1} )
You Need to project what you are sorting. One way to do it is:
db.user_profile.find({},{ email: 1, isEnabled: 1, firstName:1}).skip(3).limit(3).sort( {isEnabled:-1,firstName:1} )
Notice the projection of isEnabled and firstName that were missing before. This is a simple query but will result in adding two more fields to the output.
If you want to return only the email, you can use an aggregation pipeline that will remove the projection on a latter stage:
db.user_profile.aggregate([
{
$sort: {isEnabled: -1, firstName: 1}
},
{$skip: 0},
{$limit: 3},
{$project: {email: 1}} // or {$project: {email: 1, _id: 0}}
])
You can see the aggregation example working on the playground

MongoDB - Safely sort inner array after group

I'm trying to look up all records that match a certain condition, in this case _id being certain values, and then return only the top 2 results, sorted by the name field.
This is what I have
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {fk: 1, name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$project: {items: {$slice: ["$items", 2]} }}
])
and it works, BUT, it's not guaranteed. According to this Mongo thread $group does not guarantee document order.
This would also mean that all of the suggested solutions here and elsewhere, which recommend using $unwind, followed by $sort, and then $group, would also not work, for the same reason.
What is the best way to accomplish this with Mongo (any version)? I've seen suggestions that this could be accomplished in the $project phase, but I'm not quite sure how.
You are correct in saying that the result of $group is never sorted.
$group does not order its output documents.
Hence doing a;
{$sort: {fk: 1}}
then grouping with
{$group: {_id: "$fk", ... }},
will be a wasted effort.
But there is a silver lining with sorting before $group stage with name: -1. Since you are using $push (not an $addToSet), inserted objects will retain the order they've had in the newly created items array in the $group result. You can see this behaviour here (copy of your pipeline)
The items array will always have;
"items": [
{
..
"name": "Michael"
},
{
..
"name": "George"
}
]
in same order, therefore your nested array sort is a non-issue! Though I am unable to find an exact quote in documentation to confirm this behaviour, you can check;
this,
or this where it is confirmed.
Also, accumulator operator list for $group, where $addToSet has "Order of the array elements is undefined." in its description, whereas the similar operator $push does not, which might be an indirect evidence? :)
Just a simple modification of your pipeline where you move the fk: 1 sort from pre-$group stage to post-$group stage;
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$sort: {_id: 1}},
{$project: {items: {$slice: ["$items", 2]} }}
])
should be sufficient to have the main result array order fixed as well. Check it on mongoplayground
$group doesn't guarantee the document order but it would keep the grouped documents in the sorted order for each bucket. So in your case even though the documents after $group stage are not sorted by fk but each group (items) would be sorted by name descending. If you would like to keep the documents sorted by fk you could just add the {$sort:{fk:1}} after $group stage
You could also sort by order of values passed in your match query should you need by adding a extra field for each document. Something like
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$addField:{ifk:{$indexOfArray:[[1, 2],"$fk"]}}},
{$sort: {ifk: 1, name: -1}},
{$group: {_id: "$ifk", items: {$push: "$$ROOT"}}},
{$sort: {_id : 1}},
{$project: {items: {$slice: ["$items", 2]}}}
])
Update to allow array sort without group operator : I've found the jira which is going to allow sort array.
You could try below $project stage to sort the array.There maybe various way to do it. This should sort names descending.Working but a slower solution.
{"$project":{"items":{"$reduce":{
"input":"$items",
"initialValue":[],
"in":{"$let":{
"vars":{"othis":"$$this","ovalue":"$$value"},
"in":{"$let":{
"vars":{
//return index as 0 when comparing the first value with initial value (empty) or else return the index of value from the accumlator array which is closest and less than the current value.
"index":{"$cond":{
"if":{"$eq":["$$ovalue",[]]},
"then":0,
"else":{"$reduce":{
"input":"$$ovalue",
"initialValue":0,
"in":{"$cond":{
"if":{"$lt":["$$othis.name","$$this.name"]},
"then":{"$add":["$$value",1]},
"else":"$$value"}}}}
}}
},
//insert the current value at the found index
"in":{"$concatArrays":[
{"$slice":["$$ovalue","$$index"]},
["$$othis"],
{"$slice":["$$ovalue",{"$subtract":["$$index",{"$size":"$$ovalue"}]}]}]}
}}}}
}}}}
Simple example with demonstration how each iteration works
db.b.insert({"items":[2,5,4,7,6,3]});
othis ovalue index concat arrays (parts with counts) return value
2 [] 0 [],0 [2] [],0 [2]
5 [2] 0 [],0 [5] [2],-1 [5,2]
4 [5,2] 1 [5],1 [4] [2],-1 [5,4,2]
7 [5,4,2] 0 [],0 [7] [5,4,2],-3 [7,5,4,2]
6 [7,5,4,2] 1 [7],1 [6] [5,4,2],-3 [7,6,5,4,2]
3 [7,6,5,4,2] 4 [7,6,5,4],4 [3] [2],-1 [7,6,5,4,3,2]
Reference - Sorting Array with JavaScript reduce function
There is a bit of a red herring in the question as $group does guarantee that it will be processing incoming documents in order (and that's why you have to sort of them before $group to get an ordered arrays) but there is an issue with the way you propose doing it, since pushing all the documents into a single grouping is (a) inefficient and (b) could potentially exceed maximum document size.
Since you only want top two, for each of the unique fk values, the most efficient way to accomplish it is via a "subquery" using $lookup like this:
db.coll.aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$group:{_id:"$fk"}},
{$sort: {_id: 1}},
{$lookup:{
from:"coll",
as:"items",
let:{fk:"$_id"},
pipeline:[
{$match:{$expr:{$eq:["$fk","$$fk"]}}},
{$sort:{name:-1}},
{$limit:2},
{$project:{_id:0, fk:1, name:1}}
]
}}
])
Assuming you have an index on {fk:1, name:-1} as you must to get efficient sort in your proposed code, the first two stages here will use that index via DISTINCT_SCAN plan which is very efficient, and for each of them, $lookup will use that same index to filter by single value of fk and return results already sorted and limited to first two. This will be the most efficient way to do this at least until https://jira.mongodb.org/browse/SERVER-9377 is implemented by the server.

MongoDB aggregation: Get samples at specific intervals

I have a MongoDB collection containing timestamped documents. The important part of their shape is:
{
receivedOn: {
date: ISODate("2018-10-01T07:50:06.836Z")
}
}
They are indexed on the date.
These documents relate to and contain data from UDPs constantly arriving at a server. The rate of the UDPs vary, but there are usually around 20 per second
I'm trying to take samples from this collection. I have a list of timestamps, and I want to get the documents closest to these timestamps in the past.
For example, if I have the following documents
{_id: 1, "receivedOn.date": ISODate("2018-10-01T00:00:00.000Z")}
{_id: 2, "receivedOn.date": ISODate("2018-10-01T00:00:02.000Z")}
{_id: 3, "receivedOn.date": ISODate("2018-10-01T00:00:04.673Z")}
{_id: 4, "receivedOn.date": ISODate("2018-10-01T00:00:05.001Z")}
{_id: 5, "receivedOn.date": ISODate("2018-10-01T00:00:09.012Z")}
{_id: 6, "receivedOn.date": ISODate("2018-10-01T00:00:10.065Z")}
and the timestamps
new Date("2018-10-01T00:00:05.000Z")
new Date("2018-10-01T00:00:10.000Z")
I want the result to be
[
{_id: 3, "receivedOn.date": ISODate("2018-10-01T00:00:04.673Z")},
{_id: 5, "receivedOn.date": ISODate("2018-10-01T00:00:09.012Z")}
]
Using aggregation, I made this work. The following code gives the correct result, but it is slow and appears to have complexity O(n*m), where n is number of matched documents and m is number of timestamps
const timestamps = [
new Date("2018-10-01T00:00:00.000Z")
new Date("2018-10-01T00:00:05.000Z")
new Date("2018-10-01T00:00:10.000Z")
];
collection.aggregate([
{$match: {
$and: [
{"receivedOn.date": {$lte: new Date("2018-10-01T00:00:10.000Z")}},
{"receivedOn.date": {$gte: new Date("2018-10-01T00:00:00.000Z")}}
]},
{$project: ...},
{$sort: {"receivedOn.date": -1}},
{$bucket: {
groupBy: "$receivedOn.date",
boundaries: timestamps,
output: {
docs: {$push: "$$CURRENT"}
}
}},
// The buckets contain sorted arrays. The first element is the newest
{$project: {
doc: {
$arrayElemAt: ["$docs", 0]
}
}},
// Lift the document out of its bucket wrapper
{$replaceRoot: {newRoot: "$doc"}}
]);
Is there a way to make this faster? Like somehow telling $bucket that the data is sorted? I assume what is taking most time here is $bucket trying to figure out which bucket to put the document in. Or is there another, better way to do this?
I've also tried running one findOne query per timestamp in parallel. That also gives the correct result, and is much faster, but having a few thousand timestamps is not uncommon. I don't want to do thousands of queries each time I need to do this.

mongo find limit each match

I have a mongo collection which looks something like this:
{
title: String,
category: String
}
I want to write a query that selects various categories, similar to this:
Collection.find({category: {$in: ['Books', 'Cars', 'People']});
But I want to only select a limited number of each category, for example 5 of each book, car, people. How do I write such a query? Can I do it one query or must I use multiple ones?
You can do it using mongodb aggregation. Take a look at this pipeline:
Filter all documents by categories(using $match).
Group data by categories and create array for items with the same category(using $group and $push).
Get a subset of each array with a limited maximum length(using $project and $slice).
Try the following query:
db.collection.aggregate([
{$match: {category: {$in: ['Books', 'Cars', 'People']}}},
{$group: {_id: "$category", titles: {$push: "$title"}}},
{$project: {titles: {$slice: ["$titles", 5]}}}
])

With MongoDB, how can transform mixed single/array value into array before $unwind?

Assuming I have the following rows in a collection
{id: 1, value: 1},
{id: 2, value: [2, 3]},
{id: 3, value: 2}
if I want to do aggregation on the whole collection on the field 'value', how can I guarantee the 'value' field is always an array before sending to the $unwind operation in the pipeline? (without rebuilding the whole collection for sure)
Please ask for clarification if needed