Aggregating Temporal Data in a Collection of Nested Documents - mongodb

I'm a beginner with using MongoDB and I'm having trouble trying to write two aggregate queries performing the operations I need. I have a collection of documents formatted as follows (provided below is some made up data):
{'_id': 1234,
'name': 'name',
'difficulty': 2345,
'chart': [{'time': 0.0, 'step': '1010'},
{'time': 0.115, 'step': '0101'},
{'time': 0.346, 'step': '0011'},
{'time': 0.404, 'step': '1000'},
{'time': 0.462, 'step': '0100'},
{'time': 0.521, 'step': '0001'},
{'time': 0.579, 'step': '0010'},
{'time': 0.618, 'step': '1000'},
{'time': 0.657, 'step': '0001'},
{'time': 0.696, 'step': '0110'},
{'time': 0.813, 'step': '1101'},
{'time': 0.929, 'step': '1010'},
{'time': 0.968, 'step': '0100'},
{'time': 1.007, 'step': '0001'},
{'time': 1.046, 'step': '0110'},
{'time': 1.104, 'step': '1000'},
{'time': 1.163, 'step': '0001'},
{'time': 1.221, 'step': '0100'},
{'time': 1.28, 'step': '1011'},
{'time': 1.513, 'step': '0001'},
{'time': 1.571, 'step': '1000'},
{'time': 1.63, 'step': '0110'},
{'time': 1.688, 'step': '0001'},
{'time': 1.746, 'step': '1010'},
{'time': 1.863, 'step': '1010'},
{'time': 1.98, 'step': '1010'},
{'time': 2.097, 'step': '1010'},
{'time': 2.213, 'step': '0111'},
{'time': 2.447, 'step': '0110'},
{'time': 2.486, 'step': '0001'},
{'time': 2.525, 'step': '1000'},
{'time': 2.593, 'step': '0100'},
{'time': 2.641, 'step': '0001'},
{'time': 2.68, 'step': '1010'},
{'time': 2.797, 'step': '1011'},
{'time': 2.914, 'step': '1011'},
{'time': 3.03, 'step': '1011'},
{'time': 3.147, 'step': '0100'},
{'time': 3.206, 'step': '1000'},
{'time': 3.264, 'step': '0010'},
{'time': 3.322, 'step': '0001'},
{'time': 3.381, 'step': '1000'},
{'time': 3.42, 'step': '0010'},
{'time': 3.458, 'step': '0001'},
{'time': 3.497, 'step': '0100'},
{'time': 3.536, 'step': '1000'},
{'time': 3.575, 'step': '0001'},
{'time': 3.614, 'step': '0010'},
{'time': 3.673, 'step': '0100'},
{'time': 3.731, 'step': '1000'},
{'time': 3.789, 'step': '0010'},
{'time': 3.848, 'step': '0101'},
{'time': 3.906, 'step': '1000'},
{'time': 3.964, 'step': '0010'},
{'time': 4.018, 'step': '0100'},
{'time': 4.062, 'step': '1011'},
{'time': 4.296, 'step': '1100'},
{'time': 4.412, 'step': '1010'},
{'time': 4.471, 'step': '0101'},
{'time': 4.529, 'step': '1010'},
{'time': 4.646, 'step': '0001'},
{'time': 4.675, 'step': '0010'},
{'time': 4.704, 'step': '0100'},
{'time': 4.762, 'step': '1000'},
{'time': 4.792, 'step': '0010'},
{'time': 4.821, 'step': '0100'},
{'time': 4.879, 'step': '0001'},
{'time': 4.908, 'step': '0010'},
{'time': 4.938, 'step': '0100'},
{'time': 4.996, 'step': '1001'},
{'time': 5.229, 'step': '0100'},
{'time': 5.268, 'step': '0010'},
{'time': 5.307, 'step': '0001'},
{'time': 5.346, 'step': '1000'},
{'time': 5.385, 'step': '0100'},
{'time': 5.424, 'step': '0010'},
{'time': 5.463, 'step': '1001'},
{'time': 5.58, 'step': '0010'},
{'time': 5.696, 'step': '0100'},
{'time': 5.755, 'step': '0010'},
{'time': 5.813, 'step': '1001'},
{'time': 5.871, 'step': '0010'},
{'time': 5.93, 'step': '1100'},
{'time': 6.046, 'step': '1110'},
{'time': 6.267, 'step': '1101'},
{'time': 6.515, 'step': '1011'},
{'time': 6.737, 'step': '0111'},
{'time': 6.854, 'step': '1111'},
{'time': 7.087, 'step': '1000'},
{'time': 7.116, 'step': '0001'},
{'time': 7.145, 'step': '0100'},
{'time': 7.175, 'step': '0010'},
{'time': 7.204, 'step': '1000'},
{'time': 7.233, 'step': '0001'},
{'time': 7.262, 'step': '0100'},
{'time': 7.291, 'step': '0010'},
{'time': 7.32, 'step': '1000'}],
'index': 3456}
For the first aggregate query, my goal is to build out an output formatted as follows:
[0.404, 0.346, 0.347, 0.231, ...] # length = 144 (number of 1's in chart)
Explanation: Going into chart, we look at all "1" in step (from left to right), identify the next time where the "1" appears in the same index, and compute the difference of the two times.
First example: at time 0.0, '1' appears in the first index. The next time "1" appears as the first index is at time 0.404, so the first element of the expected result is computed as 0.404 - 0.
Second example: at time 0.115, '1' appears in the second index. The next time "1" appears as the second index is at time 0.462, so the third element of the expected result is computed as 0.462 - 0.115.
For the second aggregate query, using:
partition = [-0.117, -0.083, -0.050, 0.050, 0.118]
weights = [0.1, 0.5, 1, 0.5]
my goal is to build out an output formatted as follows:
[3, 3, 2.2, 2.2, ...] # length = 144 (number of 1's in chart)
Explanation: Going into chart, we look at all "1" in step (from left to right), put partition on the corresponding time to see which elements to consider in the computation. Count the number of "1" in the filtered result using weights.
First example: at time 0.0, '1' appears in the first index. At time 0.0, we apply the partition as a mask so that all steps between times -0.117 and -0.083 will receive +0.1 points, -0.083 to -0.050 will receive +0.5 points, -0.050 to 0.050 will receive +1 points, and 0.050 to 0.118 will receive +0.5 points. In this case: 1+ 1 + 0.5 + 0.5 = 3.
Second example: at time 0.115, '1' appears in the second index. At time 0.115, we apply the partition as a mask so that all steps between times -0.002 and 0.032 will receive +0.1 points, 0.032 to 0.065 will receive +0.5 points, 0.065 to 0.165 will receive +1 points, and 0.165 to 0.233 will receive +0.5 points. In this case: 0.1 + 0.1 + 1 + 1 = 2.2
Can MongoDB perform these advanced queries? I tried using windows but I haven't been successful in doing so. Any help would be greatly appreciated :) Thanks!

Related

Mongodb sort and group by

I'm not sure that my question is correct, but it seems so:
I have a set of rows in my Mongodb, like:
[{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}},
{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]
And I want to sort them by date and combine because for the same date there are multiple rows from different countries.
So the result should look something like ...
[{'2018-07-16T00:00:00.000Z': [{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}}]},
{'2018-10-30T00:00:00.000Z': [{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]}]
I tried:
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {'ccode': 1}}, # ccode or date?
{'$sort': {"date": 1}},
])
But got an error
The field * must be an accumulator object
I googled the error, it's pretty clear, but not seems that related to my case. I don't need any sum, avg, etc functions
Query
sort by date (asceding here, if you need descending put -1)
group by date and collect the ROOT documents
replace the root so you have the date as key
*this assumes you have dates on strings, which is bad idea, if you convert them to date objects, you can still use the query but add
"k":{"$dateToString" : {"date" :"$_id"}}
Test code here
aggregate(
[{"$sort":{"date":1}},
{"$group":{"_id":"$date", "docs":{"$push":"$$ROOT"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$docs"}]]}}}])
When using $group, you need an _id
From the docs
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
In your case...
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {
'_id': "$ccode",
'rates': { $addToSet: '$rates' },
'date': { $first: '$date' }
}},
{'$sort': {"date": 1}},
{'$project: { "_id": 0, "country": "$_id", "rates": 1, "date": 1 }}
])
Playground: https://mongoplayground.net/p/B31XLS9p-6W

Group by name, then select one document of each name with highest arbitrary field value

Let's say we have a collection containing the following documents:
[
{'_id': ..., 'name': 'Type A', 'version': 1, ...},
{'_id': ..., 'name': 'Type B', 'version': 1, ...},
{'_id': ..., 'name': 'Type B', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 3, ...},
{'_id': ..., 'name': 'Type C', 'version': 1, ...},
{'_id': ..., 'name': 'Type C', 'version': 2, ...},
{'_id': ..., 'name': 'Type A', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 4, ...},
{'_id': ..., 'name': 'Type A', 'version': 3, ...},
{'_id': ..., 'name': 'Type B', 'version': 5, ...},
]
I want to return a list containing the documents with the highest version for their respective name, such that the return would look like this, essentially returning the $$ROOT for each distinct name with the highest version:
[
{'_id': ..., 'name': 'Type A', 'version': 3, ...},
{'_id': ..., 'name': 'Type C', 'version': 2, ...},
{'_id': ..., 'name': 'Type B', 'version': 5, ...},
]
I know that I need to use the aggregation pipeline, using group sort and limit, but I can't seem to get what I'm trying to achieve.
$sort by version in descending order
$group by name and get first root document from grouped
(optional) $replaceRoot to replace root object to root
pipeline = [
{ $sort: { version: -1 } },
{
$group: {
_id: "$name",
root: { $first: "$$ROOT" }
}
},
{ $replaceRoot: { newRoot: "$root" } }
]
result = db.collection.aggregate(pipeline)
Playground

How to do mongodb inner join with nested array?

Warehouses schema:
{_id: 1, name: 'A'}
{_id: 2, name: 'B'}
{_id: 3, name: 'C'}
Stocks schema:
{_id: 11, productId: 1, instock: [{warehouse: 'A', qty: 20}, {warehouse: 'B', qty: 5}, {warehouse: 'C', qty: 8}]
{_id: 12, productId: 2, instock: [{warehouse: 'A', qty: 30}]
I am new to MongoDB, but will like to have one row per record to show products' available qty in each of A,B,C warehouses:
Desired array output:
instock: [
{_id: 11, productId: 1, warehouse: 'A', qty: 20},
{_id: 11, productId: 1, warehouse: 'B', qty: 5},
{_id: 11, productId: 1, warehouse: 'C', qty: 8},
{_id: 12, productId: 2, warehouse: 'A', qty: 30},
{_id: 12, productId: 2, warehouse: 'B', qty: 0},
{_id: 12, productId: 2, warehouse: 'C', qty: 0}
]
I read about $lookup, $unwind, $project, and tried something like below but no where near to what I want:
Warehouse.aggregate([
{
$lookup:
{
from: "stocks",
pipeline: [
{ $project: { _id: 0, instock: {qty: 1, warehouse: 1} }},
{ $replaceRoot: { newRoot: { newStock : '$instock' } } }
],
as: "instock"
}
} ,
]);
hi, Anothony Winzlet, your advise works partially, for example:
{_id: 12, productId: 2, instock: [{warehouse: 'A', qty: 30}]
From your solution:
Result show only for warehouse A:
[{_id: 12, productId: 2, warehouse: 'A', qty: 30}]
Can I get for warehouse B & C as well? (will default qty to 0 if not defined)
[{_id: 12, productId: 2, warehouse: 'A', qty: 30},
{_id: 12, productId: 2, warehouse: 'B', qty: 0},
{_id: 12, productId: 2, warehouse: 'C', qty: 0}]
Not sure if above is possible to achieve ... thank you
Solution from Anthony Winzlet:
Warehouse.aggregate([
{ "$unwind": "$instock" },
{ "$replaceRoot": { "newRoot": { "$mergeObjects": ["$$ROOT", "$instock"] } }},
{ "$project": { "instock": 0 } }
])

Mongodb: Combine result queries from multiple colllections

Collection-1 'Office-Encounters'
{'_id': '1111', 'type': 'OE', 'pcode': 'P1212', 'rank': 11}
{'_id': '2222', 'type': 'OE', 'pcode': 'P2323', 'rank': 25}
{'_id': '3333', 'type': 'OE', 'pcode': 'P1212', 'rank': 18}
{'_id': '4444', 'type': 'OE', 'pcode': 'P2323', 'rank': 10}
Collection-2 'LabEncounters'
{'_id': '5555', 'type': 'LE', 'pcode': 'P1212', 'rank': 9}
{'_id': '6666', 'type': 'LE', 'pcode': 'P2323', 'rank': 7}
{'_id': '7777', 'type': 'LE', 'pcode': 'P1212', 'rank': 15}
{'_id': '8888', 'type': 'LE', 'pcode': 'P2323', 'rank': 3}
I would like to get all documents where 'pcode' is P2323 and sorted by 'rank'. The final result should look like below:
[
{'_id': '8888', 'type': 'LE', 'pcode': 'P2323', 'rank': 3},
{'_id': '6666', 'type': 'LE', 'pcode': 'P2323', 'rank': 7},
{'_id': '4444', 'type': 'OE', 'pcode': 'P2323', 'rank': 10},
{'_id': '2222', 'type': 'OE', 'pcode': 'P2323', 'rank': 25}
]
What's the best way to get the result like above? Thanks!

mongodb how to apply pagination and total photo count in aggregate query

Following is my collection architecture.
I don't know how apply pagination and get total photo count in aggregate query. I know that it may possible but I tried a lot, still I didn't solve the issue. Guide me for following issue.
If any optimized solution for this, then please guide me.
Photo:
{_id: 1, photo_name: '1.jpg', photo_description: 'description 1', album_id: 1},
{_id: 2, photo_name: '2.jpg', photo_description: 'description 2', album_id: 1},
{_id: 3, photo_name: '3.jpg', photo_description: 'description 3', album_id: 1},
{_id: 4, photo_name: '4.jpg', photo_description: 'description 4', album_id: 2},
{_id: 5, photo_name: '5.jpg', photo_description: 'description 5', album_id: 2},
{_id: 6, photo_name: '6.jpg', photo_description: 'description 6', album_id: 2}
Album:
{_id: 1, album_name: "my album 1", album_description: "album description 1", emoji_id: 1},
{_id: 2, album_name: "my album 2", album_description: "album description 2", emoji_id: 2},
{_id: 3, album_name: "my album 3", album_description: "album description 3", emoji_id: 3},
{_id: 4, album_name: "my album 4", album_description: "album description 4", emoji_id: 4},
{_id: 5, album_name: "my album 5", album_description: "album description 5", emoji_id: 5}
Emoji:
{_id: 1, emoji_name: "1.jpg"},
{_id: 2, emoji_name: "2.jpg"},
{_id: 3, emoji_name: "3.jpg"},
{_id: 4, emoji_name: "4.jpg"},
{_id: 5, emoji_name: "5.jpg"},
{_id: 6, emoji_name: "6.jpg"},
{_id: 7, emoji_name: "7.jpg"},
{_id: 8, emoji_name: "8.jpg"}
Testing record pagination :
2
Expected output:
[
{
album_id: 1,
album_name: "my album 1",
album_emoji: "1.jpg",
total_photos: 3,(total photo counts of all photos of paritcular album)
photos: [
{
{_id: 1, photo_name: '1.jpg', photo_description: 'description 1'},
{_id: 2, photo_name: '2.jpg', photo_description: 'description 2'}
}
]
}
]
Present query:
db
.album
.aggregate([
{
$lookup:{
from:"photo",
localField:"_id",
foreignField:"album_id",
as:"photo"
}
},
{
$lookup:{
from:"emoji",
localField:"album_emoji",
foreignField:"_id",
as:"emoji"
}
},
{
$project:{
album_name:"$album_name",
album_description:"$album_description",
album_emoji:"$emoji.image_name",
photo:"$photo"
}
},
{
$match:{
_id: 1
}
}
])
.toArray();
Present output:
[{
"_id" : 1,
"album_name" : "my album 1",
"album_emoji" : [
"1.png"
],
"photo" : [
{_id: 1, photo_name: '1.jpg', photo_description: 'description 1', album_id: 1},
{_id: 2, photo_name: '2.jpg', photo_description: 'description 2', album_id: 1},
{_id: 3, photo_name: '3.jpg', photo_description: 'description 3', album_id: 1},
]
}]
You may want to check $slice and adjust your $project stage somewhat like this
...
$project:{
album_name:"$album_name",
album_description:"$album_description",
album_emoji:"$emoji.image_name",
photo: { $slice: [ "$photo", 0, 2 ] }
}
...
Just pass different values to the $slice operator for getting different pages