unable to access date value after lookup in mongodb - mongodb

Main Collection
{
"_id": ObjectID("5ea1a07bfd7e4965408a5171"),
"data": [],
"history_id": ObjectID("5e4e755b380054797d9db627"),
"sender_id": ObjectID("5e4e74eb380054797d9db623"),
"text": "Hi tester",
"date": 1587650683434
}
History collection
{
"_id": ObjectID("5ea1afd4f4151402efd234e3"),
"user_id": [
"5e4a8d2d3952132a08ae5764"
],
"dialog_id": ObjectID("5e4e755b380054797d9db627"),
"date": 1587549034211,
"__v": 1
}
const messages = await MainModal.aggregate([
{
$lookup: {
from: 'history',
localField: 'history_id',
foreignField: 'history_id',
as: 'History'
}
},
{ $unwind: '$History' },
{ "$match" : { date: {$gt: "History.date" } } }, // this is not working
])
I am getting value inside $history but enable to fetch matched record. I don't know why i read somewhere $gt does work on number my date is number too
when i set string instead "History.date" it does work but after putting this doesnot work
Basically my idea is not to display those value whose date are $lt from history collection and where user not in 5e4a8d2d3952132a08ae5764

You are mixing the query language with aggregation operators. In order to use $match to compare two fields from the same document, you will need to use $expr along with aggregation operators.
For that final match stage, use
{ "$match":{ "$expr":{ "$gt":[ "$date", "$History.date" ] } } }
Edit
Adding an additional comparison between other fields would need to use $and or $or, like:
{ "$match":{
"$expr":{
"$and":[
{"$gt":[ "$date", "$History.date" ] },
{"$not": {"$eq":[ "$sender_id", "$History.user_id" ]}}
]
}
}}
Playground

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

multi-stage aggregation pipeline matching data based on fields retrieved through $lookup

I'm trying to build a complex, nested aggregation pipeline in MongoDB (4.4.9 Community Edition, using the pymongo driver for Python 3.10).
There are relevant data points in different collections which I want to aggregate into one, NEW (ideally) view (or, if that doesn't work) collection.
The collections, and the relevant fields therein follow a hierarchy. There is members, which contains the top-level key on which other data is to be merged,
membershipNumber.
> members.find_one()
{'_id': ObjectId('61153299af6122XXXXXXXXXXXXX'), 'membershipNumber': 'N03XXXXXX'}
Then, there's a different collection, which contains membershipNumber, but also a different, linked field, an_user_id. an_user_id is used in other collections to denote records/fields in arrays that pertain to that particular user.
I 'join' members and an_users like so:
result = members.aggregate([
{
'$lookup': {
'from': 'an_users',
'localField': 'membershipNumber',
'foreignField': 'memref',
'as': 'an_users'
}
},
{ '$unwind' : '$an_users' },
{
'$project' : {
'_id' : 1,
'membershipNumber' : 1,
'an_user_id' : '$an_users.user_id'
}
}
]);
So far so good, this returns the desired, aggregated record:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX'}
Now, I have a third collection, which contains the an_user_id as a string in arrays, denoting wherever that user clicked a given email, whereby a record is an email (and the an_user_ids in the clicks array are users that clicked a link in that email.
{'_id': ObjectId('blah'),
'email_id': '407XXX',
'actions_count': 17,
'administrative_title': 'test',
'bounce': ['3440XXXX'],
'click': ['38294CCC',
'418FFFF',
'48XXXXXX',
'38eGGGG'}
I want to count the number occurences of a given an_user_id (which I've attained from aggregating) in arrays (e.g. clicks, bounces, opens) in the emails collection, and include it in the .aggregate call, to retrieve something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12
}
Further, I might want to also attach counts of an_user_id in other collections in my DB.
Consider, e.g., this collection called events:
{
"_id": "617ffa96ee11844e143a63dd",
"id": "12345",
"administrative_title": "my_event",
"created_at": {
"$date": "2020-01-15T16:28:50.000Z"
},
"event_creator_id": "123456",
"event_title": "my_event",
"group_id": "123456",
"permalink": "event_id",
"rsvp_count": 54,
"rsvps": [{
"rsvp_id": "56789",
"display_name": "John Doe",
"rsvp_user_id": "48XXXXXX",
"rsvp_created_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"rsvp_updated_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"first_name": "John",
"last_name": "Doe",
}, {
"rsvp_id": "543895",
"display_name": "James Appleslice",
"rsvp_user_id": "N03XXXXXX",
"rsvp_created_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"rsvp_updated_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"first_name": "James",
"last_name": "Appleslice"}
]
}
So, the end-product would look something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12,
'n_rsvps' : 12
}
My idea was to use the $lookup parameter -- however, I only know how to use this for matching on fields that I have in the parent collection that I'm performing the aggregation on, but not on fields that have been generated in the process of the aggregation.
Any help would be hugely appreciated!
You could use $lookup pipeline. First you would $lookup the user id followed by another $lookup to verify if the user id exists in email. Lastly few more stages to collect the results and format per your need. Furthermore, you can add $out stage if you would like to write the results into another collection.
db.members.aggregate([{
$lookup: {
from: "an_users",
let: {
membershipNumber: "$membershipNumber"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$memref",
"$$membershipNumber"
]
},
}
},
{
"$lookup": {
"from": "emails",
"localField": "user_id",
"foreignField": "click",
"as": "clicks"
}
},
{
"$project": {
"_id": 1,
"membershipNumber": 1,
"an_user_id": "$user_id",
"n_email_clicks": {
$size: "$clicks"
}
}
}
],
as: "details"
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
{
$arrayElemAt: [
"$details",
0
]
},
"$$ROOT"
]
}
}
},
{
$project: {
details: 0
}
}])
Working example - https://mongoplayground.net/p/yrFsNp44hpi

Mongodb get document that has max value for each subdocument

I have some data looking like this:
{'Type':'A',
'Attributes':[
{'Date':'2021-10-02', 'Value':5},
{'Date':'2021-09-30', 'Value':1},
{'Date':'2021-09-25', 'Value':13}
]
},
{'Type':'B',
'Attributes':[
{'Date':'2021-10-01', 'Value':36},
{'Date':'2021-09-15', 'Value':14},
{'Date':'2021-09-10', 'Value':18}
]
}
I would like to query for each document the document with the newest date. With the data above the desired result would be:
{'Type':'A', 'Date':'2021-10-02', 'Value':5}
{'Type':'B', 'Date':'2021-10-01', 'Value':36}
I managed to find some queries to find over all sub document only the global max. But I did not find the max for each document.
Thanks a lot for your help
Storing date as string is generally considered as bad pratice. Suggest that you change your date field into date type. Fortunately for your case, you are using ISO date format so some effort could be saved.
You can do this in aggregation pipeline:
use $max to find out the max date
use $filter to filter the Attributes array to contains only the latest element
$unwind the array
$project to your expected output
Here is the Mongo playground for your reference.
This keeps 1 member from Attributes only, the one with the max date.
If you want to keep multiple ones use the #ray solution that keeps all members that have the max-date.
*mongoplayground can lose the order, of fields in a document,
if you see wrong result, test it on your driver, its bug of mongoplayground tool
Query1 (local-way)
Test code here
aggregate([
{
"$project": {
"maxDateValue": {
"$max": {
"$map": {
"input": "$Attributes",
"in": { "Date": "$$this.Date", "Value": "$$this.Value" },
}
}
},
"Type": 1
}
},
{
"$project": {
"Date": "$maxDateValue.Date",
"Value": "$maxDateValue.Value"
}
}
])
Query2 (unwind-way)
Test code here
aggregate([
{
"$unwind": { "path": "$Attributes" }
},
{
"$group": {
"_id": "$Type",
"maxDate": {
"$max": {
"Date": "$Attributes.Date",
"Value": "$Attributes.Value"
}
}
}
},
{
"$project": {
"_id": 0,
"Type": "$_id",
"Date": "$maxDate.Date",
"Value": "$maxDate.Value"
}
}
])

mongodb aggregation - nested group

I'm trying to perform nested group, I have an array of documents that has two keys (invoiceIndex, proceduresIndex) I need the documents to be arranged like so
invoices (parent) -> procedures (children)
invoices: [ // Array of invoices
{
.....
"procedures": [{}, ...] // Array of procedures
}
]
Here is a sample document
{
"charges": 226.09000000000003,
"currentBalance": 226.09000000000003,
"insPortion": "",
"currentInsPortion": "",
"claim": "notSent",
"status": "unpaid",
"procedures": {
"providerId": "9vfpjSraHzQFNTtN7",
"procedure": "21111",
"description": "One surface",
"category": "basicRestoration",
"surface": [
"m"
],
"providerName": "B Dentist",
"proceduresIndex": "0"
},
"patientId": "mE5vKveFArqFHhKmE",
"patientName": "Silvia Waterman",
"invoiceIndex": "0",
"proceduresIndex": "0"
}
Here is what I have tried
https://mongoplayground.net/p/AEBGmA32n8P
Can you try the following;
db.collection.aggregate([
{
$group: {
_id: "$invoiceIndex",
procedures: {
$push: "$procedures"
},
invoice: {
$first: "$$ROOT"
}
}
},
{
$addFields: {
"invoice.procedures": "$procedures"
}
},
{
"$replaceRoot": {
"newRoot": "$invoice"
}
}
])
I retain the invoice fields with invoice: { $first: "$$ROOT" }, also keep procedures's $push logic as a separate field. Then with $addFields I move that array of procedures into the new invoice object. Then replace root to that.
You shouldn't use the procedureIndex as a part of _id in $group, for you won't be able to get a set of procedures, per invoiceIndex then. With my $group logic it works pretty well as you see.
Link to mongoplayground

Aggregation with update in mongoDB

I've a collection with many similar structured document, two of the document looks like
Input:
{
"_id": ObjectId("525c22348771ebd7b179add8"),
"cust_id": "A1234",
"score": 500,
"status": "A"
"clear": "No"
}
{
"_id": ObjectId("525c22348771ebd7b179add9"),
"cust_id": "A1234",
"score": 1600,
"status": "B"
"clear": "No"
}
By default the clear for all document is "No",
Req: I have to add the score of all documents with same cust_id, provided they belong to status "A" and status "B". If the score exceeds 2000 then I have to update the clear attribute to "Yes" for all of the document with the same cust_id.
Expected output:
{
"_id": ObjectId("525c22348771ebd7b179add8"),
"cust_id": "A1234",
"score": 500,
"status": "A"
"clear": "Yes"
}
{
"_id": ObjectId("525c22348771ebd7b179add9"),
"cust_id": "A1234",
"score": 1600,
"status": "B"
"clear": "Yes"
}
Yes because 1600+500 = 2100, and 2100 > 2000.
My Approach:
I was only able to get the sum by aggregate function but failed at updating
db.aggregation.aggregate([
{$match: {
$or: [
{status: 'A'},
{status: 'B'}
]
}},
{$group: {
_id: '$cust_id',
total: {$sum: '$score'}
}},
{$match: {
total: {$gt: 2000}
}}
])
Please suggest me how do I proceed.
After a lot of trouble, experimenting mongo shell I've finally got a solution to my question.
Psudocode:
# To get the list of customer whose score is greater than 2000
cust_to_clear=db.col.aggregate(
{$match:{$or:[{status:'A'},{status:'B'}]}},
{$group:{_id:'$cust_id',total:{$sum:'$score'}}},
{$match:{total:{$gt:500}}})
# To loop through the result fetched from above code and update the clear
cust_to_clear.result.forEach
(
function(x)
{
db.col.update({cust_id:x._id},{$set:{clear:'Yes'}},{multi:true});
}
)
Please comment, if you have any different solution for the same question.
With Mongo 4.2 it is now possible to do this using update with aggregation pipeline. The example 2 has example how you do conditional updates:
db.runCommand(
{
update: "students",
updates: [
{
q: { },
u: [
{ $set: { average : { $avg: "$tests" } } },
{ $set: { grade: { $switch: {
branches: [
{ case: { $gte: [ "$average", 90 ] }, then: "A" },
{ case: { $gte: [ "$average", 80 ] }, then: "B" },
{ case: { $gte: [ "$average", 70 ] }, then: "C" },
{ case: { $gte: [ "$average", 60 ] }, then: "D" }
],
default: "F"
} } } }
],
multi: true
}
],
ordered: false,
writeConcern: { w: "majority", wtimeout: 5000 }
}
)
Another example:
db.c.update({}, [
{$set:{a:{$cond:{
if: {}, // some condition
then:{} , // val1
else: {} // val2 or "$$REMOVE" to not set the field or "$a" to leave existing value
}}}}
]);
You need to do this in two steps:
Identify customers (cust_id) with a total score greater than 200
For each of these customers, set clear to Yes
You already have a good solution for the first part. The second part should be implemented as a separate update() calls to the database.
Psudocode:
# Get list of customers using the aggregation framework
cust_to_clear = db.col.aggregate(
{$match:{$or:[{status:'A'},{status:'B'}]}},
{$group:{_id:'$cust_id', total:{$sum:'$score'}}},
{$match:{total:{$gt:2000}}}
)
# Loop over customers and update "clear" to "yes"
for customer in cust_to_clear:
id = customer[_id]
db.col.update(
{"_id": id},
{"$set": {"clear": "Yes"}}
)
This isn't ideal because you have to make a database call for every customer. If you need to do this kind of operation often, you might revise your schema to include the total score in each document. (This would have to be maintained by your application.) In this case, you could do the update with a single command:
db.col.update(
{"total_score": {"$gt": 2000}},
{"$set": {"clear": "Yes"}},
{"multi": true}
)
Short Answer: To avoid looping a Database query, just add $merge to the end and specify your collection like so:
db.aggregation.aggregate([
{$match: {
$or: [
{status: 'A'},
{status: 'B'}
]
}},
{$group: {
_id: '$cust_id',
total: {$sum: '$score'}
}},
{$match: {
total: {$gt: 2000}
}},
{ $merge: "<collection name here>"}
])
Elaboration: The current solution is looping through a database query, which is not good time efficiency wise and also a lot more code.
Mitar's answer is not updating through an aggregation, but the opposite => using an aggregation within Mongo's update. If your wondering what is a pro in doing it this way, well you can use all of the aggregation pipeline as opposed to being restricted to only a few as specified in their documentation.
Here is an example of an aggregate that won't work with Mongo's update:
db.getCollection('foo').aggregate([
{ $addFields: {
testField: {
$in: [ "someValueInArray", '$arrayFieldInFoo']
}
}},
{ $merge : "foo" }]
)
This will output the updated collection with a new test field that will be true if "someValueInArray" is in "arrayFieldInFoo" or false otherwise. This is NOT possible currently with Mongo.update since $in cannot be used inside update aggregate.
Update: Changed from $out to $merge since $out would only work if updating the entire collection as $out replaces entire collection with the result of the aggregate. $merge will only overrite if the aggregate matches a document (much safer).
In MongoDB 2.6., it will be possible to write the output of aggregation query, with the same command.
More information here : http://docs.mongodb.org/master/reference/operator/aggregation/out/
The solution which I found is using "$out"
*) e.g adding a field :
db.socios.aggregate(
[
{
$lookup: {
from: 'cuotas',
localField: 'num_socio',
foreignField: 'num_socio',
as: 'cuotas'
}
},
{
$addFields: { codigo_interno: 1001 }
},
{
$out: 'socios' //Collection to modify
}
]
)
*) e.g modifying a field :
db.socios.aggregate(
[
{
$lookup: {
from: 'cuotas',
localField: 'num_socio',
foreignField: 'num_socio',
as: 'cuotas'
}
},
{
$set: { codigo_interno: 1001 }
},
{
$out: 'socios' //Collection to modify
}
]
)