Filter $lookup results - mongodb

I have 2 collections (with example documents):
reports
{
id: "R1",
type: "xyz",
}
reportfiles
{
id: "F1",
reportid: "R1",
time: ISODate("2016-06-13T14:20:25.812Z")
},
{
id: "F14",
reportid: "R1",
time: ISODate("2016-06-15T09:20:29.809Z")
}
As you can see one report may have multiple reportfiles.
I'd like to perform a query, matching a report id, returning the report document as is, plus an additional key storing as subdocument the reportfile with the most recent time (even better without reportid, as it would be redundant), e.g.
{
id: "R1",
type: "xyz",
reportfile: {
id: "F14",
reportid: "R1",
time: ISODate("2016-06-15T09:20:29.809Z")
}
}
My problem here is that every report type has its own set of properties, so using $project in an aggregation pipeline is not the best way.
So far I got
db.reports.aggregate([{
$match : 'R1'
}, {
$lookup : {
from : 'reportfiles',
localField : 'id',
foreignField : 'reportid',
as : 'reportfile'
}
}
])
returning of course as ´reportfile´ the list of all files with the given reportid. How can I efficiently filter that list to get the only element I need?
efficiently -> I tried using $unwind as next pipeline step but the resulting document was frighteningly and pointlessly long.
Thanks in advance for any suggestion!

You need to add another $project stage to your aggregation pipeline after the $lookup stage.
{ "$project": {
"id": "R1",
"type": "xyz",
"reportfile": {
"$let": {
"vars": {
"obj": {
"$arrayElemAt": [
{ "$filter": {
"input": "$reportfile",
"as": "report",
"cond": { "$eq": [ "$$report.time", { "$max": "$reportfile.time" } ] }
}},
0
]
}
},
"in": { "id": "$$obj.id", "time": "$$obj.time" }
}
}
}}
The $filter operator "filter" the $lookup result and return an array with the document that satisfy your condition. The condition here is $eq which return true when the document has the $maximum value.
The $arrayElemAt operator slice the $filter's result and return the element from the array that you then assign to a variable using the $let operator. From there, you can easily access the field you want in your result with the dot notation.

What you would require is to run the aggregation operation on the reportfile collection, do the "join" on the reports collection, pipe a $group operation to ordered (with $sort) and flattened documents (with $unwind) from the $lookup pipeline. The preceding result can then be grouped by the reportid and output the desired result using the $first accumulator aoperators.
The following demonstrates this approach:
db.reportfiles.aggregate([
{ "$match": { "reportid": "R1" } },
{
"$lookup": {
"from": 'reports',
"localField" : 'reportid',
"foreignField" : 'id',
"as": 'report'
}
},
{ "$unwind": "$report" },
{ "$sort": { "time": -1 } },
{
"$group": {
"_id": "$reportid",
"type": { "$first": "$report.type" },
"reportfile": {
"$first": {
"id": "$id",
"reportid": "$reportid",
"time": "$time"
}
}
}
}
])
Sample Output:
{
"_id" : "R1",
"type" : "xyz",
"reportfile" : {
"id" : "F14",
"reportid" : "R1",
"time" : ISODate("2016-06-15T09:20:29.809Z")
}
}

Related

Performing $lookup based on matching object attribute in other collection's array

I am trying to perform $lookup on collection with conditions, the problem I am facing is that I would like to match the text field of all objects which are inside an array (accounts array) in other (plates) collection.
I have tried using $map as well as $in and $setIntersection but nothing seems to work. And, I am unable to find a way to match the text fields of each of the objects in array.
My document structures are as follows:
plates collection:
{
"_id": "Batch 1",
"rego" : "1QX-WA-123",
"date" : 1516374000000.0
"accounts": [{
"text": "Acc1",
"date": 1516374000000
},{
"text": "Acc2",
"date": 1516474000000
}]
}
accounts collection:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0
}
I am trying to achieve something like this:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$eq': [ '$account.text', '$$accountId' ] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
}
}],
as: 'cusips'
}
},
The output I am trying to get is:
{
"_id": "Acc1",
"date": 1516374000000
"createdAt" : 1513810712802.0,
"plates": [{
"_id": "Batch 1",
"rego": "1QX-WA-123"
}]
}
Personally I would be initiating the aggregation from the "plates" collection instead where the initial $match conditions can filter the date range more cleanly. Getting your desired output is then a simple matter of "unwinding" the resulting "accounts" matches and "inverting" the content.
Easy enough with MongoDB 3.6 features which you must have in order to use $lookup with $expr. We even don't need that form for $lookup here:
db.plates.aggregate([
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
}
}},
{ "$lookup": {
"from": "accounts",
"localField": "accounts.text",
"foreignField": "_id",
"as": "accounts"
}},
{ "$unwind": "$accounts" },
{ "$group": {
"_id": "$accounts",
"plates": { "$push": { "_id": "$_id", "rego": "$rego" } }
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": ["$_id", { "plates": "$plates" }]
}
}}
])
This of course is an "INNER JOIN" which would only return "accounts" entries where the matc
Doing the "join" from the "accounts" collection means you need additional handling to remove the non-matching entries from the "accounts" array within the "plates" collection:
db.accounts.aggregate([
{ "$lookup": {
"from": "plates",
"let": { "account": "$_id" },
"pipeline": [
{ "$match": {
"date": {
"$gte": new Date("2016-01-01").getTime(),
"$lte": new Date("2019-01-01").getTime()
},
"$expr": { "$in": [ "$$account", "$accounts.text" ] }
}},
{ "$project": { "_id": 1, "rego": 1 } }
],
"as": "plates"
}}
])
Note that the $match on the "date" properties should be expressed as a regular query condition instead of within the $expr block for optimal performance of the query.
The $in is used to compare the "array" of "$accounts.text" values to the local variable defined for the "_id" value of the "accounts" document being joined to. So the first argument to $in is the "single" value and the second is the "array" of just the "text" values which should be matching.
This is also notably a "LEFT JOIN" which returns all "accounts" regardless of whether there are any matching "plates" to the conditions, and therefore you can possibly end up with an empty "plates" array in the results returned. You can filter those out if you didn't want them, but where that was the case the former query form is really far more efficient than this one since the relation is defined and we only ever deal with "plates" which would meet the criteria.
Either method returns the same response from the data provided in the question:
{
"_id" : "Acc1",
"date" : 1516374000000,
"createdAt" : 1513810712802,
"plates" : [
{
"_id" : "Batch 1",
"rego" : "1QX-WA-123"
}
]
}
Which direction you actually take that from really depends on whether the "LEFT" or "INNER" join form is what you really want and also where the most efficient query conditions can be made for the items you actually want to select.
Hmm, not sure how you tried $in, but it works for me:
{
$lookup: {
from: 'plates',
let: { 'accountId': '$_id' },
pipeline: [{
'$match': {
'$expr': { '$and': [
{ '$in': [ '$$accountId', '$accounts.text'] },
{ '$gte': [ '$date', ISODate ("2016-01-01T00:00:00.000Z").getTime() ] },
{ '$lte': [ '$date', ISODate ("2019-01-01T00:00:00.000Z").getTime() ] }
]}
},
}],
as: 'cusips'
}
}

Concat String by Group

I want to group records by _id and create a string by combining client_id values.
Here are examples of my documents:
{
"_id" : ObjectId("59e955e633d64c81875bfd2f"),
"tag_id" : 1,
"client_id" : "10001"
}
{
"_id" : ObjectId("59e955e633d64c81875bfd30"),
"tag_id" : 1,
"client_id" : "10002"
}
I'd like to have this output:
{
"_id" : 1
"client_id" : "10001,10002"
}
You can do it with the aggregation framework as a "two step" operation. Which is to first accumulate the items to an array via $push withing a $group pipeline, and then to use $concat with $reduce on the produced array in final projection:
db.collection.aggregate([
{ "$group": {
"_id": "$tag_id",
"client_id": { "$push": "$client_id" }
}},
{ "$addFields": {
"client_id": {
"$reduce": {
"input": "$client_id",
"initialValue": "",
"in": {
"$cond": {
"if": { "$eq": [ "$$value", "" ] },
"then": "$$this",
"else": {
"$concat": ["$$value", ",", "$$this"]
}
}
}
}
}
}}
])
We also apply $cond here to avoid concatenating an empty string with a comma in the results, so it looks more like a delimited list.
FYI There is an JIRA issue SERVER-29339 which does ask for $reduce to be implemented as an accumulator expression to allow it's use directly in a $group pipeline stage. Not likely to happen any time soon, but it theoretically would replace $push in the above and make the operation a single pipeline stage. Sample proposed syntax is on the JIRA issue.
If you don't have $reduce ( requires MongoDB 3.4 ) then just post process the cursor:
db.collection.aggregate([
{ "$group": {
"_id": "$tag_id",
"client_id": { "$push": "$client_id" }
}},
]).map( doc =>
Object.assign(
doc,
{ "client_id": doc.client_id.join(",") }
)
)
Which then leads to the other alternative of doing this using mapReduce if you really must:
db.collection.mapReduce(
function() {
emit(this.tag_id,this.client_id);
},
function(key,values) {
return [].concat.apply([],values.map(v => v.split(","))).join(",");
},
{ "out": { "inline": 1 } }
)
Which of course outputs in the specific mapReduce form of _id and value as the set of keys, but it is basically the output.
We use [].concat.apply([],values.map(...)) because the output of the "reducer" can be a "delimited string" because mapReduce works incrementally with large results and therefore output of the reducer can become "input" on another pass. So we need to expect that this can happen and treat it accordingly.
Starting Mongo 4.4, the $group stage has a new aggregation operator $accumulator allowing custom accumulations of documents as they get grouped:
// { "tag_id" : 1, "client_id" : "10001" }
// { "tag_id" : 1, "client_id" : "10002" }
// { "tag_id" : 2, "client_id" : "9999" }
db.collection.aggregate([
{ $group: {
_id: "$tag_id",
client_id: {
$accumulator: {
accumulateArgs: ["$client_id"],
init: function() { return [] },
accumulate: function(ids, id) { return ids.concat(id) },
merge: function(ids1, ids2) { return ids1.concat(ids2) },
finalize: function(ids) { return ids.join(",") },
lang: "js"
}
}
}}
])
// { "_id" : 2, "client_id" : "9999" }
// { "_id" : 1, "client_id" : "10001,10002" }
The accumulator:
accumulates on the field client_id (accumulateArgs)
is initialised to an empty array (init)
accumulates by concatenating new ids to already seen ids to new ones (accumulate and merge)
and finally joins all ids as a string (finalize)

Removing duplicates in mongodb with aggregate query

db.games.aggregate([
{ $unwind : "$rounds"},
{ $match: {
"rounds.round_values.gameStage": "River",
"rounds.round_values.decision": "BetPlus" }
},
{ $project: {"FinalFundsChange":1, "GameID":1}
}])
The resulting output is:
{ "_id" : ObjectId("57cbce66e281af12e4d0731f"), "GameID" : "229327202", "FinalFundsChange" : 0.8199999999999998 }
{ "_id" : ObjectId("57cbe2fce281af0f34020901"), "FinalFundsChange" : -0.1599999999999997, "GameID" : "755030199" }
{ "_id" : ObjectId("57cbea3ae281af0f340209bc"), "FinalFundsChange" : 0.10000000000000009, "GameID" : "231534683" }
{ "_id" : ObjectId("57cbee43e281af0f34020a25"), "FinalFundsChange" : 1.7000000000000002, "GameID" : "509975754" }
{ "_id" : ObjectId("57cbee43e281af0f34020a25"), "FinalFundsChange" : 1.7000000000000002, "GameID" : "509975754" }
As you can see the last element is a duplicate, that's because the unwind creates two elements of it, which it should. How can I (while keeping the aggregate structure of the query) keep the first element of the duplicate or keep the last element of the duplicate only?
I have seen that the ways to do it seem to be related to either $addToSet or $setUnion (any details how this works exactly are appreciated as well), but I don't understand how I can choose the 'subset' by which I want to identify the duplicates (in my case that's the 'GameID', other values are allowed to be different) and how I can select whether I want the first or the last element.
You could group by _id via $group and then use the $last and $first operator respectively to keep the last or first values.
db.games.aggregate([
{ $unwind : "$rounds"},
{ $match: {
"rounds.round_values.gameStage": "River",
"rounds.round_values.decision": "BetPlus" }
},
{ $group: {
_id: "$_id",
"FinalFundsChange": { $first: "$FinalFundsChange" },
"GameID": { $last: "$GameID" }
}
}
])
My problem was find all users who purchase same product, where a user can purchase a product multiple time.
https://mongoplayground.net/p/UTuT4e_N6gn
db.payments.aggregate([
{
"$lookup": {
"from": "user",
"localField": "user",
"foreignField": "_id",
"as": "user_docs"
}
},
{
"$unwind": "$user_docs",
},
{
"$group": {
"_id": "$user_docs._id",
"name": {
"$first": "$user_docs.name"
},
}
},
{
"$project": {
"_id": 0,
"id": "$_id",
"name": "$name"
}
}
])

How to $slice a $filter result in MongoDB?

I have a collection with the following format:
{
"_id": 123,
"items": [{
"status" : "inactive",
"created" : ISODate("2016-03-16T10:39:28.321Z")
},
{
"status" : "active",
"created" : ISODate("2016-03-16T10:39:28.321Z")
},
{
"status" : "active",
"created" : ISODate("2016-03-16T10:39:28.321Z")
}
],
"status" : "active"
}
I want to query on status field of items such that the object with status as 'active' is only returned in the array and also only the last 2 are returned in the query.
At present I am using $filter for this operation, but I am not able to use $slice along with $filter(which I think is needed for what I desire). The following is how my query looks right now:
db.collection('collection').aggregate([
{$match: {'status': 'active'}},
{
$project: {
'items': {
$filter: {
input: '$items',
as: 'item',
cond: {$eq: ['$$item.status', 'active']
}
}
}
}]);
What I am getting right now is correct result, its just that it returns all the objects in the items field and I just want the last 2 objects.
To get the last two elements use the $slice operator and set the position operand to -2. Of course the first operand to the slice operator is the $filter expression which resolves to an array.
db.collection.aggregate([
{ "$match": { "items.status": "active" } },
{ "$project": {
"items": {
"$slice": [
{ "$filter": {
"input": "$items",
"as": "item",
"cond": { "$eq": [ "$$item.status", "active" ] }
}},
-2
]
}
}}
])

Get Distinct list of two properties using MongoDB 2.4

I have an article collection:
{
_id: 9999,
authorId: 12345,
coAuthors: [23456,34567],
title: 'My Article'
},
{
_id: 10000,
authorId: 78910,
title: 'My Second Article'
}
I'm trying to figure out how to get a list of distinct author and co-author ids out of the database. I have tried push, concat, and addToSet, but can't seem to find the right combination. I'm on 2.4.6 so I don't have access to setUnion.
Whilst $setUnion would be the "ideal" way to do this, there is another way that basically involved "switching" between a "type" to alternate which field is picked:
db.collection.aggregate([
{ "$project": {
"authorId": 1,
"coAuthors": { "$ifNull": [ "$coAuthors", [null] ] },
"type": { "$const": [ true,false ] }
}},
{ "$unwind": "$coAuthors" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"$cond": [
"$type",
"$authorId",
"$coAuthors"
]
}
}},
{ "$match": { "_id": { "$ne": null } } }
])
And that is it. You may know the $const operation as the $literal operator from MongoDB 2.6. It has always been there, but was only documented and given an "alias" at the 2.6 release.
Of course the $unwind operations in both cases produce more "copies" of the data, but this is grouping for "distinct" values so it does not matter. Just depending on the true/false alternating value for the projected "type" field ( once unwound ) you just pick the field alternately.
Also this little mapReduce does much the same thing:
db.collection.mapReduce(
function() {
emit(this.authorId,null);
if ( this.hasOwnProperty("coAuthors"))
this.coAuthors.forEach(function(id) {
emit(id,null);
});
},
function(key,values) {
return null;
},
{ "out": { "inline": 1 } }
)
For the record, $setUnion is of course a lot cleaner and more performant:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": ["A"],
"as": "el",
"in": "$authorId"
}},
{ "$ifNull": [ "$coAuthors", [] ] }
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined"
}}
])
So there the only real concerns are converting the singular "authorId" to an array via $map and feeding an empty array where the "coAuthors" field is not present in the document.
Both output the same distinct values from the sample documents:
{ "_id" : 78910 }
{ "_id" : 23456 }
{ "_id" : 34567 }
{ "_id" : 12345 }