MongoDB select best matched document - mongodb

I have a collection of documents like this:
[{
"_id" : ObjectId("6347e5aa0c009a37b81da700"),
"testField1" : "1000",
"testField2" : "2000",
"testField3" : NumberInt(1)
},
{
"_id" : ObjectId("6347e5890c009a37b81da701"),
"testField2" : 2000,
"testField3" : NumberInt(2)
},
{
"_id" : ObjectId("6347e5960c009a37b81da702"),
"testField3" : NumberInt(3)
}]
I need to retrieve documents in the below precedence.
if testField1 and testField2 exist and match their values, the query should return that document.
Otherwise, if testField2 exists and matches its value, the query should return that document,
Otherwise it should return the last document, where testField1 & testField2 do not exist.
I tried the below query, but it returns all the documents.
db.getCollection("TEST_COLLECTION").aggregate([
{
$match: {
$expr: {
$cond: {
if: {
$and: {"testField1": "1000", "testField2": "2000"}
},
then: {
$and: {"testField1": "1000", "testField2": "2000"}
},
else : {
$cond: {
if: {
$and: {"testField1": null, "testField2": "2000"}
},
then: {
$and: {"testField1": null, "testField2": "2000"}
},
else : {
$and: {"testField1": null, "testField2": null}
}
}
}
}
}
}
}
])

There are definitely still some open questions from the comments. #ray has an interesting approach linked in there that uses $setWindowFields which may be appropriate depending on exactly what you're looking for.
I took a different approach (and perhaps interpretation) and built out the following aggregation that uses $unionWith:
db.collection.aggregate([
{
$match: {
testField1: "1000",
testField2: "2000"
}
},
{
"$addFields": {
sortOrder: 1
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$match: {
testField2: "2000"
}
},
{
"$addFields": {
sortOrder: 2
}
}
]
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$match: {
testField1: {
$exists: false
},
testField2: {
$exists: false
}
}
},
{
"$addFields": {
sortOrder: 3
}
},
]
}
},
{
$sort: {
sortOrder: 1
}
},
{
$limit: 1
},
{
"$unset": "sortOrder"
}
])
Basically the aggregation will internally issue three queries, one corresponding with each of three precedence conditions. Similar to #ray's solution, it creates a field to sort on (sortOrder in mine) since the ordering of $unionWith is unspecified otherwise per the documentation. After the $sort we can $limit to a single result and $unset the temporary sorting field prior to returning the result to the client. Depending on the version you are running, you could consider adding a couple of inline $limits for each of the subpipelines to reduce the amount of work being done. Along with appropriate indexes (perhaps just { testField2: 1, testField: 1 }), this operation should be reasonably efficient.
Here is the playground link.

If there are several groups and you need to return the wanted document per group, I would go with #ray's answer. If there is only one group (as implies on your comment, and on #user20042973's nice answer), I would like to point another obvious option:
db.collection.aggregate([
{$facet: {
op1: [{$match: {testField1: "1000", testField2: "2000"}}],
op2: [{$match: {testField1: null, testField2: "2000"}}],
op3: [{$match: {testField1: null, testField2: null}},
{$sort: {timestamp: -1}}, {$limit: 1}]
}},
{$project: {res: {$ifNull: [{$first: "$op1"}, {$first: "$op2"}, {$first: "$op3"}]}}},
{$replaceRoot: {newRoot: "$res"}}
])
See how it works on the playground example

Related

Is there a way to project max value in a range then finding documents within a new range starting at this max value in just one aggregate?

Given the following data in a Mongo collection:
{
_id: "1",
dateA: ISODate("2021-12-31T00:00.000Z"),
dateB: ISODate("2022-01-11T00:00.000Z")
},
{
_id: "2",
dateA: ISODate("2022-01-02T00:00.000Z"),
dateB: ISODate("2022-01-08T00:00.000Z")
},
{
_id: "3",
dateA: ISODate("2022-01-03T00:00.000Z"),
dateB: ISODate("2022-01-05T00:00.000Z")
},
{
_id: "4",
dateA: ISODate("2022-01-09T00:00.000Z"),
dateB: null
},
{
_id: "5",
dateA: ISODate("2022-01-11T00:00.000Z"),
dateB: ISODate("2022-01-11T00:00.000Z")
},
{
_id: "6",
dateA: ISODate("2022-01-12T00:00.000Z"),
dateB: null
}
And given the range below:
ISODate("2022-01-01T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
I want to find all values with dateA within given range, then I want to decrease the range starting it from the max dateB value, and finally fetching all documents that doesn't contain dateB.
In resume:
I'll start with range
ISODate("2022-01-01T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
Then change to range
ISODate("2022-01-08T00:00.000Z") .. ISODate("2022-01-10T00:00.000Z")
Then find with
dateB: null
Finally, the result would be the document with
_id: "4"
Is there a way to find the document with _id: "4" in just one aggregate?
I know how to do it programmatically using 2 queries, but the main goal is to have just one request to the database.
You can use $max to find the maxDateB first. Then perform a self $lookup to apply the $match and find doc _id: "4".
db.collection.aggregate([
{
$match: {
dateA: {
$gte: ISODate("2022-01-01"),
$lt: ISODate("2022-01-10")
}
}
},
{
"$group": {
"_id": null,
"maxDateB": {
"$max": "$dateB"
}
}
},
{
"$lookup": {
"from": "collection",
"let": {
start: "$maxDateB",
end: ISODate("2022-01-10")
},
"pipeline": [
{
$match: {
$expr: {
$and: [
{
$gte: [
"$dateA",
"$$start"
]
},
{
$lt: [
"$dateA",
"$$end"
]
},
{
$eq: [
"$dateB",
null
]
}
]
}
}
}
],
"as": "result"
}
},
{
"$unwind": "$result"
},
{
"$replaceRoot": {
"newRoot": "$result"
}
}
])
Here is the Mongo Playground for your
Assuming the matched initial dateA range is not huge, here is alternate approach that exploits $push and $filter and avoids the hit of a $lookup stage:
db.foo.aggregate([
{$match: {dateA: {$gte: new ISODate("2022-01-01"), $lt: new ISODate("2022-01-10")} }},
// Kill 2 birds with one stone here. Get the max dateB AND prep
// an array to filter later. The items array will be as large
// as the match above but the output of this stage is a single doc:
{$group: {_id: null,
maxDateB: {$max: "$dateB" },
items: {$push: "$$ROOT"}
}},
{$project: {X: {$filter: {
input: "$items",
cond: {$and: [
// Each element of 'items' is passed as $$this so use
// dot notation to get at individual fields. Note that
// all other peer fields to 'items' like 'maxDateB' are
// in scope here and addressable using '$':
{$gt: [ "$$this.dateA", "$maxDateB"]},
{$eq: [ "$$this.dateB", null ]}
]}
}}
}}
]);
This yields a single doc result (I added an additional doc _id 41 to test the null equality for more than 1 doc):
{
"_id" : null,
"X" : [
{
"_id" : "4",
"dateA" : ISODate("2022-01-09T00:00:00Z"),
"dateB" : null
},
{
"_id" : "41",
"dateA" : ISODate("2022-01-09T00:00:00Z"),
"dateB" : null
}
]
}
It is possible to $unwind and $replaceRoot after this but there is little need to do so.

$group after $lookup is taking way too long

I have following mongo collection:
{
"_id" : "22pTvYLd7azAAPL5T",
"plate" : "ABC-123",
"company": "AMZ",
"_portfolioType" : "account"
},
{
"_id" : "22pTvYLd7azAAPL5T",
"plate" : "ABC-123",
"_portfolioType" : "sale",
"price": 87.3
},
{
"_id" : "22pTvYLd7azAAPL5T",
"plate" : "ABC-123",
"_portfolioType" : "sale",
"price": 88.9
}
And I am trying to aggregate all documents which have same value in plate field. Below is the query I have written so far:
db.getCollection('temp').aggregate([
{
$lookup: {
from: 'temp',
let: { 'p': '$plate', 't': '$_portfolioType' },
pipeline: [{
'$match': {
'_portfolioType': 'sale',
'$expr': { '$and': [
{ '$eq': [ '$plate', '$$p' ] },
{ '$eq': [ '$$t', 'account' ] }
]}
}
}],
as: 'revenues'
},
},
{
$project: {
plate: 1,
company: 1,
totalTrades: { $arrayElemAt: ['$revenues', 0] },
},
},
{
$addFields: {
revenue: { $add: [{ $multiply: ['$totalTrades.price', 100] }, 99] },
},
},
{
$group: {
_id: '$company',
revenue: { $sum: '$revenue' },
}
}
])
Query works fine if I remove $group stage, however, as soon as I add $group stage mongo starts an infinite processing. I tried adding $match as the first stage so to limit number of documents to process but without any luck. E.g:
{
$match: { $or: [{ _portfolioType: 'account' }, { _portfolioType: 'sale' }] }
},
I also tried using { explain: true } but it doesn't return anything helpful.
As Neil Lunn noticed, you very likely don't need the lookup to reach your "end goal", which is still quite vague.
Please read comments and adjust as needed:
db.temp.aggregate([
{$group:{
// Get unique plates
_id: "$plate",
// Not clear what you expect if there are documents with
// different company, and the same plate.
// Assuming "it never happens"
// You may need to $cond it here with {$eq: ["$_portfolioType", "account"]}
// but you never voiced it.
company: {$first:"$company"},
// Not exactly all documents with _portfolioType: sale,
// but rather price from all documents for this plate.
// Assuming price field is available only in documents
// with "_portfolioType" : "sale". Otherwise add a $cond here.
// If you really need "all documents", push $$ROOT instead.
prices: {$push: "$price"}
}},
{$project: {
company: 1,
// Apply your math here, or on the previous stage
// to calculate revenue per plate
revenue: "$prices"
}}
{$group: {
// Get document for each "company"
_id: "$company",
// Revenue associated with plate
revenuePerPlate: {$push: {"k":"$_id", "v":"$revenue"}}
}},
{$project:{
_id: 0,
company: "$_id",
// Count of unique plate
platesCnt: {$size: "$revenuePerPlate"},
// arrayToObject if you wish plate names as properties
revenuePerPlate: {$arrayToObject: "$revenuePerPlate"}
}}
])

MongoDB - multiple queries based on condition

I have a query that looks something like this.
employees.aggregate(
[{ "$match":
{"$and": [
{"$or": [
{name : { "$regex": param, "$options":"i"}},
{title : { "$regex": param, "$options":"i"}},
]},
{ tenure : true }
]}
},
{"$sort":{experience : -1}},
{"$limit" : 100}
])
I would like to update this query to something like this.
search the employees collection where name = param and tenure = true
if data exists the sort the results by experience and limit the results to 100
if no results found then search the same collection using title and no need to sort the results.
Can someone please help with this?
You need to apply a conditional sorting in query. And another important thing is the value of $regex must have in string(" "). You can't pass like
$regex: User
you need to pass it in string.
db.hotspot.aggregate(
[{
"$match": {
"$and": [{
"$or": [{
name: {
"$regex": "SD",
"$options": "i"
}
},
{
radius: {
"$regex": "250",
"$options": "i"
}
},
]
},
{
infinite: false
}
]
}
},
{
$project: {
sort: {
$cond: {
if: {
$eq: ["$radius", 250]
},
then: "$name",
else: "$_id"
}
}
}
},
{
$sort: {
sort: 1
}
}
])

How can I get max value in nested documents?

I have a collection(named menucategories) in MongoDB 3.2.11:
{
"_id" : ...
"menus" : [
{
"code":0
},
{
"code":1
},
{
"code":2
},
{
"code":3
}
]
},
{
"_id" : ...
"menus" : [
{
"code":4
},
{
"code":5
},
{
"code":6
},
{
"code":7
}
]
},
{
"_id" : ...
"menus" : [
{
"code":8
},
{
"code":9
},
{
"code":10
},
{
"code":11
}
]
}
Every menucategory has array named menus. And every menu(element of the array) has code. The 'code' of menus is unique in every menu. I wanna get the maximum value of menu's code(in this case, 11). How can I achieve this?
If you want to find maximum value of code from all menus code then probable query will be as follows:
db.menucategories.aggregate([
{ $unwind: '$menus' },
{ $group: { _id: null, max: { $max: '$menus.code' } } },
{ $project: { max: 1, _id:0 } }
])
Click below links for more information regarding different operators:
$unwind, $group, $project
You don't need to use the $unwind aggregation pipeline operator here because starting from MongoDB 3.2, some accumulator expressions are available in the $project stage.
db.collection.aggregate([
{"$project": {"maxPerDoc": {"$max": "$menus.code"}}},
{"$group": {"_id": null, "maxValue": {"$max": "$maxPerDoc"}}}
])
Responding a previous now deleted comment, you don't need to put your pipeline in an array so the following query will work as well.
db.collection.aggregate(
{"$project": {"maxPerDoc": {"$max": "$menus.code"}}},
{"$group": {"_id": null, "maxValue": {"$max": "$maxPerDoc"}}}
)
Try with aggregation:
db.collection.aggregate({ $group : { _id: 1, max: { $max: {$max : "$menus.code"}}}});
No need of any unwind, if you need find only maximum value.

Get Distinct list of two properties using MongoDB 2.4

I have an article collection:
{
_id: 9999,
authorId: 12345,
coAuthors: [23456,34567],
title: 'My Article'
},
{
_id: 10000,
authorId: 78910,
title: 'My Second Article'
}
I'm trying to figure out how to get a list of distinct author and co-author ids out of the database. I have tried push, concat, and addToSet, but can't seem to find the right combination. I'm on 2.4.6 so I don't have access to setUnion.
Whilst $setUnion would be the "ideal" way to do this, there is another way that basically involved "switching" between a "type" to alternate which field is picked:
db.collection.aggregate([
{ "$project": {
"authorId": 1,
"coAuthors": { "$ifNull": [ "$coAuthors", [null] ] },
"type": { "$const": [ true,false ] }
}},
{ "$unwind": "$coAuthors" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"$cond": [
"$type",
"$authorId",
"$coAuthors"
]
}
}},
{ "$match": { "_id": { "$ne": null } } }
])
And that is it. You may know the $const operation as the $literal operator from MongoDB 2.6. It has always been there, but was only documented and given an "alias" at the 2.6 release.
Of course the $unwind operations in both cases produce more "copies" of the data, but this is grouping for "distinct" values so it does not matter. Just depending on the true/false alternating value for the projected "type" field ( once unwound ) you just pick the field alternately.
Also this little mapReduce does much the same thing:
db.collection.mapReduce(
function() {
emit(this.authorId,null);
if ( this.hasOwnProperty("coAuthors"))
this.coAuthors.forEach(function(id) {
emit(id,null);
});
},
function(key,values) {
return null;
},
{ "out": { "inline": 1 } }
)
For the record, $setUnion is of course a lot cleaner and more performant:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": ["A"],
"as": "el",
"in": "$authorId"
}},
{ "$ifNull": [ "$coAuthors", [] ] }
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined"
}}
])
So there the only real concerns are converting the singular "authorId" to an array via $map and feeding an empty array where the "coAuthors" field is not present in the document.
Both output the same distinct values from the sample documents:
{ "_id" : 78910 }
{ "_id" : 23456 }
{ "_id" : 34567 }
{ "_id" : 12345 }