MongoDB indexes are ignored inside the $facet pipeline - mongodb

I have all combination of compound indexes for this collection. The aggregattion query i used is:
db.products.aggregate( [
{
$facet: {
"categorizedByColor": [
{
$match: {
size: { $in : [50,60,70] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
},
{
$bucket: {
groupBy: "$color",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedBySize": [
{
$match: {
color: { $in : ["Red","Green","Blue"] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
},
{
$bucket: {
groupBy: "$size",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"categorizedByBrand": [
{
$match: {
color: { $in : ["Red","Green","Blue"] },
size: { $in : [50,60,70] }
}
},
{
$bucket: {
groupBy: "$brand",
default: "Other",
output: {
"count": { $sum: 1 }
}
}
}
],
"productResults": [
{
$match: {
color: { $in : ["Red","Green","Blue"] },
size: { $in : [50,60,70] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
}
]
}
}
]);
This query took around 6s to populate the results. Is there any alterative approach available to use mongodb indexing?
Note: This aggregation query have more than 14 facet pipelines. For better understanding i have provided only 4 facet pipelines.

Sometimes 14 queries can do the job and sometimes not.
If the $facet is the first step in the aggregation pipeline, 14 queries are a more efficient option, but if this $facet is following a complex pipeline to create or filter these documents, there are alternatives to this $facet's $match. Sometimes One needs a snapshot of the db, which 14 queries can not give, since the db may change in-between.
Since we don't have any data of former actions in this pipeline, and the question is regarding alternatives that will allow to use the indexes to make the rest of the query faster, I can offer one option for example. It is hard to tell if it will be faster than other options, according to the data we have here, but it will allow to use the indexes, which is the main idea of the question:
The first step is according to both #Takis and #Gibbs smart suggestion.
The second phase will use the indexes to make the $facet's $match much easier, by marking in advance which document belongs to which $facet pipeline.
db.collection.aggregate([
{
$match: {$or: [
{size: {$in: [50, 60, 70]}},
{color: {$in: ["Red", "Green", "Blue"]}},
{brand: {$in: ["Raymond", "Allen Solly", "Van Heusen"]}}
]
}
},
{
$addFields: {
categorizedByColor: {
$cond: [{$and: [{$in: ["$size",[50, 60, 70]]},
{$in: ["$brand",["Raymond", "Allen Solly", "Van Heusen"]]}]
}, true, false]
},
categorizedBySize: {
$cond: [{$and: [{$in: [ "$color", ["Red", "Green", "Blue"]]},
{$in: ["$brand",["Raymond", "Allen Solly", "Van Heusen"]]}]
}, true, false]
},
categorizedByBrand: {
$cond: [{$and: [{$in: [ "$color", ["Red", "Green", "Blue"]]},
{$in: ["$size",[50, 60, 70]]}]
}, true, false]
},
productResults: {
$and: [{$in: ["$color", ["Red", "Green", "Blue"]]},
{$in: ["$size",[50, 60, 70]]},
{$in: ["$brand",["Raymond", "Allen Solly", "Van Heusen"]]}]
}
}
},
{
$facet: {
"categorizedByColor": [
{$match: {categorizedByColor: true}},
{
$bucket: {
groupBy: "$color",
default: "Other",
output: {"count": {$sum: 1}}
}
}
],
"categorizedBySize": [
{$match: {categorizedBySize: true}},
{
$bucket: {
groupBy: "$size",
default: "Other",
output: {"count": {$sum: 1}}
}
}
],
"categorizedByBrand": [
{$match: {categorizedByBrand: true}},
{
$bucket: {
groupBy: "$brand",
default: "Other",
output: {"count": {$sum: 1}}
}
}
],
"productResults": [{$match: {productResults: true}}]
}
}
])
Playground example
Going a step further, there is even a way to get these results in one query without the $facet step at all, by using $group with $push with $cond instead. This should iterate over the documents once, instead of 14 times, but may result in a large document (with duplicates of data per each categorization). The main idea of such a solution can be seen on this mongoDB playground. It is important to say that these methods are not necessarily better or worse than other. The "right" solution depends on your specific case and data, which we can't see here. You asked for alternative approaches which will allow to use the indexes, so I'm pointing some directions.

Facet stage by default cannot use indexes and will perform COLLSCAN (full scan) when executed.
Because of that, you should use filtering (and sorting) way earlier in your pipeline, in order to get the "common data" for all the sub-pipelines in $facet.
So, in your case, filters :
$match: {
color: { $in : ["Red","Green","Blue"] },
size: { $in : [50,60,70] },
brand: { $in : ["Raymond","Allen Solly","Van Heusen"] }
}
should be used as a first stage in pipeline, then followed by $facet.
Hope I was clear enough. :)

Related

MongoDB select best matched document

I have a collection of documents like this:
[{
"_id" : ObjectId("6347e5aa0c009a37b81da700"),
"testField1" : "1000",
"testField2" : "2000",
"testField3" : NumberInt(1)
},
{
"_id" : ObjectId("6347e5890c009a37b81da701"),
"testField2" : 2000,
"testField3" : NumberInt(2)
},
{
"_id" : ObjectId("6347e5960c009a37b81da702"),
"testField3" : NumberInt(3)
}]
I need to retrieve documents in the below precedence.
if testField1 and testField2 exist and match their values, the query should return that document.
Otherwise, if testField2 exists and matches its value, the query should return that document,
Otherwise it should return the last document, where testField1 & testField2 do not exist.
I tried the below query, but it returns all the documents.
db.getCollection("TEST_COLLECTION").aggregate([
{
$match: {
$expr: {
$cond: {
if: {
$and: {"testField1": "1000", "testField2": "2000"}
},
then: {
$and: {"testField1": "1000", "testField2": "2000"}
},
else : {
$cond: {
if: {
$and: {"testField1": null, "testField2": "2000"}
},
then: {
$and: {"testField1": null, "testField2": "2000"}
},
else : {
$and: {"testField1": null, "testField2": null}
}
}
}
}
}
}
}
])
There are definitely still some open questions from the comments. #ray has an interesting approach linked in there that uses $setWindowFields which may be appropriate depending on exactly what you're looking for.
I took a different approach (and perhaps interpretation) and built out the following aggregation that uses $unionWith:
db.collection.aggregate([
{
$match: {
testField1: "1000",
testField2: "2000"
}
},
{
"$addFields": {
sortOrder: 1
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$match: {
testField2: "2000"
}
},
{
"$addFields": {
sortOrder: 2
}
}
]
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$match: {
testField1: {
$exists: false
},
testField2: {
$exists: false
}
}
},
{
"$addFields": {
sortOrder: 3
}
},
]
}
},
{
$sort: {
sortOrder: 1
}
},
{
$limit: 1
},
{
"$unset": "sortOrder"
}
])
Basically the aggregation will internally issue three queries, one corresponding with each of three precedence conditions. Similar to #ray's solution, it creates a field to sort on (sortOrder in mine) since the ordering of $unionWith is unspecified otherwise per the documentation. After the $sort we can $limit to a single result and $unset the temporary sorting field prior to returning the result to the client. Depending on the version you are running, you could consider adding a couple of inline $limits for each of the subpipelines to reduce the amount of work being done. Along with appropriate indexes (perhaps just { testField2: 1, testField: 1 }), this operation should be reasonably efficient.
Here is the playground link.
If there are several groups and you need to return the wanted document per group, I would go with #ray's answer. If there is only one group (as implies on your comment, and on #user20042973's nice answer), I would like to point another obvious option:
db.collection.aggregate([
{$facet: {
op1: [{$match: {testField1: "1000", testField2: "2000"}}],
op2: [{$match: {testField1: null, testField2: "2000"}}],
op3: [{$match: {testField1: null, testField2: null}},
{$sort: {timestamp: -1}}, {$limit: 1}]
}},
{$project: {res: {$ifNull: [{$first: "$op1"}, {$first: "$op2"}, {$first: "$op3"}]}}},
{$replaceRoot: {newRoot: "$res"}}
])
See how it works on the playground example

How to ggregate two collections and match field with array

I need to group the results of two collections candidatos and ofertas, and then "merge" those groups to return an array with matched values.
I've created this example with the aggregate and similar data to make this easier to test:
https://mongoplayground.net/p/m0PUfdjEye4
This is the explanation of the problem that I'm facing.
I can get both groups with the desired results independently:
candidatos collection:
db.getCollection('ofertas').aggregate([
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}}
]);
This is the result...
ofertas collection:
db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}}
]);
This is the result...
What I need to do, is to aggregate those groups to merge their results based on their _id coincidence. I think I'm going in the right way with the next aggregate, but the field countOfertas always returns 0.0. I think that there is something wrong in my project $cond, but I don't know what is it. This is the aggregate:
db.getCollection('candidatos').aggregate([
{"$group" : {_id:"$que_busco.ubicacion_puesto_trabajo.provincia", countProvinciaCandidato:{$sum:1}}},
{
$lookup: {
from: 'ofertas',
let: {},
pipeline: [
{"$group" : {_id:"$ubicacion_puesto.provincia", countProvinciaOferta:{$sum:1}}}
],
as: 'ofertas'
}
},
{
$project: {
_id: 1,
countProvinciaCandidato: 1,
countOfertas: {
$cond: {
if: {
$eq: ['$ofertas._id', "$_id"]
},
then: '$ofertas.countProvinciaOferta',
else: 0,
}
}
}
},
{ $sort: { "countProvinciaCandidato": -1}},
{ $limit: 20 }
]);
And this is the result, but as you can see, field countOfertas is always 0
Any kind of help will be welcome
What you have tried is so much appreciated. But in $project you need to use $reduce which helps to loop through the array and satisfy the condition
Here is the code
db.candidatos.aggregate([
{
"$group": {
_id: "$que_busco.ubicacion_puesto_trabajo.provincia",
countProvinciaCandidato: { $sum: 1 }
}
},
{
$lookup: {
from: "ofertas",
let: {},
pipeline: [
{
"$group": {
_id: "$ubicacion_puesto.provincia",
countProvinciaOferta: { $sum: 1 }
}
}
],
as: "ofertas"
}
},
{
$project: {
_id: 1,
countProvinciaCandidato: 1,
countOfertas: {
"$reduce": {
"input": "$ofertas",
initialValue: 0,
"in": {
$cond: [
{ $eq: [ "$$this._id", "$_id" ] },
{ $add: [ "$$value", 1 ] },
"$$value"
]
}
}
}
}
},
{ $sort: { "countProvinciaCandidato": -1 } },
{ $limit: 20 }
])
Working Mongo playground
Note : If you need to do with aggregations only, this is fine. But I personally feel this approach is not good. My suggestion is, you can concurrently call group aggregations in different service and do it with programmatically. Because $lookup is expensive, when you get massive data, this performance will be reduced
The $eq in the $cond is comparing an array to an ObjectId, so it never matches.
The $lookup stage results will be in the ofertas field as an array of documents, so '$ofertas._id' will be an array of all the _id values.
You will probably need to use $unwind, $reduce after the $lookup.

How to make mongodb work with countDocuments() and $addFields

I have a document with the following structure:
{
'_id': '',
'a': '',
'b': '',
'c': [
{
'_id': '',
'd': '',
'f': [
{
'orderDate': 12345,
'orderProfit': 12,
},
{
'orderDate': 67891,
'orderProfit': 12341,
},
{
'orderDate': 23456,
'orderProfit': 474,
},
],
},
{
'_id': '',
'd': '',
'f': [
{
'orderDate': 14232,
'orderProfit': 12222,
},
{
'orderDate': 643532,
'orderProfit': 4343,
},
{
'orderDate': 33423,
'orderProfit': 5555,
},
],
},
],
}
orderDate is an int64 that represents the date that an order was made
orderProfit is an int64 that represents the profit of an order
I needed to return the document that had the biggest "orderDate" and check if the "orderProfit" was the one i was looking for.
For that matter I used a query like this (in an aggregate query):
[
{
'$addFields': {
'orders': {
'$map': {
'input': '$c',
'as': 'c',
'in': {
'profit': {
'$filter': {
'input': '$$c.f',
'cond': {
'$eq': [
{
'$max': '$$c.f.orderDate',
},
'$$this.orderDate',
],
},
},
},
},
},
},
},
},
{
'$match': {
'$or': [
{ 'orders.profit.orderProfit': 500 },
],
},
},
];
It is working properly.
The issue comes when trying to add this query to a countDocuments() query in order to fetch the total number of documents.
It is a requirement to use the countDocuments().
I just can't seem to make it work...
$addFields throws as an unknown top level operator. if I remove the $addFields then I can't add to the countDocuments() the query that finds the max date. If I totally remove it $match is an unknown operator.
db.getCollection('orders').countDocuments(
{
"orders.profit.orderProfit" : {"Query that was shown previously"}
})
You can't.
countDocuments receives a find query, you can't attach an entire pipeline to it.
However countDocuments is just a wrapper.
Returns the count of documents that match the query for a collection or view. The method wraps the $group aggregation stage with a $sum expression to perform the count and is available for use in Transactions.
Basically this method just executes the following aggregation:
db.collection.aggregate([
{
$match: yourQuery
},
{
$group: {
_id: null,
sum: {$sum: 1}
}
}
])
And then returns results[0].sum or 0 depending on result.
So you can just use your own pipeline and add this stage at the end, it would literally be the same complexity wise.
db.collection.aggregate([
...
your entire pipeline
...
{
$group: {
_id: null,
sum: {$sum: 1}
}
}
])
If there's any other specific reason you want to not use the aggregation framework let me know, maybe there's a workaround.
The countDocuments() can't allow aggregation pipeline query,
You can use aggregation operators in match query but it cause the performance issues, you can use this when you don't have any way.
$let Binds variables for use in the specified expression, and returns the result of the expression,
vars, create variable for orders for your $addFields operation,
in, $map to iterate loop of $$orders.profit.orderProfit nested array and check $in condition if your profit amount found then it will return true otherwise false
$anyElementTrue will check returned is true then condition will true otherwise fales
db.getCollection('orders').countDocuments({
$expr: {
$let: {
vars: {
orders: {
"$map": {
"input": "$c",
"as": "c",
"in": {
"profit": {
"$filter": {
"input": "$$c.f",
"cond": {
"$eq": [{ "$max": "$$c.f.orderDate" }, "$$this.orderDate"]
}
}
}
}
}
}
},
in: {
$anyElementTrue: {
$map: {
input: "$$orders.profit.orderProfit",
in: { $in: [500, "$$this"] }
}
}
}
}
}
})
Playground
Second option, other way to handle this condition with less operators,
$filter to match your both condition with orderProfit
$size to get total result from above $filter result
now $map will return array of size that returned by filter
$sum to sum that return result number, if its greater than 0 then condition will be true otherwise false
db.getCollection('orders').countDocuments({
$expr: {
$sum: {
"$map": {
"input": "$c",
"as": "c",
"in": {
$size: {
"$filter": {
"input": "$$c.f",
"cond": {
$and: [
{ "$eq": [{ "$max": "$$c.f.orderDate" }, "$$this.orderDate"] },
{ "$eq": ["$$this.orderProfit", 500] }
]
}
}
}
}
}
}
}
})
Playground

How to use $elemMatch query in mongodb

I tried to filter the data using $elemmatch but it's not correctly working.
My Scenario
Prod Table
[
{
id:1.
product:[
{
id:1,
name:true
},
{
id:2,
name:true
},
{
id:3,
name:false
}
]
}
]
Query
db.Prod.find(
{"product.name": true},
{_id: 0, product: {$elemMatch: {name: true}}});
I got Output
[
{
id:1.
product:[
{
id:1,
name:true
}
]
}
]
Excepted Output
[
{
id:1.
product:[
{
id:1,
name:true
},
{
id:2,
name:true
}
]
}
]
How to achieve this Scenario and I referred this Link Retrieve only the queried element in an object array in MongoDB collection and I tried all the answers in this link mentioned. but Still it's not working can give an example query.
Hmm, you probably didn't try the aggregation provided in referenced link, 'cause it perfectly works.
db.collection.aggregate([
{
$match: {
"product.name": true
}
},
{
$project: {
product: {
$filter: {
input: "$product",
as: "prod",
cond: {
$eq: [
"$$prod.name",
true
]
}
}
}
}
}
])
Will output :
[
{
"_id": 1,
"product": [
{
"id": 1,
"name": true
},
{
"id": 2,
"name": true
}
]
}
]
Here's the example.
EDIT :
From the doc :
Usage Considerations
Both the $ operator and the $elemMatch operator project the first
matching element from an array based on a condition.
Considering this, you cannot achieve what you need with a find query, but only with an aggregation query.

MongoDB - multiple queries based on condition

I have a query that looks something like this.
employees.aggregate(
[{ "$match":
{"$and": [
{"$or": [
{name : { "$regex": param, "$options":"i"}},
{title : { "$regex": param, "$options":"i"}},
]},
{ tenure : true }
]}
},
{"$sort":{experience : -1}},
{"$limit" : 100}
])
I would like to update this query to something like this.
search the employees collection where name = param and tenure = true
if data exists the sort the results by experience and limit the results to 100
if no results found then search the same collection using title and no need to sort the results.
Can someone please help with this?
You need to apply a conditional sorting in query. And another important thing is the value of $regex must have in string(" "). You can't pass like
$regex: User
you need to pass it in string.
db.hotspot.aggregate(
[{
"$match": {
"$and": [{
"$or": [{
name: {
"$regex": "SD",
"$options": "i"
}
},
{
radius: {
"$regex": "250",
"$options": "i"
}
},
]
},
{
infinite: false
}
]
}
},
{
$project: {
sort: {
$cond: {
if: {
$eq: ["$radius", 250]
},
then: "$name",
else: "$_id"
}
}
}
},
{
$sort: {
sort: 1
}
}
])