Using $match after a $lookup in MongoDB - mongodb

I have two collections and I want to get fields from both, so I'm using $lookup in an aggregation pipeline.
This works fine and returns all the documents with an extra field, an array with 0 or 1 elements (an object). If 0 elements, it means that the JOIN(in SQL world) didn't return anything. If 1 element, it means that there was a match and the element in an object with the fields of the second collection.
Now that I have those results, I'd like to use $match in order to filter some of the results.
In order to use $match I first want to use $unwind on that new extra field in order to extract the array. The problem is once I insert the $unwind stage, the result of the query is a single document.
Why is this happening? How can I $unwind and then $match all the documents I got from the $lookup stage?

assume we have documents after lookup:
{doc:{_id:1, lookupArray:[{doc:1},{doc:2}]}}
and
{doc:{_id:2, lookupArray:[/* empty */]}}
when we $unwind without any options we will get:
{doc:{_id:1, lookupArray:{doc:1}}}
{doc:{_id:1, lookupArray:{doc:2}}}
null
and when we specify
{ $unwind: { path: "$array", preserveNullAndEmptyArrays: true } }
then we will get:
{doc:{_id:1, lookupArray:{doc:1}}}
{doc:{_id:1, lookupArray:{doc:2}}}
{doc:{_id:2, lookupArray:[/* empty */]}}
So when you want to perform a search for a value doc from lookupArray, $match will look like this:
{$match:{'lookupArray.doc':2}}
Any comments welcome!

Related

How to sum a particular number field present in all objects inside an array in mongodb document

This is the sample of my mongodb document( try to use jsonformatter.com to analyse it):
{"_id":"6278686","playerName":"Rohit Lal","tournamentId":"197831","score":[{"_id":"1611380","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":["Mohit Mishra"],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1602732","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536514","runsScored":1,"ballFaced":3,"fours":0,"sixes":0,"strikeRate":33.33,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"run out Sameer Baveja","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536474","runsScored":2,"ballFaced":7,"fours":0,"sixes":0,"strikeRate":28.57,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c Rajesh b Prasad Naik","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536467","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1500825","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1461428","runsScored":18,"ballFaced":6,"fours":1,"sixes":2,"strikeRate":300,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"not out","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1461408","runsScored":0,"ballFaced":1,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c Sudhir b Vinay Kasat *vk*","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1451175","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1451146","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1392796","runsScored":0,"ballFaced":1,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c †Vinay Kedia b Lalit","catches":[],"stumping":[],"runout":[],"participatedRunout":[]}],"__v":0}
I want to sum the runsScored field of all objects inside score array. I know, I can achieve it using aggregation framework, but I am begineer in mongodb and does not have knowledge of many aggregation operators.
To avoid $unwind if you want to get the total for each document, you can use this aggregation stage:
db.collection.aggregate([
{
"$project": {
"sum": {
"$sum": "$score.runsScored"
}
}
}
])
The trick here is using $score.runsScored it generates an array with all values, then you only have to $sum these values.
Example here
The other way is using $unwind and $group like this. Note that in this example _id is null to sum all values in the collection, to get the total for each document you have to use _id: $_id like this example

How to filter to check if any array element to match a condition in MongoDb?

Currently I could filter with this MongoDb query
db.getCollection("entity").find(
{
"NameDetails.Name.0.NameValue.0.EntityName" : /ABC/
}
);
How do I loop through all the Name and then all the NameValue to search for /ABC/? If any of it matches, it returns as result.
You need to either use elemMatch or unwind.
If you know that it would match one element always, use elemMatch
or
If you want all the matching elements in the array, go for unwind then group.

Mongo Query Optimization for collection over 10 Million records

I have a collection with over 10 Million records, I need to match with a particular field and get
the distinct _ids of the records set.
after the $match pipeline the result set becomes less than 5 Million.
if i group with id to get the unique ids, the execution time on my local environment is over 20 seconds.
db.getCollection('viewscounts').aggregate(
[
{
$match: {
MODULE_ID: 4,
}
},
{
$group: {
_id: '$ITEM_ID',
}
}
], { allowDiskUse: true })
If I get rid of either $match or $group and have only 1 pipeline, the execution time is less than 0.1 seconds.
I'm okay with limiting the _ids, but they should be unique.
Can anyone suggest a better way to get the results faster?
You have already implemented the best Aggregation pipelines possible for the query to get your desired output.
The reason why your query results are faster when using only one of the aggregation pipelines is that the query result returns partial output instead of the entire 5 million records. where when you add both the stages, the entire output of the $match stage has to be processed by $group stage resulting in more time.
The only way to optimize your aggregation query is to apply indexes on MODULE_ID and ITEM_ID keys
db.viewscounts.createIndex({MODULE_ID: 1}, { sparse: true })
db.viewscounts.createIndex({ITEM_ID: 1})
It should be faster after you perform the above two indexes on your viewscounts collection.
Additionally, you can also get your desired output from MongoDB distinct command. Give the below query a try and see if it helps.
db.getCollection('viewscounts').distinct("ITEM_ID", {"MODULE_ID": 4})
Note: The above query returns an array of unique key-values instead of objects like in the aggregation query
Hope this helps

MongoDb: Combine two find operations together

When I search for documents containing title 'Apple' as such:
db.getCollection('items').find({title: {$regex: 'Apple'}})
The documents with title containing 'Apple' are returned as expected
When I search for documents where priceA is less than priceB as such:
db.getCollection('items').find({"$where":"this.priceA < this.priceB"})
The documents where priceA is less than priceB are returned as expected.
However, when I try to do both together as such:
db.getCollection('items').find({title: {$regex: 'Apple'}},
{"$where":"this.priceA < this.priceB"})
Only the _id field is returned, and it's really all the documents in my collection and the two filters above are not applied at all:
How do I apply both filters?
Your query is wrong try this. You are passing the second condition wrong in projection method check this out for more details
Mongodb find() method takes 3 arguments
condition:{title: {$regex: 'Apple'},"$where":"this.priceA < this.priceB"}
Projection:
By default t you wont specify any thing in projection it will return you the entire document but if you want only specific details you can pass it like {name:1,email:1}
Options:
sample example is{sort:1,limit:1}
db.getCollection('items').find({title: {$regex: 'Apple'},"$where":"this.priceA < this.priceB"}, {})

MongoDB, using group by aggregate framework to get unique strings

I'm trying to use the aggregation framework to group a lot of strings together to indentify the unique ones. I must also keep some information about the rest of the fields. This would be analogous to me using the * operator in mysql with a group by statement.
SELECT *
FROM my_table
GROUP BY field1
I have tried using the aggregation framework, and it works fine just to get unique fields.
db.mycollection.aggregate({
$group : { _id : "$field1"}
})
What if I want the other fields that went with that. MySQL would only give me the first one that appeared in the group (which I'm fine with). Thats what I thought the $first operator did.
db.mycollection.aggregate({
$group : {
_id : "$field1",
another_field : {$first : "$field2"}
}})
This way it groups by field1 but still gives me back the other fields attached to document. When I try this I get:
exception: aggregation result exceeds maximum document size (16MB)
Which I have a feeling is because it is returning the whole aggregation back as one document. Can I return it as another json array?
thanks in advance
You're doing the aggregation correctly, but as the error message indicates, the full result of the aggregate call cannot be larger than 16 MB.
Work-arounds would be to either add a filter to reduce the size of the result or use map-reduce instead and output the result to another collection.
If you unique values of the result does not exceed 2000 you could use group() function like
db.mycollection.group( {key : {field1 : 1, field2 : 1}}, reduce: function(curr, result){}, initial{} })
Last option would be map reduce:
db.mycollection.mapReduce( function() { emit( {field1 :1, field2: 1}, 1); }, function(key, values) { return 1;}, {out: {replace: "unique_field1_field2"}})
and your result would be in "unique_field1_field2" collection
Another alternative is use the distinct function:
db.mycollection.distinct('field1')
This functions accepts a second argument, a query, where you can filter the documents.