This is the sample of my mongodb document( try to use jsonformatter.com to analyse it):
{"_id":"6278686","playerName":"Rohit Lal","tournamentId":"197831","score":[{"_id":"1611380","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":["Mohit Mishra"],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1602732","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536514","runsScored":1,"ballFaced":3,"fours":0,"sixes":0,"strikeRate":33.33,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"run out Sameer Baveja","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536474","runsScored":2,"ballFaced":7,"fours":0,"sixes":0,"strikeRate":28.57,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c Rajesh b Prasad Naik","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1536467","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1500825","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1461428","runsScored":18,"ballFaced":6,"fours":1,"sixes":2,"strikeRate":300,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"not out","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1461408","runsScored":0,"ballFaced":1,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c Sudhir b Vinay Kasat *vk*","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1451175","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1451146","runsScored":0,"ballFaced":0,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"-","catches":[],"stumping":[],"runout":[],"participatedRunout":[]},{"_id":"1392796","runsScored":0,"ballFaced":1,"fours":0,"sixes":0,"strikeRate":0,"oversBowled":0,"runsConceded":0,"economyRate":0,"wickets":0,"maiden":0,"howToOut":"c †Vinay Kedia b Lalit","catches":[],"stumping":[],"runout":[],"participatedRunout":[]}],"__v":0}
I want to sum the runsScored field of all objects inside score array. I know, I can achieve it using aggregation framework, but I am begineer in mongodb and does not have knowledge of many aggregation operators.
To avoid $unwind if you want to get the total for each document, you can use this aggregation stage:
db.collection.aggregate([
{
"$project": {
"sum": {
"$sum": "$score.runsScored"
}
}
}
])
The trick here is using $score.runsScored it generates an array with all values, then you only have to $sum these values.
Example here
The other way is using $unwind and $group like this. Note that in this example _id is null to sum all values in the collection, to get the total for each document you have to use _id: $_id like this example
I have a collection with over 10 Million records, I need to match with a particular field and get
the distinct _ids of the records set.
after the $match pipeline the result set becomes less than 5 Million.
if i group with id to get the unique ids, the execution time on my local environment is over 20 seconds.
db.getCollection('viewscounts').aggregate(
[
{
$match: {
MODULE_ID: 4,
}
},
{
$group: {
_id: '$ITEM_ID',
}
}
], { allowDiskUse: true })
If I get rid of either $match or $group and have only 1 pipeline, the execution time is less than 0.1 seconds.
I'm okay with limiting the _ids, but they should be unique.
Can anyone suggest a better way to get the results faster?
You have already implemented the best Aggregation pipelines possible for the query to get your desired output.
The reason why your query results are faster when using only one of the aggregation pipelines is that the query result returns partial output instead of the entire 5 million records. where when you add both the stages, the entire output of the $match stage has to be processed by $group stage resulting in more time.
The only way to optimize your aggregation query is to apply indexes on MODULE_ID and ITEM_ID keys
db.viewscounts.createIndex({MODULE_ID: 1}, { sparse: true })
db.viewscounts.createIndex({ITEM_ID: 1})
It should be faster after you perform the above two indexes on your viewscounts collection.
Additionally, you can also get your desired output from MongoDB distinct command. Give the below query a try and see if it helps.
db.getCollection('viewscounts').distinct("ITEM_ID", {"MODULE_ID": 4})
Note: The above query returns an array of unique key-values instead of objects like in the aggregation query
Hope this helps
When I search for documents containing title 'Apple' as such:
db.getCollection('items').find({title: {$regex: 'Apple'}})
The documents with title containing 'Apple' are returned as expected
When I search for documents where priceA is less than priceB as such:
db.getCollection('items').find({"$where":"this.priceA < this.priceB"})
The documents where priceA is less than priceB are returned as expected.
However, when I try to do both together as such:
db.getCollection('items').find({title: {$regex: 'Apple'}},
{"$where":"this.priceA < this.priceB"})
Only the _id field is returned, and it's really all the documents in my collection and the two filters above are not applied at all:
How do I apply both filters?
Your query is wrong try this. You are passing the second condition wrong in projection method check this out for more details
Mongodb find() method takes 3 arguments
condition:{title: {$regex: 'Apple'},"$where":"this.priceA < this.priceB"}
Projection:
By default t you wont specify any thing in projection it will return you the entire document but if you want only specific details you can pass it like {name:1,email:1}
Options:
sample example is{sort:1,limit:1}
db.getCollection('items').find({title: {$regex: 'Apple'},"$where":"this.priceA < this.priceB"}, {})
I'm trying to use the aggregation framework to group a lot of strings together to indentify the unique ones. I must also keep some information about the rest of the fields. This would be analogous to me using the * operator in mysql with a group by statement.
SELECT *
FROM my_table
GROUP BY field1
I have tried using the aggregation framework, and it works fine just to get unique fields.
db.mycollection.aggregate({
$group : { _id : "$field1"}
})
What if I want the other fields that went with that. MySQL would only give me the first one that appeared in the group (which I'm fine with). Thats what I thought the $first operator did.
db.mycollection.aggregate({
$group : {
_id : "$field1",
another_field : {$first : "$field2"}
}})
This way it groups by field1 but still gives me back the other fields attached to document. When I try this I get:
exception: aggregation result exceeds maximum document size (16MB)
Which I have a feeling is because it is returning the whole aggregation back as one document. Can I return it as another json array?
thanks in advance
You're doing the aggregation correctly, but as the error message indicates, the full result of the aggregate call cannot be larger than 16 MB.
Work-arounds would be to either add a filter to reduce the size of the result or use map-reduce instead and output the result to another collection.
If you unique values of the result does not exceed 2000 you could use group() function like
db.mycollection.group( {key : {field1 : 1, field2 : 1}}, reduce: function(curr, result){}, initial{} })
Last option would be map reduce:
db.mycollection.mapReduce( function() { emit( {field1 :1, field2: 1}, 1); }, function(key, values) { return 1;}, {out: {replace: "unique_field1_field2"}})
and your result would be in "unique_field1_field2" collection
Another alternative is use the distinct function:
db.mycollection.distinct('field1')
This functions accepts a second argument, a query, where you can filter the documents.