I somehow created duplicates of every single entry in my database. Currently, there are 176039 documents and counting, half are duplicates. Each document is structured like so
_id : 5b41d9ccf10fcf0014fe8917
originName : "Hartsfield Jackson Atlanta International Airport"
destinationName : "Antigua"
totalDuration : 337
Inside the MongoDB Compass Community App for Mac under the Aggregations tab, I was able to find duplicates using this pipeline
[
{$group: {
_id: {originName: "$originName", destinationName: "$destinationName"},
count: {$sum: 1}}},
{$match: {count: {"$gt": 1}}}
]
I'm not sure how to move forward and delete the duplicates at this point. I'm assuming it has something to do with $out.
Edit: Something I didn't notice until now is that the values for totalDuration on each double are actually different.
Add
{$project:{_id:0, "originName":"$_id.originName", "destinationName":"$_id.destinationName"}},
{ $out : collectionname }
This will replace the documents in your current collection with documents from aggregation pipeline. If you need totalDuration in the collection then add that field in both group and project stage before running the pipeline
Related
as stated in the title i'm having some problems querying from MongoDB Compass using the aggregate methhod. I have a collection of documents in this form:
{"Array":[{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},...]}
using mongo shell or Studio 3T software I query it with aggregate method, follows an example:
db.collection.aggregate([
{ $match: {"Array.field": "val"}},
{ $unwind: "$Array"},
{ $match: {"Array.field": "val"}},
{ $group: {_id: null, count: {$sum:NumberInt(1)}, Array: {$push: "$Array"}}},
{ $project: {"N. Hits": "$count", Array:1}}
])
where I look for elements of Array who has field's value = "val" and count them. This works perfectly, but I don't know how to do the same in MongoDB Compass
in the query bar I have 'filter', 'project' and 'sort' and I can do usual queries, but i don't know how to use aggregate method.
Thanks
You are looking at the Documents tab which is restricted for querying documents.
Take a look in the second tab called Aggregations where you can do your aggregation pipelines, as usual.
For further information please visit the Aggregation Pipeline Builder documentation.
Following is my MongoDB query to show the organization listing along with the user count per organization. As per my data model, the "users" collection has an array userOrgMap which maintains the organizations ( by orgId) to which the user belongs to. The "organization" collection doesn't store the list of assigned users in its collection. The "users" collection has 11,200 documents and the "organizations" has 10,500 documents.
db.organizations.aggregate([
{$lookup : {from:"users",localField:"_id", foreignField:"userOrgMap.orgId",as:"user" }},
{ $project : {_id:1,name:1,"noOfUsers":{$size:"$user"}}},
{$sort:{noOfUsers:-1},
{$limit : 15},
{$skip : 0}
]);
Without the sort, the query works fast. With the sort, the query works very slow. It takes around 200 secs.
I even tried another way which is also taking more time.
db.organizations.aggregate([
{$lookup : {from:"users",localField:"_id", foreignField:"userOrgMap.orgId",as:"user" }},
{$unwind:"$user"}
{$group :{_id:"$_id"},name:{"$firstName":"$name"},userCount:{$sum:1}},
{$sort:{noOfUsers:-1},
{$limit : 15},
{$skip : 0}
]);
For the above query, without the $sort itself takes more time.
Need help on how to solve this issue.
Get the aggregation to use an index that begins with noOfUsers as I do not see a $match stage here.
The problem is resolved. I created an index on "userOrgMap.orgId". The query is fast now.
5 million mongo doc:
{
_id: xxx,
devID: 123,
logLevel: 5,
logTime: 1468464358697
}
indexes:
devID
my aggregate:
[
{$match: {devID: 123}},
{$group: {_id: {level: "$logLevel"}, count: {$sum: 1}}}
]
aggregate result:
{ "_id" : { "level" : 5 }, "count" : 5175872 }
{ "_id" : { "level" : 1 }, "count" : 200000 }
aggregate explain:
numYields:42305
29399ms
Q:
if mongo without writing(saving) data, it will take 29 seconds
if mongo is writing(saving) data, it will take 2 minutes
my aggregate result need to reply to web, so 29sec or 2min are too long
How can i solve it? preferably 10 seconds or less
Thanks all
In your example, the aggregation query for {devID: 123, logLevel:5} returns a count of 5,175,872 which looks like it counted all the documents in your collection (since you mentioned you have 5 million documents).
In this particular example, I'm guessing that the {$match: {devID: 123}} stage matches pretty much every document, hence the aggregation is doing what is essentially a collection scan. Depending on your RAM size, this could have the effect of pushing your working set out of memory, and slow down every other query your server is doing.
If you cannot provide a more selective criteria for the $match stage (e.g. by using a range of logTime as well as devID), then a pre-aggregated report may be your best option.
In general terms, a pre-aggregated report is a document that contains the aggregated information you require, and you update this document every time you insert into the related collection. For example, you could have a single document in a separate collection that looks like:
{log:
{devID: 123,
levelCount: [
{level: 5, count: 5175872},
{level: 1, count: 200000}
]
}}
where that document is updated with the relevant details every time you insert into the log collection.
Using a pre-aggregated report, you don't need to run the aggregation query anymore. The aggregated information you require is instead available using a single find() query instead.
For more examples on pre-aggregated reports, please see https://docs.mongodb.com/ecosystem/use-cases/pre-aggregated-reports/
I have a document with an array (which should be denormalised, but can't be because the reactive events will fire "add" too many times at client startup).
I need to be able to push a document to that array, and keep it in sorted (or roughly sorted) order. I've tried this query:
{ $push: {
'events': {
$each: [{'id': new Mongo.ObjectID, 'start':startDate,...}],
$sort: {'start': 1},
$slice: -1
}
}
But it requires the $slice operator to be present... I don't want to delete all my old data, I just want to be able to insert data into an array, and then have that array be sorted so that I can query the array later and say "slice greater than or equal to time X".
Is this possible?
Edit:
This mongo aggregate query nearly works, except for one level of document in the result array, but aggregating is not reactive (probably because they're expensive computations). Here is the aggregate query if anyone can see how to translate it to a find, or why it can't be translated:
Coll.aggregate({$unwind: '$events'},
{$sort: {'events.start':1}},
{$match: {'events.start': {$gte: new Date()}}},
{$group: {_id: '$_id', 'events': {$push: '$events'} }})
I have a MongoDB Document like as follows
{
"_id":1,
"name":"XYZ"
ExamScores:[
{ExamName:"Maths", UnitTest:1, Score:100},
{ExamName:"Maths", UnitTest:2, Score:80},
{ExamName:"Science", UnitTest:1, Score:90}
]
}
I Need to retrieve this document so that it has to show only Maths Array. Like as follows
{
"_id":1,
"name":"XYZ"
ExamScores:[
{ExamName:"Maths", UnitTest:1, Score:100},
{ExamName:"Maths", UnitTest:2, Score:80},
]
}
How Can I Do That ?
As #karin states there is no, normal, in query method of doing this.
In version 2.2 you can use $elemMatch to project the first matching result from ExamScores but you cannot get multiple.
That being said, the aggregation framework can do this:
db.col.aggregate([
{$unwind: '$ExamScores'},
{$match: {'ExamScores.ExamName':"Maths"}},
{$group: {_id: '$_id', name: '$name', ExamScores: {$push: '$ExamScores'}}}
])
Something like that anyway.
This has been asked before MongoDB query to limit values based on condition, the only answer there says it is not possible, but that there is a request to implement that.