$project multiple fields into one field - mongodb

I'm working with MongoDB 3.0 (we won't be upgrading until next year.) I have a requirement to get a list of unique values across multiple fields in a collection. The fields have the same value most of the time. This can be accomplished in version 3.2 by something like this:
db.mydata.aggregate([
{'$project': {'combined_users': ['$user1', '$user2']}},
{'$unwind': '$combined_users'},
{'$group': {_id: 1, {$addToSet: '$combined_users'}}}
The issue is in version 3.0 we get "disallowed field type Array in..." at the combined_data.
How do I accomplish the same thing in Mongo 3.0?

You need to use the $setUnion operator
db.mydata.aggregate([
{'$project': { 'combined_users': { "$setUnion": ['$user1', '$user2'] }}}
])

Related

Mongo Aggregation How to match an array inside lookup without using $expr

I have an aggregation pipeline query (I've removed unnecesary stuff) that work when using $expr but doesn't work without. However, I want to avoid using the $expr for better performance, so the indices will be used. Logically there is a many to many relation here between rule and resource. I want to summarize the cost of the resources per rule. The problem here is to match the resources inside the grouped array without using an expression.
with $expr:
db.collection.aggregate([
{'$group': {'_id': {'rule_id': '$rule_id'}, 'rule_id': {'$first': '$rule_id'}, 'resources_ids': {'$push': '$resource_id'}}},
{'$lookup':
{'from': 'other_collection',
'let': {'resources_ids': '$resources_ids'},
'pipeline': [
{'$match':
{'$expr":
{'$and': [
{'$in':['$resource_id', '$$resources_ids']}
]}
}
},
{'$group': {'_id': {}, 'total_cost': {'$sum': '$cost'}}}], 'as': 'results'}}])
without $expr:
db.collection.aggregate([
{'$group': {'_id': {'rule_id': '$rule_id'}, 'rule_id': {'$first': '$rule_id'}, 'resources_ids': {'$push': '$resource_id'}}},
{'$lookup':
{'from': 'cost_data',
'let': {'resources_ids': '$resources_ids'},
'pipeline': [
{'$match':
{'$and': [
{'resource_id': {'$in': '$$resources_ids'}},
]}
},
{'$group': {'_id': {}, 'total_cost': {'$sum': '$cost'}}}], 'as': 'results'}}])
I think #rickhg12hs' comment gives the right answer. There shouldn't be any need for $expr here. The localField/foreignField syntax will work correctly when the localField is an array without needing to $unwind (or use $expr) as documented here. Therefore the matching component of your $lookup can effectively look like this:
$lookup: {
from: "foreign",
localField: "resources_ids",
foreignField: "resource_id",
as: "result"
}
You can compare the outputs of this syntax above with the more verbose pipeline/$expr version to see that they are the same.
A few other thoughts come to mind. The first is that you can combine the localField/foreignField syntax with the pipeline syntax so that the second $group can still be nested inside of the $lookup. This would make the final version of the $lookup stage structured as follows:
{
$lookup: {
from: "foreign",
localField: "resources_ids",
foreignField: "resource_id",
pipeline: [
{
"$group": {
"_id": {},
"total_cost": {
"$sum": "$cost"
}
}
}
],
as: "result"
}
}
Playground demonstration of that component is here.
The second thing is that using an index to perform the $lookups is important and will likely improve performance, but it may not make this operation "fast". As written, this aggregation will perform a full collection scan to process all of the documents in the source collection. (You will still see that COLLSCAN in the explain output from this source collection even if the index in the other collection is used for the $lookup).
Finally, the index on the other collection should probably be { resource_id: 1, cost: 1 }. This should allow the database to cover the query when doing the $lookups and avoid fetching those documents altogether.
Edit to address this comment:
$expr is not required inside the match. I've done this already. However, here because of the array I am building through the pipeline it doesn't let me use it in the match without an $expr.
This is not correct. Specifically the source of the array isn't relevant here. Whether the array is a field in the source document directly or generated in an earlier pipeline stage doesn't matter to the $lookup stage at all. In fact, it won't even know where that array comes from, just that it is a field in the generated document that is passed to it.
Rather, what you are describing is the behavior of $match itself. From the documentation:
$match takes a document that specifies the query conditions. The query syntax is identical to the read operation query syntax; i.e. $match does not accept raw aggregation expressions. Instead, use a $expr query expression to include aggregation expression in $match.
Said another way, you presently cannot reference any fields from the document (regardless of where they come from or what type and value they have) without $expr.
But that fact should mostly be irrelevant for your use case. You can use the localField/foreignField syntax for this array matching. If you need to match on additional filters then you can also leverage the let/pipeline syntax in the same $lookup. Here is an arbitrary demonstration of that (note the _id: 4 document doesn't match due to the mismatched otherVal).
It is also worth noting that $expr itself does not preclude the usage of indexes in general. One current exception, unfortunately, seems to be with $in (reference). Again though, that shouldn't matter for you if you place that part of the $lookup matching into the localField/foreignField parameters.

How to delete duplicates using MongoDB Aggregations in MongoDB Compass Community

I somehow created duplicates of every single entry in my database. Currently, there are 176039 documents and counting, half are duplicates. Each document is structured like so
_id : 5b41d9ccf10fcf0014fe8917
originName : "Hartsfield Jackson Atlanta International Airport"
destinationName : "Antigua"
totalDuration : 337
Inside the MongoDB Compass Community App for Mac under the Aggregations tab, I was able to find duplicates using this pipeline
[
{$group: {
_id: {originName: "$originName", destinationName: "$destinationName"},
count: {$sum: 1}}},
{$match: {count: {"$gt": 1}}}
]
I'm not sure how to move forward and delete the duplicates at this point. I'm assuming it has something to do with $out.
Edit: Something I didn't notice until now is that the values for totalDuration on each double are actually different.
Add
{$project:{_id:0, "originName":"$_id.originName", "destinationName":"$_id.destinationName"}},
{ $out : collectionname }
This will replace the documents in your current collection with documents from aggregation pipeline. If you need totalDuration in the collection then add that field in both group and project stage before running the pipeline

Aggregate method in MongoDB Compass?

as stated in the title i'm having some problems querying from MongoDB Compass using the aggregate methhod. I have a collection of documents in this form:
{"Array":[{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},{"field":"val","field2":"val2"},...]}
using mongo shell or Studio 3T software I query it with aggregate method, follows an example:
db.collection.aggregate([
{ $match: {"Array.field": "val"}},
{ $unwind: "$Array"},
{ $match: {"Array.field": "val"}},
{ $group: {_id: null, count: {$sum:NumberInt(1)}, Array: {$push: "$Array"}}},
{ $project: {"N. Hits": "$count", Array:1}}
])
where I look for elements of Array who has field's value = "val" and count them. This works perfectly, but I don't know how to do the same in MongoDB Compass
in the query bar I have 'filter', 'project' and 'sort' and I can do usual queries, but i don't know how to use aggregate method.
Thanks
You are looking at the Documents tab which is restricted for querying documents.
Take a look in the second tab called Aggregations where you can do your aggregation pipelines, as usual.
For further information please visit the Aggregation Pipeline Builder documentation.

Mongodb aggregation sort on array primative

Given documents like
{
...
name:'whatever',
games: [122, 199, 201, 222]
}
db.col.aggregate({$match:{}},
{$sort:{'games.0': -1}})
doesn't sort ... no errors ... it just doesn't sort on the first array element of the games array.
Although a query with the same syntac .. works fine
col.find({}).sort({'games.0':-1})
if I change the collection so games is an array of objects like
[ {game1:198}, {game2:201} ...]
then the aggregation works using
{$sort:{'games.game1': -1}})
what am I missing to get this to work with an array of numbers?
Try unwinding the array first by applying the $unwind operator on the array, then use $sort on the deconstructed array and finally use $group to get the original documents structure:
db.coll.aggregate([
{"$unwind": "$games"},
{"$sort": {"games": 1}},
{
"$group": {
"_id": "$_id",
"name": {"$first": "$name"},
"games": {"$push": "$games"}
}
}
])
Try this:
db.coll.aggregate([
{"$unwind": "$games"},
{"$sort": {"games": -1}}
]}
I hope this will work for you as you expected.
In mongo 3.4 find sort i.e. db.col.find({}).sort({'games.0':-1}) works as expected whereas aggregation sort doesn't.
In mongo 3.6 both find and aggregation sort works as expected.
Jira issue for that: https://jira.mongodb.org/browse/SERVER-19402
I would recommend you to update your mongo version and your aggregation query will work fine.

Retrieve Array Of Documents in MongoDB

I have a MongoDB Document like as follows
{
"_id":1,
"name":"XYZ"
ExamScores:[
{ExamName:"Maths", UnitTest:1, Score:100},
{ExamName:"Maths", UnitTest:2, Score:80},
{ExamName:"Science", UnitTest:1, Score:90}
]
}
I Need to retrieve this document so that it has to show only Maths Array. Like as follows
{
"_id":1,
"name":"XYZ"
ExamScores:[
{ExamName:"Maths", UnitTest:1, Score:100},
{ExamName:"Maths", UnitTest:2, Score:80},
]
}
How Can I Do That ?
As #karin states there is no, normal, in query method of doing this.
In version 2.2 you can use $elemMatch to project the first matching result from ExamScores but you cannot get multiple.
That being said, the aggregation framework can do this:
db.col.aggregate([
{$unwind: '$ExamScores'},
{$match: {'ExamScores.ExamName':"Maths"}},
{$group: {_id: '$_id', name: '$name', ExamScores: {$push: '$ExamScores'}}}
])
Something like that anyway.
This has been asked before MongoDB query to limit values based on condition, the only answer there says it is not possible, but that there is a request to implement that.