Mongo group and push: pushing all fields - mongodb

Is there an easy way to "$push" all fields of a document?
For example:
Say I have a Mongo collection of books:
{author: "tolstoy", title:"war & peace", price:100, pages:800}
{author: "tolstoy", title:"Ivan Ilyich", price:50, pages:100}
I'd like to group them by author - for each author, list his entire book objects:
{ author: "tolstoy",
books: [
{author: "tolstoy", title:"war & peace", price:100, pages:800}
{author: "tolstoy", title:"Ivan Ilyich", price:50, pages:100}
]
}
I can achieve this by explicitly pushing all fields:
{$group: {
_id: "$author",
books:{$push: {author:"$author", title:"$title", price:"$price", pages:"$pages"}},
}}
But is there any shortcut, something in the lines of:
// Fictional syntax...
{$group: {
_id: "$author",
books:{$push: "$.*"},
}}

You can use $$ROOT
{ $group : {
_id : "$author",
books: { $push : "$$ROOT" }
}}
Found here: how to use mongodb aggregate and retrieve entire documents

Actually you cant achieve what you are saying at all, you need $unwind
db.collection.aggregate([
{$unwind: "$books"},
{$group: {
_id: "$author",
books:{$push: {
author:"$books.author",
title:"$books.title",
price:"$books.price",
pages:"$books.pages"
}},
}}
])
That is how you deal with arrays in aggregation.
And what you are looking for to shortcut typing all of the fields does not exist, yet.
But specifically because of what you have to do then you could not do that anyway as you are in a way, reshaping the document.

If problem is that you don't want to explicitly write all fields (if your document have many fields and you need all of them in result), you could also try to do it with Map-Reduce:
db.books.mapReduce(
function () { emit(this.author, this); },
function (key, values) { return { books: values }; },
{
out: { inline: 1 },
finalize: function (key, reducedVal) { return reducedVal.books; }
}
)

Related

How to filter array (of objects) inside one document in mongo db based on some condition

I have the below docs collection structure.
I'm able to filter the documnents with various approaches, but not able to filter the array inside the documents.
{
"_id": "",
"employee": {
"EmployeeAttributeValues": {
"EmployeeAttributeValue": [
{.....
},
{.....
},
{.....
},
{.....
}
]
}
}
}
Kindly help me on how to filter the MemberAttributeValue array based on some condition.
you can use $where operator for custom filtering
https://docs.mongodb.com/v4.2/reference/operator/query/where/
db.test.aggregate([
{ $match: {_id: <ID>}},
{ $unwind: '$<ARRAY>'},
{ $match: {'<ARRAY>.a': {$gt: 3}}},
{ $group: {_id: '$_id', list: {$push: '$<ARRAY>.a'}}}
])

How to get all subdocuments _id into variable

Im trying to get families subdocuments _ids to variable.
Here my schema:
families: [
{
_id: {
type: mongoose.Types.ObjectId
},
name: {
type: String
},
relation: {
type: String
}
}
]
the problem is, i can get the _id of parent to show inside variable, but when im trying to get the families _ids its showing undefined in console log.
What is the proper query to get families subdocuments _ids into variable?
Please try this :
db.yourCollection.aggregate([
{ $unwind: '$families' },
{ $project: { Ids: '$families._id' } }, { $group: { '_id': '$_id', subDocumentsIDs: { $push: '$Ids' } } }
])
Output:
/* 1 */
{
"_id" : ObjectId("5d58d3205a0d22d3c85d16f1"),
"subDocumentsIDs" : [
ObjectId("5d570b350e2fb4f72533d512"),
ObjectId("5d570b350e2fb4f71533d510"),
ObjectId("5d570b350e2fb4172533d511")
]
}
/* 2 */
{
"_id" : ObjectId("5d58d3105a0d22d3c85d1591"),
"subDocumentsIDs" : [
ObjectId("5d570b350e2fb4f72533d312"),
ObjectId("5d570b350e2fb4f71533d310"),
ObjectId("5d570b350e2fb4172533d311")
]
}
Please consider this as a basic example & go ahead with enhancements if anything needed, something like $unwind as an early stage would have performance impacts, if your collection is of large dataset, but you can easily avoid that by using $match as first stage, as you said you're able to get parent _id then use it in $match to filter documents

mongodb find matches based on count aggregation

I have a mongodb collection like this:
{"uid": "01370mask4",
"title": "hidden",
"post: "hidden",
"postTime": "01-23, 2016",
"unixPostTime": "1453538601",
"upvote": [2, 3]}
and I'd like to select post records from the users with more than 5 posts. The stucture should be the same, I just don't need the documents from users who don't have many posts.
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } }
]
)
Now I'm stuck at how to use the count values to find. I searched but didn't find any methods to add the count values back to the same collection by uid. Saving the aggregation output and joining them together seems not supported by mongodb. Please advise, thanks!
Update:
Sorry that I didn't make it clear earlier. Thanks for your prompt answers! I want a subset of the original collection, with post text, post timestamp, etc. I don't want a subset of the aggregation output.
If there aren't millions of documents, then you can try a shortcut way to achieve what you are trying using one aggregate and another find query,
Aggregate query:
var users = db.collection.aggregate(
[
{$group:{_id:'$uid', count:{$sum:1}}},
{$match:{count:{$gt:5}}},
{$group:{_id:null,users:{$push:'$_id'}}}
]
).toArray()[0]['users']
Then it's a straight ahead query to find the particular users:
db.collection.find({uid: {$in: users}})
Just add the $match after your group with the correct query and it works :
db.collection.aggregate(
[
{ $group : { _id : "$uid", count: { $sum: 1 } } },
{ $match : { count : { $gt : 5 } }
]
)
Please try this one to select users with more than 5 posts. To keep the original fields through using $first, if the $uid is unique, please add the field as below.
db.collection.aggregate([
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}])
)
If there are multiple value for the same $uid, you should use $push to an array in the $group.
If you want to save the result to db, please try it as below
var cur = db.collection.aggregate(
[
{$group: {
_id: '$uid',
title: {$first: '$title'},
post: {$first:'$post'},
postTime:{$first: '$postTime'},
unixPostTime:{$first: '$unixPostTime'},
upvote:{$first: '$upvote'},
count: {$sum: 1}
}},
{$match: {count: {$gte: 5}}}
]
)
cur.forEach(function(doc) {
db.collectioin.update({_id: doc._id}, {/*the field should be updated */});
});

MongoDB: count the number of items in an array

I have a collection where every document in the collection has an array named foo that contains a set of embedded documents. Is there currently a trivial way in the MongoDB shell to count how many instances are within foo? something like:
db.mycollection.foos.count() or db.mycollection.foos.size()?
Each document in the array needs to have a unique foo_id and I want to do a quick count to make sure that the right amount of elements are inside of an array for a random document in the collection.
In MongoDB 2.6, the Aggregation Framework has a new array $size operator you can use:
> db.mycollection.insert({'foo':[1,2,3,4]})
> db.mycollection.insert({'foo':[5,6,7]})
> db.mycollection.aggregate([{$project: { count: { $size:"$foo" }}}])
{ "_id" : ObjectId("5314b5c360477752b449eedf"), "count" : 4 }
{ "_id" : ObjectId("5314b5c860477752b449eee0"), "count" : 3 }
if you are on a recent version of mongo (2.2 and later) you can use the aggregation framework.
db.mycollection.aggregate([
{$unwind: '$foo'},
{$group: {_id: '$_id', 'sum': { $sum: 1}}},
{$group: {_id: null, total_sum: {'$sum': '$sum'}}}
])
which will give you the total foos of your collection.
Omitting the last group will aggregate results per record.
Using Projections and Groups
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo_count:{$size:"$foo"},
}
},
{
$group: {
foo_total:{$sum:"$foo_count"},
}
}
]
)
Multiple child array counts can also be calculated this way
db.mycollection.aggregate(
[
{
$project: {
_id:0,
foo1_count:{$size:"$foo1"},
foo2_count:{$size:"$foo2"},
}
},
{
$group: {
foo1_total:{$sum:"$foo1_count"},
foo2_total:{$sum:"$foo2_count"},
}
}
]
)

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.