How to query certain elements of an array of objects? (mongodb) - mongodb

say I have a mongo DB collection with records as follows:
{
email: "person1#gmail.com",
plans: [
{planName: "plan1", dataValue = 100},
{planName: "plan2", dataValue = 50}
]
},
{
email: "person2#gmail.com",
plans: [
{planName: "plan3", dataValue = 25},
{planName: "plan4", dataValue = 12.5}
]
}
and I want to query such that only the dataValue returns where the email is "person1#gmail.com" and the planName is "plan1". How would I approach this?

You can accomplish this using the Aggregation Pipeline.
The pipeline may look like this:
db.collection.aggregate([
{ $match: { "email" :"person1#gmail.com", "plans.planName": "plan1" }},
{ $unwind: "$plans" },
{ $match: { "plans.planName": "plan1" }},
{ $project: { "_id": 0, "dataValue": "$plans.dataValue" }}
])
The first $match stage will retrieve documents where the email field is equal to person1#gmail.com and any of the elements in the plans array has a planName equal to plan1.
The second $unwind stage will output one document per element in the plans array. The plans field will now be an object containing a single plan object.
In the third $match stage, the unwound documents are further matched against to only include documents with a plans.planName of plan1. Finally, the $project stage excludes the _id field and projects a single dataValue field with a value of plans.dataValue.
Note that with this approach, if the email field is not unique you may have multiple documents consist with just a dataValue field.

Related

Get count of a value of a subdocument inside an array with mongoose

I have Collection of documents with id and contact. Contact is an array which contains subdocuments.
I am trying to get the count of contact where isActive = Y. Also need to query the collection based on the id. The entire query can be something like
Select Count(contact.isActive=Y) where _id = '601ad0227b25254647823713'
I am using mongo and mongoose for the first time. Please edit the question if I was not able to explain it properly.
You can use an aggregation pipeline like this:
First $match to get only documents with desired _id.
Then $unwind to get different values inside array.
Match again to get the values which isActive value is Y.
And $group adding one for each document that exists (i.e. counting documents with isActive= Y). The count is stores in field total.
db.collection.aggregate([
{
"$match": {"id": 1}
},
{
"$unwind": "$contact"
},
{
"$match": {"contact.isActive": "Y"}
},
{
"$group": {
"_id": "$id",
"total": {"$sum": 1}
}
}
])
Example here

Trying to select single documents from mongo collection

We have a rudimentary versioning system in a collection that uses a field (pageId) as a root key. Subsequent versions of this page have the same pageId. This allows us to very easily find all versions of a single page.
How do I go about running a query that returns only the lastModified document for each distinct pageId.
In psuedo-code you could say:
For each distinct pageId
sort documents based on lastModified descending
and return only the first document
You can use the aggregation pipelines for that.
$sort - Sorts all input documents and returns them to the pipeline in sorted order.
$group - Groups documents by some specified expression and outputs to the next stage a document for each distinct grouping.
$first - Returns the value that results from applying an expression to the first document in a group of documents that share the same group by key.
Example:
db.getCollection('t01').aggregate([
{
$sort: {'lastModified': -1}
},
{
$group: {
_id: "$pageId",
element1: { $first: "$element1" },
element2: { $first: "$element2" },
elementN: { $first: "$elementN" },
}
}
]);

Accelerate mongo update within two collections

I have a Payments collection with playerId field, which is the _id key of Person collection. I need to count once, what's the maximal payment of a person and save the value to person's document. This is how I do it now:
db.Person.find().forEach( function(person) {
var cursor = db.Payment.aggregate([
{$match: {playerId: person._id}},
{$group: {
_id:"$playerId",
maxp: {$max:"$amount"}
}}
]);
var maxPay = 0;
if (cursor.hasNext()) {
maxPay = cursor.next().maxp;
}
person.maxPay = maxPay;
db.Person.save(person);
});
I suppose seeking maxPay on Payments collection once for all Persons should be faster, but I dunno how to write that in code. Could you help me please?
You can run just a single aggregation pipeline operation which has a $lookup pipeline initially to do a "left join" on the Payment collection. This is necessary in order to get the data from the right collection (payments) embedded within the resulting documents as an array called payments.
The preceding $unwind pipeline deconstructs the embedded payments array i.e. it will generate a new record for each and every element of the payments data field. It basically flattens the data which will be useful for the next $group stage.
In this $group pipeline stage, you calculate your desired aggregates by applying the accumulator expression(s). If for instance your Person schema has other fields you wish to retain, then the $first accumulator operator should suffice in addition to the $max operator for the extra maxPay field.
UPDATE
Unfortunately, there is no operator to "include all fields" in the $group aggregation pipeline operation. This is because the $group pipeline step is mostly used to group and calculate/aggregate data from collection fields (sum, avg, etc.) and returning all the collection's fields is not the pipeline's intended purpose. The group pipeline operator is similar to the SQL's GROUP BY clause where you can't use GROUP BY unless you use any of the aggregation functions (accumulator operators in MongoDB). The same way, if you need to retain most fields, you have to use an aggregation function in MongoDB as well. In this case, you have to apply $first to each field you want to keep.
You can also use the $$ROOT system variable which references the root document. Keep all fields of this document in a field within the $group pipeline, for example:
{
"$group": {
"_id": "$_id",
"maxPay": { "$max": "$payments.amount" },
"doc": { "$first": "$$ROOT" }
}
}
The drawback with this approach is you would need a further $project pipeline to reshape the fields so that they match the original schema because the documents from the resulting pipeline will have only three fields; _id, maxPay and the embedded doc field.
The final pipeline stage, $out, writes the resulting documents of the aggregation pipeline to the same collection, akin to updating the Person collection by atomically replacing the existing collection with the new results collection. The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the pre-existing collection:
db.Person.aggregate([
{
"$lookup": {
"from": "Payment",
"localField": "_id",
"foreignField": "playerId",
"as": "payments"
}
},
{ "$unwind": {
"path": "$payments",
"preserveNullAndEmptyArrays": true
} },
{
"$group": {
"_id": "$_id",
"maxPay": { "$max": "$payments.amount" },
/* extra fields for demo purposes
"firstName": { "$first": "$firstName" },
"lastName": { "$first": "$lastName" }
*/
}
},
{ "$out": "Person" }
])

Can I use populate before aggregate in mongoose?

I have two models, one is user
userSchema = new Schema({
userID: String,
age: Number
});
and the other is the score recorded several times everyday for all users
ScoreSchema = new Schema({
userID: {type: String, ref: 'User'},
score: Number,
created_date = Date,
....
})
I would like to do some query/calculation on the score for some users meeting specific requirement, say I would like to calculate the average of score for all users greater than 20 day by day.
My thought is that firstly do the populate on Scores to populate user's ages and then do the aggregate after that.
Something like
Score.
populate('userID','age').
aggregate([
{$match: {'userID.age': {$gt: 20}}},
{$group: ...},
{$group: ...}
], function(err, data){});
Is it Ok to use populate before aggregate? Or I first find all the userID meeting the requirement and save them in a array and then use $in to match the score document?
No you cannot call .populate() before .aggregate(), and there is a very good reason why you cannot. But there are different approaches you can take.
The .populate() method works "client side" where the underlying code actually performs additional queries ( or more accurately an $in query ) to "lookup" the specified element(s) from the referenced collection.
In contrast .aggregate() is a "server side" operation, so you basically cannot manipulate content "client side", and then have that data available to the aggregation pipeline stages later. It all needs to be present in the collection you are operating on.
A better approach here is available with MongoDB 3.2 and later, via the $lookup aggregation pipeline operation. Also probably best to handle from the User collection in this case in order to narrow down the selection:
User.aggregate(
[
// Filter first
{ "$match": {
"age": { "$gt": 20 }
}},
// Then join
{ "$lookup": {
"from": "scores",
"localField": "userID",
"foriegnField": "userID",
"as": "score"
}},
// More stages
],
function(err,results) {
}
)
This is basically going to include a new field "score" within the User object as an "array" of items that matched on "lookup" to the other collection:
{
"userID": "abc",
"age": 21,
"score": [{
"userID": "abc",
"score": 42,
// other fields
}]
}
The result is always an array, as the general expected usage is a "left join" of a possible "one to many" relationship. If no result is matched then it is just an empty array.
To use the content, just work with an array in any way. For instance, you can use the $arrayElemAt operator in order to just get the single first element of the array in any future operations. And then you can just use the content like any normal embedded field:
{ "$project": {
"userID": 1,
"age": 1,
"score": { "$arrayElemAt": [ "$score", 0 ] }
}}
If you don't have MongoDB 3.2 available, then your other option to process a query limited by the relations of another collection is to first get the results from that collection and then use $in to filter on the second:
// Match the user collection
User.find({ "age": { "$gt": 20 } },function(err,users) {
// Get id list
userList = users.map(function(user) {
return user.userID;
});
Score.aggregate(
[
// use the id list to select items
{ "$match": {
"userId": { "$in": userList }
}},
// more stages
],
function(err,results) {
}
);
});
So by getting the list of valid users from the other collection to the client and then feeding that to the other collection in a query is the onyl way to get this to happen in earlier releases.

Include all existing fields and add new fields to document

I would like to define a $project aggregation stage where I can instruct it to add a new field and include all existing fields, without having to list all the existing fields.
My document looks like this, with many fields:
{
obj: {
obj_field1: "hi",
obj_field2: "hi2"
},
field1: "a",
field2: "b",
...
field26: "z"
}
I want to make an aggregation operation like this:
[
{
$project: {
custom_field: "$obj.obj_field1",
//the next part is that I don't want to do
field1: 1,
field2: 1,
...
field26: 1
}
},
... //group, match, and whatever...
]
Is there something like an "include all fields" keyword that I can use in this case, or some other way to avoid having to list every field separately?
In 4.2+, you can use the $set aggregation pipeline operator which is nothing other than an alias to $addFieldsadded in 3.4
The $addFields stage is equivalent to a $project stage that explicitly specifies all existing fields in the input documents and adds the new fields.
db.collection.aggregate([
{ "$addFields": { "custom_field": "$obj.obj_field1" } }
])
You can use $$ROOT to references the root document. Keep all fields of this document in a field and try to get it after that (depending on your client system: Java, C++, ...)
[
{
$project: {
custom_field: "$obj.obj_field1",
document: "$$ROOT"
}
},
... //group, match, and whatever...
]
>>> There's something like "include all fields" keyword that I can use in this case or some another solution?
Unfortunaly, there is no operator to "include all fields" in aggregation operation. The only reason, why, because aggregation is mostly created to group/calculate data from collection fields (sum, avg, etc.) and return all the collection's fields is not direct purpose.
To add new fields to your document you can use $addFields
from docs
and to all the fields in your document, you can use $$ROOT
db.collection.aggregate([
{ "$addFields": { "custom_field": "$obj.obj_field1" } },
{ "$group": {
_id : "$field1",
data: { $push : "$$ROOT" }
}}
])
As of version 2.6.4, Mongo DB does not have such a feature for the $project aggregation pipeline. From the docs for $project:
Passes along the documents with only the specified fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
and
The _id field is, by default, included in the output documents. To include the other fields from the input documents in the output documents, you must explicitly specify the inclusion in $project.
according to #Deka reply, for c# mongodb driver 2.5 you can get the grouped document with all keys like below;
var group = new BsonDocument
{
{ "_id", "$groupField" },
{ "_document", new BsonDocument { { "$first", "$$ROOT" } } }
};
ProjectionDefinition<BsonDocument> projection = new BsonDocument{{ "document", "$_document"}};
var result = await col.Aggregate().Group(group).Project(projection).ToListAsync();
// For demo first record
var fistItemAsT = BsonSerializer.Deserialize<T>(result.ToArray()[0]["document"].AsBsonDocument);