Accelerate mongo update within two collections - mongodb

I have a Payments collection with playerId field, which is the _id key of Person collection. I need to count once, what's the maximal payment of a person and save the value to person's document. This is how I do it now:
db.Person.find().forEach( function(person) {
var cursor = db.Payment.aggregate([
{$match: {playerId: person._id}},
{$group: {
_id:"$playerId",
maxp: {$max:"$amount"}
}}
]);
var maxPay = 0;
if (cursor.hasNext()) {
maxPay = cursor.next().maxp;
}
person.maxPay = maxPay;
db.Person.save(person);
});
I suppose seeking maxPay on Payments collection once for all Persons should be faster, but I dunno how to write that in code. Could you help me please?

You can run just a single aggregation pipeline operation which has a $lookup pipeline initially to do a "left join" on the Payment collection. This is necessary in order to get the data from the right collection (payments) embedded within the resulting documents as an array called payments.
The preceding $unwind pipeline deconstructs the embedded payments array i.e. it will generate a new record for each and every element of the payments data field. It basically flattens the data which will be useful for the next $group stage.
In this $group pipeline stage, you calculate your desired aggregates by applying the accumulator expression(s). If for instance your Person schema has other fields you wish to retain, then the $first accumulator operator should suffice in addition to the $max operator for the extra maxPay field.
UPDATE
Unfortunately, there is no operator to "include all fields" in the $group aggregation pipeline operation. This is because the $group pipeline step is mostly used to group and calculate/aggregate data from collection fields (sum, avg, etc.) and returning all the collection's fields is not the pipeline's intended purpose. The group pipeline operator is similar to the SQL's GROUP BY clause where you can't use GROUP BY unless you use any of the aggregation functions (accumulator operators in MongoDB). The same way, if you need to retain most fields, you have to use an aggregation function in MongoDB as well. In this case, you have to apply $first to each field you want to keep.
You can also use the $$ROOT system variable which references the root document. Keep all fields of this document in a field within the $group pipeline, for example:
{
"$group": {
"_id": "$_id",
"maxPay": { "$max": "$payments.amount" },
"doc": { "$first": "$$ROOT" }
}
}
The drawback with this approach is you would need a further $project pipeline to reshape the fields so that they match the original schema because the documents from the resulting pipeline will have only three fields; _id, maxPay and the embedded doc field.
The final pipeline stage, $out, writes the resulting documents of the aggregation pipeline to the same collection, akin to updating the Person collection by atomically replacing the existing collection with the new results collection. The $out operation does not change any indexes that existed on the previous collection. If the aggregation fails, the $out operation makes no changes to the pre-existing collection:
db.Person.aggregate([
{
"$lookup": {
"from": "Payment",
"localField": "_id",
"foreignField": "playerId",
"as": "payments"
}
},
{ "$unwind": {
"path": "$payments",
"preserveNullAndEmptyArrays": true
} },
{
"$group": {
"_id": "$_id",
"maxPay": { "$max": "$payments.amount" },
/* extra fields for demo purposes
"firstName": { "$first": "$firstName" },
"lastName": { "$first": "$lastName" }
*/
}
},
{ "$out": "Person" }
])

Related

Get count of a value of a subdocument inside an array with mongoose

I have Collection of documents with id and contact. Contact is an array which contains subdocuments.
I am trying to get the count of contact where isActive = Y. Also need to query the collection based on the id. The entire query can be something like
Select Count(contact.isActive=Y) where _id = '601ad0227b25254647823713'
I am using mongo and mongoose for the first time. Please edit the question if I was not able to explain it properly.
You can use an aggregation pipeline like this:
First $match to get only documents with desired _id.
Then $unwind to get different values inside array.
Match again to get the values which isActive value is Y.
And $group adding one for each document that exists (i.e. counting documents with isActive= Y). The count is stores in field total.
db.collection.aggregate([
{
"$match": {"id": 1}
},
{
"$unwind": "$contact"
},
{
"$match": {"contact.isActive": "Y"}
},
{
"$group": {
"_id": "$id",
"total": {"$sum": 1}
}
}
])
Example here

MongoDB aggregation query optimization: $match, $lookup and double $unwind

Let's say we have two collections:
devices: the objects from this collection have (among others) the fields name (string) and cards (array); each part from that array has the fields model and slot. The cards are not another collection, it's just some nested data.
interfaces: the objects from this collection have (among others) the fields name and owner.
Extra info:
for cards, I'm only interested in the ones where slot is a number
for a part of a device that matches the previous condition, there is an interface object in the other collection where the owner fields has as value the name of the device in cause and the name is s[slot]p1 (the character 's' + the slot of that part + 'p1')
My job is to create a query to generate a summary of all the existing cards in all of those devices, each entry being enriched with information from the interfaces collection. I also need to be able to parametrize the query (in case I'm interested only in a certain device with a certain name, only a certain model for cards etc.)
So far, I have this:
mongo_client.devices.aggregate([
# Retrieve all the devices having the cards field
{
"$match": {
# "name": "<device-name>",
"cards": {
"$exists": "true"
}
}
},
# Group current content with every cards object
{
"$unwind": "$cards"
},
# Only take the ones having "slot" a number
{
"$match": {
"cards.slot": {
"$regex": "^\d+$"
}
}
},
# Retrieve the device's interfaces
{
"$lookup": {
"from": "interfaces",
"let": {
"owner": "$name",
},
"as": "interfaces",
"pipeline": [{
"$match": {
"$expr": {
"$eq": ["$owner", "$$owner"]
},
},
}]
}
},
{
"$unwind": "$interfaces"
},
{
"$match": {
"$expr": {
"$eq": ["$interfaces.name", {
"$concat": ["s", "$cards.slot", "p1"]
}]
}
}
},
# Build the final object
{
"$project": {
# Card related fields
"slot": "$cards.slot",
"model": "$cards.model",
# Device related fields
"device_name": "$name",
# Fields from interfaces
"interface_field_x": "$interfaces.interface_field_x",
"interface_field_y": "$interfaces.interface_field_y",
}
},
])
The query works and it's quite fast, but I have a question:
Is there any way I can avoid the 2nd $unwind? If for every device there are 50-150 interface objects where owner is the name of that device, I feel that I'm slowing it down. Every device has a unique interface named s[slot]p1. How can I get that specific object in a better way? I tried to use two $eq expressions in the $match inside the $lookup or even $regex or $regexMatch, but I couldn't use the outside slot fields, even if I put it inside let.
If I want to parametrize my query to filter the data if needed, would you add match expressions as intermediary steps or just filter at the end?
Any other improvements to the query are welcome. I'm also interested in how to make it errors proof (if by mistake cards is missing or that s1p1 interface is not found.
Thanks!
Your question is missing sample data for the query, but:
Merge the third stage into the first stage, get rid of $exists
Instead of pipeline use localField+foreignField, pipeline is much slower
The number of unwinds in the query should correspond to what objects you want in the result set:
0 unwinds for devices
1 unwind for cards
2 unwinds for interfaces
To match the desired conditions no unwinds are needed.

How to query certain elements of an array of objects? (mongodb)

say I have a mongo DB collection with records as follows:
{
email: "person1#gmail.com",
plans: [
{planName: "plan1", dataValue = 100},
{planName: "plan2", dataValue = 50}
]
},
{
email: "person2#gmail.com",
plans: [
{planName: "plan3", dataValue = 25},
{planName: "plan4", dataValue = 12.5}
]
}
and I want to query such that only the dataValue returns where the email is "person1#gmail.com" and the planName is "plan1". How would I approach this?
You can accomplish this using the Aggregation Pipeline.
The pipeline may look like this:
db.collection.aggregate([
{ $match: { "email" :"person1#gmail.com", "plans.planName": "plan1" }},
{ $unwind: "$plans" },
{ $match: { "plans.planName": "plan1" }},
{ $project: { "_id": 0, "dataValue": "$plans.dataValue" }}
])
The first $match stage will retrieve documents where the email field is equal to person1#gmail.com and any of the elements in the plans array has a planName equal to plan1.
The second $unwind stage will output one document per element in the plans array. The plans field will now be an object containing a single plan object.
In the third $match stage, the unwound documents are further matched against to only include documents with a plans.planName of plan1. Finally, the $project stage excludes the _id field and projects a single dataValue field with a value of plans.dataValue.
Note that with this approach, if the email field is not unique you may have multiple documents consist with just a dataValue field.

Get distinct records with specified fields that match a value, paginated

I'm trying to get all documents in my MongoDB collection
by distinct customer ids (custID)
where status code == 200
paginated (skipped and limit)
return specified fields
var Order = mongoose.model('Order', orderSchema());
My original thought was to use mongoose db query, but you can't use distinct with skip and limit as Distinct is a method that returns an "array", and therefore you cannot modify something that is not a "Cursor":
Order
.distinct('request.headers.custID')
.where('response.status.code').equals(200)
.limit(limit)
.skip(skip)
.exec(function (err, orders) {
callback({
data: orders
});
});
So then I thought to use Aggregate, using $group to get distinct customerID records, $match to return all unique customerID records that have status code of 200, and $project to include the fields that I want:
Order.aggregate(
[
{
"$project" :
{
'request.headers.custID' : 1,
//other fields to include
}
},
{
"$match" :
{
"response.status.code" : 200
}
},
{
"$group": {
"_id": "$request.headers.custID"
}
},
{
"$skip": skip
},
{
"$limit": limit
}
],
function (err, order) {}
);
This returns an empty array though. If I remove project, only $request.headers.custID field is returned when in fact I need more.
Any thoughts?
The thing you need to understand about aggregation pipelines is generally the word "pipeline" means that each stage only receives the input that is emitted by the preceeding stage in order of execution. The best analog to think of here is "unix pipe" |, where the output of one command is "piped" to the other:
ps aux | grep mongo | tee out.txt
So aggregation pipelines work in much the same way as that, where the other main thing to consider is both $project and $group stages operate on only emitting those fields you ask for, and no others. This takes a little getting used to compared to declarative approaches like SQL, but with a little practice it becomes second nature.
Other things to get used to are stages like $match are more important to place at the beginning of a pipeline than field selection. The primary reason for this is possible index selection and usage, which speeds things up immensely. Also, field selection of $project followed by $group is somewhat redundant, as both essentially select fields anyway, and are usually best combined where appropriate anyway.
Hence most optimially you do:
Order.aggregate(
[
{ "$match" : {
"response.status.code" : 200
}},
{ "$group": {
"_id": "$request.headers.custID", // the grouping key
"otherField": { "$first": "$otherField" },
// and so on for each field to select
}},
{ "$skip": skip },
{ "$limit": limit }
],
function (err, order) {}
);
Where the main thing here to remember about $group is that all other fields than _id ( which is the grouping key ) require the use of an accumulator to select, since there is in fact always a multiple occurance of the values for the grouping key.
In this case we are using $first as an accumulator, which will take the first occurance from the grouping boundary. Commonly this is used following a $sort, but does not need to be so, just as long as you understand the behavior of what is selected.
Other accumulators like $max simply take the largest value of the field from within the values inside the grouping key, and are therefore independant of the "current record/document" unlike $first or $last. So it all depends on your needs.
Of course you can shorcut the selection in modern MongoDB releases after MongoDB 2.6 with the $$ROOT variable:
Order.aggregate(
[
{ "$match" : {
"response.status.code" : 200
}},
{ "$group": {
"_id": "$request.headers.custID", // the grouping key
"document": { "$first": "$$ROOT" }
}},
{ "$skip": skip },
{ "$limit": limit }
],
function (err, order) {}
);
Which would take a copy of all fields in the document and place them under the named key ( which is "document" in this case ). It's a shorter way to notate, but of course the resulting document has a different structure, being now all under the one key as sub-fields.
But as long as you understand the basic principles of a "pipeline" and don't exclude data you want to use in later stages by previous stages, then you generally should be okay.

Include all existing fields and add new fields to document

I would like to define a $project aggregation stage where I can instruct it to add a new field and include all existing fields, without having to list all the existing fields.
My document looks like this, with many fields:
{
obj: {
obj_field1: "hi",
obj_field2: "hi2"
},
field1: "a",
field2: "b",
...
field26: "z"
}
I want to make an aggregation operation like this:
[
{
$project: {
custom_field: "$obj.obj_field1",
//the next part is that I don't want to do
field1: 1,
field2: 1,
...
field26: 1
}
},
... //group, match, and whatever...
]
Is there something like an "include all fields" keyword that I can use in this case, or some other way to avoid having to list every field separately?
In 4.2+, you can use the $set aggregation pipeline operator which is nothing other than an alias to $addFieldsadded in 3.4
The $addFields stage is equivalent to a $project stage that explicitly specifies all existing fields in the input documents and adds the new fields.
db.collection.aggregate([
{ "$addFields": { "custom_field": "$obj.obj_field1" } }
])
You can use $$ROOT to references the root document. Keep all fields of this document in a field and try to get it after that (depending on your client system: Java, C++, ...)
[
{
$project: {
custom_field: "$obj.obj_field1",
document: "$$ROOT"
}
},
... //group, match, and whatever...
]
>>> There's something like "include all fields" keyword that I can use in this case or some another solution?
Unfortunaly, there is no operator to "include all fields" in aggregation operation. The only reason, why, because aggregation is mostly created to group/calculate data from collection fields (sum, avg, etc.) and return all the collection's fields is not direct purpose.
To add new fields to your document you can use $addFields
from docs
and to all the fields in your document, you can use $$ROOT
db.collection.aggregate([
{ "$addFields": { "custom_field": "$obj.obj_field1" } },
{ "$group": {
_id : "$field1",
data: { $push : "$$ROOT" }
}}
])
As of version 2.6.4, Mongo DB does not have such a feature for the $project aggregation pipeline. From the docs for $project:
Passes along the documents with only the specified fields to the next stage in the pipeline. The specified fields can be existing fields from the input documents or newly computed fields.
and
The _id field is, by default, included in the output documents. To include the other fields from the input documents in the output documents, you must explicitly specify the inclusion in $project.
according to #Deka reply, for c# mongodb driver 2.5 you can get the grouped document with all keys like below;
var group = new BsonDocument
{
{ "_id", "$groupField" },
{ "_document", new BsonDocument { { "$first", "$$ROOT" } } }
};
ProjectionDefinition<BsonDocument> projection = new BsonDocument{{ "document", "$_document"}};
var result = await col.Aggregate().Group(group).Project(projection).ToListAsync();
// For demo first record
var fistItemAsT = BsonSerializer.Deserialize<T>(result.ToArray()[0]["document"].AsBsonDocument);