MongoDB has recently added an option to perform an update operation by providing an aggregation pipeline rather than the standard modifier object. Check MongoDB's docs on this topic.
The ability to use aggregation pipeline, whose statements can refer to existing document properties, can be extremely useful in situations when certain fields needs to be evaluated based on other fields, e.g. during data migration.
Moreover, most of the standard update operators like $set, $push, $inc, etc. can be successfully replicated with the aggregation expression language so in some sense this new functionality generalizes the good old modifiers technique. Though, I must admit the pipeline can become quite verbose if one tries to do things like $addToSet. This of course brings up a whole bunch of performance related questions, but let's ignore them for now.
So far, there's been just one thing which I haven't been able to fully replicate with the aggregation pipeline update, namely the $setOnInsert operator. Let's assume that I want to perform an upsert:
db.test.update(selector, pipeline, { upsert: true });
My initial intuition was that the $$ROOT variable (which I can use in the pipeline) will equal null unless there exists a document that matches selector. Unfortunately, but probably for a good reason, MongoDB developers decided that $$ROOT should be derived from selector by default. It makes sense when you think about how normal $setOnInsert works, but it also makes it practically impossible to distinguish between an update and an insert within pipeline.
I know what you're thinking. You can look at $$ROOT._id. This is a good idea, though if _id is part of the selector it doesn't work anymore. I have figured out that this can be bypassed by tricking MongoDB a little bit and doing things like:
selector = {
_id: { $in: [value, 'fake'] },
}
instead if the simpler { _id: value }, but this doesn't look clean. Please note that if $in only contains one element, then Mongo is actually clever enough to figure out what the identifier should be and it populates $$ROOT accordingly (sic!).
I am wondering if anyone has a better idea how to approach this. Maybe there's some hidden variable that I could potentially use inside the pipeline itself to distinguish between update and insert (e.g. in $merge stage there's $$new variable which serves a similar purpose)?
If there is no matching documents, $$ROOT will have only _id field. So you can transform $$ROOT to array by its key/value pairs and check if the size of that array is equal to 1. If it is then create a new document, and if it is not then do nothing.
$objectToArray and $size to convert $$ROOT to an array by its key/value pairs and to get the size of that array
$cond to check if the size of the array above is equal to 1. If it is then merge current $$ROOT (which is only _id field) with the update object. If it is not, return the current $$ROOT. In both scenarios, put result in result feild.
$mergeObjects to merge $$ROOT and the update that you are sending, and put that in the result field
$replaceRoot to replace root to the result field from previous stage
db.collection.update({
_id: 1
},
[
{
$set: {
result: {
$cond: {
if: {
"$eq": [
{
$size: {
$objectToArray: "$$ROOT"
},
},
1
]
},
then: {
$mergeObjects: [
"$$ROOT",
{
key: 3
}
]
},
else: "$$ROOT"
},
}
}
},
{
$replaceRoot: {
newRoot: "$result"
}
}
],
{
upsert: true
})
Working example
Related
Let's say you want to use mongoexport/import to update a collection (for reasons explained here. You should make sure the types in the collection are JSON-safe.
How can one determine all the types used in all documents of a collection, including within array elements, using the aggregation framework?
You can use $objectToArray in combination with $map and $type.
I think something like this should get you started:
db.collection.aggregate([
{ $project: {
types: {
$map: {
input: { $objectToArray: "$$CURRENT" },
in: { $type: [ "$$this.v" ] }
}
}
}
}
])
Note it is not recursive and it would not go deep into the values of the arrays since I am not also sure how many levels you want to go deep and even what is the desired output. So hopefully that is a good start for you.
You can see that aggregation with provided input with various types working here.
The $match expression on mongodb's stitch application does not work properly.
I am trying to set up a simple update trigger that will only work on one field in a collection.
The trigger setup provides a $match aggregation which seems simple enough to set up.
For example if I want the trigger to only fire when the field "online" in a specified collection gets set to "true" I would do:
{"updateDescription.updatedFields":{"online":"true"}}
which for a stitch trigger is the same as:
{$match:{{updateDescription.updatedFields:{online:"true"}}}
The problem is when i try to match an update on a field that is an object.(for example hours:{online:40,offline:120}
For some reason $exists or $in does not work
So doing:
{"updateDescription.updatedFields":{"hours":{"$exists":true}}
does not work,neither does something like:
{"updateDescription.updatedFields":{"hours.online":{"$exists":true}}
The $match for the trigger is supposed to work exactly like a normal mongo $match.
They just provide one example :
{
"updateDescription.updatedFields": {
"status": "blocked"
}
}
The example is from here:
https://docs.mongodb.com/stitch/triggers/database-triggers/
I tried 100's of variations but i can't seem to get it
The trigger is working fine if the match is a specific value like:
{"updateDescription.updatedFields":{"hours.online":{"$numberInt\":"20"}}
and then i set the hours.online to 20 in the database.
I was able to have it match items by using an explicit $expr operator or declare it as a single field not an embedded object. ie. "updateDescription.updatedFields.statue": "blocked"
I struggled with this myself, trying to get a trigger to fire when a certain nested field was updated to any value (rather than just one specific one).
The issue appears to have to do with how change streams report updated fields.
With credit and thanks to MongoDB support, I can finally offer this as a potential solution, at least for simpler cases:
{
"$expr": {
"$not": {
"$cmp": [{
"$let": {
"vars": { "updated": { "$objectToArray": "$updateDescription.updatedFields" } },
"in": { "$arrayElemAt": [ "$$updated.k", 0 ] }
}},
"example.subdocument.nested_field"
]
}
}
}
Of course replace example.subdocument.nested_field with your own dot-notation field path.
I am using Mongo 3.2.14
I have a mongo collection that looks like this:
{
'_id':...
'field1':...
'field2':...
'field3':...
etc...
}
I want to aggregate this way:
db.collection.aggregate{
'$match':{},
'$project':{
'field1':1,
'field2':1,
'field3':1,
etc...(all fields)
}
}
Is there a way to include all fields in the project without listing each field one by one ? (I have around 30 fields, and growing...)
I have found info on this here:
MongoDB $project: Retain previous pipeline fields
Include all existing fields and add new fields to document
how to not write every field one by one in project
However, I'm using mongo 3.2.14 and I don't need to create a new field, so, I think I cannot use $addFields. But, if I could, can someone show me how to use it?
Basically, if you want all attributes of your documents to be passed to the next pipeline you can skip the $project pipeline. but if you want all the attributes except the "_id" value then you can pass
{ $project: { _id: 0 } }
which will return every value except the _id.
And if by any chance you have embedded lists or nests that you want to flatten, you can use the $unwind pipeline
you can use $replaceRoot
db.collection.aggregate{
"$match": {},
"$project": {
"document": "$$ROOT"
},
{
"$replaceRoot": { "newRoot": "$document" }
}
this way you can get the exact document returned with all the fields in the document...you don't need to add each field one by one in the $project...try using $replaceRoot at the end of the pipeline.
After performing a aggregation operation on a Mongo collection, my last step is to get the length of the array result. Now I have two options:
Use one more $group stage which _id equals null:
db.col.aggregate([
// ...,
{
$group: {
_id: null,
length: { $sum: 1},
},
},
]);
Or use the .length method:
db.col.aggregate([
// ...
]).length;
Both of them work well and give me the expected result. I just wonder which way is better in term of performance. What do you think?
I would use the .length method as it's likely to be an attribute in the JS Array object (it might depend on the JS engine your code is using).
I believe that using $group will make the mongo engine to process all the data and then count how many document it returns, which would be much slower.
As felix said, you can run a small benchmark and see which option is faster.
Let's say I have four documents in my collection:
{u'a': {u'time': 3}}
{u'a': {u'time': 5}}
{u'b': {u'time': 4}}
{u'b': {u'time': 2}}
Is it possible to sort them by the field 'time' which is common in both 'a' and 'b' documents?
Thank you
No, you should put your data into a common format so you can sort it on a common field. It can still be nested if you want but it would need to have the same path.
You can use use aggregation and the following code has been tested.
db.test.aggregate({
$project: {
time: {
"$cond": [{
"$gt": ["$a.time", null]
}, "$a.time", "$b.time"]
}
}
}, {
$sort: {
time: -1
}
});
Or if you also want the original fields returned back: gist
Alternatively you can sort once you get the result back, using a customized compare function ( not tested,for illustration purpose only)
db.eval(function() {
return db.mycollection.find().toArray().sort( function(doc1, doc2) {
var time1 = doc1.a? doc1.a.time:doc1.b.time,
time2 = doc2.a?doc2.a.time:doc2.b.time;
return time1 -time2;
})
});
You can, using the aggregation framework.
The trick here is to $project a common field to all the documents so that the $sort stage can use the value in that field to sort the documents.
The $ifNull operator can be used to check if a.time exists, it
does, then the record will be sorted by that value else, by b.time.
code:
db.t.aggregate([
{$project:{"a":1,"b":1,
"sortBy":{$ifNull:["$a.time","$b.time"]}}},
{$sort:{"sortBy":-1}},
{$project:{"a":1,"b":1}}
])
consequences of this approach:
The aggregation pipeline won't be covered by any of the index you
create.
The performance will be very poor for very large data sets.
What you could ideally do is to ask the source system that is sending you the data to standardize its format, something like:
{"a":1,"time":5}
{"b":1,"time":4}
That way your query can make use of the index if you create one on the time field.
db.t.ensureIndex({"time":-1});
code:
db.t.find({}).sort({"time":-1});