The mongo documentation on update change events says that the update description will have an array of removed fields, a document of update fields and an array of truncated arrays. The removed and updated fields are pretty straight forward, but I'm having trouble understanding what the truncated arrays are.
The documentation says
An array of documents which record array truncations performed with
pipeline-based updates using one or more of the following stages:
$addFields
$set
$replaceRoot
$replaceWith
But try as I might, I can't seem to figure out how to even cause an update event that includes truncated arrays.
Any help understanding what this field is for and / or an example of how to cause an update that includes it would be greatly appreciated.
I did not know that the change stream document had truncatedArrays field. So, I tried to set up the change stream in MongoDB version 4 and 5.
MongoDB Enterprise rs0:PRIMARY> db.coll.find();
{ "_id" : ObjectId("63b2d783"), "a" : 1, "b" : [ { "c" : 1, "d" : "qwq" }, { "c" : 2, "d" : "mlo" } ] }
{ "_id" : ObjectId("63b2d784"), "a" : 2, "b" : [ { "c" : 4, "d" : "hyt" }, { "c" : 5, "d" : "nhw" } ] }
In another window,
MongoDB Enterprise rs0:PRIMARY> cs = db.coll.watch([], {"fullDocument": "updateLookup"});
MongoDB Enterprise rs0:PRIMARY> while(!cs.isExhausted()){
... if(cs.hasNext()){
... print(tojson(cs.next()));
... }
... }
Then I ran an update.
MongoDB Enterprise rs0:PRIMARY> db.coll.update({},{$set:{"a":3}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
There was no such field in the change stream.
{
"_id" : {"_data" : "8263B2D474000000012B022C0100296E5A100439C39"},
"operationType" : "update",
"clusterTime" : Timestamp(1672664180, 1),
"fullDocument" : {
"_id" : ObjectId("63b2d783"),
"a" : 3,
"b" : [
{"c" : 1,"d" : "qwq"},
{"c" : 2,"d" : "mlo"}
]
},
"ns" : {
"db" : "test","coll" : "coll"
},
"documentKey" : {"_id" : ObjectId("63b2d783")},
"updateDescription" : {
"updatedFields" : {
"a" : 3
},
"removedFields" : [ ]
}
}
Next, I updated the server to version 6 and executed this update query to slice the array.
db.coll.update(
{},
[
{$set: {"b": {$slice: ["$b",1]}}}
]
);
And, there it was. Showing the array field name and it's new size.
{
"_id" : {"_data" : "8263B2D756000000012B022C0100296E5A1004A7FD82"},
"operationType" : "update",
"clusterTime" : Timestamp(1672664918, 1),
"wallTime" : ISODate("2023-01-02T13:08:38.584Z"),
"fullDocument" : {
"_id" : ObjectId("63b2d1d6"),
"a" : 1,
"b" : [
{"c" : 1,"d" : "qwq"}
]
},
"ns" : {
"db" : "test","coll" : "coll"
},
"documentKey" : {"_id" : ObjectId("63b2d1d6")},
"updateDescription" : {
"updatedFields" : {
},
"removedFields" : [ ],
"truncatedArrays" : [
{
"field" : "b",
"newSize" : 1
}
]
}
}
I couldn't find any other way to cause this using $replaceRoot/$replaceWith.
Related
I've got a collection with documents using a schema something like this (some members redacted):
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
2,
3,
5
],
"activity" : [
4,
4,
3
],
},
"media" : [
ObjectId("537ea185df872bb71e4df270"),
ObjectId("537ea185df872bb71e4df275"),
ObjectId("537ea185df872bb71e4df272")
]
}
In this schema, the first, second, and third positivity ratings correspond to the first, second, and third entries in the media array, respectively. The same is true for the activity ratings. I need to calculate statistics for the positivity and activity ratings with respect to their associated media objects across all documents in the collection. Right now, I'm doing this with MapReduce. I'd like to, however, accomplish this with the Aggregation Pipeline.
Ideally, I'd like to $unwind the media, answers.ratings.positivity, and answers.ratings.activity arrays simultaneously so that I end up with, for example, the following three documents based on the previous example:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 2,
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df270")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 3
"activity" : 4
}
},
"media" : ObjectId("537ea185df872bb71e4df275")
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : 5
"activity" : 3
}
},
"media" : ObjectId("537ea185df872bb71e4df272")
}
]
Is there some way to accomplish this?
The current aggregation framework does not allow you to do this. Being able to unwind multiple arrays that are know to be the same size and creating a document for the ith value of each would be a good feature.
If you want to use the aggregation framework you will need to change your schema a little. For example take the following document schema:
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : [
{k:1, v:2},
{k:2, v:3},
{k:3, v:5}
],
"activity" : [
{k:1, v:4},
{k:2, v:4},
{k:3, v:3}
],
}},
"media" : [
{k:1, v:ObjectId("537ea185df872bb71e4df270")},
{k:2, v:ObjectId("537ea185df872bb71e4df275")},
{k:3, v:ObjectId("537ea185df872bb71e4df272")}
]
}
By doing this you are essentially adding the index to the object inside the array. After this it's just a matter of unwinding all the arrays and matching on the key.
db.test.aggregate([{$unwind:"$media"},
{$unwind:"$answers.ratings.positivity"},
{$unwind:"$answers.ratings.activity"},
{$project:{"media":1, "answers.ratings.positivity":1,"answers.ratings.activity":1,
include:{$and:[
{$eq:["$media.k", "$answers.ratings.positivity.k"]},
{$eq:["$media.k", "$answers.ratings.activity.k"]}
]}}
},
{$match:{include:true}}])
And the output is:
[
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 1,
"v" : 2
},
"activity" : {
"k" : 1,
"v" : 4
}
}
},
"media" : {
"k" : 1,
"v" : ObjectId("537ea185df872bb71e4df270")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 2,
"v" : 3
},
"activity" : {
"k" : 2,
"v" : 4
}
}
},
"media" : {
"k" : 2,
"v" : ObjectId("537ea185df872bb71e4df275")
},
"include" : true
},
{
"_id" : ObjectId("539f41a95d1887b57ab78bea"),
"answers" : {
"ratings" : {
"positivity" : {
"k" : 3,
"v" : 5
},
"activity" : {
"k" : 3,
"v" : 3
}
}
},
"media" : {
"k" : 3,
"v" : ObjectId("537ea185df872bb71e4df272")
},
"include" : true
}
]
Doing this creates a lot of extra document overhead and may be slower than your current MapReduce implementation. You would need to run tests to check this. The computations required for this will grow in a cubic way based on the size of those three arrays. This should also be kept in mind.
I have a collection of the following data:
{
"_id" : ObjectId("51f1fcc08188d3117c6da351"),
"cust_id" : "abc123",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 25,
"items" : [{
"sku" : "ggg",
"qty" : 7,
"price" : 2.5
}, {
"sku" : "ppp",
"qty" : 5,
"price" : 2.5
}]
}
I am using the query:
cmd { "aggregate" : "orders" , "pipeline" : [
{ "$unwind" : "$items"} ,
{ "$match" : { "items" : { "$elemMatch" : { "qty" : { "$in" : [ 7]}}}}} ,
{ "$group" : { "price" : { "$first" : "$price"} , "items" : { "$push" : { "sku" : "$items.sku"}} , "_id" : { "items" : "$items"}}} ,
{ "$sort" : { "price" : -1}} ,
{ "$project" : { "_id" : 0 , "price" : 1 , "items" : 1}}
]}
Not able to understand what is going wrong
It's because you're doing $match after $unwind. $unwind generates a new stream of documents where items is no longer an array (see docs).
It emits each document as many times as there are items in it.
If you want to select documents with desired element in it and then process all of its documents, you should call $match first:
db.orders.aggregate(
{ "$match" : { "items" : { "$elemMatch" : { "qty" : { "$in" : [ 7]}}}}},
{ "$unwind" : "$items"},
...
);
If you want to select items to be processed after $unwind, you shoul remove $elemMatch:
db.orders.aggregate(
{ "$unwind" : "$items"},
{ "$match" : { "items.qty" : { "$in" : [7]}}},
...
);
In first case you'll get two documents:
{
"price" : 25,
"items" : [
{"sku" : "ppp"}
]
},
{
"price" : 25,
"items" : [
{"sku" : "ggg"}
]
}
and in second case you'll get one:
{
"price" : 25,
"items" : [
{"sku" : "ggg"}
]
}
Update. After $unwind your documents will look like:
{
"_id" : ObjectId("51f1fcc08188d3117c6da351"),
"cust_id" : "abc123",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 25,
"items" : {
"sku" : "ggg",
"qty" : 7,
"price" : 2.5
}
}
For small number of documents, unwind and match is fine. But large number of documents, it better to do - match ($elemMatch), unwind, and match again.
db.orders.aggregate(
{ "$match" : { "items" : { "$elemMatch" : { "qty" : { "$in" : [ 7]}}}}},
{ "$unwind" : "$items"},
{ "$match" : { "items.qty" : { "$in" : [7]}}}
...
...
);
The first match will filter only documents that match qty criteria. Among the selected documents, the second match will remove the subdocuments not matching the qty criteria.
model:
{
"_id" : "a62107e10f388c90a3eb2d7634357c8b",
"_appid" : [
{
"_id" : "1815aaa7f581c838",
"events" : [
{
"_id" : "_TB_launch",
"boday" : [
{
"VERSIONSCODE" : "17",
"NETWORK" : "cmwap",
"VERSIONSNAME" : "2.4.0",
"IMSI" : "460026319223205",
"PACKAGENAME" : "com.androidbox.astjxmjmmshareMM",
"CHANNELID" : "xmjmm17",
"CHANNELNAME" : "浠..?.M寰.俊?.韩?.?1.x锛.,
"eventid" : "_TB_launch",
"uuid" : "a62107e10f388c90a3eb2d7634357c8b",
"creattime" : "1366300799766",
"ts" : ISODate("2013-04-25T06:28:36.403Z")
}
],
"size" : 1
}
],
"size" : 1
}
],
"size" : 1
}
> db.events.update(
{
"_id":"039e569770cec5ff3811e7410233ed27",
"_appid._id":"e880db04064b03bc534575c7f831a83a",
"_appid.events._id":"_TB_launch"
},
{
"$push":{
"_appid.$.events.$.boday":{"111":"123123"}
}
}
);
Cannot apply the positional operator without a corresponding query field containing an array.
Why?!!
You are trying to reference multiple levels of embedding - you can only have one positional $ operator. You won't be able to do something like this until this feature request has been implemented.
Response Here
The short answer is, "no", but working with nested arrays gets
tricky. Here is an example:
db.foo.save({_id: 1, a1:[{_a1id:1, a2:[{a2id:1, a3:[{a3id:1, a4:"data"}]}]}]})
db.foo.find()
{ "_id" : 1, "a1" : [
{ "_a1id" : 1, "a2" : [
{ "a2id" : 1, "a3" : [
{ "a3id" : 1, "a4" : "data" }
] }
] }
] }
db.foo.update({_id:1}, {$push:{"a1.0.a2.0.a3":{a3id:2, a4:"other data"}}})
db.foo.find()
{ "_id" : 1, "a1" : [
{ "_a1id" : 1, "a2" : [
{ "a2id" : 1, "a3" : [
{ "a3id" : 1, "a4" : "data" },
{ "a3id" : 2, "a4" : "other data" }
] }
] }
] }
If you are unsure where one of your sub-documents lies within an
array, you may use one positional operator, and Mongo will update the
first sub-document which matches. For example:
db.foo.update({_id:1, "a1.a2.a2id":1}, {$push:{"a1.0.a2.$.a3":{a3id:2, a4:"other data"}}})
Given a set of questions that have linked survey and category id:
> db.questions.find().toArray();
[
{
"_id" : ObjectId("4fda05bc322b1c95b531ac25"),
"id" : 1,
"name" : "Question 1",
"category_id" : 1,
"survey_id" : 1,
"score" : 5
},
{
"_id" : ObjectId("4fda05cb322b1c95b531ac26"),
"id" : 2,
"name" : "Question 2",
"category_id" : 1,
"survey_id" : 1,
"score" : 3
},
{
"_id" : ObjectId("4fda05d9322b1c95b531ac27"),
"id" : 3,
"name" : "Question 3",
"category_id" : 2,
"survey_id" : 1,
"score" : 4
},
{
"_id" : ObjectId("4fda4287322b1c95b531ac28"),
"id" : 4,
"name" : "Question 4",
"category_id" : 2,
"survey_id" : 1,
"score" : 7
}
]
I can find the category average with:
db.questions.aggregate(
{ $group : {
_id : "$category_id",
avg_score : { $avg : "$score" }
}
}
);
{
"result" : [
{
"_id" : 1,
"avg_score" : 4
},
{
"_id" : 2,
"avg_score" : 5.5
}
],
"ok" : 1
}
How can I get the average of category averages (note this is different than simply averaging all questions)? I would assume I would do multiple group operations but this fails:
> db.questions.aggregate(
... { $group : {
... _id : "$category_id",
... avg_score : { $avg : "$score" },
... }},
... { $group : {
... _id : "$survey_id",
... avg_score : { $avg : "$score" },
... }}
... );
{
"errmsg" : "exception: the _id field for a group must not be undefined",
"code" : 15956,
"ok" : 0
}
>
It's important to understand that the operations in the argument to aggregate() form a pipeline. This meant that the input to any element of the pipeline is the stream of documents produced by the previous element in the pipeline.
In your example, your first query creates a pipeline of documents that look like this:
{
"_id" : 2,
"avg_score" : 5.5
},
{
"_id" : 1,
"avg_score" : 4
}
This means that the second element of the pipline is seeing a series of documents where the only keys are "_id" and "avg_score". The keys "category_id" and "score" no longer exist in this document stream.
If you want to further aggregate on this stream, you'll have to aggregate using the keys that are seen at this stage in the pipeline. Since you want to average the averages, you need to put in a single constant value for the _id field, so that all of the input documents get grouped into a single result.
The following code produces the correct result:
db.questions.aggregate(
{ $group : {
_id : "$category_id",
avg_score : { $avg : "$score" },
}
},
{ $group : {
_id : "all",
avg_score : { $avg : "$avg_score" },
}
}
);
When run, it produces the following output:
{
"result" : [
{
"_id" : "all",
"avg_score" : 4.75
}
],
"ok" : 1
}
I have a collection with a following data:
{
"_id" : ObjectId("4e3951905e746b3805000000"),
"m" : "hello",
"r" : [{
"_id" : ObjectId("4e3951965e746b8007000000"),
"u" : 3,
"m" : "response1"
}, {
"_id" : ObjectId("4e39519d5e746bc00f000000"),
"u" : 3,
"m" : "response2"
}, {
"_id" : ObjectId("4e3953dc5e746b5c07000000"),
"u" : 3,
"m" : "response3"
}, {
"_id" : ObjectId("4e3953ea5e746bd40f000001"),
"u" : 3,
"m" : "response"
}],
"u" : 3,
"w" : 3
}
{
"_id" : ObjectId("4e3952c75e746bd807000001"),
"m" : "asdfa",
"r" : [{
"_id" : ObjectId("4e39544e5e746bc00f000001"),
"u" : 3,
"m" : "response5"
}],
"u" : 3,
"w" : 3
}
Can anyone suggest how to remove a subdocument from a 'r' key
having only id of subdocument, I am going to del?
for instance i want to del a subdocument with id 4e39519d5e746bc00f000000
So this subdocument should be deleted
{
"_id" : ObjectId("4e39519d5e746bc00f000000"),
"u" : 3,
"m" : "response2"
},
It's easy, you just need to use $pull operator:
db.items.update( {},
{ $pull : { r : {"_id": ObjectId("4e39519d5e746bc00f000000")} } }, false, false )
dbh.users.update({"_id": ObjectId("4e39519d5e746bc00f000000")}, {"$unset":{"r":1}},False,False)
Try using unset
Reference: MongoDB : Update Modifier semantics of "$unset"