I have a collection of documents where a "tags" field was switched over from being a space separated list of tags to an array of individual tags. I want to update the previous space-separated fields to all be arrays like the new incoming data.
I'm also having problems with the $type selector because it is applying the type operation to individual array elements, which are strings. So filtering by type just returns everything.
How can I get every document that looks like the first example into the format for the second example?
{
"_id" : ObjectId("12345"),
"tags" : "red blue green white"
}
{
"_id" : ObjectId("54321"),
"tags" : [
"red",
"orange",
"black"
]
}
We can't use the $type operator to filter our documents here because the type of the elements in our array is "string" and as mentioned in the documentation:
When applied to arrays, $type matches any inner element that is of the specified BSON type. For example, when matching for $type : 'array', the document will match if the field has a nested array. It will not return results where the field itself is an array.
But fortunately MongoDB also provides the $exists operator which can be used here with a numeric array index.
Now how can we update those documents?
Well, from MongoDB version <= 3.2, the only option we have is mapReduce() but first let look at the other alternative in the upcoming release of MongoDB.
Starting from MongoDB 3.4, we can $project our documents and use the $split operator to split our string into an array of substrings.
Note that to split only those "tags" which are string, we need a logical $condition processing to split only the values that are string. The condition here is $eq which evaluate to true when the $type of the field is equal to "string". By the way $type here is new in 3.4.
Finally we can overwrite the old collection using the $out pipeline stage operator. But we need to explicitly specify the inclusion of other field in the $project stage.
db.collection.aggregate(
[
{ "$project": {
"tags": {
"$cond": [
{ "$eq": [
{ "$type": "$tags" },
"string"
]},
{ "$split": [ "$tags", " " ] },
"$tags"
]
}
}},
{ "$out": "collection" }
]
)
With mapReduce, we need to use the Array.prototype.split() to emit the array of substrings in our map function. We also need to filter our documents using the "query" option. From there we will need to iterate the "results" array and $set the new value for "tags" using bulk operations using the bulkWrite() method new in 3.2 or the now deprecated Bulk() if we are on 2.6 or 3.0 as shown here.
db.collection.mapReduce(
function() { emit(this._id, this.tags.split(" ")); },
function(key, value) {},
{
"out": { "inline": 1 },
"query": {
"tags.0": { "$exists": false },
"tags": { "$type": 2 }
}
}
)['results']
Related
I am using Mongodb and the stucture is as below:
**{
"_id" : ObjectId("5693af5d62f0d4b6af5124f1"),
"uid" : 1,
"terms" : [
"sasasdsds",
"test",
"abcd"
]
}**
I want to remove the last element from array if the length of it is greater than 10.
Is it possible to be done in a single query?
Yes it is possible. What you use is a property of "dot notation" to test that the array contains at least 11 members and then apply the $pop update modifier to the array:
db.collection.update(
{ "terms.10": { "$exists": 1 } },
{ "$pop": { "terms": 1 } },
{ "multi": true }
)
So the test there is for the presence of the n-1 element in the array by index which means it must have at least that many members to match using the $exists operator. That works opposed to $size which is looking for an "exact" length of the array to match.
Then the $pop with a positive integer in the argument removes the last entry from that array.
The alternate case would be where you just wanted to keep 10 entries in the array only regardless of length, where you could then apply the $slice modifier to the $push update operator:
db.collection.update(
{ "terms.10": { "$exists": 1 } },
{ "$push": { "terms": { "$each": [], "$slice": 10 } } },
{ "multi": true }
)
Where the maybe not obvious example with $push is when the $each modifier has an empty array then no changes are made (pushed) to the array, however the $slice is applied to restrict the overall array length. In this case, from the start of the array ( first n elements ) by using a positive integer. A negative comes from the "end" of the array for the "last" n elements.
In each case the "multi" modifier is used to apply the statement to all matched documents from the query condition of the update. And in both cases it's still best to just modify the documents that are a match only.
I have a "class" document as:
{
className: "AAA",
students: [
{name:"An", age:"13"},
{name:"Hao", age:"13"},
{name:"John", age:"14"},
{name:"Hung", age:"12"}
]
}
And i want to get the student who has name is "An", get only matching element in array "students". I can do that with function find() as:
>db.class.find({"students.name":"An"}, {"students.$":true})
{
"_id" : ObjectId("548b01815a06570735b946c1"),
"students" : [
{
"name" : "An",
"age" : "13"
}
]}
It's fine, but when i do the same with Aggregation as following, it get error:
db.class.aggregate([
{$match:{"students.name":'An'}},
{$project:{"students.$":true}}
])
Error is:
uncaught exception: aggregate failed: {
"errmsg" : "exception: FieldPath field names may not start with '$'.",
"code" : 16410,
"ok" : 0
}
Why? I can't use "$" for array in $project operator of aggregate() while can use this one in project operator of find().
From the docs:
Use $ in the projection document of the find() method or the findOne()
method when you only need one particular array element in selected
documents.
The positional operator $ cannot be used in an aggregation pipeline projection stage. It is not recognized there.
This makes sense, because, when you execute a projection along with a find query, the input to the projection part of the query is a single document that has matched the query.The context of the match is known even during projection. So for each document that matches the query, the projection operator is applied then and there before the next match is found.
db.class.find({"students.name":"An"}, {"students.$":true})
In case of:
db.class.aggregate([
{$match:{"students.name":'An'}},
{$project:{"students.$":true}}
])
The aggregation pipeline is a set of stages. Each stage is completely unaware and independent of its previous or next stages. A set of documents pass a stage completely before being passed on to the next stage in the pipeline. The first stage in this case being the $match stage, all the documents are filtered based on the match condition. The input to the projection stage is now a set of documents that have been filtered as part of the match stage.
So a positional operator in the projection stage makes no sense, since in the current stage it doesn't know on what basis the fields had been filtered. Therefore, $ operators are not allowed as part of the field paths.
Why does the below work?
db.class.aggregate([
{ $match: { "students.name": "An" },
{ $unwind: "$students" },
{ $project: { "students": 1 } }
])
As you see, the projection stage gets a set of documents as input, and projects the required fields. It is independent of its previous and next stages.
Try using the unwind operator in the pipeline: http://docs.mongodb.org/manual/reference/operator/aggregation/unwind/#pipe._S_unwind
Your aggregation would look like
db.class.aggregate([
{ $match: { "students.name": "An" },
{ $unwind: "$students" },
{ $project: { "students": 1 } }
])
You can use $filter to selects a subset of an array to return based on the specified condition.
db.class.aggregate([
{
$match:{
"className": "AAA"
}
},
{
$project: {
$filter: {
input: "$students",
as: "stu",
cond: { $eq: [ "$$stu.name", "An" ] }
}
}
])
The following example filters the Students array to only include documents that have a name equal to "An".
I have the following json:
{
"a1": {"a": "b"},
"a2": {"a": "c"}
}
How can I request all documents where a1 and a2 are not equal in the same document?
You could use $where:
db.myCollection.find( { $where: "this.a1.a != this.a2.a" } )
However, be aware that this won't be very fast, because it will have to spin up the java script engine and iterate each and every document and check the condition for each.
If you need to do this query for large collections, or very often, it's best to introduce a denormalized flag, like areEqual. Still, such low-selectivity fields don't yield good index performance, because he candidate set is still large.
update
using the new $expr operator available as of mongo 3.6 you can use aggregate expressions in find query like this:
db.myCollection.find({$expr: {$ne: ["$a1.a", "$a2.a"] } });
Although this comment solves the problem, I think a better match for this use case would be to use $addFields operator available as of version 3.4 instead of $project.
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$addFields": {
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
]);
To avoid JavaScript use the aggregation framework:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aCmp": {"$cmp":["$a1.a","$a2.a"]}
}
},
{"$match":{"aCmp":0}}
])
On our development server the equivalent JavaScript query takes 7x longer to complete.
Update (10 May 2017)
I just realized my answer didn't answer the question, which wanted values that are not equal (sometimes I'm really slow). This will work for that:
db.myCollection.aggregate([
{"$match":{"a1":{"$exists":true},"a2":{"$exists":true}}},
{"$project": {
"a1":1,
"a2":1,
"aEq": {"$eq":["$a1.a","$a2.a"]}
}
},
{"$match":{"aEq": false}}
])
$ne could be used in place of $eq if the match condition was changed to true but I find using $eq with false to be more intuitive.
MongoDB uses Javascript in the background, so
{"a": "b"} == {"a": "b"}
would be false.
So to compare each you would have to a1.a == a2.a
To do this in MongoDB you would use the $where operator
db.myCollection.find({$where: "this.a1.a != this.a2.a"});
This assumes that each embedded document will have a property "a". If that isn't the case things get more complicated.
Starting in Mongo 4.4, for those that want to compare sub-documents and not only primitive values (since {"a": "b"} == {"a": "b"} is false), we can use the new $function aggregation operator that allows applying a custom javascript function:
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 1, "y" : 2 } }
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
db.collection.aggregate(
{ $match:
{ $expr:
{ $function: {
body: function(a1, a2) { return JSON.stringify(a1) != JSON.stringify(a2); },
args: ["$a1", "$a2"],
lang: "js"
}}
}
}
)
// { "a1" : { "x" : 1, "y" : 2 }, "a2" : { "x" : 3, "y" : 2 } }
$function takes 3 parameters:
body, which is the function to apply, whose parameter are the two fields to compare.
args, which contains the fields from the record that the body function takes as parameter. In our case, both "$a1" and "$a2".
lang, which is the language in which the body function is written. Only js is currently available.
Thanks all for solving my problem -- concerning the answers that use aggregate(), one thing that confused me at first is that $eq (or $in, or lots of other operators) has different meaning depending on where it is used. In a find(), or the $match phase of aggregation, $eq takes a single value, and selects matching documents:
db.items.aggregate([{$match: {_id: {$eq: ObjectId("5be5feb45da16064c88e23d4")}}}])
However, in the $project phase of aggregation, $eq takes an Array of 2 expressions, and makes a new field with value true or false:
db.items.aggregate([{$project: {new_field: {$eq: ["$_id", "$foreignID"]}}}])
In passing, here's the query I used in my project to find all items whose list of linked items (due to a bug) linked to themselves:
db.items.aggregate([{$project: {idIn: {$in: ["$_id","$header.links"]}, "header.links": 1}}, {$match: {idIn: true}}])
I have a document with an array field, similar to this:
{
"_id" : "....",
"Statuses" : [
{ "Type" : 1, "Timestamp" : ISODate(...) },
{ "Type" : 2, "Timestamp" : ISODate(...) },
//Etc. etc.
]
}
How can I update a specific Status item's Timestamp, by specifying its Type value?
From mongodb shell you can do this by
db.your_collection.update(
{ _id: ObjectId("your_objectid"), "Statuses.Type": 1 },
{ $set: { "Statuses.$.Timestamp": "new timestamp" } }
)
so the c# equivalent
var query = Query.And(
Query.EQ("_id", "your_doc_id"),
Query.EQ("Statuses.Type", 1)
);
var result = your_collection.Update(
query,
Update.Set("Statuses.$.Timestamp", "new timestamp", UpdateFlags.Multi,SafeMode.True)
);
This will update the specific document, you can remove _id filter if you wanted to update the whole collection
Starting with MongoDB 3.6, the $[<identifier>] positional operator may be used. Unlike the $ positional operator — which updates at most one array element per document — the $[<identifier>] operator will update every matching array element. This is useful for scenarios where a given document may have multiple matching array elements that need to be updated.
db.yourCollection.update(
{ _id: "...." },
{ $set: {"Statuses.$[element].Timestamp": ISODate("2021-06-23T03:47:18.548Z")} },
{ arrayFilters: [{"element.Type": 1}] }
);
The arrayFilters option matches the array elements to update, and the $[element] is used within the $set update operator to indicate that only array elements that matched the arrayFilter should be updated.
I have a document with an array field, similar to this:
{
"_id" : "....",
"Statuses" : [
{ "Type" : 1, "Timestamp" : ISODate(...) },
{ "Type" : 2, "Timestamp" : ISODate(...) },
//Etc. etc.
]
}
How can I update a specific Status item's Timestamp, by specifying its Type value?
From mongodb shell you can do this by
db.your_collection.update(
{ _id: ObjectId("your_objectid"), "Statuses.Type": 1 },
{ $set: { "Statuses.$.Timestamp": "new timestamp" } }
)
so the c# equivalent
var query = Query.And(
Query.EQ("_id", "your_doc_id"),
Query.EQ("Statuses.Type", 1)
);
var result = your_collection.Update(
query,
Update.Set("Statuses.$.Timestamp", "new timestamp", UpdateFlags.Multi,SafeMode.True)
);
This will update the specific document, you can remove _id filter if you wanted to update the whole collection
Starting with MongoDB 3.6, the $[<identifier>] positional operator may be used. Unlike the $ positional operator — which updates at most one array element per document — the $[<identifier>] operator will update every matching array element. This is useful for scenarios where a given document may have multiple matching array elements that need to be updated.
db.yourCollection.update(
{ _id: "...." },
{ $set: {"Statuses.$[element].Timestamp": ISODate("2021-06-23T03:47:18.548Z")} },
{ arrayFilters: [{"element.Type": 1}] }
);
The arrayFilters option matches the array elements to update, and the $[element] is used within the $set update operator to indicate that only array elements that matched the arrayFilter should be updated.