mongodb: match either string or array [duplicate] - mongodb

I have a MongoDb collection contains of 284.116 tweets. The problem is the "author" field in some objects are in object type, but in other objects -this "author" field- are in array type. So the problem is I want to filter which ones are Array, and which ones are Object.
For example:
The author field's type is object.
{
"_id" : ObjectId("55edfbd11a87d41d987a6dc1"),
"tweet" : "Back in my dorm, yay!",
"uri" : "https://twitter.com/natalylug0/status/640994018529181696",
"date" : "2015-09-08 00:04:17",
"country" : "U.S.A.",
"city" : "Texas",
"state" : "Dallas",
"author" : {
"username" : "Nataly",
"uri" : "https://twitter.com/natalylug0",
"screenname" : "natalylug0"
}
}
And the other one:
The author field's type is array.
{
"_id" : ObjectId("55ee3a00e11fbb1030d659fe"),
"author" : [
{
"username" : "Relapsed Shini",
"uri" : "https://twitter.com/iPictoraL",
"screenname" : "iPictoraL"
}
],
"tweet" : "#zumbiezuza 😍😍😍💚 ily zoeeeeeeee",
"uri" : "https://twitter.com/iPictoraL/status/641060812140900352",
"date" : "2015-09-08 01:29:42",
"country" : "U.S.A.",
"city" : "Texas",
"state" : "Dallas"
}
So I executed query like this:
db.getCollection('tweets').find({ author: { $type: 4} })
And what I get is
Fetched 0 record(s)
But if execute $type:3 I get 284.116 values which is the same value of size of this collection.
So my question is, how can I filter objects which "author" fields contain arrays.

Actually there is a "gotcha" listed in the documentation for $type specifically about arrays:
When applied to arrays, $type matches any inner element that is of the specified type. Without projection this means that the entire array will match if any element has the right type. With projection, the results will include just those elements of the requested type.
So that means that rather than detect whether the "element itself" is in array, what is actually being tested is the "inner element" of the array to see what type it is.
Now the documentation itself suggests this JavaScript test with $where:
.find({ "$where": "return Array.isArray(this.author)" })
But I think that's pretty horrible as there is a better way.
The trick is in "dot notation", where you ask for the 0 index element of the array to $exists
.find({ "author.0": { "$exists": true } })
Which is just the basic case that if the "0th" element exists then the field is present and the data is therefore an array.
Once you understand that logical premise then it is pretty simple test. The only thing that cannot be matched by that is a "truly empty" array, in which case you can fall back to the JavaScript alternative if needed. But this can actually use an index, so it would be preferred to use the latter form.

Here's a better way to do what you originally asked; that is to actually check if a certain field holds an array type value:
.find({ "author": { "$gte": [] } })
MongoDB's $type functionality for arrays, though well documented, is IMO inconsistent with all the other $type checks, and obviously doesn't work for this use case, but since around 2.6, you can use the above query to check whether the value is an array (empty or not).
I say this is "better" than the currently selected answer, because executing code via $where is not recommended, unless standard query constructs truly cannot get the job done.
To elaborate, $where is not recommended due to performance via lack of ability to use indexes in executed code. More detail: https://docs.mongodb.com/manual/reference/operator/query/where/#considerations
Also, if you'd like to check for non-empty arrays specifically, use this:
.find({ "author": { "$gt": [] } })
Technically, this one is also better than the current answer's corresponding $exists solution, as the field may have a non-array object with a field named "0", and that would match as a "non-empty array", which is wrong in that case.

Since mongoDB version 3.2 we have $isArray for an aggregation pipeline, which allow to do something like:
db.tweets.aggregate([
{$set: {isArray: {$cond: [{ $isArray: "$author" }, 1, 0]}}},
{$match: {isArray: 1}}
])
Playground example A
Or even:
db.tweets.aggregate([
{$match: {$expr: {$isArray: "$author"}}}
])
Playground example B

Related

Update Unique for One Field in Array

A user collection has an array of sites. Each site has a name. I wish that all site names will be unique per user.
user-1 with two sites called "a" and "b", and user-2 with a site called "a" is ok. But user-1 with two sites called "a" and "a" is not ok.
Here is an attempt to update a site's name but only if it's unique per user. Unfortunately it doesn't work:
collection.update({
_id: userId,
'sites._id' : req.params.id,
'sites' : {$not : {$elemMatch : {name: req.body.name}}}
},
{ $set: { 'sites.$.name': req.body.name } });
Data example:
db.users.insert({
"_id" : ObjectId("59f1ebf8090f072ee17438f5"),
"email" : "user.email#gmail.com",
"sites" : [
{
"_id" : ObjectId("59f5ebf8190f074ee17438f3"),
"name" : "site 1"
},
{
"_id" : ObjectId("59f5bbf8194f074be17438f2"),
"name" : "site 2"
}
]});
So the problem here is that you need to update by a specific _id value in the array but only where the "name" does not already exist in another element. The particular problem is the positional $ operator and how that applies to using the "matched" element.
Point is you want the "position" from the _id but the other constraint on the whole array without actually affecting either the document match or the position.
You cannot use $elemMatch since the "name" test applies to ALL elements of the array, regardless of _id. And you cannot "dot notate" both fields in the query, since the match conditions could be found on different positions. For example "site 3" is not matched on the first array element, but a provided _id value may be matched on the second array element.
Presently, the positional $ operator only works with the first match found. So what needs to happen at current is we can only really keep the position from the _id match, even though you still want a constraint looking for "unique" values within "name" which are not currently present in another before altering.
Current MongoDB
The only way you currently get to keep the position but be able to test the array for not having the "name" is to use $where with a JavaScript expression which looks at the array. The trick being that the normal query conditions match the "position", but the $where affects document selection without having any effect on the "position" which is matched.
So trying to write "site 2" to the first array entry:
db.users.update(
{
"_id" : ObjectId("59f1ebf8090f072ee17438f5"),
"sites._id": ObjectId("59f5ebf8190f074ee17438f3"),
"$where": '!this.sites.some(s => s.name === "site 2")'
},
{
"$set": { "sites.$.name": "site 2" }
}
)
Updates nothing because the constraint of:
!this.sites.some(s => s.name === "site 2")
Shows that at least one of the array elements via Array.some() actually matched the criteria, so the document is rejected from update.
When you change to "site 3" on the second element:
db.users.update(
{
"_id" : ObjectId("59f1ebf8090f072ee17438f5"),
"sites._id": ObjectId("59f5bbf8194f074be17438f2"),
"$where": '!this.sites.some(s => s.name === "site 3")'
},
{
"$set": { "sites.$.name": "site 3" }
}
)
None of the array elements are matched by the $where clause, but of course the _id does so this position is used in the update. So the update applies correctly to the matched element and not just the "first element" which did not have the name:
/* 1 */
{
"_id" : ObjectId("59f1ebf8090f072ee17438f5"),
"email" : "user.email#gmail.com",
"sites" : [
{
"_id" : ObjectId("59f5ebf8190f074ee17438f3"),
"name" : "site 1"
},
{
"_id" : ObjectId("59f5bbf8194f074be17438f2"),
"name" : "site 3"
}
]
}
MongoDB 3.6
From MongoDB 3.6 we can do this differently and not rely on JavaScript evaluation as a "workaround". The new addition is the positional filtered $[<identifier>] operator which allows us to "turn around" those conditions as above and instead "query" natively on the "name" but "filter" on the "_id" value:
db.users.update(
{
"_id": ObjectId("59f1ebf8090f072ee17438f5"),
"sites.name": { "$ne": "site 4" }
},
{ "$set": { "sites.$[el].name": "site 4" } },
{ "arrayFilters": [{ "el._id": ObjectId("59f5bbf8194f074be17438f2") }] }
)
So here we get the correct result of the second array element still being updated despite the query condition actually meeting a "first" match on the very first element of the array. This is because what takes over after the "document selection" from the query is how the arrayFilters are defined along with the positional filtered operator. Her we alias as [el] for the matched element from the "sites" array, and look for it's _id sub-property within the arrayFilters definition.
This actually makes a lot more sense to read and is easier to construct that a JavaScript statement representing the condition. So it's a really useful boon being added into MongoDB for updating.
Conclusion
So right now, code using $where to positionally match like this yet still put a constraint on the array. From MongoDB 3.6 this will be much easier to code and a bit more efficient since the document exclusion can be fully done in native code operators whilst still retaining a method for matching the position.

Add object to object array if an object property is not given yet

Use Case
I've got a collection band_profiles and I've got a collection band_profiles_history. The history collection is supposed to store a band_profile snapshot every 24 hour and therefore I am using MongoDB's recommended format for historical tracking: Each month+year is it's own document and in an object array I will store the bandProfile snapshot along with the current day of the month.
My models:
A document in band_profiles_history looks like this:
{
"_id" : ObjectId("599e3bc406955db4cbffe0a8"),
"month" : 7,
"tag_lowercased" : "9yq88gg",
"year" : 2017,
"values" : [
{
"_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "example name1",
},
"day" : 1
},
{
"_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "new name",
},
"day" : 2
}
]
}
And a document in band_profiles:
{
"_id" : ObjectId("5989a6190f39d9fd70cddeb1"),
"tag" : "9V9LRGU",
"name_normalized" : "example name",
"tag_lowercased" : "9v9lrgu",
}
This is how I upsert my documents into band_profiles_history at the moment:
BandProfileHistory.update(
{ tag_lowercased: tag, year, month},
{ $push: {
values: { day, profile }
}
},
{ upsert: true }
)
My problem:
I only want to insert ONE snapshot for every day. Right now it would always push a new object into the object array values no matter if I already have an object for that day or not. How can I achieve that it would only push that object if there is no object for the current day yet?
Putting mongoose aside for a moment:
There is an operation addToSet that will add an element to an array if it doesn't already exists.
Caveat:
If the value is a document, MongoDB determines that the document is a duplicate if an existing document in the array matches the to-be-added document exactly; i.e. the existing document has the exact same fields and values and the fields are in the same order. As such, field order matters and you cannot specify that MongoDB compare only a subset of the fields in the document to determine whether the document is a duplicate of an existing array element.
Since you are trying to add an entire document you are subjected to this restriction.
So I see the following solutions for you:
Solution 1:
Read in the array, see if it contains the element you want and if not push it to the values array with push.
This has the disadvantage of NOT being an atomic operation meaning that you could end up would duplicates anyways. This could be acceptable if you ran a periodical clean up job to remove duplicates from this field on each document.
It's up to you to decide if this is acceptable.
Solution 2:
Assuming you are putting the field _id in the subdocuments of your values field, stop doing it. Assuming mongoose is doing this for you (because it does, from what I understand) stop it from doing it like it says here: Stop mongoose from creating _id for subdocument in arrays.
Next you need to ensure that the fields in the document always have the same order, because order matters when comparing documents in the addToSet operation as stated in the citation above.
Solution 3
Change the schema of your band_profiles_history to something like:
{
"_id" : ObjectId("599e3bc406955db4cbffe0a8"),
"month" : 7,
"tag_lowercased" : "9yq88gg",
"year" : 2017,
"values" : {
"1": { "_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "example name1"
}
},
"2": {
"_id" : ObjectId("599e3bc41c073a7418fead91"),
"profile" : {
"_id" : ObjectId("5989a65d0f39d9fd70cde1fe"),
"tag" : "9YQ88GG",
"name_normalized" : "new name"
}
}
}
Notice that the day field became the key for the subdocuments on the values. Notice also that values is now an Object instead of an Array.
No you can run an update query that would update values.<day> only if values.<day> didn't exist.
Personally I don't like this as it is using the fact that JSON doesn't allow duplicate keys to support the schema.
First of all, sadly mongodb does not support uniqueness of a field in an array of a collection. You can see there is major bug opened for 7 years and not closed yet(that is a shame in my opinion).
What you can do from here is limited and all is on application level. I had same problem and solve it in application level. Do something like this:
First read your document with document _id and values.day.
If your reading in step 1 returns null, that means there is no record on values array for given day, so you can push the new value(I assume band_profile_history has record with _id value).
If your reading in step 1 returns a document, that means values array has a record for given day. In that case you can use setoperation with $operator.
Like others said, they will be not atomic but while you are dealing with your problem in application level, you can make whole bunch of code synchronized. There will be 2 queries to run on mongodb among of 3 queries. Like below:
db.getCollection('band_profiles_history').find({"_id": "1", "values.day": 3})
if returns null:
db.getCollection('band_profiles_history').update({"_id": "1"}, {$push: {"values": {<your new band profile history for given day>}}})
if returns not null:
db.getCollection('band_profiles_history').update({"_id": "1", "values.day": 3}, {$set: {"values.$": {<your new band profile history for given day>}}})
To check if object is empty
{ field: {$exists: false} }
or if it is an array
{ field: {$eq: []} }
Mongoose also supports field: {type: Date} so you can use it instead counting a days, and do updates only for current date.

removing object from nested array of objects mongodb

I've got collection with volunteer information in it, and it lists the volunteers as an array of objects. I can display all the shifts for each volunteer, but removing one from the array is proving difficult for me:
Sample data:
"_id" : ObjectId("59180305c19dbaa4ecd9ee59"),
"where" : "Merchandise tent",
"description" : "Sell gear at the merchandise tent.",
"shifts" : [
{
"dateNeeded" : ISODate("2017-06-23T00:00:00Z"),
"timeslot" : "8:00 - NOON",
"needed" : 2,
"_id" : ObjectId("591807546a71c3a57d1a2105"),
"volunteers" : [
{
"fullname" : "Mary Mack",
"phone" : "1234567890",
"email" : "mary#gmail.com",
"_id" : ObjectId("591ce45bc7e8a8c7b742474c")
}
]
},
The data I have available for this is:
_id, where, shifts.timeslot, shifts.dateNeeded, volunteers.email
Can someone help me? Lets say Mary Mack wants to unVolunteer for the 8 - Noon shift at the merchandise tent. She may be listed under other shifts as well, but we only want to remove her from this shift.
You can do this by specifying something to match the "document" and then the required "shifts" array entry as the query expression for an .update(). Then apply the positional $ operator for the matched array index with $pull:
db.collection.update(
{ "_id": ObjectId("59180305c19dbaa4ecd9ee59"), "shifts.timeslot": "8:00 - NOON" },
{ "$pull": { "shifts.$.volunteers": { "fullname": "Mary Mack" } } }
)
That is okay in this instance since you are only trying to "match" on the "outer" array in the nested structure and the $pull has query arguments of it's own to identify the array entry to remove.
You really should be careful using "nested arrays" though. As whilst a $pull operation like this works, updates to the "inner" array are not really possible since the positional $ operator will only match the "first" element that meets the condition. So your example of "Mary Mack" in multiple shifts would only ever match in the first "shifts" array entry found.
Try this
db.example.update(
{},
{ $unset: {"Mary Mack":1}},
false, true
)

MongoDB Match an array with $type?

I have a MongoDb collection contains of 284.116 tweets. The problem is the "author" field in some objects are in object type, but in other objects -this "author" field- are in array type. So the problem is I want to filter which ones are Array, and which ones are Object.
For example:
The author field's type is object.
{
"_id" : ObjectId("55edfbd11a87d41d987a6dc1"),
"tweet" : "Back in my dorm, yay!",
"uri" : "https://twitter.com/natalylug0/status/640994018529181696",
"date" : "2015-09-08 00:04:17",
"country" : "U.S.A.",
"city" : "Texas",
"state" : "Dallas",
"author" : {
"username" : "Nataly",
"uri" : "https://twitter.com/natalylug0",
"screenname" : "natalylug0"
}
}
And the other one:
The author field's type is array.
{
"_id" : ObjectId("55ee3a00e11fbb1030d659fe"),
"author" : [
{
"username" : "Relapsed Shini",
"uri" : "https://twitter.com/iPictoraL",
"screenname" : "iPictoraL"
}
],
"tweet" : "#zumbiezuza 😍😍😍💚 ily zoeeeeeeee",
"uri" : "https://twitter.com/iPictoraL/status/641060812140900352",
"date" : "2015-09-08 01:29:42",
"country" : "U.S.A.",
"city" : "Texas",
"state" : "Dallas"
}
So I executed query like this:
db.getCollection('tweets').find({ author: { $type: 4} })
And what I get is
Fetched 0 record(s)
But if execute $type:3 I get 284.116 values which is the same value of size of this collection.
So my question is, how can I filter objects which "author" fields contain arrays.
Actually there is a "gotcha" listed in the documentation for $type specifically about arrays:
When applied to arrays, $type matches any inner element that is of the specified type. Without projection this means that the entire array will match if any element has the right type. With projection, the results will include just those elements of the requested type.
So that means that rather than detect whether the "element itself" is in array, what is actually being tested is the "inner element" of the array to see what type it is.
Now the documentation itself suggests this JavaScript test with $where:
.find({ "$where": "return Array.isArray(this.author)" })
But I think that's pretty horrible as there is a better way.
The trick is in "dot notation", where you ask for the 0 index element of the array to $exists
.find({ "author.0": { "$exists": true } })
Which is just the basic case that if the "0th" element exists then the field is present and the data is therefore an array.
Once you understand that logical premise then it is pretty simple test. The only thing that cannot be matched by that is a "truly empty" array, in which case you can fall back to the JavaScript alternative if needed. But this can actually use an index, so it would be preferred to use the latter form.
Here's a better way to do what you originally asked; that is to actually check if a certain field holds an array type value:
.find({ "author": { "$gte": [] } })
MongoDB's $type functionality for arrays, though well documented, is IMO inconsistent with all the other $type checks, and obviously doesn't work for this use case, but since around 2.6, you can use the above query to check whether the value is an array (empty or not).
I say this is "better" than the currently selected answer, because executing code via $where is not recommended, unless standard query constructs truly cannot get the job done.
To elaborate, $where is not recommended due to performance via lack of ability to use indexes in executed code. More detail: https://docs.mongodb.com/manual/reference/operator/query/where/#considerations
Also, if you'd like to check for non-empty arrays specifically, use this:
.find({ "author": { "$gt": [] } })
Technically, this one is also better than the current answer's corresponding $exists solution, as the field may have a non-array object with a field named "0", and that would match as a "non-empty array", which is wrong in that case.
Since mongoDB version 3.2 we have $isArray for an aggregation pipeline, which allow to do something like:
db.tweets.aggregate([
{$set: {isArray: {$cond: [{ $isArray: "$author" }, 1, 0]}}},
{$match: {isArray: 1}}
])
Playground example A
Or even:
db.tweets.aggregate([
{$match: {$expr: {$isArray: "$author"}}}
])
Playground example B

matching fields internally in mongodb

I am having following document in mongodb
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"students" : [
{
"id" : 23,
"class" : "a"
},
{
"id" : 55,
"class" : "b"
}
]
}
{
"_id" : ObjectId("517b9d05254e385a07fc4e71"),
"studentId" : 55,
"students" : [
{
"id" : 33,
"class" : "c"
}
]
}
Note: Not an actual data but schema is exactly same.
Requirement: Finding the document which matches the studentId and students.id(id inside the students array using single query.
I have tried the code like below
db.data.aggregate({$match:{"students.id":"$studentId"}},{$group:{_id:"$student"}});
Result: Empty Array, If i replace {"students.id":"$studentId"} to {"students.id":33} it is returning the second document in the above shown json.
Is it possible to get the documents for this scenario using single query?
If possible, I'd suggest that you set the condition while storing the data so that you can do a quick truth check (isInStudentsList). It would be super fast to do that type of query.
Otherwise, there is a relatively complex way of using the Aggregation framework pipeline to do what you want in a single query:
db.students.aggregate(
{$project:
{studentId: 1, studentIdComp: "$students.id"}},
{$unwind: "$studentIdComp"},
{$project : { studentId : 1,
isStudentEqual: { $eq : [ "$studentId", "$studentIdComp" ] }}},
{$match: {isStudentEqual: true}})
Given your input example the output would be:
{
"result" : [
{
"_id" : ObjectId("517b88decd483543a8bdd95b"),
"studentId" : 23,
"isStudentEqual" : true
}
],
"ok" : 1
}
A brief explanation of the steps:
Build a projection of the document with just studentId and a new field with an array containing just the id (so the first document it would contain [23, 55].
Using that structure, $unwind. That creates a new temporary document for each array element in the studentIdComp array.
Now, take those documents, and create a new document projection, which continues to have the studentId and adds a new field called isStudentEqual that compares the equality of two fields, the studentId and studentIdComp. Remember that at this point there is a single temporary document that contains those two fields.
Finally, check that the comparison value isStudentEqual is true and return those documents (which will contain the original document _id and the studentId.
If the student was in the list multiple times, you might need to group the results on studentId or _id to prevent duplicates (but I don't know that you'd need that).
Unfortunately it's impossible ;(
to solve this problem it is necessary to use a $where statement
(example: Finding embeded document in mongodb?),
but $where is restricted from being used with aggregation framework
db.data.find({students: {$elemMatch: {id: 23}} , studentId: 23});