MongoDB nested Document Validation for sub-documents - mongodb

I got a document structured like the following. My question is how do I do the nested part "roles" validation on the database side. My requirements are:
the roles size could be 0 or more than 1.
the presence of name and created_by for a role if a role is created.
{
"_id": "123456",
"name": "User Name",
"roles": [
{
"name": "mobiles_user",
"last_usage_at": {
"$date": 1457000592991
},
"created_by": "987654",
"created_at": {
"$date": 1457000592991
}
},
{
"name": "webs_user",
"last_usage_at": {
"$date": 1457000592991
},
"created_by": "987654",
"created_at": {
"$date": 1457000592991
}
},
]
}
At the moment, I am only doing the following for those none nested attributes:
db.createCollection( "users",
{ "validator" : {
"_id" : {
"$type" : "string"
},
"email" : {
"$regex" : /#gmail\.com$/
},
"name" : {
"$type" : "string"
}
}
} )
Could anyone please advise that how to do the nested document validation?

Yes, you can validate all sub-documents in a document by negating $elemMatch, and you can ensure that the size is not 1. It's sure not pretty though! And not exactly obvious either.
> db.createCollection('users', {
... validator: {
... name: {$type: 'string'},
... roles: {$exists: 'true'},
... $nor: [
... {roles: {$size: 1}},
... {roles: {$elemMatch: {
... $or: [
... {name: {$not: {$type: 'string'}}},
... {created_by: {$not: {$type: 'string'}}},
... ]
... }}}
... ],
... }
... })
{ "ok" : 1 }
This is confusing, but it works! What it means is only accept documents where neither the size of roles is 1 nor roles has an element with a name that isn't a string or a created_by that isn't a string.
This is based upon the fact that in logic terms,
for all x: f(x) and g(x)
Is equivalent to
not exists x s.t.: not f(x) or not g(x)
We have to use the latter since MongoDB only gives us an exists operator.
Proof
Valid documents work:
> db.users.insert({
... name: 'hello',
... roles: [],
... })
WriteResult({ "nInserted" : 1 })
> db.users.insert({
... name: 'hello',
... roles: [
... {name: 'foo', created_by: '2222'},
... {name: 'bar', created_by: '3333'},
... ]
... })
WriteResult({ "nInserted" : 1 })
If a field is missing from roles, it fails:
> db.users.insert({
... name: 'hello',
... roles: [
... {name: 'foo', created_by: '2222'},
... {created_by: '3333'},
... ]
... })
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
If a field in roles has the wrong type, it fails:
> db.users.insert({
... name: 'hello',
... roles: [
... {name: 'foo', created_by: '2222'},
... {name: 'bar', created_by: 3333},
... ]
... })
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
If roles has size 1 it fails:
> db.users.insert({
... name: 'hello',
... roles: [
... {name: 'foo', created_by: '2222'},
... ]
... })
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
The only thing I can't figure out unfortunately is how to ensure that roles is an array. roles: {$type: 'array'} seems to fail everything, I presume because it's actually checking that the elements are of type 'array'?

Edit: this answer is not correct, it is possible to validate all sub-documents in the array. See answer: https://stackoverflow.com/a/43102783/200224
You can't really. You can do things like:
"roles.name": { "$type": "string" }
But all that really means is at "at least one" of those properties need match the specified type. That means this would actually be valid:
{
"_id" : "123456",
"name" : "User Name",
"roles" : [
{
"name" : "mobiles_user",
"last_usage_at" : ISODate("2016-03-03T10:23:12.991Z"),
"created_by" : "987654",
"created_at" : ISODate("2016-03-03T10:23:12.991Z")
},
{
"name" : "webs_user",
"last_usage_at" : ISODate("2016-03-03T10:23:12.991Z"),
"created_by" : "987654",
"created_at" : ISODate("2016-03-03T10:23:12.991Z")
},
{
"name" : 1
}
]
}
It is afterall "documement validation" and that is by nature not well suited to sub-documents in arrays, or any data in a contained array really.
The core of the implementation relies on expressions available to query operators, and since MongoDB lacks anythin in standard query expressions that equates to "All array entries must match this value" without being directly specific then it's not possible to express as a validator condition.
The only posibility to check array content like that in a "query" expression is using $where, and that is noted to not be an available option with document validation.
Even the $size operator available for queries must match a specific "size" value, and cannot use an in-equality condition. So you "could" verify a strict size, but not a minimal size, unless:
"roles.0": { "$exists": true }
This is a feature in "infancy" and somewhat experimental, so there is the possibility that future releases may address this.
But for now, your better option is to do such "schema validation" in client side code ( where you will get a lot better exception reporting ) instead. There are many libraries already existing that take that approach.

Related

Complex MongoDB query?

I'm pretty brand new to Mongo and queries still, so that said, I'm trying to build a query that will find me results that match these three types of dog breeds and in addition to that, check for additional two specs. And finally, sort all by age. All the data comes from a csv file (scrnshot), there aren't any sub categories to any of the entries.
db.animals.find({
"animal_id" : 1,
"breed" : "Labrador Retriever Mix",
"breed" : "Chesapeake Bay Retriever",
"breed" : "Newfoundland",
$and : [ { "age_upon_outcome_in_weeks" :{"$lt" : 156, "$gte" : 26} ],
$and: {"sex_upon_outcome" : "Intact Female"}}).sort({"age_upon_outcome_in_weeks" : 1})
This is throwing a number of errors, such as :
Error: error: {
"ok" : 0,
"errmsg" : "$and must be an array",
"code" : 2,
"codeName" : "BadValue"
}
What am I messing up? Or is there a better way to do it?
As mentionend by takis in the comments, you cannot repeat a key in a mongo query - you have to imagine that your query document becomes a json object, and each time a key is repeated is replaces the previous one. To go around this problem, mongodb supports $or and $and operators. For complex queries like this one, I would recommend starting with a global each containing a single constraint or a $or constraint. Your query becomes this:
db.coll.find({
"$and": [
{ "animal_id": 1 },
{ "age_upon_outcome_in_weeks": { "$lt": 156, "$gte": 26 } },
{ "sex_upon_outcome": "Intact Female" },
{ "$or": [
{ "breed": "Labrador Retriever Mix" },
{ "breed": "Chesapeake Bay Retriever" },
{ "breed": "Chesapeake Bay Retriever" },
{ "breed": "Newfoundland" }
]
}
]
})
.sort({"age_upon_outcome_in_weeks" : 1})
--- edit
You can also consider using the $in instead of the $or:
db.coll.find({
"animal_id": 1,
"age_upon_outcome_in_weeks": { "$lt": 156, "$gte": 26 },
"sex_upon_outcome": "Intact Female",
"breed": { "$in": [
"Labrador Retriever Mix",
"Chesapeake Bay Retriever",
"Chesapeake Bay Retriever",
"Newfoundland"
] }
})
.sort({"age_upon_outcome_in_weeks" : 1})

What's the most economical alternative to multiple positional identifiers in MongoDB?

I have a collection named authors with the following schema:
authors: {
_id,
firstName,
lastName,
posts: [
post 1: {...},
post 2: {...},
post 3: {
_id,
title,
content,
tags: [
tag 1: {...},
tag 2: {...},
tag 3: {
_id,
name,
description,
},
],
},
],
}
As can be seen, posts is an array of objects inside the authors collection. Each object inside this array, in turn, has tags, another array of objects. And each of these tags objects has three fields: _id, name, and description.
I'm trying to write a GraphQL mutation to update this name field on matching documents in the collection.
const updatedTagInAuthor = await Author
.updateMany({ 'posts.tags._id': args.newTagInfo._id },
{
$set: {
'posts.$.tags.$.name': args.newTagInfo.name,
'posts.$.tags.$.description': args.newTagInfo.description,
},
}, opts);
The above snippet obviously fails since MongoDB doesn't allow multiple positional elements ($) in a query. So is there any economical alternative to accomplish what I'm trying to do?
I tried the ArrayFilter method as MongoDB suggests:
const updatedTagInAuthor = await Author.update(
{},
{ $set: { 'posts.$[].tags.$[tag].name': args.newTagInfo.name } },
{ arrayFilters: [{ 'tag._id': args.newTagInfo._id }], multi: true }
);
But this throws the following error:
Cannot read property 'castForQuery' of undefined
Still confused!
These are the documents I'm updating with the kind of query I have given,
{"name" : "Steve","details" : [ {
"work" : [ {
"Company" : "Byjus",
"id" : 1,
"country" : "India"
},
{
"Company" : "Vodafone",
"id" : 2,
"country" : "UK"
}]
}]},{"name" : "Virat","details" : [ {
"work" : [ {
"Company" : "Byjus",
"id" : 1,
"country" : "India"
},
{
"Company" : "Verizon",
"id" : 3,
"country" : "US"
}]
}]}
QUERY:
db.getCollection('Users').update({"details": {"$elemMatch": {"work.id": 1}}}, {'$set': {'details.$[].work.$.Company': 'Boeing', 'details.$[].work.$.country': 'US'} }, {multi: true});
It's similar to what you asked right?
Try inserting those two Documents in a collection called User and try the above query in Mongo CONSOLE directly, not in GUI. Use the Query completely not just the $set method.
Try this,
Author.update({"posts": { "$elemMatch": { "tags.id": args.newTagInfo._id } }},
{'$set': {'posts.$[].tags.$.name': args.newTagInfo.name, 'posts.$[].tags.$.description': args.newTagInfo.description} },
{multi: true});

mongdb ensure uniqueness on two fields both ways

Say I have the fields a and b. I want to have a compound uniqueness where if a: 1, b: 2, I would not be able to do a: 2, b: 1.
The reason I want this is because I'm making a "friends list" kind of collection, where if a is connected to b, then it's automatically the reverse as well.
is this possible on a schema level or do I need to do queries to check.
If you don't need to differentiate between requester and requestee, you could sort the values before saving or querying so that your two fields a and b have a predictable order for any pair of friend IDs (and you can take advantage of the unique index constraint).
For example, using the mongo shell:
Create a helper function to return friend pairs in predictable order:
function friendpair (friend1, friend2) {
if ( friend1 < friend2) {
return ({a: friend1, b: friend2})
} else {
return ({a: friend2, b: friend1})
}
}
Add a compound unique index:
> db.friends.createIndex({a:1, b:1}, {unique: true});
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
Insert unique pairs (should work)
> db.friends.insert(friendpair(1,2))
WriteResult({ "nInserted" : 1 })
> db.friends.insert(friendpair(1,3))
WriteResult({ "nInserted" : 1 })
Insert non-unique pair (should return duplicate key error):
> db.friends.insert(friendpair(2,1))
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.friends index: a_1_b_1 dup key: { : 1.0, : 2.0 }"
}
})
Search should work in either order:
db.friends.find(friendpair(3,1)).pretty()
{ "_id" : ObjectId("5bc80ed11466009f3b56fa52"), "a" : 1, "b" : 3 }
db.friends.find(friendpair(1,3)).pretty()
{ "_id" : ObjectId("5bc80ed11466009f3b56fa52"), "a" : 1, "b" : 3 }
Instead of handling duplicate key errors or insert versus update, you could also use findAndModify with an upsert since this is expected to be a unique pair:
> var pair = friendpair(2,1)
> db.friends.findAndModify({
query: pair,
update: {
$set: {
a : pair.a,
b : pair.b
},
$setOnInsert: { status: 'pending' },
},
upsert: true
})
{
"_id" : ObjectId("5bc81722ce51da0e4118c92f"),
"a" : 1,
"b" : 2,
"status" : "pending"
}
Doesn't seem like you can do a unique on the entire array's values so I'm doing a kind of work around. I'm using the $jsonSchema as follows:
{
$jsonSchema:
{
bsonType:
"object",
required:
[
"status",
"users"
],
properties:
{
status:
{
enum:
[
"pending",
"accepted"
],
bsonType:
"string"
},
users:
{
bsonType:
"array",
description:
"references two user_id",
items:
{
bsonType:
"objectId"
},
maxItems:
2,
minItems:
2,
},
}
}
}
then I will use $all to find the connected users, e.g.
db.collection.find( { users: { $all: [ ObjectId1, ObjectId2 ] } } )

Update existing mongodb data into an embedded document

I am new to MongoDB so this is probably a basic question (hopefully). I currently have 10 million records with 410 fields loaded in a mongodb collection like so:
{
"_id" : ObjectId("........"),
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I need to wrap all of this data into an embedded document like so:
{
"_id" : ObjectId("........"),
"data" : {
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I don't necessarily need to update this data in-place but that would be nice. If I need to export this data somehow specifying the new format and then re-import the new, updated data that is fine. Performing this via the MongoDB shell would be ideal.
As suggested by chridam within comments you can execute the following aggregation pipeline:
db.collectionName.aggregate([
{ $project: { _id: "$_id", data: "$$ROOT" } },
{ $out: "newCollectionName" }
]);
This way you have the _id field both at root level and in the data object. Thus, you can execute a massive update to unset the second one:
db.newCollectionName.updateMany(
{},
{ $unset: { "data._id": "" } }
);
Finally, you can drop the first collection and rename the second to restore the original name on the updated collection:
db.collectionName.drop();
db.newCollectionName.rename("collectionName");
This approach fully works within the database, avoiding fetching any of your 10 million documents.
You can simply do this in the shell with the following
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
For example, if we insert the following documents:
> db.test.insertMany([
... {
... _id: ObjectId("5a91af8908e17c5997e03b7e"),
... field1: false,
... field2: 0,
... field3: "No"
... },
... {
... _id: ObjectId("5a91afbc08e17c5997e03b7f"),
... field1: true,
... field2: 1,
... field3: "Yes"
... }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5a91af8908e17c5997e03b7e"),
ObjectId("5a91afbc08e17c5997e03b7f")
]
}
Then run:
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
Our documents now look like this:
> db.test.find().pretty()
{
"_id" : ObjectId("5a91af8908e17c5997e03b7e"),
"data" : {
"field1" : false,
"field2" : 0,
"field3" : "No"
}
}
{
"_id" : ObjectId("5a91afbc08e17c5997e03b7f"),
"data" : {
"field1" : true,
"field2" : 1,
"field3" : "Yes"
}
}

Mongodb Update/Upsert array exact match

I have a collection :
gStats : {
"_id" : "id1",
"criteria" : ["key1":"value1", "key2":"value2"],
"groups" : [
{"id":"XXXX", "visited":100, "liked":200},
{"id":"YYYY", "visited":30, "liked":400}
]
}
I want to be able to update a document of the stats Array of a given array of criteria (exact match).
I try to do this on 2 steps :
Pull the stat document from the array of a given "id" :
db.gStats.update({
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2096955"},{"value1" : "2015610"}]}
},
{
$pull : {groups : {"id" : "XXXX"}}
}
)
Push the new document
db.gStats.findAndModify({
query : {
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2015610"}, {"key2" : "2096955"}]}
},
update : {
$push : {groups : {"id" : "XXXX", "visited" : 29, "liked" : 144}}
},
upsert : true
})
The Pull query works perfect.
The Push query gives an error :
2014-12-13T15:12:58.571+0100 findAndModifyFailed failed: {
"value" : null,
"errmsg" : "exception: Cannot create base during insert of update. Cause
d by :ConflictingUpdateOperators Cannot update 'criteria' and 'criteria' at the
same time",
"code" : 12,
"ok" : 0
} at src/mongo/shell/collection.js:614
Neither query is working in reality. You cannot use a key name like "criteria" more than once unless under an operator such and $and. You are also specifying different fields (i.e groups) and querying elements that do not exist in your sample document.
So hard to tell what you really want to do here. But the error is essentially caused by the first issue I mentioned, with a little something extra. So really your { "$size": 2 } condition is being ignored and only the second condition is applied.
A valid query form should look like this:
query: {
"$and": [
{ "criteria" : { "$size" : 2 } },
{ "criteria" : { "$all": [{ "key1": "2015610" }, { "key2": "2096955" }] } }
]
}
As each set of conditions is specified within the array provided by $and the document structure of the query is valid and does not have a hash-key name overwriting the other. That's the proper way to write your two conditions, but there is a trick to making this work where the "upsert" is failing due to those conditions not matching a document. We need to overwrite what is happening when it tries to apply the $all arguments on creation:
update: {
"$setOnInsert": {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
},
"$push": { "stats": { "id": "XXXX", "visited": 29, "liked": 144 } }
}
That uses $setOnInsert so that when the "upsert" is applied and a new document created the conditions specified here rather than using the field values set in the query portion of the statement are used instead.
Of course, if what you are really looking for is truly an exact match of the content in the array, then just use that for the query instead:
query: {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
}
Then MongoDB will be happy to apply those values when a new document is created and does not get confused on how to interpret the $all expression.