Multiple arrays of objects inside array of objects in MongoDB - mongodb

Fellow programmers.
Is it considered as a bad practice to use such MongoDb model:
{
companyId: '',
companyName: '',
companyDivisions: [
{
divisionId: '',
divisionName: '',
divisionDepartments: [
{
departmentId: '',
departmentName: ''
},
...
]
},
...
],
},
...
Because right now it's getting complicated to update certain departments.
Thanks.

I don't think this is a bad practice generally speaking. If your model resembles this data structure it is a good choice storing data this way, leveraging a document database. You can naturally handle data and most likely you have a direct map onto your data model.
Another choice would be to have three different collections:
companies;
divisions;
departements.
However, in this case you would end up storing data as you would do in a relational database. Thus, more than a general rule, it is a matter of data model and expected query profile on your database.
Edit: using MongoDb 3.6+
Using your document oriented approach, a single department can be granularly updated using the following update:
db.companies.findAndModify(
{
query: {
"companyId": "yourCompanyId"
},
update: {
$set : { "companyDivisions.$[element1].divisionDepartments.$[element2].divisioneName": "yourNewName" }
},
arrayFilters: [
{ "element1.divisionId": "yourDivisioneId" },
{ "element2.departmentId": "yourDepartementId" }
]
});
This update uses the new powerful filtered positional operator feature introduced by MongoDB v3.6. The $[<identifier>] syntax allows to select an array entry based on a specific condition expressed in the arrayFilters option of the db.collection.findAndModify() method.
Note that in case the condition matches multiple array items, the update affects all such items, thus allowing for multiple updates as well.
Furthermore, note that I would apply such an optimization only in case of need, since premature optimization is the root of all evil. (D. Knuth).

Related

Mongodb create index for boolean and integer fields

user collection
[{
deleted: false,
otp: 3435,
number: '+919737624720',
email: 'Test#gmail.com',
name: 'Test child name',
coin: 2
},
{
deleted: false,
otp: 5659,
number: '+917406732496',
email: 'anand.satyan#gmail.com',
name: 'Nivaan',
coin: 0
}
]
I am using below command to create index Looks like for string it is working
But i am not sure this is correct for number and boolean field.
db.users.createIndex({name:"text", email: "text", coin: 1, deleted: 1})
I am using this command to filter data:
db.users.find({$text:{$search:"anand.satya"}}).pretty()
db.users.find({$text:{$search:"test"}}).pretty()
db.users.find({$text:{$search:2}}).pretty()
db.users.find({$text:{$search:false}}).pretty()
string related fields working. But numeric and boolean fields are not working.
Please check how i will create index for them
The title and comments in this question are misleading. Part of the question is more focused on how to query with fields that contain boolean and integer fields while another part of the question is focused on overall indexing strategies.
Regarding indexing, the index that was shown in the question is perfectly capable of satisfying some queries that include predicates on coin and deleted. We can see that when looking at the explain output for a query of .find({$text:{$search:"test"}, coin:123, deleted: false}):
> db.users.find({$text:{$search:"test"}, coin:123, deleted: false}).explain().queryPlanner.winningPlan.inputStage
{
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
filter: {
'$and': [ { coin: { '$eq': 123 } }, { deleted: { '$eq': false } } ]
},
keyPattern: { _fts: 'text', _ftsx: 1, coin: 1, deleted: 1 },
indexName: 'name_text_email_text_coin_1_deleted_1',
isMultiKey: false,
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'backward',
indexBounds: {}
}
}
Observe here that the index scan stage (IXSCAN) is responsible for providing the filter for the coin and deleted predicates (as opposed to the database having to do that after FETCHing the full document.
Separately, you mentioned in the question that these two particular queries aren't working:
db.users.find({$text:{$search:2}}).pretty()
db.users.find({$text:{$search:false}}).pretty()
And by 'not working' you are referring to the fact that no results are being returned. This is also related to the following discussion in the comments which seemed to have a misleading takeaway:
You'll have to convert your coin and deleted fields to string, if you want it to be picked up by $search – Charchit Kapoor
So. There is no way for searching boolean or integger field. ? – Kiran S youtube channel
Nope, not that I know of. – Charchit Kapoor
You can absolutely use boolean and integer values in your query predicate to filter data. This playground demonstrates that.
What #Charchit Kapoor is mentioning that can't be done is using the $text operator to match and return results whose field values are not strings. Said another way, the $text operator is specifically used to perform a text search.
If what you are trying to achieve are direct equality matches for the field values, both strings and otherwise, then you can delete the text index as there is no need for using the $text operator in your query. A simplified query might be:
db.users.find({ name: "test"})
Demonstrated in this playground.
A few additional things come to mind:
Regarding indexing overall, databases will generally consider using an index if the first key is used in the query. You can read more about this for MongoDB specifically on this page. The takeaway is that you will want to create the appropriate set of indexes to align with your most commonly executed queries. If you have a query that just filters on coin, for example, then you may wish to create an index that has coin as its first key.
If you want to check if the exact string value is present in multiple fields, then you may want to do so using the $or operator (and have appropriate indexes for the database to use).
If you do indeed need more advanced text searching capabilities, then it would be appropriate to either continue using the $text operator or consider Atlas Search if the cluster is running in Atlas. Doing so does not prevent you from also having indexes that would support your other queries, such as on { coin: 2 }. It's simply that the syntax for performing such a query needs to be updated.
There is a lot going on here, but the big takeaway is that you can absolutely filter data based on any data type. Doing so simply requires using the appropriate syntax, and doing so efficiently requires an appropriate indexing strategy to be used along side of the queries.

MongoDB: Find document given field values in an object with an unknown key

I'm making a database on theses/arguments. They are related to other arguments, which I've placed in an object with a dynamic key, which is completely random.
{
_id : "aeokejXMwGKvWzF5L",
text : "test",
relations : {
cF6iKAkDJg5eQGsgb : {
type : "interpretation",
originId : "uFEjssN2RgcrgiTjh",
ratings: [...]
}
}
}
Can I find this document if I only know what the value of type is? That is I want to do something like this:
db.theses.find({relations['anything']: { type: "interpretation"}}})
This could've been done easily with the positional operator, if relations had been an array. But then I cannot make changes to the objects in ratings, as mongo doesn't support those updates. I'm asking here to see if I can keep from having to change the database structure.
Though you seem to have approached this structure due to a problem with updates in using nested arrays, you really have only caused another problem by doing something else which is not really supported, and that is that there is no "wildcard" concept for searching unspecified keys using the standard query operators that are optimal.
The only way you can really search for such data is by using JavaScript code on the server to traverse the keys using $where. This is clearly not a really good idea as it requires brute force evaluation rather than using useful things like an index, but it can be approached as follows:
db.theses.find(function() {
var relations = this.relations;
return Object.keys(relations).some(function(rel) {
return relations[rel].type == "interpretation";
});
))
While this will return those objects from the collection that contain the required nested value, it must inspect each object in the collection in order to do the evaluation. This is why such evaluation should really only be used when paired with something that can directly use an index instead as a hard value from the object in the collection.
Still the better solution is to consider remodelling the data to take advantage of indexes in search. Where it is neccessary to update the "ratings" information, then basically "flatten" the structure to consider each "rating" element as the only array data instead:
{
"_id": "aeokejXMwGKvWzF5L",
"text": "test",
"relationsRatings": [
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 1,
"ratingScore": 5
},
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 2,
"ratingScore": 6
}
]
}
Now searching is of course quite simple:
db.theses.find({ "relationsRatings.type": "interpretation" })
And of course the positional $ operator can now be used with the flatter structure:
db.theses.update(
{ "relationsRatings.ratingId": 1 },
{ "$set": { "relationsRatings.$.ratingScore": 7 } }
)
Of course this means duplication of the "related" data for each "ratings" value, but this is generally the cost of being to update by matched position as this is all that is supported with a single level of array nesting only.
So you can force the logic to match with the way you have it structured, but it is not a great idea to do so and will lead to performance problems. If however your main need here is to update the "ratings" information rather than just append to the inner list, then a flatter structure will be of greater benefit and of course be a lot faster to search.

JSON Schema with dynamic key field in MongoDB

Want to have a i18n support for objects stored in mongodb collection
currently our schema is like:
{
_id: "id"
name: "name"
localization: [{
lan: "en-US",
name: "name_in_english"
}, {
lan: "zh-TW",
name: "name_in_traditional_chinese"
}]
}
but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be
{
_id: "id"
name: "name"
localization: {
"en-US": "name_in_english",
"zh-TW": "name_in_traditional_chinese"
}
}
which would be neater and easier to parse (just localization[language] would get the value i want for specific language).
But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?
It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.
However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).
I would suggest an alternative schema:
{
your_id: "id_for_name"
lan: "en-US",
name: "name_in_english"
}
{
your_id: "id_for_name"
lan: "zh-TW",
name: "name_in_traditional_chinese"
}
Now you can :
set an index on { your_id: 1, lan: 1 } for speedy lookups
query for each translation individually and just get that translation:
db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
query for all the versions for each id using this same index:
db.so.find( { your_id: "id_for_name" } )
and also much easier update the translation for a specific language:
db.so.update(
{ your_id: "id_for_name", lan: 'en-US' },
{ $set: { name: "ooga" } }
)
Neither of those points are possible with your suggested schemas.
Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).
Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).
My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).
As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.
There is a case where an object is necessarily better than an array: supporting upserts into a set. For example, if you want to update an item having name 'item1' to have val 100, or insert such an item if one doesn't exist, all in one atomic operation. With an array, you'd have to do one of two operations. Given a schema like
{ _id: 'some-id', itemSet: [ { name: 'an-item', val: 123 } ] }
you'd have commands
// Update:
db.coll.update(
{ _id: id, 'itemSet.name': 'item1' },
{ $set: { 'itemSet.$.val': 100 } }
);
// Insert:
db.coll.update(
{ _id: id, 'itemSet.name': { $ne: 'item1' } },
{ $addToSet: { 'itemSet': { name: 'item1', val: 100 } } }
);
You'd have to query first to know which is needed in advance, which can exacerbate race conditions unless you implement some versioning. With an object, you can simply do
db.coll.update({
{ _id: id },
{ $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 } }
});
If this is a use case you have, then you should go with the object approach. One drawback is that querying for a specific name requires scanning. If that is also needed, you can add a separate array specifically for indexing. This is a trade-off with MongoDB. Upserts would become
db.coll.update({
{ _id: id },
{
$set: { 'itemSet.name': 'item1', 'itemSet.val': 100 },
$addToSet: { itemNames: 'item1' }
}
});
and the query would then simply be
db.coll.find({ itemNames: 'item1' })
(Note: the $ positional operator does not support array upserts.)

How to apply constraints in MongoDB?

I have started using MongoDB and I am fairly new to it.
Is there any way by which I can apply constraints on documents in MongoDB?
Like specifying a primary key or taking an attribute as unique?
Or specifying that a particular attribute is greater than a minimum value?
MongoDB 3.2 Update
Document validation is now supported natively by MongoDB.
Example from the documentation:
db.createCollection( "contacts",
{ validator: { $or:
[
{ phone: { $type: "string" } },
{ email: { $regex: /#mongodb\.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
}
} )
Original answer
To go beyond the uniqueness constraint available natively in indexes, you need to use something like Mongoose and its ability to support field-based validation. That will give you support for things like minimum value, but only when updates go through your Mongoose schemas/models.
Being a "schemaless" database, some of the things you mention must be constrained from the application side, rather than the db side. (such as "minimum value")
However, you can create indexes (keys to query on--remember that a query can only use one index at a time, so it's generally better to design your indexes around your queries, rather than just index each field you might query against):
http://www.mongodb.org/display/DOCS/Indexes#Indexes-Basics
And you can also create unique indexes, which will enforce uniqueness similar to a unique constraint (it does have some caveats, such as with array fields):
http://www.mongodb.org/display/DOCS/Indexes#Indexes-unique%3Atrue

mongoDB: unique index on a repeated value

So i'm pretty new to mongoDb so i figure this could be a misunderstanding on general usage. so bear with me.
I have a document schema I'm working with as such
{
name: "bob",
email: "bob#gmail.com",
logins: [
{ u: 'a', p: 'b', public_id: '123' },
{ u: 'x', p: 'y', public_id: 'abc' }
]
}
My Problem is that i need to ensure that the public ids are unique within a document and collection,
Furthermore there are some existing records being migrated from a mySQL DB that dont have records, and will therefore all be replaced by null values in mongo.
I figure its either an index
db.users.ensureIndex({logins.public_id: 1}, {unique: true});
which isn't working because of the missing keys and is throwing a E11000 duplicate key error index:
or this is a more fundamental schema problem in that I shouldn't be nesting objects in an array structure like that. In which case, what? a seperate collection for the user_logins??? which seems to go against the idea of an embedded document.
If you expect u and p to have always the same values on each insert (as in your example snippet), you might want to use the $addToSet operator on inserts to ensure the uniqueness of your public_id field. Otherwise I think it's quite difficult to make them unique across a whole collection not working with external maintenance or js functions.
If not, I would possibly store them in their own collection and use the public_id as _id field to ensure their cross-document uniqueness inside a collection. Maybe that would contradict the idea of embedded docs in a doc database, but according to different requirements I think that's negligible.
Furthermore there are some existing records being migrated from a mySQL DB that dont have records, and will therefore all be replaced by null values in mongo.
So you want to apply a unique index on a data set that's not truly unique. I think this is just a modeling problem.
If logins.public_id is null that's going to violate your uniqueness constraint, then just don't write it at all:
{
logins: [
{ u: 'a', p: 'b' },
{ u: 'x', p: 'y' }
]
}
Thanks all.
In the end I opted to seperate this into 2 collections, one for users and one for logins.
users this looked a little like..
userDocument = {
...
logins: [
DBRef('loginsCollection', loginDocument._id),
DBRef('loginsCollection', loginDocument2._id),
]
}
loginDocument = {
...
user: new DBRef('userCollection', userDocument ._id)
}
Although not what i was originally after (a single collection) It is working niocely and by utilising the MongoId uniquness there is a constraint now built in at a database level and not implemented at the application level.