Embedded and references in a data model - mongodb

Embedded and references in a data model - mongodb - mongodb

I want to create a mongodb database, and use embedded structur. For exemple, consider that the size of each document of the persons 's collection is 16MB. It means that i can not add the sub-document contacts in the person's collection.
1- In this case what should i do ?
2- If i create the collection of contact, it will be an obligation to reference to the a person.
Can we have embeded and reference stuctur in a mongodb database ?
Thank you.
{
nom:'Kox',
prenom:'Karl',
gender:'M',
addres:
{
rue: '123 Fake Street',
appt:108,
city:'mycity',
zip_code:'GGG23'
},
class:
{
name:'CLASS ONE',
group:'C',
section:'SECTION ONE'
}
}

One of the strengths of MongoDB is the flexible schema.
You can certainly have contacts embedded for some of person documents, referenced for others, or a single person document that has some of its contacts embedded and some referenced.
One possible use of this is to have recently or frequently used contacts embedded for quick access (similar to a per-person cache) and all contacts available for lookup via reference.
The natural extension of this is that if a person's entire contact list fits within the person document, there is never a need to do a separate contact lookup for that person.
The tradeoff is:
A referenced approach allows contact lists to be arbitrarily large, but requires a separate contact lookup aside from the person lookup.
An embedded approach reduces the load on the database server by requiring only 1 lookup for both person and contacts, but limits the size of the person + contact list to 16MB.
A hybrid embedded/referenced approach requires a bit more complexity in the application code, but provides a reduction in query load on the database server while still allowing a contact list to be extremely large.

Related

Performance difference between storing the asset as subdocument vs single document in Mongoose

I have an API for synchronizing contacts from the user's phone to our database. The controller essentially iterates the data sent in the request body and if it passes validation a new contact is saved:
const contact = new Contact({ phoneNumber, name, surname, owner });
await contact.save();
Having a DB with 100 IOPS and considering the average user has around 300 contacts, when the server is busy this API takes a lot of time.
Since the frontend client is made in a way that a contact ID is necessary for other operations (edit, delete), I was thinking about changing the data structure to subdocuments, and instead of saving each Contact as a separate document, the idea is to save one document with many contacts inside:
const userContacts = new mongoose.Schema({
owner: //the id of the contacts owner,
contacts: [new mongoose.Schema({
name: { type: String },
phone: { type: String }
})]
});
This way I have to do just one save. But since Mongo has to generate an ID for each subdocument, is this really that much faster than the original approach?

Summary
This really depends on your exact usage scenarios:
are contacts often updated?
what is the max / average quantity of contacts per user
are they ever partially loaded, or are they always fetched all together?
But for a fairly common collection such as contacts, I would not recommend storing them in subdocuments.
Instead you should be able to use insertMany for your initial sync scenario.
Explanation
Storing as subdocuments makes a bulk-write easier will make querying and updating contacts slower and more awkward than as regular documents.
For example, if I have 100 contacts, and I want to view and edit 1 of them, it needs to load the full 100 contacts. I can make the change via a partial update using $set or $update, so the update will be OK. But when I add a new contact, I will have to add a new contact subDocument to you Contacts document. This makes it a growing document, meaning your database will suffer from fragmentation which can slow things down a lot (see this answer)
You will have to use aggregate with $ projection or $unwind to search through contacts in MongoDB. If you want to apply a specific sort order, this too would have to be done via aggregate or in code.
Matching via projection can also lead to problems with duplicate contacts being difficult to find.
And this won't scale. What if you get users with 1000s of contacts later? Then this single document will grow large and querying it will become very slow.
Alternatives
If your contacts for sync are in the 100s, you might get away with a splitting them into groups of ~50-100 and calling insertMany for each batch.
If they grow into the thousands, then I would suggest uploading all contacts, saving them as JSON / CSV files to disk, then slowly processing these in the background in batches.

MongoDB Data Model Design for Meteor.js App

I'm not much of a backend guy and even worse when it comes to MongoDB, however, I've been taken with Meteor.js so I'm giving it a try as I play around.
I'm creating a project management/ticketing app and would like your opinion on the data model design. In my app you create a ticket, assign other team members to the ticket and allow people to access it and manipulate the data like a todo list, attachments, comments, etc. Pretty basic.
From my research, it appears that a normalized data model with references makes sense. In that case, is a good model:
A collection for all my users.
A collection for tickets (each ticket/project its own document) with a field for team members in which I insert them into an array using a reference. Then I'd have fields for comments, todos, etc.
Or would this be best:
A collection for all my users.
A unique collection for each ticket with a field for team members kept in an array.
Sorry if this seems rather basic. I'm taking the MongoDB University classes for Node, so I hope I don't have to rely on too many basic questions for too long.
Thanks everyone!

You should store each ticket/project in its own document in a single collection (the first option).
If you give each ticket its own collection you have no effective way to index and query tickets.

Links vs References in Document databases

I am confused with the term 'link' for connecting documents
In OrientDB page http://www.orientechnologies.com/orientdb-vs-mongodb/ it states that they use links to connect documents, while in MongoDB documents are embedded.
Since in MongoDB http://docs.mongodb.org/manual/core/data-modeling-introduction/, documents can be referenced as well, I can not get the difference between linking documents or referencing them.

The goal of Document Oriented databases is to reduce "Impedance Mismatch" which is the degree to which data is split up to match some sort of database schema from the actual objects residing in memory at runtime. By using a document, the entire object is serialized to disk without the need to split things up across multiple tables and join them back together when retrieved.
That being said, a linked document is the same as a referenced document. They are simply two ways of saying the same thing. How those links are resolved at query time vary from one database implementation to another.
That being said, an embedded document is simply the act of storing an object type that somehow relates to a parent type, inside the parent. For example, I have a class as follows:
class User
{
string Name
List<Achievement> Achievements
}
Where Achievement is an arbitrary class (its contents don't matter for this example).
If I were to save this using linked documents, I would save User in a Users collection and Achievement in an Achievements collection with the List of Achievements for the user being links to the Achievement objects in the Achievements collection. This requires some sort of joining procedure to happen in the database engine itself. However, if you use embedded documents, you would simply save User in a Users collection where Achievements is inside the User document.
A JSON representation of the data for an embedded document would look (roughly) like this:
{
"name":"John Q Taxpayer",
"achievements":
[
{
"name":"High Score",
"point":10000
},
{
"name":"Low Score",
"point":-10000
}
]
}
Whereas a linked document might look something like this:
{
"name":"John Q Taxpayer",
"achievements":
[
"somelink1", "somelink2"
]
}
Inside an Achievements Collection
{
"somelink1":
{
"name":"High Score",
"point":10000
}
"somelink2":
{
"name":"High Score",
"point":10000
}
}
Keep in mind these are just approximate representations.
So to summarize, linked documents function much like RDBMS PK/FK relationships. This allows multiple documents in one collection to reference a single document in another collection, which can help with deduplication of data stored. However it adds a layer of complexity requiring the database engine to make multiple disk I/O calls to form the final document to be returned to user code. An embedded document more closely matches the object in memory, this reduces Impedance Mismatch and (in theory) reduces the number of disk I/O calls.
You can read up on Impedance Mismatch here: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch
UPDATE
I should add, that choosing the right database to implement for your needs is very important from the start. If you have a lot of questions about each database, it might make sense to contact each supplier and get some of their training material. MongoDB offers 2 free courses you can take to learn more about their product and best uses at MongoDB University. OrientDB does offer training, however it is not free. It might be best to try contacting them directly and getting some sort of pre-sales training (if you are looking to license the db), usually they will put you in touch with some sort of pre-sales consultant to help you evaluate their product.

MongoDB works like RDBMS where the object id is like a foreign key. This means a "JOIN" that is run-time expensive. OrientDB, instead, has direct links that are created only once and have a very low run-time cost.

Retrieve records in mongoDB using bidirectional query

I have two collections - Tickets and Users. Where a user can have one to many tickets. The ticket collection is defined as follows
Ticket = {_id, ownerId, profile: {name}}
The ownerId is used to find all tickets that belong to a specific person. I need to write a query that gets me all users with no tickets.
How can i write this query without having to loop through all users, checking if the userID shows up in any Tickets?
Would a bidirectional storage cause me any performance problems ? For example, if i were to change my users collection and add an array of tickets: [ticketID, ticketID2, ...]?

I'd go with the array of tickets being stored in users. As far as I know, Mongo doesn't really have a way to query one collection based on the (lack of) elements in another collection. With the array, though, you can simply do db.users.find({tickets:[]}).

How do document databases deal with changing relationships between objects (or do they at all)?

Say, at the beginning of a project, I want to store a collection of Companies, and within each company, a collection of Employees.
Since I'm using a document database (such as MongoDB), my structure might look something like this:
+ Customers[]
+--Customer
+--Employees[]
+--Employee
+--Employee
+--Customer
+--Employees[]
+--Employee
What happens if, later down the track, a new requirement is to have some Employees work at multiple Companies?
How does one manage this kind of change in a document database?
Doesn't the simplicity of a document database become your worse enemy, since it creates brittle data structures which can't easily be modified?
In the example above, I'd have to run modify scripts to create a new 'Employees' collection, and move every employee into that collection, while maintaining some sort of relationship key (e.g. a CompanyID on each employee).
If I did the above thoroughly enough, I'd end up with many collections, and very little hierarchy, and documents being joined by means of keys.
In that case, am I still using the document database as I should be?
Isn't it becoming more like a relational database?

Speaking about MongoDB specifically...because the database doesn't enforce any relationships like a relational database, you're on the hook for maintaining any sort of data integrity such as this. It's wonderfully helpful in many cases, but you end up writing more application code to handle these sorts of things.
Having said all of that, they key to using a system like MongoDB is modeling your data to fit MongoDB. What you have above makes complete sense if you're using MySQL...using Mongo you'd absolutely get in trouble if you structure your data like it's a relational database.
If you have Employees who can work at one or more Companies, I would structure it as:
// company records
{ _id: 12345, name : 'Apple' }
{ _id: 55555, name : 'Pixar' }
{ _id: 67890, name : 'Microsoft' }
// employees
{ _id : ObjectId('abc123'), name : "Steve Jobs", companies : [ 12345, 55555 ] }
{ _id : ObjectId('abc456'), name : "Steve Ballmer", companies : [ 67890 ] }
You'd add an index on employees.companies, which would make is very fast to get all of the employees who work for a given company...regardless of how many companies they work for. Maintaining a short list of companies per employee will be much easier than maintaining a large list of employees for a company. To get all of the data for a company and all of it's employees would be two (fast) queries.
Doesn't the simplicity of a document
database become your worse enemy,
since it creates brittle data
structures which can't easily be
modified?
The simplicity can bite you, but it's very easy to update and change at a later time. You can script changes via Javascript and run them via the Mongo shell.

My recent answer for this question covers this in the RavenDb context:
How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse