Which is the best design for a MongoDB database model? - mongodb

I feel like the MVP of my current database needs some design changes. The number of users is growing quite fast and we are having bad performances in some requests. I also want to get rid of all the DBRef we used.
Our current model can be summarized as follow :
A company can have multiple employees (thousands)
A company can have multiple teams (hundreds)
An employee can be part of a team
A company can have multiple devices (thousands)
An employee is affected to multiple devices
Our application displays in different pages :
The company data
The users
The devices
The teams
I guess I have different options, but I'm not familiar enough with MongoDB to make the best decision.
Option 1
Do not embed and use list of ids for one to many relationships.
// Company document
{
"companyName": "ACME",
"users": [ObjectId(user1), ObjectId(user2)],
"teams": [ObjectId(team1), ObjectId(team2)],
"devices": [ObjectId(device1), ObjectId(device2)]
}
// User Document
{
"userName": "Foo",
"devices": [ObjectId(device2)]
}
// Team Document
{
"teamName": "Foo",
"users": [ObjectId(user1)]
}
// Device Document
{
"deviceName": "Foo"
}
Option 2
Embed data and duplicate informations.
// User Document
{
"companyName": "ACME",
"userName": "Foo",
"team": {
"teamName": "Foo"
},
"device": {
"deviceName": "Foo"
}
}
// Team Document
{
"teamName": "Foo"
"companyName": "ACME",
"users": [
{
"userName": "Foo"
}
]
}
// Device Document
{
"deviceName": "Foo",
"companyName": "ACME",
"user": {
"userName": "Foo"
}
}
Option 3
Do not embed and use id for one to one relationship.
// Company document
{
"companyName": "ACME"
}
// User Document
{
"userName": "Foo",
"company": ObjectId(company),
"team": ObjectId(team1)
}
// Team Document
{
"teamName": "Foo",
"company": ObjectId(company)
}
// Device Document
{
"deviceName": "Foo",
"company": ObjectId(company),
"user": ObjectId(user1)
}
MongoDB recommends to embed data as much as possible but I don't think it can be possible to embed all data in the company document. A company can have multiple devices or users and I believe it can grow too big.
I'm switching from SQL to NoSQL and I think I haven't figured it out by myself yet !
Thanks !

MongodB provides you with a feature which is handling unstructured data.
Every database can contain collection which in turn can contain documents.
Moreover, you cannot use joins in mongodB. So, storing information in one company model is a better choice because you wont be needed join in that scenario.
One more thing, You dont need to embed all the models For example : You can get user and device both from company table, so why embedding users and device as well?

Related

what is the best model to save data on elasticsearch?

I have a rails application and use elastic search as a search engine in my rails app. this app collects data from the mobile application and could collect from any kind of mobile app. mobile app sends two types of data user profile details and user actions details. my app admins could search over this data with multiple conditions and operations and fetch the specific results and which are user profile details. after that my app admins could communicate with this profile, for example, send an email, SMS, or even chat online. In my case I have two options to save user data; first of all, I want to save user profiles details and user action details in a separate document with this structure profile doc:
POST profilee-2022-06-09/_doc
{
"profile": {
"app_id": "abbccddeeff",
"profile_id": "2faae1d6-5875-4b36-b119-74a14589c841",
"whatsapp_number": "whatsapp:+61478421940",
"phone": "+61478421940",
"email": "user#mail.com",
"first_name": "john",
"last_name": "doe"
}
}
user actions details:
POST events_app_id_2022-05-17/_doc
{
"app_id": "9vlgwrr6rg",
"event": "Email_Sign_Up",
"profile_id": "2faae1d6-5875-4b36-b119-74a14589c840",
"media": "x1z1",
"date_time": "2022-05-17T11:48:02.511Z",
"device_id": "2faae1d6-5875-4b36-b119-74a14589c840",
"lib": "android",
"lib_version": "1.0.0",
"os": "Android",
"os_version": "12",
"manufacturer": "Google",
"brand": "google",
"model": "sdk_gphone64_arm64",
"google_play_services": "available",
"screen_dpi": 440,
"screen_height": 2296,
"screen_width": 1080,
"app_version_string": "1.0",
"app_build_number": 1,
"has_nfc": false,
"has_telephone": true,
"carrier": "T-Mobile",
"wifi": true,
"bluetooth_version": "ble",
"session_id": "b1ad31ab-d440-435f-ac12-3d03c30ac44f",
"insert_id": "1e285b51-abcf-46ae-8359-9a9d58970cdf"
}
As I said before app admins search over this document to fetch specific profiles and use that result to communicate with them, in this case, the problem is the mobile user could create a profile and a few days or a few months later create some actions so user profile details and user action details are generated in different days so if app admins want to fetch specific result from this data and wrote some complex query I have at least two queries by application on my elastic search in my app it's impossible because each query must save for later use by admin, so As a result of business logic it's impossible to me, and I have to add in some case I need to implement join query that based on elastic search documentation It has cost so it's impossible In the second scenario I decided to save both user profile and action in one docs somethings like this:
POST profilee-2022-06-09/_doc
{
"profile": {
"app_id": "abbccddeeff",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"whatsapp_number": "whatsapp:+61478421940",
"phone": "+61478421940",
"email": "user#mail.com",
"first_name": "john",
"last_name": "doe",
"events": [
{
"app_id": "abbccddeeff",
"event": "sign_in",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"media": "x1z1",
"date_time": "2022-06-06T11:52:02.511Z"
},
{
"app_id": "abbccddeeff",
"event": "course_begin",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"media": "x1z1",
"date_time": "2022-06-06T11:56:02.511Z"
},
{
"app_id": "abbccddeeff",
"event": "payment",
"profile_id": "urm-2faae1d6-5875-4b36-b119-74a14589c841",
"media": "x1z1",
"date_time": "2022-06-06T11:58:02.511Z"
}
]
}
}
In this case, In the same state, I have to do as same as I do in before and I have to generate a profile index per day and append user action to it, so It means I have to update continuously each day, assume I have 100,000 profile and each one have 50 actions it means 100,000 * 50 per day update that have severity on my server so still it's impossible. So Could you please help me what is the best model to save my data in elastic search based on my descriptions?
Update: Does elastic search useful for my requirements? If I switch to other databases like MongoDB or add Hadoop it be more useful in my case?

MongoDB - Rename keys in different collections with different structures

I am trying to refactor all the keys over a bunch of collections that match a certain regex or have the same name. The issue is that in different documents or collections, the keys in question may appear at different nesting levels in different locations.
For example, let's say we need to replace the key "sound" to "noise" in the following:
Animals collection:
{
"_id": "4ebb8fd68f7aaffc5d287383",
"animal": "cat",
"name": "fluffy",
"type": "long-haired",
"sound": "meow"
}
Events collection:
{
"_id": "4ebb8fd68f7abac75d287341",
"event": "thunder",
"description": {
"type": "natural",
"sound": "boom"
}
}
How would you go about doing it? Via raw mongo queries ideally, or pymongo if necessary

MongoDB - how to properly model relations

Let's assume we have the following collections:
Users
{
"id": MongoId,
"username": "jsloth",
"first_name": "John",
"last_name": "Sloth",
"display_name": "John Sloth"
}
Places
{
"id": MongoId,
"name": "Conference Room",
"description": "Some longer description of this place"
}
Meetings
{
"id": MongoId,
"name": "Very important meeting",
"place": <?>,
"timestamp": "1506493396",
"created_by": <?>
}
Later on, we want to return (e.g. from REST webservice) list of upcoming events like this:
[
{
"id": MongoId(Meetings),
"name": "Very important meeting",
"created_by": {
"id": MongoId(Users),
"display_name": "John Sloth",
},
"place": {
"id": MongoId(Places),
"name": "Conference Room",
}
},
...
]
It's important to return basic information that need to be displayed on the main page in web ui (so no additional calls are needed to render the table). That's why, each entry contains display_name of the user who created it and name of the place. I think that's a pretty common scenario.
Now my question is: how should I store this information in db (question mark values in Metting document)? I see 2 options:
1) Store references to other collections:
place: MongoId(Places)
(+) data is always consistent
(-) additional calls to db have to be made in order to construct the response
2) Denormalize data:
"place": {
"id": MongoId(Places),
"name": "Conference room",
}
(+) no need for additional calls (response can be constructed based on one document)
(-) data must be updated each time related documents are modified
What is the proper way of dealing with such scenario?
If I use option 1), how should I query other documents? Asking about each related document separately seems like an overkill. How about getting last 20 meetings, aggregate the list of related documents and then perform a query like db.users.find({_id: { $in: <id list> }})?
If I go for option 2), how should I keep the data in sync?
Thanks in advance for any advice!
You can keep the DB model you already have and still only do a single query as MongoDB introduced the $lookup aggregation in version 3.2. It is similar to join in RDBMS.
$lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
So instead of storing a reference to other collections, just store the document ID.

Mongo - What is best design: store menu documents or menu item documents?

I want to store website menus in Mongo for the navigation of my CMS, but since I'm new to Mongo and the concept of documents, I'm trying to figure out what would be best:
a) Should I store menu documents, containing children and those having more children, or
b) Should I store menu item documents with parent_id and child_ids ?
Both would appear to have benefits, since in case A it's normal to load an entire menu at once as you'll need everything to display, but B might be easier to update single items?
I'm using Spring data mongo.
PS: If I asked this question in a wrong way, please let me know. I'm sure this question can be expanded to any general parent-child relationship, but I was having trouble finding the right words.
Since menus are typically going to be very small (under 16MB I hope) then the embedded form should give you the best performance:
{
"topItem1": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
],
"topItem2": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
{
"name": "sub-menu",
"type": "sub",
"items": [
{ "name": "item1", "link": "linkValue" },
{ "name": "item2", "link": "linkValue" }
}
}
]
}
The only possible issue there is with updating the content inside nested arrays, as MngoDB can only "match" the first found array index. See the positional $ operator documentation for this.
But as long as you know the positions then this should not be a problem, using "dot notation" concepts:
db.menu.update({}, {
"$set": {
"topItem2.2.items.1": { "name": "item3", "link": "linkValue" }
}
})
But general adding should be simple:
db.menu.update(
{ "topItem2.name": "sub-menu" },
{
"$push": {
"topItem2.2.items": { "name": "item4", "link": "linkValue" }
}
}
)
So that is a perspective on how to use the inherrent embedded structure rather than associate "parent" and "child" items.
After long hard thinking I believe I would use:
{
_id: {},
submenu1: [
{label: "Whatever", url: "http://localhost/whatever"}
]
}
I thought about using related documents with IDs all sitting in a collection but then you would have to shoot off multiple queries to get the parent and its range, possibly even sub-sub ranges too. With this structure you have only one query for all.
This structure is not infallible however, if you change your menu items regularly you might start to notice fragmentation. You can remedy this a little with powerof2sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes
But yes, with careful planning you should be able to use one single document for every parent menu item

mongodb best practice: nesting

Is this example of nesting generally accepted as good or bad practice (and why)?
A collection called users:
user
basic
name : value
url : value
contact
email
primary : value
secondary : value
address
en-gb
address : value
city : value
state : value
postalcode : value
country : value
es
address : value
city : value
state : value
postalcode : value
country : value
Edit: From the answers in this post I've updated the schema applying the following rules (the data is slightly different from above):
Nest, but only one level deep
Remove unneccesary keys
Make use of arrays to make objects more flexible
{
"_id": ObjectId("4d67965255541fa164000001"),
"name": {
"0": {
"name": "Joe Bloggs",
"il8n": "en"
}
},
"type": "musician",
"url": {
"0": {
"name": "joebloggs",
"il8n": "en"
}
},
"tags": {
"0": {
"name": "guitar",
"points": 3,
"il8n": "en"
}
},
"email": {
"0": {
"address": "joe.bloggs#example.com",
"name": "default",
"primary": 1,
"il8n": "en"
}
},
"updates": {
"0": {
"type": "news",
"il8n": "en"
}
},
"address": {
"0": {
"address": "1 Some street",
"city": "Somecity",
"state": "Somestate",
"postalcode": "SOM STR",
"country": "UK",
"lat": 49.4257641,
"lng": -0.0698241,
"primary": 1,
"il8n": "en"
}
},
"phone": {
"0": {
"number": "+44 (0)123 4567 890",
"name": "Home",
"primary": 1,
"il8n": "en"
},
"1": {
"number": "+44 (0)098 7654 321",
"name": "Mobile",
"il8n": "en"
}
}
}
Thanks!
In my opinion above schema not 'generally accepted', but looks like great. But i suggest some improvements thats will help you to query on your document in future:
User
Name
Url
Emails {email, emailType(primary, secondary)}
Addresses{address, city, state, postalcode, country, language}
Nesting is always good, but two or three level nesting deep can create additional troubles in quering/updating.
Hope my suggestions will help you make right choice of schema design.
You may want to take a look at schema design in MongoDB, and specifically the advice on embedding vs. references.
Embedding is preferred as "Data is then colocated on disk; client-server turnarounds to the database are eliminated". If the parent object is in RAM, then access to the nested objects will always be fast.
In my experience, I've never found any "best practices" for what a MongoDB record actually looks like. The question to really answer is, "Does this MongoDB schema allow me to do what I need to do?"
For example, if you had a list of addresses and needed to update one of them, it'd be a pain since you'd need to iterate through all of them or know which position a particular address was located. You're safe from that since there is a key-value for each address.
However, I'd say nix the basic and contact keys. What do these really give you? If you index name, it'd be basic.name rather than just name. AFAIK, there are some performance impacts to long vs. short key names.
Keep it simple enough to do what you need to do. Try something out and iterate on it...you won't get it right the first time, but the nice thing about mongo is that it's relatively easy to rework your schema as you go.
That is acceptable practice. There are some problems with nesting an array inside of an array. See SERVER-831 for one example. However, you don't seem to be using arrays in your collection at all.
Conversely, if you were to break this up into multiple collections, you would have to deal with a lack of transactions and the resulting race conditions in your data access code.