MongoDB - how to properly model relations - mongodb

Let's assume we have the following collections:
Users
{
"id": MongoId,
"username": "jsloth",
"first_name": "John",
"last_name": "Sloth",
"display_name": "John Sloth"
}
Places
{
"id": MongoId,
"name": "Conference Room",
"description": "Some longer description of this place"
}
Meetings
{
"id": MongoId,
"name": "Very important meeting",
"place": <?>,
"timestamp": "1506493396",
"created_by": <?>
}
Later on, we want to return (e.g. from REST webservice) list of upcoming events like this:
[
{
"id": MongoId(Meetings),
"name": "Very important meeting",
"created_by": {
"id": MongoId(Users),
"display_name": "John Sloth",
},
"place": {
"id": MongoId(Places),
"name": "Conference Room",
}
},
...
]
It's important to return basic information that need to be displayed on the main page in web ui (so no additional calls are needed to render the table). That's why, each entry contains display_name of the user who created it and name of the place. I think that's a pretty common scenario.
Now my question is: how should I store this information in db (question mark values in Metting document)? I see 2 options:
1) Store references to other collections:
place: MongoId(Places)
(+) data is always consistent
(-) additional calls to db have to be made in order to construct the response
2) Denormalize data:
"place": {
"id": MongoId(Places),
"name": "Conference room",
}
(+) no need for additional calls (response can be constructed based on one document)
(-) data must be updated each time related documents are modified
What is the proper way of dealing with such scenario?
If I use option 1), how should I query other documents? Asking about each related document separately seems like an overkill. How about getting last 20 meetings, aggregate the list of related documents and then perform a query like db.users.find({_id: { $in: <id list> }})?
If I go for option 2), how should I keep the data in sync?
Thanks in advance for any advice!

You can keep the DB model you already have and still only do a single query as MongoDB introduced the $lookup aggregation in version 3.2. It is similar to join in RDBMS.
$lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
So instead of storing a reference to other collections, just store the document ID.

Related

MongoDB - Rename keys in different collections with different structures

I am trying to refactor all the keys over a bunch of collections that match a certain regex or have the same name. The issue is that in different documents or collections, the keys in question may appear at different nesting levels in different locations.
For example, let's say we need to replace the key "sound" to "noise" in the following:
Animals collection:
{
"_id": "4ebb8fd68f7aaffc5d287383",
"animal": "cat",
"name": "fluffy",
"type": "long-haired",
"sound": "meow"
}
Events collection:
{
"_id": "4ebb8fd68f7abac75d287341",
"event": "thunder",
"description": {
"type": "natural",
"sound": "boom"
}
}
How would you go about doing it? Via raw mongo queries ideally, or pymongo if necessary

Which is the best design for a MongoDB database model?

I feel like the MVP of my current database needs some design changes. The number of users is growing quite fast and we are having bad performances in some requests. I also want to get rid of all the DBRef we used.
Our current model can be summarized as follow :
A company can have multiple employees (thousands)
A company can have multiple teams (hundreds)
An employee can be part of a team
A company can have multiple devices (thousands)
An employee is affected to multiple devices
Our application displays in different pages :
The company data
The users
The devices
The teams
I guess I have different options, but I'm not familiar enough with MongoDB to make the best decision.
Option 1
Do not embed and use list of ids for one to many relationships.
// Company document
{
"companyName": "ACME",
"users": [ObjectId(user1), ObjectId(user2)],
"teams": [ObjectId(team1), ObjectId(team2)],
"devices": [ObjectId(device1), ObjectId(device2)]
}
// User Document
{
"userName": "Foo",
"devices": [ObjectId(device2)]
}
// Team Document
{
"teamName": "Foo",
"users": [ObjectId(user1)]
}
// Device Document
{
"deviceName": "Foo"
}
Option 2
Embed data and duplicate informations.
// User Document
{
"companyName": "ACME",
"userName": "Foo",
"team": {
"teamName": "Foo"
},
"device": {
"deviceName": "Foo"
}
}
// Team Document
{
"teamName": "Foo"
"companyName": "ACME",
"users": [
{
"userName": "Foo"
}
]
}
// Device Document
{
"deviceName": "Foo",
"companyName": "ACME",
"user": {
"userName": "Foo"
}
}
Option 3
Do not embed and use id for one to one relationship.
// Company document
{
"companyName": "ACME"
}
// User Document
{
"userName": "Foo",
"company": ObjectId(company),
"team": ObjectId(team1)
}
// Team Document
{
"teamName": "Foo",
"company": ObjectId(company)
}
// Device Document
{
"deviceName": "Foo",
"company": ObjectId(company),
"user": ObjectId(user1)
}
MongoDB recommends to embed data as much as possible but I don't think it can be possible to embed all data in the company document. A company can have multiple devices or users and I believe it can grow too big.
I'm switching from SQL to NoSQL and I think I haven't figured it out by myself yet !
Thanks !
MongodB provides you with a feature which is handling unstructured data.
Every database can contain collection which in turn can contain documents.
Moreover, you cannot use joins in mongodB. So, storing information in one company model is a better choice because you wont be needed join in that scenario.
One more thing, You dont need to embed all the models For example : You can get user and device both from company table, so why embedding users and device as well?

Generate a JSON schema from an existing MongoDB collection

I have a MongoDB collection that contains a lot of documents. They are all roughly in the same format, though some of them are missing some properties while others are missing other properties. So for example:
[
{
"_id": "SKU14221",
"title": "Some Product",
"description": "Product Description",
"salesPrice": 19.99,
"specialPrice": 17.99,
"marketPrice": 22.99,
"puchasePrice": 12,
"currency": "USD",
"color": "red",
},
{
"_id": "SKU14222",
"title": "Another Product",
"description": "Product Description",
"salesPrice": 29.99,
"currency": "USD",
"size": "40",
}
]
I would like to automatically generate a schema from the collection. Ideally it would not which properties are present in all the documents and mark those as required. Detecting unique columns would also be nice, though not really all that necessary. In any event I would be modifying the schema after it's automatically generated.
I've noticed that there are tools that can do this for JSON. But short of downloading the entire collection as JSON, is it possible to do this using the MongoDb console or a CLI tool directly from the collection?
You could try this tool out. It appears to do exactly what you want.
Extract (and visualize) schema from Mongo database, including foreign
keys. Output is simple json file or html with dagre/d3.js diagram
(depending on command line options).
https://www.npmjs.com/package/extract-mongo-schema

Only query field if value has been passed from API in MongoDB

I have created an API in Mule and have a number of queryParameters specified in the RAML file that I would like to use to query my store MongoDB collection.
A snippet of the collection looks like this:
{
"stores": [{
"storeId": 1234,
"storeName": "Shop Around Ltd",
"opens": "09:00:00.000Z",
"closes": "17:00:00.000Z",
"departments": ["clothing", "computers", "toys", "kitchen and home"],
"address": {
"street": "street1",
"city": "New York",
"state": "New York",
"zipCode": "10002"
}
}]
}
My problem is that the query parameters used to query the collection may be different each time so how can I write a dynamic query for MongoDB so it only queries on the fields that have been passed in with values instead of writing a query for each combination of query parameters? E.g. I may want to query based on zip code for one query so all the other fields will be blank and the next time I may want to query on the departments and whether the store will be open at 11am.
The query will be called from Mule.
Thanks

MongoDB manual reference

In MongoDB there is a nice tutorial on Manual Reference:
Example:
original_id = ObjectId()
db.places.insert({
"_id": original_id,
"name": "Broadway Center",
"url": "bc.example.net"
})
db.people.insert({
"name": "Erin",
"places_id": original_id,
"url": "bc.example.net/Erin"
})
Note, that this two documents are created at same time.
Question. I need to reference a Customer with an Order, therefore the Customer has been created long time before the Order, how can I add this reference of customer and insert in Order?
Wouldn't you need to do a findOne() on your customer first and add it's id property to your new document?