MongoDB design speculation - mongodb

I am digging the MongoDB related questions/answers but one thing is still not obvious.
Let's consider the following Product collection:
{
"manufacturer": "Man1",
"model": "Model1"
}
Let's say we have 1.000.000 products, and I would like to create
a dropdown of manufacturers (which would be max 50 options).
In this case every time I have to use the .distinct() function on that huge product collection.
Is this the right way to do it ?
I am a bit concerned about the performance.
Or should I create a separate collection for manufacturers and keep it synced ?
UPDATE
Thanks for all the answers, I still considering them.
And what if I do the following:
Manufacturer:
{
"name": "Man1",
"models": [
{
"name": "Model1",
"products": [Product1, Product2]
}
],
}
and Product
{
"manufcturer": "Man1",
"model": "Model1"
"manufacturer_id": Manufacturer1,
"model_id", Model1
}

First. If you've large number of records, you'd never want to load all the data in just one request to populate the list, dropdown or whatever it is. Rather, implementing something like load more options suits more. Just like pagination.
And you can manage to get like 20,40 records per request and do any optimization on those small chunk of data.

You can create a separate collection for manufacturers. You just have to keep it in sync, after every addition/update/deletion of product from products collection.

I think you can think of designing your Product collections like this:
{
"manufacturer": "Man1",
"model": "Model1"
"Product" :[Product1,Product2]
}
And having an index on "manufacturer" will optimize your query to get list of manufacturers

Related

Mongoose model data deeper than 1 level

I'm trying to design my schema in Mongoose which contains the following collections:
Customers
Expenses
Categories
Instead of embedding the expenses within the customer, I have decided to create its own collection as a customer could potentially have thousands of expenses.
My first question is, is that the best way to model this? i.e. In the expenses collection, would it be best to have a single document for each expense that has a reference to the customer it belongs to?
My second question is about modeling the expenses.
I want to have something like this (oversimplified it here):
{
"_id": Object,
"customer_ref": Object,
"details": {
"amount": 20.00,
"description": "An expense"
},
"date_created": "April 5th..."
}
I have been able to create schema models where the data is one level deep but I'm not sure how you achieve something like the above where I have nested data.
Is mongoose designed to only deal with one level deep data, so I should avoid nesting altogether?
If not how do I achieve something like the above?

Best way to represent multilingual database on mongodb

I have a MySQL database to support a multilingual website where the data is represented as the following:
table1
id
is_active
created
table1_lang
table1_id
name
surname
address
What's the best way to achieve the same on mongo database?
You can either design a schema where you can reference or embed documents. Let's look at the first option of embedded documents. With you above application, you might store the information in a document as follows:
// db.table1 schema
{
"_id": 3, // table1_id
"is_active": true,
"created": ISODate("2015-04-07T16:00:30.798Z"),
"lang": [
{
"name": "foo",
"surname": "bar",
"address": "xxx"
},
{
"name": "abc",
"surname": "def",
"address": "xyz"
}
]
}
In the example schema above, you would have essentially embedded the table1_lang information within the main table1document. This design has its merits, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. If your application frequently accesses table1 information along with the table1_lang data then you'll almost certainly want to go the embedded route. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a document which has a lang key "name" with value "foo", this can be done with one single (atomic) operation:
db.table.remove({"lang.name": "foo"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents where you follow a normalized schema. For example:
// db.table1 schema
{
"_id": 3
"is_active": true
"created": ISODate("2015-04-07T16:00:30.798Z")
}
// db.table1_lang schema
/*
1
*/
{
"_id": 1,
"table1_id": 3,
"name": "foo",
"surname": "bar",
"address": "xxx"
}
/*
2
*/
{
"_id": 2,
"table1_id": 3,
"name": "abc",
"surname": "def",
"address": "xyz"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child table1_lang documents for the main parent entity table1 with id 3 will be straightforward, simply create a query against the collection table1_lang:
db.table1_lang.find({"table1_id": 3});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of table_lang documents per give table entity, embedding has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland

How to Model a "likes" voting system with MongoDB

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

MongoDB design for scalability

We want to design a scalable database. If we have N users with 1 Billion user responses, from the 2 options below which will be a good design? We would want to query based on userID as well as Reponse ID.
Having 2 Collections one for the user information and another to store the responses along with user ID. Each response is stored as a document so we will have 1 billion documents.
User Collection
{
"userid" : "userid1",
"password" : "xyz",
,
"City" : "New York",
},
{
"userid" : "userid2",
"password" : "abc",
,
"City" : "New York",
}
responses Collection
{
"userid": "userid1",
"responseID": "responseID1",
"response" : "xyz"
},
{
"userid": "userid1",
"responseID": "responseID2",
"response" : "abc"
},
{
"userid": "userid2",
"responseID": "responseID3",
"response" : "mno"
}
Having 1 Collection to store both the information as below. Each response is represented by a new key (responseIDX).
{
"userid" : "userid1",
"responseID1" : "xyz",
"responseID2" : "abc",
,
"responseN"; "mno",
"city" : "New York"
}
If you use your first options, I'd use a relational database (like MySQL) opposed to MongoDB. If you're heartfelt on MongoDB, use it to your advantage.
{
"userId": n,
"city": "foo"
"responses": {
"responseId1": "response message 1",
"responseId2": "response message 2"
}
}
As for which would render a better performance, run a few benchmark tests.
Between the two options you've listed - I would think using a separate collection would scale better - or possibly a combination of a separate collection and still using embedded documents.
Embedded documents can be a boon to your schema design - but do not work as well when you have an endlessly growing set of embedded documents (responses, in your case). This is because of document growth - as the document grows - and outgrows the allocated amount of space for it on disk, MongoDB must move that document to a new location to accommodate the new document size. That can be expensive and have severe performance penalties when it happens often or in high concurrency environments.
Also, querying on those embedded documents can become troublesome when you are looking to selectively return only a subset of responses, especially across users. As in - you can not return only the matching embedded documents. Using the positional operator, it is possible to get the first matching embedded document however.
So, I would recommend using a separate collection for the responses.
Though, as mentioned above, I would also suggest experimenting with other ways to group those responses in that collection. A document per day, per user, per ...whatever other dimensions you might have, etc.
Group them in ways that allow multiple embedded documents and compliments how you would query for them. If you can find the sweet spot between still using embedded documents in that collection and minimizing document growth, you'll have fewer overall documents and smaller index sizes. Obviously this requires benchmarking and testing, as the same caveats listed above can apply.
Lastly (and optionally), with that type of data set, consider using increment counters where you can on the front end to supply any type of aggregated reporting you might need down the road. Though the Aggregation Framework in MongoDB is great - having, say, the total response count for a user pre-aggregated is far more convenient then trying to get a count by running a aggregate query on the full dataset.

Mongodb architecture required [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am creating a project that requires data storage and I am considering useing MongoDB but am having trouble finding the logical / optimal way of organising the data
My simplified data needs to store Place information like so:
{place_city : "London",
place_owner: "Tim",
place_name: "Big Ben"}
{place_city : "Paris",
place_owner: "Tim",
place_name: "Eifel Tower"}
{place_city : "Paris",
place_owner: "Ben",
place_name: "The Louvre"}
And here are the main operations I need
Retrieve all my places
Retrieve all my friends places
Retrieve all my friends cities
If I use mongoDB a collection document max size is 16meg right? If that is correct then I can't store all the information in a PLACES similar to my example above right?
I would probably need to create a "OWNER" collection? like so:
{
owner: "Tim",
cities: [ {
name: "London",
places:[ {name:"Big Ben"}]
},
{
name: "Paris",
places:[ {name:"Eifel Tower"}, {name: "The Louvre"}]
}
]
}
but the problem now is that Retrieving my friends places becomes cumbersome and my friends cities even more so....
any advice from a cunning DB architect woudl be much appreciated.
The size limit of 16MB is per document, not per collection.
{place_city : "London", place_owner: "Tim", place_name: "Big Ben"}
is a very little document, so don't worry. The design of your collections depends heavily on how you query your data.
The data size limitation is per document and not per collection. Collections can easily become several GB (or even TB) large.
I would suggest you keep your data as simple as you have, like:
{place_city : "London",
place_owner: "Tim",
place_name: "Big Ben"}
{place_city : "Paris",
place_owner: "Tim",
place_name: "Eifel Tower"}
{place_city : "Paris",
place_owner: "Ben",
place_name: "The Louvre"}
I am thinking that friends are stored like this:
{
username: "Ben",
friends: [ "Tim", "Bob" ]
}
Then your three queries can be done as:
All your places: db.places.find( { place_owner: "Ben" } );
All your friends' places with two queries (pseudo code):
friends = db.friends.find( { username: "Ben" } );
// friends = [ "Tim", "Bob" ], you do need to do some code to make this change
db.places.find( { place_owner: { $in: [ "Tim", "Bob" ] } } );
All your friends' cities with two queries (pseudo code):
friends = db.friends.find( { username: "Ben" } );
db.so.distinct( 'name', { place_owner: { $in: [ "Tim", "Bob" ] } } );
Even with millions of documents, this should work fine, providing you have an index on the fields that you query for: { place_owner: 1 } and { username: 1 }.
I love MongoDB, but this data is not a good candidate for MongoDB. MongoDB does not support relationships, and that is basically all you are tracking here. Use a relational database to store the relationships.
Think of it like this: under the skin of the DBMS, MongoDB or SQL, an index is an index, a table is a table (basically). You get more performance from MongoDB not because it does the same things faster, but because you are able to use it to get your DB server to do less. (E.g. pull an entire document containing nested arrays and subdocs instead of joining a bunch of tables together). There are some fundamental differences in the way MongoDB handles updates, but for querying simple data sets most systems are going to be relatively similar. One big difference between the two rooted in the way MongoDB works is that it cannot use data in a collection as parameters for a another query, which is basically the whole point of a relational database. Since 2 of your use cases require "joins" (to "all my friends"), you need two queries.
So what you're doing with two queries is the same thing as a join, except relational databases are optimized to do this extremely efficiently; I promise you it will be slower to do this join manually, plus you're sending all data (friends' IDs) over the wire and making an extra DB connection. Now, if you could store all your friends' cities and places in a single document, MongoDB will probably be (slightly) faster than joining, but now you've got a new problem, because you now have to manage all this, anytime anyone adds a city or place all their friends have to be modified--this is unrealistic.
But there is even more to the story than that, because SQL DBMS are extremely mature applications with lots of features to improve query performance. They let you do things like create "materialize views" that store all your friends cities and places in memory and update themselves automatically any time one of their source tables is updated so you don't have to do anything special, you'd just query and you'd get your data without actually executing any joins. (A materialized table is not the right fit here, but just as an example, it is possible if you needed it.)
ALSO, when you are using MongoDB, a guideline I've found helpful is this, anytime you are asking yourself whether your document will be large enough to store all the data you will EVENTUALLY have, you probably have a design problem on your hands. If a document's growth is unbound, it should probably be enumerated within a collection instead. Or put another way, your collections should grow as your application is used, not your document's size (much).
If breaking apart your schema like this means that for primary operations you are doing a lot of manual joins, it is worth considering the question of whether or not you should be using a relational database instead.