mongodb: Embedded only id or both id and name - mongodb

I'm new to mongodb, please suggest me how to correct design schema for situation like below:
I have User collection and Product collection. Product contain info like id, title, description, price... User can bookmark or like Product. Currently, in User collection, I'm store 1 array for liked products, and 1 array for bookmarked products. So when I need to view info about 1 user, I have to read out these 2 array, then search in Product collection to get title of liked and bookmarked products.
//User collection
{
_id : 12345,
name: "John",
liked: [123, 456, 789],
bkmark: [123, 125]
}
//Product collection
{
_id : 123,
title: "computer",
desc: "awesome computer",
price: 12
}
Now I think I can speed up this process by embedded both product id and title in User collection, so that I don't have to search in Product collection, just read it out and display. But if I choose this way, whenever Product's title get updated, I have to search and update in User collection too. I can't evaluate update cost in 2nd way, so I don't know which way is correct. Please help me to choose between them.
Thanks & Regards.

You should consider what happens more often: A product gets renamed or the information of a user is requested.
You should also consider what's a bigger problem: Some time lag in which users see an outdated product name (we are talking about seconds, maybe minutes when you have a really large number of users) or always a longer response time when requesting a user profile.
Without knowing your actual usage patterns and requirements, I would guess that it's the latter in both cases, so you should rather optimize for this situation.
In general it is not recommended to normalize a MongoDB as radical as you would normalize a relational database. The reason is that MongoDB can not perform JOINs. So it's usually not such a bad idea to duplicate some relevant information in multiple documents, while accepting a higher cost for updates and a potential risk of inconsistencies.

Related

Schema on mongodb for reducing API calls with two collections

Not quite sure what the best practice is if I have two collections, a user collection and a picture collection - I do not want to embed all my pictures into my user collection.
My client searches for pictures under a certain criteria. Let's say he gets 50 pictures back from the search (i.e. one single mongodb query). Each picture is associated to one user. I want the user name displayed as well. I assume there is no way to do a single search performance wise on the user collection returning the names of each user for each picture, i.e. I would have to do 50 searches. Which means, I could only avoid this extra performance load by duplicating data (next to the user_id, also the user_name) in my pictures collection?
Same question the other way around. If my client searches for users and say 50 users are returned from the search through one single query. If I want the last associated picture + title also displayed next to the user data, I would again have to add that to the users collection, otherwise I assume I need to do 50 queries to return the picture data?
Lets say the schema for your picture collection is as such:
Picture Document
{
_id: Objectid(123),
url: 'img1.jpg',
title: 'img_one',
userId: Objectid(342)
}
1) Your picture query will return documents that look like the above. You don't have to make 50 calls to get the user associated with the images. You can simply make 1 other query to the Users Collection using the user ids taken from the picture documents like such:
db.users.find({_id: {$in[userid_1,user_id2,userid_3,...,userid_n]}})
You will receive an array of user documents with the user information. You'll have to handle their display on the client afterwards. At most you'll need 2 calls.
Alternatively
You could design the schema as such:
Picture Document
{
_id: Objectid(123),
url: 'img1.jpg',
title: 'img_one',
userId: Objectid(342),
user_name:"user associated"
}
If you design it this way. You would only require 1 call, but the username won't be in sync with user collection documents. For example lets say a user changes their name. A picture that was saved before may have the old user name.
2) You could design your User Collection as such:
User Document
{
_id: Objectid(342),
name: "Steve jobs",
last_assoc_img: {
img_id: Object(342)
url: 'img_one',
title: 'last image title
}
}
You could use the same principles as mentioned above.
Assuming that you have a user id associated with every user and you're also storing that id in the picture document, then your user <=> picture is a loosely coupled relationship.
In order to not have to make 50 separate calls, you can use the $in operator given that you are able to pull out those ids and put them into a list to run the second query. Your query will basically be in English: "Look at the collection, if it's in the list of ids, give it back to me."
If you intend on doing this a lot and intend for it to scale, I'd either recommend using a relational database or a NoSQL database that can handle joins to not force you into an embedded document schema.

MongoDB document model size/performance limits? A collection with an object that possibly houses 100k+ names?

I'm trying to build an event website that will host videos and such. I've set up a collection with the event name, event description, and an object with some friendly info of people "attending". If things go well there might be 100-200k people attending, and those people should have access to whoever else is in the event. (clicking on the friendly name will find the user's id and subsequently their full profile) Is that asking too much of mongo? Or is there a better way to go about doing something like that? It seems like that could get rather large rather quick.
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :{
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url},
{'username': user's friendly name, 'avatarlink': avatar url}
}
}
Thanks for the suggestions!
In MongoDB many-to-many modeling (or one-to-many) in general, you should take a different approach depending if the many are few (up to few dozens usually) or "really" many as in your case.
It will be better for you not to use embedding in your case, and instead normalize. If you embed users in your events collection, adding attendees to a certain event will increase the array size. Since documents are updated in-place, if the document can't fit it's disk size, it will have to moved on disk, a very expensive operation which will also cause fragmentation. There are few techniques to deal with moves, but none is ideal.
Having a array of ObjectId as attendees will be better in that documents will grow much less dramatically, but still issue few problems. How will you find all events user has participated in? You can have a multi-key index for attendees, but once a certain document moves, the index will have to be updated per each user entry (the index contains a pointer to the document place on disk). In your case, where you plan to have up to 200K of users it will be very very painful.
Embedding is a very cool feature of MongoDB or any other document oriented database, but it's naive to think it doesn't (sometimes) comes without a price.
I think you should really rethink your schema: having an events collection, a users collection and a user_event collection with a structure similar to this:
{
_id : ObjectId(),
user_id : ObjectId(),
event_id : ObjectId()
}
Normalization is not a dirty word
Perhaps you should consider modeling your data in two collections and your attendees field in an event document would be an array of user ids.
Here's a sample of the schema:
db.events
{
_id : ...., // event Id,
'name' : // event name
'description' : //event description
'attendees' :[ObjectId('userId1'), ObjectId('userId2') ...]
}
db.users
{
_id : ObjectId('userId1'),
username: 'user friendly name',
avatarLink: 'url to avatar'
}
Then you could do 2 separate queries
db.events.find({_id: ObjectId('eventId')});
db.users.find( {_id: {$in: [ObjectId['userId1'), ObjectId('userId2')]}});

mongodb - add column to one collection find based on value in another collection

I have a posts collection which stores posts related info and author information. This is a nested tree.
Then I have a postrating collection which stores which user has rated a particular post up or down.
When a request is made to get a nested tree for a particular post, I also need to return if the current user has voted, and if yes, up or down on each of the post being returned.
In SQL this would be something like "posts.*, postrating.vote from posts join postrating on postID and postrating.memberID=currentUser".
I know MongoDB does not support joins. What are my options with MongoDB?
use map reduce - performance for a simple query?
in the post document store the ratings - BSON size limit?
Get list of all required posts. Get list of all votes by current user. Loop on posts and if user has voted add that to output?
Is there any other way? Can this be done using aggregation?
NOTE: I started on MongoDB last week.
In MongoDB, the simplest way is probably to handle this with application-side logic and not to try this in a single query. There are many ways to structure your data, but here's one possibility:
user_document = {
name : "User1",
postsIhaveLiked : [ "post1", "post2" ... ]
}
post_document = {
postID : "post1",
content : "my awesome blog post"
}
With this structure, you would first query for the user's user_document. Then, for each post returned, you could check if the post's postID is in that user's "postsIhaveLiked" list.
The main idea with this is that you get your data in two steps, not one. This is different from a join, but based on the same underlying idea of using one key (in this case, the postID) to relate two different pieces of data.
In general, try to avoid using map-reduce for performance reasons. And for this simple use case, aggregation is not what you want.

best possible schema design for log analysis database in mongodb

i have to store the following data in mongodb uid, gender ,country, city, date_of_visit, url_of_visit
I would like to store uid, gender, country and city in one collection because these information will never change for particular user.
in the other collection i would like to store uid, date_of_visit, url_of_visit
i want to know which is best practice to store uid, date_of_visit and url_of_visit.there are two things in my mind..
(a) { uid: 100, date: xxxxxxxxxxxxxxx, url: abc.php }
{ uid: 100, date: xxxxxx, url: ref.php }
{ uid: 200, date: xxxxxxxxx, url: ref.php }
(b) { uid:100, visit:[{date:xxxxxxx, url:abc.php},
{date:xxxx, url:def.php},
{.........................}]}
i want to have following index date:1, uid:1 ,url:1 ...the problem with approach (a) is with each row inserted in database the database side and index size will grow and there will come a point when index size will not fit into RAM
problem with approach (b) is at some point each document will exceed the 16 MB limit and this approach will fail that time..
please suggest me what should be the best schema design for this scenario. i would also have the query which include uid, gender, country, date_of_visit, url_of_visit
I know this thread is a bit older but I'm wondering if you've decided on a structure and if it works well.
My idea was, instead of risking to create too large documents, to structure it similar to your second approach but include the date in the main collection. This way each document would be the user's activity within one day. It would be indexed by user and date, easy to update and query and keep things organized.
Something like:
{ uid:100, date:xxxxxxx, event:[{time:xxxxxxx, url:abc.php},
{time:xxxx, url:def.php},
{.........................}]}
I think the second approach is better than one because it corresponds to idea of grouping similar data together. About exceeding 16M of document you can reach this limit but he should be a very active user. :)
Also you can pull out some data to another collection and make reference using ObjectId or DBRef.
See more info http://www.mongodb.org/display/DOCS/Database+References#DatabaseReferences-DBRef
Your second approach will force you to fetch a huge amount of data from the embedded document, which cannot be filtered by Mongo. In other words, if you have a million documents stored inside the "event" field for a particular user, then when you fetch those embedded documents with dot notation, then the entire document including the parent will be returned. There's no way you can filter the results.
I would recommend the first approach which makes the data easier to retrieve and work with.

Many to many in MongoDB

I decided to give MongoDB a try and see how well we get along. I do have some questions though.
Premise
I have users(id, name, address, password, email, etc)
I have stamps(id, type, value, price, etc)
Users browse through a stamp archive and filter it in various ways(pagination, filter by price, type, name, etc), select a stamp then add it to their collection.
Users can add more then one stamp to their collection (1 piece of mint and one used or just 2 pieces of used)
Users can flag some of their stamps for sale or trade and perhapa specify a price.
So far
Here's what I have so far:
{
_id : objectid,
Name: "bob",
Email: "bob#bob.com",
...
Stamps: [stampid-1, stampid-543,...,stampid-23]
}
Questions
How should I add the state of the owned stamp, the quantity and condition?
what would be some sample queries for the situations described earlier?
As far as I know, ensureindex makes it so you reduce the number of "scanned" entries.
The accepted answer here keeps changing the index. Is that just for the purpose of explaining it or is this the way to do it? I mean it does make sense somehow but I keep thinking of it in sql terms and... it does not make ANY sense...
The only change I would do is how you store the stamps that a user owns. I would store an array of objects representing the stamps and duplicating the values that are the more often accessed.
For example something like that :
{
_id : objectid,
Name: "bob",
Email: "bob#bob.com",
...
Stamps : [
{
_id: id,
type: 'type',
price: 20,
forSale: true/false,
quantity: 2
},
{
_id: id2,
type: 'type2',
price: 5,
forSale: false,
quantity: 10
}
]
}
You can see that some datas are duplicated between the stamps collection and the stamps array in the user collection. You do that with the properties that you access the more often. Because otherwise you would have to do a findOne for each stamps, and it is better to read directly the data that doing that in MongoDB. And this way you can add others properties such as quantity and forSale here.
The goal of duplication here is to avoid to run a query for each stamp in the array.
There is a link of a video that discusses MongoDB design and also explains what I tried to explain here.
http://lacantine.ubicast.eu/videos/3-mongodb-deployment-strategies/
from a SQL background, struggling with NoSQL also. It seems to me that a lot hinges on how unchanging types of data may or may not be. One thing that puzzles me in RDBMS systems is why it is not possible to say a particular column/field is "immutable". If you know a field is immutable (or nearly) in a NoSQL context it seems me to make it more acceptable to duplicate the info. Is it complete heresy to suggest that in many contexts you might actually want a combination of SQL and NoSQL structures?