MongoDB historical data storage - best practice? - mongodb

Assuming I have an user who has an ID and I want to store a historical record (document) on this user every day, what is better:
create a new document for each record and search for the user id; or
keep updating and embedding that data into one single user document which keeps growing over time?
Mostly I want to retrieve only the current document for the user but all records should be accessible at any time without a super long search/query.

There are a lot of variables that can affect such a decision. One big document seems most obvious provided it doesn't grow to unpractically large or even disallowed sizes (mind you, a document can be at most 16MB in size).
Using document per entry is also perfectly viable and provided you create the appropriate indexes should not result in slow queries.

There is a limit to how big a document can be. It's (as of v1.8) 16 MB. So you can simply run out of room if you update & embed. Also, mongo allocates document space based on average document size in a collection. If you keep adjusting/resizing this might have negative performance implications.
I think it's much safer to create new documents for each record and if/when you want to collate that data, you do it in a map/reduce job.

Related

MongoDB for a user tagging system

The requirement for this system is to store information about users and report on it. So...it makes sense for a User to be an individual document, and perhaps have an "event" or "tag" array on that user and a query could be performed that returned all users that had a specific event...that's fine. But - I'm worried about performance here. After a while this data is going to get very big, very quickly.
Let's say we have a really active user - it has billions of events and that particular user document is approaching gigabytes in size. In this instance, would the simple act of pulling that document down is going to take a while...and updating then sending it back will take a while as well (although I guess individual properties could be updated individually...)
What are the ways of managing this?
A document approaching gigabytes in size is already technically impossible, because MongoDB puts a limit of 16MB on documents.
When you have documents which grow over time, it is usually better to put the growing data in a separate collection as individual documents. The reason is that when a document grows beyond its initial size, MongoDB needs to move the document to another file location from time to time, which greatly slows down updates.

Storing visiting IP addresses in document array or seperate collection

I have a collection Items. Each document in this collection has a view counter, that increments every time a user who hasn't viewed the item earlier, visits its page.
Currently, I am storing an array of ipaddresses in each item document, so that I can keep track of who has viewed it, and only increment the view counter when a new user visits.
I am however concerned that this may affect performance since I have no way of retrieving the item document, without also getting the IP array.
I expect this array to range between 1 - 5000.
Would I be better off having a separate collection with an item id and the array, or am i overblowing the potential performance risks?
Quoting the official documentation.
In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.
However, embedding related data in documents may lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation
Since your array size will grow embedding your document is not a good option.
You may want to go for One-to-One Relationships

Aggregate collection that have an aggregate collectin

I am having some trouble which schema design to pick, i have a document which holds user info each user have a very big set of items that can be up to 20k items.
an item have a date and an id and 19 other fields and also an internal array which can have 20-30 items , and it can be modified,deleted and of course newly inserted and queried by any property that it holds.
so i came up with 2 possible schemas.
1.Putting everything into a single docment
{_id:ObjectId("") type:'user' name:'xxx' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
{_id:ObjectId("") type:'user' name:'yyy' items:[{.......,internalitems:[]},{.......,internalitems:[]},...]}
2.Seperating the items from the user and letting eachitem have its own
document
{_id:ObjectId(""), type:'user', username:'xxx'}
{_id:ObjectId(""), type:'user', username:'yyy'}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'xxx' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
{_id:ObjectId(""), type:'useritem' username:'yyy' item:{.......,internalitems:[]}]}
as i explained before a single user can have thousands of items and i have tens of users, internalitems can have 20-30 items, and it has 9 fields
considering that a single item can be queried by different users and can be modified only by the owner and another process.
if performance is really important which design would you pick?
if you pick neither of them what schema can you suggest?
on a side note i will be sharding and i have a single collection for everything.
I wouldn't recommend the first approach, there is a limit to the maximum document size:
"The maximum BSON document size is 16 megabytes.
The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API. See mongofiles and the documentation for your driver for more information about GridFS."
Source: http://docs.mongodb.org/manual/reference/limits/
There is also a performance implication if you exceed the current allocated document space when updating (http://docs.mongodb.org/manual/core/write-performance/ "Document Growth").
Your first solution is susceptible to both of these issues.
The second one is (Disclaimer: In the case of 20-30 internal items) is less susceptible of reaching the limit but still might require reallocation when doing updates. I haven't had this issue with a similar scenario, so this might be the way to go. And you might wanna look into Record Padding(http://docs.mongodb.org/manual/core/record-padding/) for some more details.
And, if all else fails, you can always split the internal items out as well.
Hope this helps!

Will I have problems using MongoDb regarding the size of documents?

I'm trying to develop a professional social network and I use mongodb to the database, and I wanted to ask if I will not have a problem with the database, regarding the size of documents. knowing that we plan to have a large number of users in the social network. I hope that I would have util feedback from you.
'Large number of users' is somewhat vague. Having a rough estimate helps..Anyway, the document size limit in MongoDB is 16MB, which looks enough for storing a user's profile details. However, in your use-case of 'networking', you might be planning for keeping followers/friends. Whether to store them in the same document as the User-profile document or not is a different question in itself. You might want to check these out:
What is a good MongoDB document structure for most efficient querying of user followers/followees?
http://www.10gen.com/events/common-mongodb-use-cases
http://docs.mongodb.org/manual/use-cases/
http://nosql.mypopescu.com/post/316345119/mongodb-usecases
One issue you might run into is that MongoDB stores the text of a field name for each field in each document. So if you have a field called "Name" or "Address" that you want for a set of documents that text will appear in every single document, taking up space. This is different to a relational database which has a schema, where the name of a column is only stored once.
A few years ago I worked on a project where the engineers had a bit of a surprise at the size of their data set when they simulated millions of users because they had not taken this into consideration. They optimized the data for size (ie "loc1" instead of "Location 1") but had not done the same for the field names. It's the problem when developers used to RDBM development make assumptions about NoSQL solutions, they only counted the size of their data, not field name plus field value.
They were glad they found this out in a test before they went live, otherwise they would have had to migrate every live document in order to implement the changes they wanted.
It isn't a big deal, certainly not a reason not to use MongoDB (being schema less and treating each document as a unique item is after all a feature rather than a bug or design flaw). Just something to keep in mind.

MongoDB -- large number of documents

This is related to my last question.
We have an app where we are storing large amounts of data per user. Because of the nature of data, previously we decided to create a new database for each user. This would have required a large no. of databases (probably millions) -- and as someone pointed out in a comment, that this indicated wrong design.
So we changed the design and now we are thinking about storing each user's entire information in one collection. This means one collection exactly maps to one user. Since there are 12,000 collections available per database, we can store 12,000 users per DB (and this limit could be increased).
But, now my question is -- is there any limit on the no. of documents a collection can have. Because of the way we need to store data per user, we expect to have a huge (tens of millions in extreme cases) no. of document per documents. Is that OK for MongoDB and design-wise?
EDIT
Thanks for the answers. I guess then it's OK to use large no of documents per collection.
The app is a specialized inventory control system. Each user has a large no. of little pieces of information related to them. Each piece of information has a category and some related stuff under that category. Moreover, no two collections need to see each other's data -- hence an index that touch more than one collection is not needed.
To adjust the number of collections/indexes you can have (~24k is the limit--~12k is what they say for collections because you have the _id index by default, but keep in mind, if you have more indexes on the collections, that will use namespace up as well), you can use the --nssize option when you start up mongod.
There are plenty of implementations around with billions of documents in a collection (and I'm sure there are several with trillions), so "tens of millions" should be fine. There are some numbers such as counts returned that have constraints of 64 bits, so after you hit 2^64 documents you might find some issues.
What sort of query and update load are you going to be looking at?
Your design still doesn't make much sense. Why store each user in a separate collection?
What indexes do you have on the data? If you are indexing by some field that has content that's common across all the users you'll get a significant saving in total index size by having a single collection with one index.
Index size is often the limiting factor not total database size when it comes to performance.
Why do you have so many documents per user? How large are they?
Craigslist put 2+ billion documents in MongoDB so that shouldn't be an issue if you have the hardware to support it and aren't being inefficient with your indexes.
If you posted more of your schema here you'd probably get better advice.