I have a project that I'm working on that will require me to store a large number of objects in an array linked to a parent object, akin to the storing of social media comments to their original post. What is the best way for me to organize the data for the array of child documents/comments?
Is it considered best practice to have the child objects under a different collection and reference to their parent or would it be more ideal just to put them all within the parent object directly?
I discuss this a little here, read this first:
https://stackoverflow.com/a/27285313/68567
For your case, Option 3 (keeping some of the data in your primary model) is probably the best. The key is to Avoid unbounded array growth.
This has to do with how Mongodb allocates documents. http://docs.mongodb.org/manual/core/storage/
"Every document in MongoDB is stored in a record which contains the document itself and extra space, or padding, which allows the document to grow as the result of updates."
When node allocates new documents it allocates space based on the size of the inserted document and the sizes of documents already in your collection. (Read more in the link above.) If you have some documents that are orders of magnitude larger than others this will likely lead to fragmentation.
The way to avoid having too many documents in your 'comments' sub-document array is with the $push and $slice commands.
http://docs.mongodb.org/manual/reference/operator/update/slice/
So store the 'most recent 5' and display those when the item first loads. (Or oldest, or whatever other sorting criteria you want to use.) Then provide a way for the user to load more which will do a separate round-trip to the collection that has all of them.
Related
The vast majority of the fields will not need to be indexed and will never be queried (think display ONLY). There are maybe 20-30 fields that need to be queried.
Likewise, the vast majority of the the fields will be for simple field value pairs (not embedded docs).
Finally, there will be some fields that will store embedded docs, but not a lot and the embedded docs will not be large.
I was thinking of maybe breaking up the collection into two collections:
A collection with fields that need to be indexed/queried and fields that need to be displayed in larger result sets (really anything more than 1 single document).
A collection with: _id, related_id and data (where data is an embedded doc with all the extra data in it). This collection would only ever be accessed when viewing the "detailed" display of the document.
Also, there will be 100s of millions of documents (eventually).
Say I want to mirror a social media's news feed by storing it in a mongo collection, and then periodically syncing it to fetch updates.
Multiple users will then be able to interact with this feed at a time (both reads and writes)
Also, lets assume that I initially will be storing between 500 and 1000 entries, but that I might consider increasing this later on.
My question is, would i be better off storing these activities in an embedded array, or a separate collection
As I understand it, storing it in an embedded array will allow for quick access access, but can quickly halt performance due to memory allocation.
On the other hand, storing each entry as it own document means ill have to go fetch every single on of them, which will slow down read performance
Any suggestion to what might fit my usecase best is much a appreciated
Thanks
Use a collection. Queries return matching documents, not matching array elements, so the things you are searching for should logically be your collection documents. You can reshape a document to contain just the first matching array element when a query matches against elements of an array, but not, e.g. the first 4 matching elements. You would need to use aggregation for simple queries, which would hurt performance.
I have a collection Items. Each document in this collection has a view counter, that increments every time a user who hasn't viewed the item earlier, visits its page.
Currently, I am storing an array of ipaddresses in each item document, so that I can keep track of who has viewed it, and only increment the view counter when a new user visits.
I am however concerned that this may affect performance since I have no way of retrieving the item document, without also getting the IP array.
I expect this array to range between 1 - 5000.
Would I be better off having a separate collection with an item id and the array, or am i overblowing the potential performance risks?
Quoting the official documentation.
In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.
However, embedding related data in documents may lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation
Since your array size will grow embedding your document is not a good option.
You may want to go for One-to-One Relationships
In short: If you have a large number of documents with varying sizes, where relatively few documents hit the maximum object size, what are the best practices to store those documents in MongoDB?
I have set of documents like:
{_id: ...,
values: [12, 13, 434, 5555 ...]
}
The length of the values list varies hugely from one document to another. For the majority of documents, it will have a few elements, for a few it will have tens of millions of elements, and I will hit the maximum object size limit in MongoDB. The trouble is any special solution I come up with for those very large (and relatively few) documents might have an impact on how I store the small documents which would, otherwise, live happily in a MongoDB collection.
As far as I see, I have the following options. I would appreciate any input on pros and cons of those, and any other option that I missed.
1) Use another datastore: That seems too drastic. I like MongoDB, and it's not like I hit the size limit for many objects. In the words case, my application could treat the very large objects and the rest differently. It just doesn't seem elegant.
2) Use GridFS to store the values: Like a blob in a traditional DB, I could keep the first few thousand elements of values in document and if there are more elements in the list, I could keep the rest in a GridFS object as a binary file. I wouldn't be able to search in this part, but I can live with that.
3) Abuse GridFS: I could keep every document in gridFS. For the majority of the (small) documents the binary chunk would be empty because the files collection would be able to keep everything. For the rest I could keep the excess elements in the chunks collection. Does that introduce an overhead compared to option #2?
4) Really abuse GridFS: I could use the optional fields in the files collection of GridFS to store all elements in the values. Does GridFS do smart chunking also for the files collection?
5) Use an additional "relational" collection to store the one-to-many relation, but th number of documents in this collection would easily exceed a hundred billion rows.
If you have large documents, try to store some metadata about them in MongoDB, and put the rest of the data --the part you will not be querying on-- outside.
My database has users collection,
each user has multiple documents,
each document has multiple sections
each section has multiple works
Users work with works collection very often (add new work, update works, delete works). So my question is what structure of collections should I make? works collection is 100-200 records per section.
Should I make work collection for all users with user _id or there is best solution?
Depends on what kind of queries you have. The guideline is to arrange documents so that you can fetch all you need in ideally one query.
On the other hand, what you probably want to avoid is to have mongo reallocate documents because there's not enough space for a in-place update. You can do that by preallocating enough space, or extracting that frequently changing part into its own collection.
As you can read in MongoDB docs,
Generally, for "contains" relationships between entities, embedding should be be chosen. Use linking when not using linking would result in duplication of data.
So if each user has only access to his documents, I think you're good. Just keep in mind there's a limitation on size (16MB I think) for documents which you should be careful about, since you're embedding lots of stuff.