Storing user activity in embedded array versus separate collection? - mongodb

Say I want to mirror a social media's news feed by storing it in a mongo collection, and then periodically syncing it to fetch updates.
Multiple users will then be able to interact with this feed at a time (both reads and writes)
Also, lets assume that I initially will be storing between 500 and 1000 entries, but that I might consider increasing this later on.
My question is, would i be better off storing these activities in an embedded array, or a separate collection
As I understand it, storing it in an embedded array will allow for quick access access, but can quickly halt performance due to memory allocation.
On the other hand, storing each entry as it own document means ill have to go fetch every single on of them, which will slow down read performance
Any suggestion to what might fit my usecase best is much a appreciated
Thanks

Use a collection. Queries return matching documents, not matching array elements, so the things you are searching for should logically be your collection documents. You can reshape a document to contain just the first matching array element when a query matches against elements of an array, but not, e.g. the first 4 matching elements. You would need to use aggregation for simple queries, which would hurt performance.

Related

Storing visiting IP addresses in document array or seperate collection

I have a collection Items. Each document in this collection has a view counter, that increments every time a user who hasn't viewed the item earlier, visits its page.
Currently, I am storing an array of ipaddresses in each item document, so that I can keep track of who has viewed it, and only increment the view counter when a new user visits.
I am however concerned that this may affect performance since I have no way of retrieving the item document, without also getting the IP array.
I expect this array to range between 1 - 5000.
Would I be better off having a separate collection with an item id and the array, or am i overblowing the potential performance risks?
Quoting the official documentation.
In general, embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. Embedded data models make it possible to update related data in a single atomic write operation.
However, embedding related data in documents may lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation
Since your array size will grow embedding your document is not a good option.
You may want to go for One-to-One Relationships

Best way to organize subdocuments in Mongo?

I have a project that I'm working on that will require me to store a large number of objects in an array linked to a parent object, akin to the storing of social media comments to their original post. What is the best way for me to organize the data for the array of child documents/comments?
Is it considered best practice to have the child objects under a different collection and reference to their parent or would it be more ideal just to put them all within the parent object directly?
I discuss this a little here, read this first:
https://stackoverflow.com/a/27285313/68567
For your case, Option 3 (keeping some of the data in your primary model) is probably the best. The key is to Avoid unbounded array growth.
This has to do with how Mongodb allocates documents. http://docs.mongodb.org/manual/core/storage/
"Every document in MongoDB is stored in a record which contains the document itself and extra space, or padding, which allows the document to grow as the result of updates."
When node allocates new documents it allocates space based on the size of the inserted document and the sizes of documents already in your collection. (Read more in the link above.) If you have some documents that are orders of magnitude larger than others this will likely lead to fragmentation.
The way to avoid having too many documents in your 'comments' sub-document array is with the $push and $slice commands.
http://docs.mongodb.org/manual/reference/operator/update/slice/
So store the 'most recent 5' and display those when the item first loads. (Or oldest, or whatever other sorting criteria you want to use.) Then provide a way for the user to load more which will do a separate round-trip to the collection that has all of them.

How to do global search in MongoDB?

What I mean as global search is searching for documents in specified collections, for example, searching for a name in both User and Organization collections and will return both user and organization documents that match the criteria.
Is it possible to simply copy the documents in User and Organization into another collection and do a search in it?
No, it is not possible to do a multi-collection search automatically. There's no reason however that you couldn't perform the same query on multiple collections and combine the results.
While you could duplicate the data into another collection for query purposes, if you need to be guaranteed that the source collection's values matches identically with the "index" collection, you'll need to implement your own multi-phase transaction (example) as MongoDb doesn't have a multi-collection atomic commit. Or, you can accept the fact that the "index" table may be out of sync. Of course, it could be periodically updated through custom code. Further, it means your working set has increased as you're double storing data. Also, if you then need to grab data from individual collections (to grab more of the source document), you've likely not gained anything and made things worse when compared to doing multiple queries in the first place.
You could store related documents in the same collection and take advantage of the built-in indexing offered. Of course, this comes with the caveat that if your documents are now typed, you may find it more challenging to build MongoDb indexes that are efficient. Every changing/new document must go through the indexing pipeline, which may introduce significant overhead.
If it's only a few collections, I'd just do multiple searches without understanding more deeply your requirements. If not, the second best would be to combine documents into a single collection. Last choice would be to copy the data.

Storing very large documents in MongoDB

In short: If you have a large number of documents with varying sizes, where relatively few documents hit the maximum object size, what are the best practices to store those documents in MongoDB?
I have set of documents like:
{_id: ...,
values: [12, 13, 434, 5555 ...]
}
The length of the values list varies hugely from one document to another. For the majority of documents, it will have a few elements, for a few it will have tens of millions of elements, and I will hit the maximum object size limit in MongoDB. The trouble is any special solution I come up with for those very large (and relatively few) documents might have an impact on how I store the small documents which would, otherwise, live happily in a MongoDB collection.
As far as I see, I have the following options. I would appreciate any input on pros and cons of those, and any other option that I missed.
1) Use another datastore: That seems too drastic. I like MongoDB, and it's not like I hit the size limit for many objects. In the words case, my application could treat the very large objects and the rest differently. It just doesn't seem elegant.
2) Use GridFS to store the values: Like a blob in a traditional DB, I could keep the first few thousand elements of values in document and if there are more elements in the list, I could keep the rest in a GridFS object as a binary file. I wouldn't be able to search in this part, but I can live with that.
3) Abuse GridFS: I could keep every document in gridFS. For the majority of the (small) documents the binary chunk would be empty because the files collection would be able to keep everything. For the rest I could keep the excess elements in the chunks collection. Does that introduce an overhead compared to option #2?
4) Really abuse GridFS: I could use the optional fields in the files collection of GridFS to store all elements in the values. Does GridFS do smart chunking also for the files collection?
5) Use an additional "relational" collection to store the one-to-many relation, but th number of documents in this collection would easily exceed a hundred billion rows.
If you have large documents, try to store some metadata about them in MongoDB, and put the rest of the data --the part you will not be querying on-- outside.

mongodb document structure

My database has users collection,
each user has multiple documents,
each document has multiple sections
each section has multiple works
Users work with works collection very often (add new work, update works, delete works). So my question is what structure of collections should I make? works collection is 100-200 records per section.
Should I make work collection for all users with user _id or there is best solution?
Depends on what kind of queries you have. The guideline is to arrange documents so that you can fetch all you need in ideally one query.
On the other hand, what you probably want to avoid is to have mongo reallocate documents because there's not enough space for a in-place update. You can do that by preallocating enough space, or extracting that frequently changing part into its own collection.
As you can read in MongoDB docs,
Generally, for "contains" relationships between entities, embedding should be be chosen. Use linking when not using linking would result in duplication of data.
So if each user has only access to his documents, I think you're good. Just keep in mind there's a limitation on size (16MB I think) for documents which you should be careful about, since you're embedding lots of stuff.