Mongodb Collections and documents - mongodb

I get its point about dynamic schema architecture. But this raises my confusion that if its schema less, then why would we need more than one collection (other than collection size issue)? Then we can include all sort of documents in one collection?

Related

Is the Firestore collection write limit (imposed by sequentially-updated indexed fields) affected by collection-group queries?

From how I understand it, if a collection has a monotonically-increasing indexed field, a write limit is imposed on that collection. If that collection is split into two separate collections, each collection would have its own write limit. However, if we split that collection into two separate collections but give them the same name (putting them under different documents), would they still have their own independent write limits if the monotonically-indexed field was part of a collection-group query that queried them both together?
No, that's not the way it works. A collection group query requires its own index, and the limit you're talking about is the write rate of the index itself, not the collection. Each collection automatically indexes fields from documents for just that specific collection, but that would not apply the collection group queries that span collections.
Note that the documentation states the limit as:
Maximum write rate to a collection in which documents contain sequential values in an indexed field
On a related note, disabling the indexing for a specific field on a collection allows you to bypass the normal monotonic write limits for that one field on that collection because it's no longer being indexed.

how mongodb scales when you have relationships between collections?

I have a mongodb which links documents (the data cannot be embedded)
Does the mongos cluster (http://docs.mongodb.org/manual/core/sharding-introduction/) support sharding when the documents are linked?
How this impacts the performance?
Thanks!
Considering there is nothing special about referenced documents, it is just a logical relationship inferred by the application layer and not MongoDB itself, sharding is supported. This applies for "manual" references, as well as DBRefs. You can even shard on a DBRef property, although I'm not sure as to why you'd want to considering a DBRef should have inherently low cardinality.
There is an impact in performance for both manual and DBRefs, in that multiple queries must be performed to "join" the data. From the docs:
To resolve DBRefs, your application must perform additional queries to
return the referenced documents. Many drivers have helper methods that
form the query for the DBRef automatically. The drivers do not
automatically resolve DBRefs into documents.
There is no such thing as "document links" in MongoDB. Just fields in documents of collection A which happen to have the same values as fields of documents in collection B. DBRef's are just a conversion on the application layer and get no special treatment whatsoever by the database.
What matters for sharding efficiency is how you define the shard key for the referenced collection. When the field you search by is part of the shard key of the collection, mongos can accelerate it by redirecting the query to the correct shard.
You likely want all documents of collection A which belong to the same document of collection B to reside on the same shard. That means you should have the shard key of A include the field of A which is an unique identifier of B (objectID, name or whatever).

Mapping datasets to NoSql (MongoDB) collection

what I have ?
I have data of 'n' department
each department has more than 1000 datasets
each datasets has more than 10,000 csv files(size greater than 10MB) each with different schema.
This data even grow more in future
What I want to DO?
I want to map this data into mongodb
What approaches I used?
I can't map each datasets to a document in mongo since it has limit of 4-16MB
I cannot create collection for each datasets as max number of collection is also limited (<24000)
So finally I thought to create collection for each department , in that collection one document for each record in csv file belonging to that department.
I want to know from you :
will there be a performance issue if we map each record to document?
is there any max limit for number of documents?
is there any other design i can do?
will there be a performance issue if we map each record to document?
mapping each record to document in mongodb is not a bad design. You can have a look at FAQ at mongodb site
http://docs.mongodb.org/manual/faq/fundamentals/#do-mongodb-databases-have-tables .
It says,
...Instead of tables, a MongoDB database stores its data in collections,
which are the rough equivalent of RDBMS tables. A collection holds one
or more documents, which corresponds to a record or a row in a
relational database table....
Along with limitation of BSON document size(16MB), It also has max limit of 100 for level of document nesting
http://docs.mongodb.org/manual/reference/limits/#BSON Document Size
...Nested Depth for BSON Documents Changed in version 2.2.
MongoDB supports no more than 100 levels of nesting for BSON document...
So its better to go with one document for each record
is there any max limit for number of documents?
No, Its mention in reference manual of mongoDB
...Maximum Number of Documents in a Capped Collection Changed in
version
2.4.
If you specify a maximum number of documents for a capped collection
using the max parameter to create, the limit must be less than 232
documents. If you do not specify a maximum number of documents when
creating a capped collection, there is no limit on the number of
documents ...
is there any other design i can do?
If your document is too large then you can think of document partitioning at application level. But it will have high computation requirement at application layer.
will there be a performance issue if we map each record to document?
That depends entirely on how you search them. When you use a lot of queries which affect only one document, it is likely even faster that way. When a higher document-granularity results in a lot of document-spanning queries, it will get slower because MongoDB can't do that itself.
is there any max limit for number of documents?
No.
is there any other design i can do?
Maybe, but that depends on how you want to query your data. When you are content with treating files as a BLOB which is retrieved as a whole but not searched or analyzed on the database level, you could consider storing them on GridFS. It's a way to store files larger than 16MB on MongoDB.
In General, MongoDB database design doesn't depend so much on what and how much data you have, but rather on how you want to work with it.

MongoDB - Using email id as identifier across collections

I have user collection which holds email_id and _id as unique. I want to store user data across various collections. I would like to use email_id as identifier in those collections. Because it is easy to query in the shell against those collections with email_id instead of complex ObjectId.
Is this right way? will it give any performance problem while creating indexes with big emailIds?
Also, don't consider this option, If you have plan to enable email_id change
option in future.
While relational databases encourage you to normalize your data and spread it over many tables, this approach is usually not the best for MongoDB. MongoDB doesn't support JOINs over multiple collections or even multiple documents from the same collection. So you should try to design your database documents in a way that each query can be statisfied by searching for a single document. That means it is usually a good idea to store all information about a user in one document.
An exception for this is when certain points of data of the user grows indefinitely (like the posts made by a user in a forum). First, MongoDB documents have a size limit and second, when the size of a document increases, the database needs to reallocate its hard drive space frequently. This slows down writes and leads to fragmentation in the database. In that case it's better to put each entity in a different collection.
The size of the fields covered by an index don't matter when you search for equality. When you have an unique index on email_id, it should be just as fast as searching by _id.

Why does MongoDB have collections

MongoDB being document-oriented, the structure of collections seems to be a special case of documents. By that I mean one can define a document to contain other documents. So a collection is just a document containing other documents.
So why do we need collections after all?
Logically yes, you could design a database system like that, but practically speaking no.
A collection has indexes on the documents in it.
A collection requires the documents in it to have unique ids.
A document is limited in size.
Object ids (_id top-level document attribute) must be unique within a collection. Multiple collections may have the same _id, just like in RDBMs where the key constraint is per-table, yet multiple tables may contain the same value for a key.
collections is a container for documents. so when you say a document that contain other documents that s kinda wrong because, already, a document can have inner documents.
Collection is the unit where you put together the documents. Be aware that due to schema free design, you can put anything in a collection but it s not a good design. so collection is kinda logical container for documents. same as tables in relational world.