How MongoDB manage data after inserting? - mongodb

After data is inserted into the db, I know that MongoDB stores the data in files, however, I'm confused about memory.
Supposing when I will insert 50 million records into the db - will this data be loaded in memory? If not, how does MongoDB behave to keep its performance?

In that case documents are loaded into memory on request by blocks, that mean our collection is split-ed into chunks, and most frequent used chunks resides in memory.
To gain performance mongo uses indexes and there is a special query called coved query which means that all data needed is stored in index, which is smaller than collection.

Related

Cache only high-usage keys in MongoDB

Suppose we have a simple blog with many posts.
And we regularly execute a query to get list of post's title and date from the posts collection.
So, What will cache inside of the Ram in this scenario(except indexes)? the whole document or only _ids, titles and dates?
The documentation doesn't clear this:
MongoDB keeps most recently used data in RAM. If you have created
indexes for your queries and your working data set fits in RAM,
MongoDB serves all queries from memory
DB version is 4.2.8.
It works as below:
Let's say you have 4 collections. Each contains 2Gig data. And each collection contain 500Mb of data in the index.
Totally: 8Gb data, 2Gb index. You query on one collection frequently and one particular query on that collection. You can assume that it keeps the data related to the query which you execute frequently in the cache. It includes data in the index and actual data related to it from the desk.
It doesn't keep the result in the index. So it keeps the whole document and the index data for the frequent query in the cache.

Using nested document structure in mongodb

I am planning to use a nested document structure for my MongoDB Schema design as I don't want to go for flat schema design as In my case I will need to fetch my result in one query only.
Since MongoDB has a size limit for a document.
MongoDB Limits and Threshold
A MongoDB document has a size limit of 16MB ( an amount of data). If your subcollection can growth without limits go flat.
I don't need to fetch my nested data but only be needing my nested data for filtering and querying purpose.
I want to know whether I will still be bound by MongoDB size limits even if I use my embedded data only for querying and filter purpose and never for fetching of nested data because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?
Nested schema design example
{
clinicName: "XYZ Hopital",
clinicAddress: "ABC place.",
"doctorsWorking":{
"doctorId1":{
"doctorJoined": ISODate("2017-03-15T10:47:47.647Z")
},
"doctorId2":{
"doctorJoined": ISODate("2017-04-15T10:47:47.647Z")
},
"doctorId3":{
"doctorJoined": ISODate("2017-05-15T10:47:47.647Z")
},
...
...
//upto 30000-40000 more records suppose
}
}
I don't think your understanding is correct when you say "because as per my understanding, in this case, MongoDB won't load the complete document in memory but only the selected fields?".
If we see MongoDB Doc. then it reads
The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.
So the clear limit is 16 MB on document size. Mongo should stop you from saving such a document which is greater than this size.
If I agree with your understanding for a while then let's say that it allows to
save any size of document but more than 16 MB in RAM is not allowed. But on other hand, while storing the data it won't know what queries will be run on this data. So ultimately you will be inserting such big documents which can't be used later. (because while inserting we don't tell the query pattern, we can even try to fetch the full document in a single shot later).
If the limit is on transmission (hypothetically assuming) then there are lot of ways (via code) software developers can bring data into RAM in clusters and they won't cross 16 MB limit ever (that's how they do IO ops. on large files). They will make fun of this limit and just leave it useless. I hope MongoDB creators knew it and didn't want it to happen.
Also if limit is on transmission then there won't be any need of separate collection. We can put everything in a single collections and just write smart queries and can fetch data. If fetched data is crossing 16 MB then fetch it in parts and forget the limit. But it doesn't go this way.
So the limit must be on document size else it can create so many issues.
In my opinion if you just need "doctorsWorking" data for filtering or querying purpose (and if you also think that "doctorsWorking" will cause document to cross 16 MB limit) then it's good to keep it in a separate collection.
Ultimately all things depend on query and data pattern. If a doctor can serve in multiple hospitals in shifts then it will be great to keep doctors in separate collection.

MongoDB Internal implementation of indexing?

I've learned a lot of things about indexing and finding some stuff from
here.
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the
query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
But i still have some questions:
While Creating index using (createIndex), is the Record always stored in
RAM?
Is every time need to create Index Whenever My application
is going to restart ?
What will Happen in the case of default id (_id). Is always Store in RAM.
_id Is Default Index, That means All Records is always Store in RAM for particular collections?
Please help me If I am wrong.
Thanks.
I think, you are having an idea that indexes are stored in RAM. What if I say they are not.
First of all we need to understand what are indexes, indexes are basically a pointer to tell where on disk that document is. Just like we have indexing in book, for faster access we can see what topic is on which page number.
So when indexes are created, they also are stored in the disk, But when an application is running, based on the frequent use and even faster access they get loaded into RAM but there is a difference between loaded and created.
Also loading an index is not same as loading a collection or records into RAM. If we have index loaded we know what all documents to pick up from disk, unlike loading all document and verifying each one of them. So indexes avoid collection scan.
Creation of indexes is one time process, but each write on the document can potentially alter the indexing, so some part might need to be recalculating because records might get shuffled based on the change in data. that's why indexing makes write slow and read fast.
Again think of as a book, if you add a new topic of say 2 pages in between the book, all the indexes after that topic number needs to be recalculated. accordingly.
While Creating index Using (createIndex),Is Record always store in RAM
?.
No, records are not stored in RAM, while creating it sort of processes all the document in the collection and create an index sheet, this would be time consuming understandably if there are too many documents, that's why there is an option to create index in background.
Is every time need to create Index Whenever My application is going to
restart ?
Index are created one time, you can delete it and create again, but it won't recreated on the application or DB restart. that would be insane for huge collection in sharded environment.
What will Happen in the case of default id (_id). Is always Store in
RAM.
Again that's not true. _id comes as indexed field, so index is already created for empty collection, as when you do a write , it would recalculate the index. Since it's a unique index, the processing would be faster.
_id Is Default Index, That means All Records is always Store in RAM for particular collections ?
all records would only be stored in RAM when you are using in-memory engine of MongoDB, which I think comes as enterprise edition. Due to indexing it would not automatically load the record into RAM.
To answer the question from the title:
MongoDB indexes use a B-tree data structure.
source: https://docs.mongodb.com/manual/indexes/index.html#b-tree

mongoDB does a huge collection affects the preformance of other collections?

In my application I'm about to save some files on the DB.
I've seen the debate whether to save on the filesystem \ db and chose to save the files on the database.
my database for the project is mongoDB.
I would like to know if i have lets say 20 collections in my mongoDB,
and exactly one of them is extremely big.
will i see a performance impact when i work on the other (less large) collections?
if So should i separate this collection from the other collections ? (create another DB for this huge collection alone)?
Does my-sql suffer from the same effect?
thanks.
There are two key considerations here:
Ensure that your working set fits in memory. This will mean that your available memory should exceed at least the total size of the indexes you use for your reads.
MongoDB has a database level write lock after v2.2. This means that during any write operation, the entire database is locked for reads. So for large bulk inserts into a single collection that may take a while, all other collections are locked for the duration of the bulk insert. Therefore, if you separate your large collection into a separate database, your key advantage will be that inserts to that collection will not block reads to collections in other databases.
I'd suggest firstly ensuring that you have enough memory for your working set, and secondly I'd separate the large collection into a separate DB if you intend to write to it a lot.

mongodb got slow when the document count went around 100, 000 . Any performance optimization?

I run a single mongodb instance which is getting inserted with logs from an app server. the current rate of insert in production is 10 inserts per second. And its a capped collection. i DONT USE ANY INDEXES . Queries were running faster when there were small number of records. only one collection has that amount of data. even querying from collection that has very few rows has become very slow. IS there any means to improve the performance.
-Avinash
This is a very difficult question to answer because we dont know much about your configuration or your document structure.
One thing that immediately pops into my head is that you are running out of memory. 10 inserts per second doesn't mean much because we do not know how big the inserted documents are.
If you are inserting larger documents at 10 per second, you could be eating up memory, causing the operating system to push some of your records to disk.
When you query without using an index, you are forced to scan every document. If your documents have been pushed to disk by the OS, you will begin having page faults. Mongo will need to fetch pages of data off the hard disk, and load them into memory so that they can be scanned. Before doing this, the operating system will need to make room for that data in memory by flushing other parts of memory out to disk.
It sounds like you are are I/O bound and the two biggest things you can do to fix this are
Add more memory to the machine running mongod
Start using indexes so that the database does not need to do full collection scans
Use proper indexes, though that will have some effect on the efficiency of insertion in a capped collection.
It would be better if you can share the collection structure and the query you are using.