What is a persistent disk? - mongodb

I have been doing an online course in MongoDB, however I am frequently coming across a term called "Persistent disk". I have googled it out, but did not get a satisfying answer. Could you please help me out ?

When your application runs (or Mongo in this case), it does all its work in RAM. RAM is not persistent because when you turn off your computer you permanently lose all the information stored in RAM.
Hard drives and SSDs on the otherhand are persistent. If you write something to them, it sticks for good.
So when you tell Mongo to insert a document, it first gets written into RAM, and then eventually gets written to the (persistent) disk. From an AppDev perspective is this all hidden from you and you (generally) just assume its written straight to desk
This is a good video from what I guess to be the Mongo course you're taking that talks about the transition from memory to disk.

Related

Possible big mistake. What exactly does "db.repairDatabase()" do? MONGODB

I have a mongodb database with several million users.
I wanted to free space and I created a bot to remove inactive users of more than 6 months.
I have been looking at the disk for several minutes
and I have seen that it varied but it will not release large space, not even 1 mb. That's weird.
I've read that "remove" does not actually delete the disc if it does not simply mark that it can be deleted or overwritten. It is true?
That seemed to make a lot of sense to me. So, I've looked for something that forces space to really free up...
I've applied repairDatabase() and I think I've done wrong.
Everything has been blocked!
I have tried the luck and I have restarted the server.
There is a MongoDB service working but its status is maintained in "Starting" (not Running).
I'm reading from other sites that repairDatabase() requires twice as much space as the original size of the database, it does not have it.
I do not know, what is doing, and this could in several hours, days ...
Is the database lost? I think I will stop all services and delete the database.
repairDatabase is similar to fsck. That is, it attempts to clean up the database of any corrupt documents which may be preventing MongoDB to start up. How it works in detail is different depending on your storage engine, but repairDatabase could potentially remove documents from the database.
The details of what the command does is outlined quite clearly (with all the warnings) in the MongoDB documentation page: https://docs.mongodb.com/manual/reference/command/repairDatabase/
I would suggest that next time it's better to read the official documentation first rather than reading what people said in forums. Second-hand information like these could be outdated, or just plain wrong.
Having said that, you should leave the process running until completion, and perform any troubleshooting if the database cannot be started. It may require 2x the disk space of your data, but it's also possible that the command just needs time to finish.

mongodb performance for large document

I have a document that holds a big data structure in certain fields inside an array, it is slowing down my application due to frequent hits to read such data. am thinking on few solutions to implement but I need advice before i proceed and possibly even a better solution, here are my thoughts/questions:
would it help to cache data?
should I use memcached or redis as a caching engine and why?
would it help to read single fields from this document instead of reading it all every time?
should I do something else?!
Caching will help because it would avoid your db to be hit too often
Memcache or redis it's up to you. I prefere redis but if you already have a memcache it's fine.
If you have a cluster of servers, think if you need a centralized cache or not
Caching a full document won't help for getting a single field because you cache the result of a query without knowing what it contains.
your question need more clarification. for example how big is the data that you are speaking of is it couple of megabytes or gigabytes. All these factors change the solution. But if we consider that you have couple of megabytes and you want to prevent to call database every time the best solution is cache. How to choose a cache is also completely depends on what is your situation. If your web application runs on one server you can use the in-memory cache like ASP.Net cache which is very quick and fast for in-memory cache. this cache is stored in your heap so you can put all your object in the cache without serialization.But consider that whenever your application is restarted like most of deployments. your heap will be deleted and all the cache is cleared inside the heap.
if you have more than one server then you can start to think about an out-of-memory cache because two servers are not sharing heap memory and using all in-memory cache are useless because it duplicate the data and invalidating is nightmare. However, this is more reliable cache while it is not in the heap and in term of persistence is more than in-memory cache. But whatever you want to put in this kind of cache should be serializable while you are transferring the object over network connection. So you cannot put all your object in cache. Both Redis and memcached can be used for this purpose. Redis is more complicated with more functionality than Memcached but for your purpose memcached is quite good.
Whatever caching system you choose, approach it in a wide perspective. Design a caching system in your application while over time you need to put more things in cache. so its better to prepare everything for that time from now.
another things which is very important in cache is that whenever you set something in cache you have to consider when you are going to invalidate it.
Whether or not caching will help depends on the accession of the document. If the document is being accessed multiple times then caching will not help due to how MongoDB to memory caching actually works.
First, you need to understand your data accession patterns.

is memcache relative to database

I have been browsing through a lot of websites. I need experts advice on this one.
can anyone please explain me what exactly is memcache ?
From what I understand that it is a distributed memory caching system used for dynamic web apps but my main question is do we need a database when we say 'memcache' or the term 'memcache' doesnt need a database ?
please answer. Thank you
No, you don't require a traditional database when you say memcache, it's an in memory hash table(dictionary) with key,value storage and so it resides in RAM as a lookup table.For this fact, it is not persistent, so whenever you restart your server, memcache gets reset.
memcached is a specific program that runs a server that other programs can use to keep things in memory. It's something like an in-memory database, depending on your definition of database.
Caching something in memory can also be done generically, without memcached (pronounced memcache dee).

In Mongodb how does the cache get warmed up?

I'm wondering how the cache (memory) gets warmed up. I understand that MongoDB uses memory mapped files and the OS's virtual memory to swap pages in and out as needed. What I don't understand is how it gets warmed up on startup.
Upon startup does mongod map all of the pages in the database to virtual memory or is there some other mechanism to load pages that are not yet mapped which get mapped as queries are run against the database?
Similarly, is the size of the database limited to the amount of virtual memory available to the system. I understand that on a 64-bit system this is a lot. Is there another mechanism other than memory mapping for pages to moved to and from disk?
Memory mapping means that there is a representation of all the on disk files available but only a portion of these files may be present in RAM. When a given page is needed (and it is not in RAM) it can be read from disk into RAM to be accessed.
Regarding limitations, you can see them on the MongoDB limits page
MongoDB does not do any specific "warming" of pages on startup, as it does not have any concept of which pages would be useful and which not.
If you wish to "Warm" certain collections manually before using them you should look at the touch command.

Is running postgresql in memory a good idea?

Recently we are working on migrate our software from general PC server to a kind of embedded system which use Disk on module (DOM) instead of hard disk drive.
My colleague insist that as DOM could only support about 1 million times of write operation, we should running our database entirely in a RAM disk and backup the database to DOM.
There 3 ways to trigger the backup :
User trigger
Every 30 minutes
Every time when there is some add/update/delete operation in database
As we expecte that user will only modify the database when system is installed, I think maybe postgresql would not write that often.
But I don't know much about postgresql, I can not judge if it worth all this trouble and which approach is better.
What do you think about it?
The problem of wearing out SSDs can be alleviated by whatever firmware the SSD has. Sometimes those chipsets don't do it well, or leave the responsibility to someone else. In this case, you can use a filesystem designed to do wear levelling by itself. UBIFS or LogFS are suitable filesystems.
Assuming that the claim about the DOM write cycles is true, which I can't comment on, then this won't work very well. PostgreSQL assumes that it can write whatever it wants whenever it wants (even if no logical updates are happening), and you have no real chance of making it go along with the 3 triggers that you mention.
What you could do is have the entire thing run on a RAM disk and have some operating system process back this up atomically to permanent storage. This needs careful file system and kernel support. This could work if your device is on most of the time, but probably not so well if it's the sort of thing that you switch on and off like a TV, because the recovery times could be annoying.
Alternatives are using either a more embedded-like RDBMS such as SQLite, or using a storage system that can handle PostgreSQL, like the recent solid state drives, although some SSDs have bogus cache settings that might make them unsuitable for PostgreSQL.