MongoDB: Queries running twice slow on NEW server compared to OLD server - mongodb

I transferred current/old running DB into a new standalone server for MongoDB. To do this, I performed the following:
Took dump of data from OLD server
Restored data from the generated dump into NEW server
Configured the server for authentication
Issue:
I noticed that after performing the above, few queries on the NEW server were running slow almost twice the time compared to their performance on the OLD server.
Configurations:
The configurations of both the servers are same however the NEW server has 32 GB RAM while the OLD server had 28GB RAM. OLD server had other applications and servers running as well. While the NEW server is a dedicated server only for this DB.
CPU consumption is similar however RAM is heavily occupied in the OLD server while it is comparatively less occupied on NEW server.
Therefore, NEW server is better equipped in hardware and RAM consumption. Also NEW server is standalone dedicated to only this DB.
Question:
Why could my NEW server even though it is standalone be slow compared to OLD one? How can I correct this?

MongoDB keeps the most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory.
So your OLD DB has cached data in memory and whenever you are performing the query your results may be coming from memory.
Also by default wired tiger consumes 50% of memory for internal cache.
Hence, the heavy memory consumption and better performance for your OLD DB.
Since you just restored the dump on your NEW DB, it doesn't have any recent data cached into the memory.
So whenever you perform a query it is hitting the disk and performing the I/O operations, which impacts your query performance.
If you fire a set of queries a couple of times, MongoDB will cache the result if it fits in your memory and it will give you better performance on your queries on NEW DB
Here are some docs that may help you get more information
https://docs.mongodb.com/manual/core/wiredtiger/#wiredtiger-ram
https://docs.mongodb.com/manual/faq/fundamentals/#does-mongodb-handle-caching

Related

Increase data insert speed of PostgreSQL

I am stuck in a problem that PostgreSQL data writes are very slow.
I developed my application in Java (using JDBC) to insert data into a PostgreSQL DB. It works well on our remote development server. However, after I deploy it to the production server, it causes a problem.
The insert speed of PostgreSQL on the production server is only ~150 records/s for 200000K records, while it is ~1000 records/s for the same data set on the development server.
Firstly, I tried to change the configuration in postgresql.conf as follows:
effective_cache_size = 4GB
max_wal_size = 2GB
work_mem = 128MB
shared buffers = 512MB
After I changed the configuration and restarted, it only affects the query speed, while the insert speed does not change (~150 records/s).
I have checked my server memory info, there is a lot of free memory ~4GB. The inserter only uses 0.5% of 8GB (~40MB).
So my questions are:
Is this a problem of a storage disk, such as SSD and HDD or virtual
and physical etc.? Why is the insert speed still very slow, although I have changed the configuration? Is there any way
for increasing the insert speed?
Note: the problem does not relate to the insert query structure.
I have used the same query in the same condition elsewhere (I set up an
environment in 2 servers in the same way). I do not know why the
DEVELOPMENT server (4GB) works better than the PRODUCTION server
(8GB).
The only one of your parameters that has an influence on INSERT performance is max_wal_size. High values prevent frequent checkpoints.
Use iostat -x 1 on the database server to see how busy your disks are. If they are quite busy, you are probably I/O bottlenecked. Maybe the I/O subsystem on your test server is better?
If you are running the INSERTs in many small transactions, you may be bottlenecked by fsync to the WAL. The symptom is a busy disk with not much I/O being performed.
In that case batch the INSERTs in larger transactions. The difference you observe could then be due to different configuration: Maybe you set synchronous_commit or (horribile dictu!) fsync to off on the test server.

Postgres load balance with limited hardware resources

I've got a task to do and some limited hardware resources, as always.
I need to setup postgres server with single database, with a table of largeobjects (3TB+) and a few small, heavily accessed tables (<10 GB).
I've got old physical server with ~5 TB of harddisk space, with limited CPU and RAM, I can also use much faster (in CPU and RAM) virtual server - but limited in storage.
I won't have much DELETE statements, most SELECT statements will be to recent data. There will be one simultanous connection doing all the job, client on one host only.
I see a few scenarios:
Postgres on virtual machine with remote storage (single instance)
Postgres on old hardware with local storage (single instance)
Postgres on both, with some kind of replication (high speed virtual machine for new data, low speed for older data on the old hardware)
Any other ideas?
Is it even possible to replicate just the most recent part of the postgres database?
90% of SELECT queries will be to the most recent ~5-10 gigabytes of data, but I need seamless access to the rest 2,990 TB.
What should I do? (except buying appropriate hardware;)
It doesn't really matter as long as you have enough RAM to buffer the 10GB of heavily accessed data.
You'll need some additional RAM to read large objects without pushing the 10GB out of the cache, but that shouldn't be a problem on today's machines.
If all your work is done on one connection, that sounds like there will be no high load on the database.
So I wouldn't really worry about scaling with requirements like that.
Your biggest worry should probably be how to backup 3TB of data in a reasonable time.
Edit: If you have much less memory, you should take the machine with the faster storage.
Finally I've checked several different scenarios and decided not to keep files/largeobjects in database.
Postgres with database location mounted over NFS (v4) had some lags - It was faster but it was choking for a few seconds periodically, i decided to store plain files over NFS which is significantly slower but more stable.
I'm sure there was a way to tune it, but this solution is fine too.
Postgres is used for file index and keeps their files on local harddisk.

mongodb sudden spike in memory usage

There is a spike in the memory utilization of mongodb on our CentOS-07 server.It has 64 Gigabytes of RAM.This is a stand alone mongodb instance which doesn't have any application running on it and there are house keeping scripts enabled to keep only the relevant data .We haven't indexed the data .The total size of data on disk is 81 Gigabyte. This issue was not seen before we tried enabling replication,post which the the node set-up has been using high memory hence was disabled,we then brought up a fresh stand alone instance of mongo. The memory usage hasn't come down ever since we tried re-starting the mongo server but hasn't worked.Is there any reason for mongodb to use so much memory??Below is a link to the snap shot of the mem usage taken from the site server.
The mongo version is 2.6.5
Image link
This is not surprising. See the Memory Use section in the docs for the MMAPv1 storage engine (which is what MongoDB 2.6 uses):
With MMAPv1, MongoDB automatically uses all free memory on the machine as its cache. System resource monitors show that MongoDB uses a lot of memory, but its usage is dynamic. If another process suddenly needs half the server’s RAM, MongoDB will yield cached memory to the other process.
It is also not surprising that the usage spiked after enabling replication, as it sounds like you had a fully populated database and then added a replica member. This would mean that the replica member would need to perform an initial sync of the data from that node, which would require a read of every document which would "prime" MongoDB's cache as a result.

Mongodb freezing

I'm using MongoDB on a cloud server with 10GB of storage and 1GB RAM. After importing about 4.4 GB of data into a MongoDB database, whenever I type "mongo" on the commandline to test some queries, the server freezes.
Is there a cap on the memory resource allocation to MongoDB that I can remove? Or is it simply a matter of increasing RAM?
MongoDB uses memory mapped files, which are allocated by the OS. This means that there is no specific resource that you can free up to make more room for a Mongo console to run.
There are a couple of things to note about your environment. Firstly, the amount of RAM you have for the amount of data you have loaded is on the small side. MongoDB is going to try and keep as much of the working set in memory as it can, to avoid page faults as the disc seeks are a real killer for performance. Secondly, there will be some initial work going on when the data is loaded which could affect performance.
You can check out the Wiki page Checking Server Memory Usage for information on how much memory Mongo is using up, and general information on the Memory Usage of Mongo.
Can you try and connect to the MongoD from another machine, so as to remove this burden from the DB Server?

PostgreSQL In Memory Database

I want to run my PostgreSQL database server from memory. The reason is that on my new server, I have 24 GB of memory, and hardly any of it is used.
I know I can run this command to make a ramdisk:
mdmfs -s 1024m md2 /mnt
And I could theoretically have PostgreSQL store its data there. But the problem with this is that if the server crashes or reboots, the data will be gone.
Basically, I want the database to be loaded in memory at all times so that it does not have to go to the hard disk drive to read every record, since I have TONS of memory and since memory is faster than hard disk drives.
Is there a way to do this while also having PostgreSQL write to disk so I don't lose any data in case the server goes down? Or is there a way to cache all data in memory?
I'm now using streaming replication which is async. This means my MASTER could be running all in memory, with the separate SLAVE instance using traditional disk.
A machine restart would involve stopping the SLAVE, copying the postgresql data back into ramdisk and then restarting the MASTER followed by the SLAVE. This would be an interesting possibility which compares well with something like REDIS, but with the advantage of redundancy / hotstandby / backup / sql / rich toolset etc.
have you seen the Server Configuration manual chapter? check it out, then google postgresql memory tuning.
I have to believe that Postgres is written in such a way as to take full advantage of available RAM in the server. As you may have guessed by now, there's no reliable way to do this outside of Postgres.
Within Postgres, transactions assure that all operations are atomic, so if the power goes down while you are writing to a Postgres database, you will only lose that particular operation, and not the entire database.
The answer is caching. Look into adding memory to the server, then tuning PostgreSQL to maximize memory usage. Also, the file system cache will help with this, doing some of it automatically. You will be able to speed up performance, almost as if it were in memory except for the first hit, while not having to manage it yourself, and being able to have a database larger than the physical memory.