MongoDB cant access document above at specific skip - mongodb

I have a MongoDB instance in a cloud on AWS EC2 t2.micro (30GB storage, 1GB ram) running in Docker and in that database I have a single collection which stores 411 thousand documents, an this takes ~700MB disk space.
On my local computer, if I run this in mongo shell:
db.my_collection.find().skip(200000).limit(1)
then I get the correct results, but if I run this
db.my_collection.find().skip(220000).limit(1)
then MongoDB shuts down. Why? What should I do, to access these data?

It appears that your system doesn't have enough RAM to fulfill mongodb demand. When a Linux system is critically low in memory, kernel starts killing processes to avoid system crash itself.
I believe, this is what happening in your case too. Mongodb is not even getting chance to write a log. I'd recommend to increase RAM or if it's not feasible, add more swap space. This will prevent system crash but mongodb will keep working though very very slow.
Please visit these excellent resources on Linux and it's behavior.
https://unix.stackexchange.com/questions/136291/will-linux-start-killing-my-processes-without-asking-me-if-memory-gets-short
https://serverfault.com/questions/480266/how-to-know-if-the-server-runs-out-of-ram-before-crashing-down

Related

How to limit mongodb's RAM usage?

I am running tests on the server, as a result of which mongodb starts using a lot of RAM. Mongo version 3.6.8, arm architecture on the server, there is no swap file. I've tried all the solutions from here Is there any option to limit mongodb memory usage? . In particular, when using cgroups, mongo exceeds the limit and dies.
I will also say that the memory that mongodb uses for the cache is not returned if necessary.
I will be glad of any help.

AWS RDS with Postgres : Is OOM killer configured

We are running load test against an application that hits a Postgres database.
During the test, we suddenly get an increase in error rate.
After analysing the platform and application behaviour, we notice that:
CPU of Postgres RDS is 100%
Freeable memory drops on this same server
And in the postgres logs, we see:
2018-08-21 08:19:48 UTC::#:[XXXXX]:LOG: server process (PID XXXX) was terminated by signal 9: Killed
After investigating and reading documentation, it appears one possibility is linux oomkiller running having killed the process.
But since we're on RDS, we cannot access system logs /var/log messages to confirm.
So can somebody:
confirm that oom killer really runs on AWS RDS for Postgres
give us a way to check this ?
give us a way to compute max memory used by Postgres based on number of connections ?
I didn't find the answer here:
http://postgresql.freeideas.cz/server-process-was-terminated-by-signal-9-killed/
https://www.postgresql.org/message-id/CAOR%3Dd%3D25iOzXpZFY%3DSjL%3DWD0noBL2Fio9LwpvO2%3DSTnjTW%3DMqQ%40mail.gmail.com
https://www.postgresql.org/message-id/04e301d1fee9%24537ab200%24fa701600%24%40JetBrains.com
AWS maintains a page with best practices for their RDS service: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html
In terms of memory allocation, that's the recommendation:
An Amazon RDS performance best practice is to allocate enough RAM so
that your working set resides almost completely in memory. To tell if
your working set is almost all in memory, check the ReadIOPS metric
(using Amazon CloudWatch) while the DB instance is under load. The
value of ReadIOPS should be small and stable. If scaling up the DB
instance class—to a class with more RAM—results in a dramatic drop in
ReadIOPS, your working set was not almost completely in memory.
Continue to scale up until ReadIOPS no longer drops dramatically after
a scaling operation, or ReadIOPS is reduced to a very small amount.
For information on monitoring a DB instance's metrics, see Viewing DB Instance Metrics.
Also, that's their recommendation to troubleshoot possible OS issues:
Amazon RDS provides metrics in real time for the operating system (OS)
that your DB instance runs on. You can view the metrics for your DB
instance using the console, or consume the Enhanced Monitoring JSON
output from Amazon CloudWatch Logs in a monitoring system of your
choice. For more information about Enhanced Monitoring, see Enhanced
Monitoring
There's a lot of good recommendations there, including query tuning.
Note that, as a last resort, you could switch to Aurora, which is compatible with PostgreSQL:
Aurora features a distributed, fault-tolerant, self-healing storage
system that auto-scales up to 64TB per database instance. Aurora
delivers high performance and availability with up to 15 low-latency
read replicas, point-in-time recovery, continuous backup to Amazon S3,
and replication across three Availability Zones.
EDIT: talking specifically about your issue w/ PostgreSQL, check this Stack Exchange thread -- they had a long connection with auto commit set to false.
We had a long connection with auto commit set to false:
connection.setAutoCommit(false)
During that time we were doing a lot
of small queries and a few queries with a cursor:
statement.setFetchSize(SOME_FETCH_SIZE)
In JDBC you create a connection object, and from that connection you
create statements. When you execute the statments you get a result
set.
Now, every one of these objects needs to be closed, but if you close
statement, the entry set is closed, and if you close the connection
all the statements are closed and their result sets.
We were used to short living queries with connections of their own so
we never closed statements assuming the connection will handle the
things once it is closed.
The problem was now with this long transaction (~24 hours) which never
closed the connection. The statements were never closed. Apparently,
the statement object holds resources both on the server that runs the
code and on the PostgreSQL database.
My best guess to what resources are left in the DB is the things
related to the cursor. The statements that used the cursor were never
closed, so the result set they returned never closed as well. This
meant the database didn't free the relevant cursor resources in the
DB, and since it was over a huge table it took a lot of RAM.
Hope it helps!
TLDR: If you need PostgreSQL on AWS and you need rock solid stability, run PostgreSQL on EC2 (for now) and do some kernel tuning for overcommitting
I'll try to be concise, but you're not the only one who has seen this and it is a known (internal to Amazon) issue with RDS and Aurora PostgreSQL.
OOM Killer on RDS/Aurora
The OOM killer does run on RDS and Aurora instances because they are backed by linux VMs and OOM is an integral part of the kernel.
Root Cause
The root cause is that the default Linux kernel configuration assumes that you have virtual memory (swap file or partition), but EC2 instances (and the VMs that back RDS and Aurora) do not have virtual memory by default. There is a single partition and no swap file is defined. When linux thinks it has virtual memory, it uses a strategy called "overcommitting" which means that it allows processes to request and be granted a larger amount of memory than the amount of ram the system actually has. Two tunable parameters govern this behavior:
vm.overcommit_memory - governs whether the kernel allows overcommitting (0=yes=default)
vm.overcommit_ratio - what percent of system+swap the kernel can overcommit. If you have 8GB of ram and 8GB of swap, and your vm.overcommit_ratio = 75, the kernel will grant up to 12GB or memory to processes.
We set up an EC2 instance (where we could tune these parameters) and the following settings completely stopped PostgreSQL backends from getting killed:
vm.overcommit_memory = 2
vm.overcommit_ratio = 75
vm.overcommit_memory = 2 tells linux not to overcommit (work within the constraints of system memory) and vm.overcommit_ratio = 75 tells linux not to grant requests for more than 75% of memory (only allow user processes to get up to 75% of memory).
We have an open case with AWS and they have committed to coming up with a long-term fix (using kernel tuning params or cgroups, etc) but we don't have an ETA yet. If you are having this problem, I encourage you to open a case with AWS and reference case #5881116231 so they are aware that you are impacted by this issue, too.
In short, if you need stability in the near term, use PostgreSQL on EC2. If you must use RDS or Aurora PostgreSQL, you will need to oversize your instance (at additional cost to you) and hope for the best as oversizing doesn't guarantee you won't still have the problem.

Memory issues with new version of Mongodb

I'm using Mongodb on my Windows server 2012 for more than two years. Since the last update some weird issues started to happen which in the end lead to usage of the entire RAM memory.
The service Iv'e configured for Mongodb is as follows:
logpath=d:\data\log\mongod.log
dbpath=d:\data\db
storageEngine=wiredTiger
rest=true
#override port
port=27017
#configsvr = true
shardsvr = true
And in order to limit the Cache memory usage Iv'e added the following line:
wiredTigerCacheSizeGB=10
And this is where the weird stuff started happening. When I check the task manager it says that now Mongodb is really limited to 10GB as I defined in the service but it is actually using a lot more than 10GB.
In the first image you can see the memory consumption sorted by RAM consumption
While in fact the machine I'm using has 28GB in total
This crazy consumption leads to failure in the scripts I'm running, even the most basic ones, even when I only run simple queries like 'count' or 'distinct', I believe that this is a direct results of the memory consumption.
When I checked the log files I saw that there are many open connections that even when the session ends it indicates that still the same amount of connections is opened:
So in the end I have two major questions:
1. Is there a way of solving this issue without downgrading the Mongodb version?
2. The config file looks right? is everything there is necessary?
Memory usage in WiredTiger is a two-level cache:
First is the WiredTiger cache as controlled by --wiredTigerCacheSizeGB
Second is the Operating System filesystem cache. MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes
See also WiredTiger memory usage
For OS filesystem cache, MongoDB doesn't manage the memory it uses directly - it lets the OS manage it. Windows will try to use every last scrap of physical memory if it can - but lots of it should and will be thrown out if other processes request memory.
An alternative is to run mongod in a container (e.g. lxc, cgroups, Docker, etc.) that does not have access to all of the RAM available in a system.
Having said the above:
You are also running another database in the server i.e. mysqld. MongoDB, like some databases will perform better on a dedicated server to reduce memory contention.
Task Manager shows mongod is using 10GB, although the machine is using up to ~28GB. This may or may not be mongod as you have other processes as well.
Useful resources:
FAQ: Memory diagnostics for WiredTiger
FAQ: MongoDB Cache Handling
MongoDB Production Notes

Running MongoDB and Redis on two different containers in the same host machine

I have read somewhere that MongoDB and Redis server shouldn't be executed in the same host because the way that Redis manages the memory damages MongoDb. This is before Docker.io. But now thing seems are pretty different or not? Is is convenient running Redis server and MongoDB on two different containers on the same host machine?
Docker does not change your hardware, also it is the OS that deals with resources which is not virtualized so the same rules as a normal hardware should apply here.
RAM
MongoDB and Redis don't share any memory. The problem of using the same host will be that you can run out of RAM with these two processes, you can put a max size for redis, you can probably do the same for MongoDB, it is mandatory.
If your sizing is good (MongoDB RAM + Redis RAM < Hardware RAM), you won't get any swap on disk for redis (which is absolutely what you want to prevent) but maybe mongodb cache won't be as good (not enough place for optimization). Less memory for redis is always a challenge if your data grows: beware of out of memory if the data size is unpredictable!
If you use backups with redis, it uses more RAM than its dataset to produce the dump, so beware of that. It implies also using IO.
IO
In this case (less RAM) mongo will do a lot more of IO to access data. Redis, depending on your backup policy, can use IO or not (your choice). Worst case: if you use AOF on redis, it is a lot of IO so maybe IO can become a bottleneck in this architecture. If you don't use backups with redis: you won't have problems. Also a SSD is a good choice for Mongo.
CPU
I don't know if MongoDB uses a lot of CPU, but redis most of the time does not except during backups. If you use backups with redis: try to have two CPU cores available for it (one for redis, one for backup task).
Network
It depends on your number of clients. But you should check the throughput / input load of your machine to see if you are not saturating (using monit for instance with alerts). Sometimes it is the bottleneck, not enought throughput in one machine!
Many of today's services, in particular Databases, are very aggressive consuming resources and are designed thinking they will (or should) be executed in a dedicated machine for them. MongoDB and Redis try to keep a lot of data in memory and will try to take the more memory they can for themselves. To avoid this services take all the memory of your host machine you can limit the maximum memory used by a container using -m="<number><optional unit>" in docker run. E.g.: docker run -d -m="2g" -p 27017:27017 --name mongodb dockerfile/mongodb
So you can control in an easy way the resource limits of your services, and run them in the same host with a fine grained control of the resources. Anyway it's important to consider that the performance of these services is designed thought that the resources of the host machine will be fully available for them. For example there are other databases as Cassandra that will consume a lot of memory, and furthermore, are designed to have sequential access writing to disk. In these cases Docker will let you to run limiting the resources used, but if you run multiple services in the same host the performance of them will decrease severely.

How to scale MongoDB?

I know that MongoDB can scale vertically. What about if I am running out of disk?
I am currently using EC2 with EBS. As you know, I have to assign EBS for a fixed size.
What if the MongoDB growth bigger than the EBS size? Do I have to create a larger EBS and Copy & Paste the files?
Or shall we start more MongoDB instance and each connect to different EBS disk? In such case, I could connect to a different instance for different databases.
If you're running out of disk, you obviously need to get a bigger disk.
There are several ways to migrate your data, it really depends on the type of up-time you need. First steps of course involve bundling the machine and creating the new volume.
These tips go from easiest to hardest.
Can you take the database completely off-line for several minutes?
If so, do this (migration by copy):
Mount new EBS on the server.
Stop your app from connecting to Mongo.
Shut down mongod and wait for everything to write (check the logs)
Copy all of the data files (and probably the logs) to the new EBS volume.
While the copy is happening, update your mongod start script (or config file) to point to the new volume.
Start mongod and check connection
Restart your app.
Can you take the database off-line for just a few minutes?
If so, do this (slaving and switch):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a --slave pointing at the current database. (you may need to re-start the current as --master)
The slave will do a fresh synchronization. Once the slave is up-to-date, you'll do a "switch" (next steps).
Turn off writes from the system.
Shut down the original mongod process.
Re-start the "new" mongod as a master instead of the slave.
Re-activate system writes pointing at the new master.
Done correctly those last three steps can happen in minutes or even seconds.
Can you not afford any down-time?
If so, do this (master-master):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a master and a slave against the current database. (may need to re-start current as master, minimal down-time?)
The new computer should do a fresh synchronization.
Once the new computer is up-to-date, switch the system to point at the new server.
I know it seems like this last version is actually the best, but it can be a little dicey (as of this writing). The reason is simply that I've honestly had a lot of issues with "Master-Master" replication, especially if you don't start with both active.
If you plan on using this method, I highly suggest a smaller practice run first. If something bombs here, Mongo might simply wipe all of your data files which will have the effect of taking more stuff down.
If you get a good version of this please post the commands, I'd like to see it in action.
Doesn't the E in EBS stand for elastic meaning something like resizing on the fly?
Currently the MongoDB team is working on finishining sharding which will allow you horizontal scaling by partitioning data separately on different servers. Give it a month or two and it will work fine. The developers are quite good at keeping their promises.
http://api.mongodb.org/wiki/current/Sharding%20Introduction.html
http://api.mongodb.org/wiki/current/Sharding%20Limits.html
You could slave the bigger disk off the smaller until it's caught up
or
fsync+lock and take a file system snapshot and copy it onto the bigger disk.
well, I am using Mongo DB now. I am pretty amazed the performance it generated, especially on some simple sorting.
I believe it's a good tool for simple web application logic. The remaining concern for is how to scale and backup. I will continue to explore.
The only disadvantage I have is that I didn't have any good tools to reveal the data stored inside. For example, I want to put my logging from MYSQL into Mongo as well. However, it's pretty difficult for me to view the log. Previously, i can use MYSQL query to fetch what I want easily.
Anyway, it's a good tool and I will continue to use it.