Mongo 3.4 on Amazon Linux - Maximum simultaneous connections stops at 4077 - mongodb

I'm running mongoDB 3.4 on a t2.micro EC2 instance (Amazon Linux 2.0 (2017.12))
Following is the ulimit -a configuration in the instance.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 3867
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 50000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 3867
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
You can see that the number of open files set is 50000.
So I expected that mongo will allow nearly 50000 connections to the running mongo instance. But I'm unable to get more than 4077 connections simultaneously. In the /var/log/mongodb/mongod.log I can see that the current open connections is 4077 and new connections are getting rejected because it fails to create threads for new connection requests.
I'm not even able to connect to the mongo shell from the terminal. Its unable to create the sockets. I can connect to the DB if I release the 4077 connections that are open now.
How can I specify the maximum simultaneous connections within the mongo config file? Do I change any other parameters in the OS environment like ulimit?

I can see that the current open connections is 4077 and new connections are getting rejected because it fails to create threads for new connection requests.
A t2.micro instance only provides 1GiB of RAM. Each database connection will use up to 1 MB of RAM, so with 4000+ connections you are likely to have exhausted the available resources of your server. Assuming you are using the default WiredTiger storage engine in MongoDB 3.4, you probably have 256MB of RAM allocated to the WiredTiger cache by default and the remaining memory has to be shared between connection threads, your O/S, and any other temporary memory allocation required by mongod.
How can I specify the maximum simultaneous connections within the mongo config file? Do I change any other parameters in the OS environment like ulimit?
Resource limits are intended to impose a reasonable upper bound so a system administrator can intervene before the system becomes non-responsive. There are two general categories of limits for connections: those imposed by your MongoDB server configuration (in this case net.maxIncomingConnections ) and those imposed by your operating system (ulimit -a).
In MongoDB 3.4, net.maxIncomingConnections defaults to 65,536 simultaneous connections, so ulimit settings for files or threads are typically reached before the connection limit.
For a server with more capacity than a t2.micro, it typically makes sense to increase limits from the default. However, given the limited resources of a t2.micro instance I would actually recommend reducing limits if you want your deployment to be stable.
For example, a more realistic limit would be to set net.maxIncomingConnections to 100 connections (or an expected max of 100MB of RAM for connections). In your case you are aiming for 50,000 connections so you could either set that value or leave the default (65,536) and rely on ulimit restrictions.
Your ulimit settings already allow more consumption than your instance can reasonably cope with, but the MongoDB manual has a reference if you'd like to Review and Set Resource Limits. You could consider increasing your -u value (max processes/threads) as this is likely the current ceiling you are hitting, but as with connections I would consider what is reasonable given available resources and your workload.

Related

AWS RDS Aurora Postgres freeable memory decreasing leads to db restart

I have a RDS Aurora Postgres instance crashing with OOM error when the freeable memory reaches close to 0. This error generates a database restart and once the freeable memory decreases again over the time, a new restart happens. After each new restart, the freeable memory goes back to around 16500MB.
Does anyone has any idea on why this is happening?
Some information:
Number of ongoing IDLE connections: around 450 IDLE connections
Instance class = db.r6g.2xlarge
vCPU = 8
RAM = 64 GB
Engine version = 11.9
shared_buffers = 43926128kB
work_mem = 4MB
temp_buffers = 8MB
wal_buffers = 16MB
max_connections = 5000
maintenance_work_mem = 1042MB
autovacuum_work_mem = -1
These connections are keep alive for application reutilization and when new queries goes to the database, one of those connections are used by the application. This is not a pool of connection implementation, just 100 instances of my application are connected to the database.
It seems some of these connections/processes are "eating" the memory over the time. By checking the OS processes, I saw some of these processes are increasing the RES memory. For example, one idle process had 920.9 MB as RES metric but now it has 3.96 GiB.
RES metric refers to physical memory being used by the process as per this AWS doc: https://docs.amazonaws.cn/en_us/AmazonRDS/latest/AuroraUserGuide/USER_Monitoring.OS.Viewing.html
I'm wondering if this issue is related to these idle connections as described here: https://aws.amazon.com/blogs/database/resources-consumed-by-idle-postgresql-connections/
Maybe I should reduce the number of connections to the database.
Freeable Memory on CloudWatch graph:
General metrics on CloudWatch:
Enhanced monitoring metrics:
OS processes (around 100 ongoing processes):

unable to resize Postgres 10 /dev/shm due to kubernetes limiting shared memory

Encountered the following error while reading from a PG 10 table with 10 parallel threads:-
ERROR: could not resize shared memory segment "/PostgreSQL.1110214013" to 3158016 bytes: No space left on device
Seems to be the result of K8s limiting the maximum size of /dev/shm/ to 64MB.
Setting this value any higher results in 64MB.
Parallel reads are being carried out by Spark tasks and are partitioned based on the hashed value of an identifying column. Wondering if unbalanced partitions could be causing a particular task to exceed the value of postgres work_mem for the process causing a write to disk.
I am seeing a corresponding error log for each of my threads so this shared memory segment resize is occuring multiple times (presumably the resizes requested are pushing above the locked 64MB)
Have tried upping work_mem from 4MB, to 32MB, 64MB and finally 256MB but have seen the error at each stage. Below are the full set of PG settings that I believe can be tweaked to avoid the problematic disk usage :-
effective_cache_size: "750MB"
shared_buffers: "2GB"
min_wal_size: "80MB"
max_wal_size: "5GB"
work_mem: "4MB,32MB,64MB,128MB,256MB" (all tried)
random_page_cost: 4 (wondering if this setting could be of use?)
max_connections: 100
Have a potential workaround that involves mounting a directory to /dev/shm/ but would prefer to avoid this solution as I would be left unable to limit the size the directory could grow to, would ideally find a solution that works with the 64MB.
Thanks.
It seems that (according to this explanation) if you want to avoid the issue while leaving /dev/shm limited to 64MB, you'll need to set shared_buffers to less than 64MB. However, mounting an emptyDir volume to /dev/shm is probably the best option if there is more memory physically available to your Kubernetes node.
It's true that as of Kubernetes 1.21 you can't constrain the size of the emptyDir volume (unless you have access to configure feature gates: the new SizeMemoryBackedVolumes feature gate is still in alpha), but this probably doesn't matter for the Postgres use case.
If Postgres is the only application running in the pod, and you've configured shared_buffers to around 25% of available memory as recommended by the Postgres documentation, the current behavior of offering up to 50% of node memory to the emptyDir volume before eviction should be fine. You'd need to trigger some bug in Postgres for it to consume much more of that memory than the shared_buffers setting.
So the best solution is likely to set shared_buffers to ~25% of available node memory, then mount an emptyDir volume to /dev/shm.

Jenkins and PostgreSQL is consuming a lot of memory

We have a Data ware house server running on Debian linux ,We are using PostgreSQL , Jenkins and Python.
It's been few day the memory of the CPU is consuming a lot by jenkins and Postgres.tried to find and check all the ways from google but the issue is still there.
Anyone can give me a lead on how to reduce this memory consumption,It will be very helpful.
below is the output from free -m
total used free shared buff/cache available
Mem: 63805 9152 429 16780 54223 37166
Swap: 0 0 0
below is the postgresql.conf file
Below is the System configurations,
Results from htop
Please don't post text as images. It is hard to read and process.
I don't see your problem.
Your machine has 64 GB RAM, 16 GB are used for PostgreSQL shared memory like you configured, 9 GB are private memory used by processes, and 37 GB are free (the available entry).
Linux uses available memory for the file system cache, which boosts PostgreSQL performance. The low value for free just means that the cache is in use.
For Jenkins, run it with these JAVA Options
JAVA_OPTS=-Xms200m -Xmx300m -XX:PermSize=68m -XX:MaxPermSize=100m
For postgres, start it with option
-c shared_buffers=256MB
These values are the one I use on a small homelab of 8GB memory, you might want to increase these to match your hardware

Amazon Emr: Unable to increase default memory settings

I am running a spark job (spark-submit) and facing outOfMemory and open files memory issues a lot. I am searching all over couldn't find anything helpful.
Can somebody help me increase the amazon emr default memory settings?
[hadoop#ip-10-0-52-76 emr]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31862
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31862
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Increasing the java heap size and open files size will resolve my issue.
For more information, I am using r3.4xlarge emr clusters. Thanks
In EMR you can change the memory setting in /etc/spark/conf/spark-defaults.conf file.
If tasks are getting outofmemory means, you should increase your executor memory. Please choose the executor memory based on data size.
spark.executor.memory 5120M
Incase, driver throws outofmemory error, you can increase the driver memory.
spark.driver.memory 5120M

Fork fails with "resource temporarily unavailable". Which resource?

I have inherited a Perl script that, depending on machine configuration, fails during calls to fork with $? == 11.
According to errno.h and various posts, 11 is EAGAIN, i.e. "try again", because some resource was temporarily unavailable.
Is there a way to determine which resource caused the fork to fail,
other than increasing various system limits one by one
(open file descriptors,
swap space, or
number of allowable threads)?
Assuming you mean $! is EAGAIN, the fork man page on my system says:
EAGAIN: fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.
EAGAIN: It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.
Are you trying to create a ton of processes? Are you reaping your children when they are done?
The error is due to the user has run out of free stacks.
Check the security configuration file on RHEL Server
[root#server1 webapps]# cat /etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
* soft nproc 1024
root soft nproc unlimited
[root#server1 webapps]# vi /etc/security/limits.d/90-nproc.conf
[root#server1 webapps]#
In my case, the "test" user was receiving the message "-bash: fork: retry: Resource temporarily unavailable"
Resolved the issue by adding user specific limits for stack limit
[root#server1 webapps]# vi/etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
* soft nproc 1024
test soft nproc 16384
root soft nproc unlimited
[root#server1 webapps]#