I have a local app that connects to a local MongoDB. It has 2 databases and about 60 collections in total.
I open one connection and then get an object to access each collection.
I let the system run the whole afternoon and checking stats, I found this:
I don't understand why I have over 750k connections; but also, I don't really understand the metrics; for example the number blow: total connections created, hovering at 1770...
Can someone explain what is going on?
Total Connections Created refers to the number of times the server has accepted a connection since it started running, so if it has been running for many months, it is likely to have many. It doesn't mean they are still active (and most won't be).
You can choose to only show active connections under Current Connections by clicking on the menu icon and choosing In Use:
Here's more info on why you might be seeing a large number of available connections: MongoDB available connections
Related
I have connection pools setup for my system to handle concurrent connections for my managed database clusters in DigitalOcean.
Overall, each client I have, has their own DB, then I create a pool for that connection to avoid the error:
FATAL: remaining connection slots are reserved for non-replication superuser connections
Yesterday I ran into connection issues with a default database that my system also uses, I hadn't thought the connection pooling was needed for whatever dumb reason or another. No worries, I started getting flooded with error emails and then fixed the system to use the correct pooling mechanism.
This is where my question comes in, with the pooling on DigitalOcean they give you a specific "size" depending on your subscription, my subscription has an available "size" for the clusters of 97. As my clients grow I will be creating new pools and databases for them, so eventually I will run out of slots to assign a pool...what does this "size" dictate?
For example 1 client I have has an allotted size of 10 to their connection pool. Speaking to support:
The connection pool with a size of 1 will only allow 1 connection at a time. As for how you can estimate the number of simultaneous users, this is something you'll need to look over as your user and application grow. We don't have a way to give you that estimate from our back end.
So with that client that has a size of 10 alloted to their pool, they have 88 staff users that use the system simultaneously throughout the day, then they have about 4,000 users that they manage that can all sign in theoretically at once.
This is a lot more than 10 connections, and I get no errors on connection size at least that I've seen so far.
Given that I have a limited amount, how do I determine the appropriate size to use, does anybody have experience with this in production?
For example, with the connections listed above, is 10 too much, too little, just right?
Update 2/14/23
I have tested the capabilities bit because I was curious and can't get any semi-logical answer. When I use 1 connection pool for my 4,000 user client (although all users would not hit their DB/pool at the same time), I get connection errors (specifically when running background tasks from django-celery and Celery in the middle of the night).
Here are those errors, overall just connection already closed from here:
File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/base.py", line 269, in create_cursor cursor = self.connection.cursor()
This issue happened concurrently on 2 nights, but never during the day during normal user activity.
Once I upped the connection pool for said 4,000 user client to 2 instead of 1 the connection already closed error never occurred again.
As written in HikariCP docs the formula for counting connection pool size is connections = ((core_count * 2) + effective_spindle_count). But which core count is this: my app server or database server?
For example: my is app running on 2 CPUs, but database is running on 16 CPUs.
This is Kevin's formula for connection pool size, where cores and spindles (you can tell is is an old formula) are the database server's.
This assumes that the connections are kept fairly busy. If you have transactions with longer idle times, you might need to make the pool bigger.
In the end, only trial and error can find the ideal pool size.
The quote is from PostgreSQL wiki which is related to database cores/server
database server only has so many resources, and if you don't have enough connections active to use all of them, your throughput will generally improve by using more connections.
Notice that this formula may be outdated (comment by #mustaccio)
That wiki page was last updated nearly 5 years ago, and the advice in question is even older. I/O queue depth might be more relevant today than the number of spindles, even if the latter are actually present
I am using mongodb as a database for my game servers. When I started them, they were bound to just one region, I used the free tier of Mongodb Atlas in that region, India, and it worked perfectly with around 20ms of latency even on high load.
Now when I'm trying to scale up my servers and reach other regions like US-East, the latency jumps up to 500ms.
Is there a way I could open up two mongodb servers on both my India and US instances, which would always be in complete sync with each other, while the game server process would just use localhost to connect to the specific replica. I'm using pyMongo.
The players have no connection with the database, it's the game server process that manages it
Is there a way I could open up two mongodb servers on both my India and US instances, which would always be in complete sync with each other, while the game server process would just use localhost to connect to the specific replica.
Assuming you are using a replica set, I believe the closest you can come to achieving the stated requirement for reads is by:
Performing all writes with the write concern w=(# of nodes in the deployment), if you have two servers then use w=2.
Performing reads with read preference=nearest and read concern=majority.
I say "closest" because I suspect read concern=majority on a secondary could return stale data even with w=(# of nodes in deployment), since I imagine the majority commit point would necessarily lag behind the point when each secondary commits each document. To guarantee that you are reading current data you must read from the primary.
Note that such a setup has (at least) two additional major drawbacks:
If any of the servers becomes unavailable, your application ceases to be able to write any data to the database.
Each write will wait for all servers in the deployment to store it, hence all writes will be slow.
Achieving the same requirement for reads and writes is not physically possible. (You essentially want to be able to write a document in, say, India in 20ms and have it be "instantly" available in US, i.e. be able to retrieve it in US 20 ms later, but it takes a minimum of 500 ms for data to travel from India to US.)
I followed this tutorial and there is configuration connections per host.
What is this?
connectionsPerHost are the amount of physical connections a single Mongo client instance (it's singleton so you usually have one per application) can establish to a mongod/mongos process. At time of writing the java driver will establish this amount of connections eventually even if the actual query throughput is low (in order words you will see the "conn" statistic in mongostat rise until it hits this number per app server).
There is no need to set this higher than 100 in most cases but this setting is one of those "test it and see" things. Do note that you will have to make sure you set this low enough so that the total amount of connections to your server do not exceed
Found here How to configure MongoDB Java driver MongoOptions for production use?
I was writing a code to find the speed of my database using a Perl script.
My intention was to make a 4,000 database connection after each fork (which would act as a 4,000 different clients) and sleep, and I issue the update command when I get the signal
but the system itself becomes very slow and almost hangs for making the connections itself and even I couldn't send the signal using my terminal.
I am using DBI module, I have 4GB RAM in my system where Postgres 8.3 is running in a different machine.
I'm not entirely clear on whether you're saying you wanted to a) Open 4,000 connections, fork, open 4,000 more connections, etc. or b) Fork 4,000 times and open one connection from each process, but 4,000 database connections or 4,000 processes is some pretty serious resource consumption either way. I'm not at all surprised that it's slowing your system to a crawl - I would expect that to be the end result regardless of the language used.
What are you actually attempting to achieve by creating all of these processes and/or connections? There's probably a better way to do it that won't be quite so resource-intensive.
I've seen pgpool in use on production systems where the number of postgres connections could not be limited to something reasonable. You may wish to look into using this yourself to mitigate against poor application design by your developers.
Essentially, pgpool acts as a proxy to postgres. It multiplexes queries on lots of connections to a much smaller (and manageable) number to the back-end database.
That is relativity speaking a lot of connections to have at once, but not unheard of by any means. How much memory do you have on the database server? Each connection takes resources, if you don't have a database server setup to handle that volume of connections it will be slow no matter what language you use to connect.
A simple analogy would be if you had a Toyota Prius (old days I would had said Ford Pinto) pulling a semi trailer with 80,000 lbs (typical legal weight in a lot of the states) of weight in it. It would burn that little Prius up in a heartbeat like you are seeing. To do it right you need to buy your self a big rig and hook it to that trailer to move that amount of weight.
Ignoring the wisdom of doing 4000 connection forks, you should work through your performance issues with something akin to Devel::NYTProf.
I would alternatively setup persistent workers in gearman and do my gearman client requests. Persistence and your scheduled forks on demand.