I am using pecl mongo 1.4.x driver(http://pecl.php.net/package/mongo/1.4.1), with the setup mentioned in the title on a moderate traffic services (5K - 10K request per min).
And I've found that, in mongoDB the Auth command is taking a large chunck of traffic, and connection request rate is like 30-50 per second.
This impacts the performance seriously(lock ratio up, memory management doesn't cops nicely)
And if I do netstat in a box(which I have 5-8 boxes in total), I see 2-3K mongo connections in total (some in WAIT some in ESTABLISHED) per box.
My question is how can I reduce the # of connection to mongoDB, how to setup persistent connection properly?
It seems the way of persistent connection working in PECL mongoDB Driver has been changing since 1.2 then 1.3 and it performs slightly differently in 1.4.
Here is the way I invoke the client with the driver:
$mongo = new MongoClient(
"host1:11004,host2:11004", array(
'replicaSet' => MG_REPLICASET,
'password'=>"superpasswd",
'username'=>"myuser",
'db'=>"mydb",
'journal' => true,
"readPreference"=> MongoClient::RP_SECONDARY_PREFERRED
)
);
In the 1.4 version all connections are persistent, unless you close them yourself - which you should never do. You will see a connection per IP/username/password/database combination from each PHP processing unit. In your case, per each PHPFPM process. In order to reduce the number of connections, you need to have less username/password/database combinations. However, with 8 boxes, and 50 FPM processes and 3 nodes in your replicaset you're already at 1200 connections - without even taken into account database/username/password differences. There is not much that you can do about that, but it shouldn't have a big influence in performance. You are much more likely to hit RAM/slow disk limitations.
I guess I found a solution to avoid exceessive mongo connection request.
We need to set PHP_FCGI_MAX_REQUESTS (or pm. max_requests in php-fpm) to a larger number, so the process won't recycle too often.
Also I need to make sure pm.request_terminate_timeout is not too small, so workers won't be killed too often.
Related
I have a db and client app that does reads and writes, I need to handle a lot of concurrent reads but be sure that writes get priority, while also respecting my db’s connection limit.
Long version:
I have a single instance pgSQL database which allows 100 connections.
My .net microservice uses Npgsql to connect to the db. It has to do read queries that can take 20-2000ms and writes that can take about 500-2000ms. Right now there are 2 instances of the app, connecting with the same user credentials. I am trusting Npgsql to manage my connection pooling, and am preparing my read queries as there are basically just 2 or 3 variants with different parameter values.
As user requests increased, I started having problems with the database’s connection limit. Errors like ‘Too many connections’ from the db.
To deal with this I introduced a simple gate system in my repo class:
private static readonly SemaphoreSlim _writeGate = new(20, 20);
private static readonly SemaphoreSlim _readGate = new(25, 25);
public async Task<IEnumerable<SomeDataItem>> ReadData(string query, CancellationToken ct)
{
await _readGate.WaitAsync(ct);
// try to get data, finally release the gate
_readGate.Release();
}
public async Task WriteData(IEnumerable<SomeDataItem>, CancellationToken ct)
{
await _writeGate.WaitAsync(ct);
// try to write data, finally release the gate
_writeGate.Release();
}
I chose to have separate gates for read and write because I wanted to be confident that reads would not get completely blocked by concurrent writes.
The limits are hardcoded as above, a total of limit of 45 on each of the 2 app instances, connecting to 1 db server instance.
It is more important that attempts to write data do not fail than attempts to read. I have some further safety here with a Polly retry pattern.
This was alright for a while, but as the concurrent read requests increase, I see that the response times start to degrade, as a backlog of read requests begins to accumulate.
So, for this question, assume my sql queries and db schema are optimized to the max, what can I do to improve my throughput?
I know that there are times when my _readGate is maxed out, but there is free capacity in the _writeGate. However I don’t dare reduce the hardcoded limits because at other times I need to support concurrent writes. So I need some kind of QoS solution that can allow more concurrent reads when possible, but will give priority to writes when needed.
Queue management is pretty complicated to me but is also quite well known to many, so is there a good nuget package that can help me out? (I’m not even sure what to google)
Is there a simple change to my code to improve on what I have above?
Would it help to have different conn strings / users for reads vs writes?
Anything else I can do with npgsql / connection string that can improve things?
I think that postgresql recommends limiting connections to 100, there's a SO thread on this here: How to increase the max connections in postgres?
There's always a limit to how many simultaneous queries that you can run before the perf would stop improving and eventually drop off.
However I can see in my azure telemetry that my db server is not coming close to fully using cpu, ram or disk IO (cpu doesn't exceed 70% and is often less, memory the same, and IOPS under 30% of its capacity) so I believe there is more to be squeezed out somewhere :)
Maybe there are other places to investigate, but for the sake of this question I'd just like to focus on how to better manage connections.
First, if you're getting "Too many connections" on the PostgreSQL side, that means that the total number of physical connections being opened by Npgsql exceeds the max_connection setting in PG. You need to make sure that the aggregate total of Npgsql's Max Pool Size across all app instances doesn't exceed that, so if your max_connection is 100 and you have two Npgsql instances, each needs to run with Max Pool Size=50.
Second, you can indeed have different connection pools for reads vs. writes, by having different connection strings (a good trick for that is to set the Application Name to different values). However, you may want to set up one or more read replicas (primary/secondary setup); this would allow all read workload to be directed to the read replica(s), while keeping the primary for write operation only. This is a good load balancing technique, and Npgsql 6.0 has introduced great support for it (https://www.npgsql.org/doc/failover-and-load-balancing.html).
Apart from that, you can definitely experiment with increasing max_connection on the PG side - and accordingly Max Pool Size on the clients' side - and load-test what this do to resource utilization.
I'm currently using the default connection pool in sequelize, which is as follows:
const defaultPoolingConfig = {
max: 5,
min: 0,
idle: 10000,
acquire: 10000,
evict: 10000,
handleDisconnects: true
};
Of late, I'm getting these errors ResourceRequest timed out which are due to the above DB configuration. According to some answers the max pool should be set to 5, but those who have faced the above, Resource timeout, error have suggested to increase the pool size to 30, along with increasing the acquire time.
I need to know what must be the optimum value of max pool size for a web-app.
Edit: 1.Lets say I have 200 concurrent users, and I have 20 concurrent queries. Then what should be the values?
2.My database is provided by GCP, with the following configuration:
vCPUs
1
Memory
3.75 GB
SSD storage
10 GB
I'm adding some graphs for CPU utilization, Read / write operations per second and transactions per second.
My workload resources are as follows:
resources:
limits:
cpu: 500m
memory: 600Mi
requests:
cpu: 200m
memory: 500Mi
The number of concurrent connections should be large enough for the number of concurrent running queries or transactions you may have.
If you have a lower limit, then new queries/transactions will have to wait for an available connection.
You may want to monitor currently running queries (see pg_stat_activity for instance) to detect such issues.
However, your database server must be able to handle the number of connections. If you are using a server provided by a third party, it may have set limits. If you are using your own server, then it needs to be configured properly.
Note that to handle more connections, your database server will need more processes and more RAM. Also, if they are long running queries (as opposed to transactions), then you are most probably resource-constrained on the server (often I/O-bound), and adding more queries running at the same time usually won't help with overall performance. You may want to look at configuration of your DB server (buffers etc.), and of course, if you haven't already done so, optimise your queries (make sure they all use indexes). The other pg_stat_* views and EXPLAIN are your friends here.
If you have long-running transactions with lots of idle time, then more concurrent connections may help, though you may have to wonder why you have such long-running transactions.
To summarise, your next steps should be to:
Check the immediate state of your database server using pg_stat_activity and friends.
If you don't already have that, set up monitoring of I/O, CPU, memory, swap, postgresql statistics over time. This will give you a clearer picture of what is going on on your server. If you don't have that, you're just running blind.
If you have long-running transactions, check that you always correctly release transactions/connections, including when errors occur. This is a pretty common issue with node.js-based web servers. Make sure you use try .. catch blocks wherever needed.
If there are any long-running queries, check that they are properly optimised (using indexes). If not, do your utmost to optimise them. This will be the single most useful step you can take if that's were the issue is.
If they are properly optimised and you have enough spare resources (RAM, I/O...), then you can consider raising the number of connections. Otherwise it's just pointless.
Edit
Since you are not operating the database yourself, you won't necessarily have all the visibility you could have on resource usage.
However, you can still:
Check pg_stat_activity. This alone will tell you a lot of things.
Check for connections/transactions that are kept around when they shouldn't
Check queries are properly optimised
GCP has a default maximum concurrent connections limit set to 100 for instances with 3.75 GiB of RAM. So you could indeed increase the size of your pool. But if any of the above issues are present, you are just delaying or moving the issue a bit further, so start by checking those and fixing them if relevant.
I followed this tutorial and there is configuration connections per host.
What is this?
connectionsPerHost are the amount of physical connections a single Mongo client instance (it's singleton so you usually have one per application) can establish to a mongod/mongos process. At time of writing the java driver will establish this amount of connections eventually even if the actual query throughput is low (in order words you will see the "conn" statistic in mongostat rise until it hits this number per app server).
There is no need to set this higher than 100 in most cases but this setting is one of those "test it and see" things. Do note that you will have to make sure you set this low enough so that the total amount of connections to your server do not exceed
Found here How to configure MongoDB Java driver MongoOptions for production use?
I am using httperf to benchmark web-servers. My configuration, i5 processor and 4GB RAM. How to stress this configuration to get accurate results...? I mean I have to put 100% load on this server(12.04 LTS server).
you can use httperf like this
$httperf --server --port --wsesslog=200,0,urls.log --rate 10
Here the urls.log contains the different uri/path to be requested. Check the documention for details.
Now try to change the rate value or session value, then see how many RPS you can achieve and what is the reply time. Also in mean time monitor the cpu and memory utilization using mpstat or top command to see if it is reaching 100%.
What's tricky about httperf is that it is often saturating the client first, because of 1) the per-process open files limit, 2) TCP port number limit (excluding the reserved 0-1024, there are only 64512 ports available for tcp connections, meaning only 1075 max sustained connections for 1 minute), 3) socket buffer size. You probably need to tune the above limit to avoid saturating the client.
To saturate a server with 4GB memory, you would probably need multiple physical machines. I tried 6 clients, each of which invokes 300 req/s to a 4GB VM, and it saturates it.
However, there are still other factors impacting hte result, e.g., pages deployed in your apache server, workload access patterns. But the general suggestions are:
1. test the request workload that is closest to your target scenarios.
2. add more physical clients to see if the changes of response rate, response time, error number, in order to make sure you are not saturating the clients.
I was writing a code to find the speed of my database using a Perl script.
My intention was to make a 4,000 database connection after each fork (which would act as a 4,000 different clients) and sleep, and I issue the update command when I get the signal
but the system itself becomes very slow and almost hangs for making the connections itself and even I couldn't send the signal using my terminal.
I am using DBI module, I have 4GB RAM in my system where Postgres 8.3 is running in a different machine.
I'm not entirely clear on whether you're saying you wanted to a) Open 4,000 connections, fork, open 4,000 more connections, etc. or b) Fork 4,000 times and open one connection from each process, but 4,000 database connections or 4,000 processes is some pretty serious resource consumption either way. I'm not at all surprised that it's slowing your system to a crawl - I would expect that to be the end result regardless of the language used.
What are you actually attempting to achieve by creating all of these processes and/or connections? There's probably a better way to do it that won't be quite so resource-intensive.
I've seen pgpool in use on production systems where the number of postgres connections could not be limited to something reasonable. You may wish to look into using this yourself to mitigate against poor application design by your developers.
Essentially, pgpool acts as a proxy to postgres. It multiplexes queries on lots of connections to a much smaller (and manageable) number to the back-end database.
That is relativity speaking a lot of connections to have at once, but not unheard of by any means. How much memory do you have on the database server? Each connection takes resources, if you don't have a database server setup to handle that volume of connections it will be slow no matter what language you use to connect.
A simple analogy would be if you had a Toyota Prius (old days I would had said Ford Pinto) pulling a semi trailer with 80,000 lbs (typical legal weight in a lot of the states) of weight in it. It would burn that little Prius up in a heartbeat like you are seeing. To do it right you need to buy your self a big rig and hook it to that trailer to move that amount of weight.
Ignoring the wisdom of doing 4000 connection forks, you should work through your performance issues with something akin to Devel::NYTProf.
I would alternatively setup persistent workers in gearman and do my gearman client requests. Persistence and your scheduled forks on demand.