High CPU usage on Cloud SQL causing timeouts - postgresql

We have a postres database that has billions of records in it.
We have one client that uses our older API to query the database to fetch thousands of records once a day.
I would say close to the top end of the thousands.
The API is currently on a compute engine behind a load balancer and during the allotted time I spin up 6 instances of this to attempt to help handle the load.
What I have found is that the CPU usage on cloud SQL is maxing out at 100% and most of the other stats are fine, it's just the CPU.
This basically renders our API useless as we can't accept connections and it just shits its self.
What can we do to help this?
Here is the CPU utilisation chart
And the connections
Read/Writes
Memory Usage
You can see in most of the other charts the readings are well within normal for what we expect.
I don't really want to have to beef up the CPU usage if it isn't really the actual underlining problem.
A further thing to note is we have developed a new endpoint for this client specifically to use, they have not got that in place yet, and there is no guarantee that it will reduce the db load.

High CPU usage can most definitely cause dropped or ignored connections. The database engine and underlying OS are fighting for resources and aren't able to respond to the connection in time.
While you can increase CPU usage, it looks like the CPU usage you have it (usually) enough, except during parts where the CPU is at 100%. I'd suggest instead finding out why the query is eating so much CPU usage and optimizing it.
You might be interested in something like Cloud SQL Insights to help debug the query.

Related

Locust eats CPU after 2-3 hours running

I have a simple HTTP server that I was testing. This server interacts with other HTTP servers and Cassandra DB.
Currently I was using 100 users with 1 request/s, so totally 100 tps was on the server. What I noticed with the Docker stats was that the CPU usage became higher and higher and ~ 2-3 hours later the CPU usage reaches the 90% mark, and even more. After that I got a notice from Locust, stating that the measurement may be inconsistent. But the latencies were not increased, so I do not know why this has been happening.
Can you please suggest possible cause(s) of the problem? I think 100 tps should be handled by one vCPU.
Thanks,
AM
There's no way for us to know exactly what's wrong without at very least seeing some code, and even then other factors like the environment or data or server you're running it on or against could have additional factors we wouldn't know about.
It's possible you have a problem with your code for your Locust users, such as a memory leak or they're just doing too much for a single worker to handle that many users. For users only doing simple HTTP calls, a single CPU typically can handle upwards of thousands of requests per second. Do anything more than that and you'll start to expect to reduce what a worker can handle. It's also possible you may just need a more powerful CPU (or more RAM or bandwidth) to do what you want it to do at the scale you want.
Do some profiling to see if you can find any inefficiencies in your code. Run smaller tests to see if the same behavior is evident with smaller loads. Run the same load but with additional Locust workers on other CPUs.
It's also just as possible your DB can't handle the load. The increasing CPU usage could be due to how your code is handling waiting on the connection from the DB. Perhaps the DB could sustain, say, 80 users at an acceptable rate but any additional users makes it fall further and further behind and your Locust users are then waiting longer and longer for the requested data.
For more suggestions, check out the Locust FAQ https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps

mongodb limit server memory consumption

I'm testing running monbodb on the kubernetes platform where I can limit the resources used by the running container.
Say I set the memory limit to 256Mb. The problem is that for example while making backup memory consumption increases to the limit and container gets restarted by kubernetes.
So the question is is there a way to limit mongodb memory consumption for my case so that it would not cause the crush by exeeding memory limit set by platform.
I could of course increase the limit but I'm interested in a principal solution and would like to understand this process better because I don't really now how memory consumed by mongodb and container os. Is it possible to tune mongodb/underlying linux os to work inside existing limits.
The limits that you have set are good enough for a monogodb pod, these are the limits used by the community as well.
The only way I think you can get around this for backups is to increase the memory limits, but still it might fail, because in other places on stackoverflow people have experienced OOM killing on VMs with memory of giga bytes. MongoDB basically tries to eat any and every memory that is made available to it.
Also there are other ways to backup mongodb: https://dba.stackexchange.com/questions/76130/how-to-backup-large-mongodb-database
I am not sure how this aligns in the k8s world.

What happens when AWS RDS CPU utilisation reaches100%?

I would like to know what what happens when AWS RDS CPU utilisation is 100%?
Do the database requests fail or are the requests put on hold until the CPU utilisation drops below 100%?
I'm using RDS Postgres and thanks in advance for your help.
Your query performance will degrade. Further queries will fail.
If your RDS is the sole database instance for your application, your entire application could come to a stand still.
You will need to figure out if the CPU is peaking due to high load or if there is one such query that is consuming all the resources.
If its under heavy load, adding another replica might help if its read heavy. If its write heavy, you may need to scale up to the next instance or probably think about sharding your datasets.
This could lead to a lot of issues namely:
There is a good chance that if the CPU remains at 100% consistently, your instance will crash. Now, if this a Multi AZ instance, an automatic failover can reduce the downtime incurred due to any unexpected reboot to around 2min. However, if this is a Single AZ instance, the downtime can be significantly longer.
Your Db instance won't accept any new connections despite not hitting the value of 'max_connections'. There is a good chance that some of the existing transactions will roll back due to performance degradation.
Continuous spike to 100% CPU may lead to memory pressure, ie, very high swap usage and low freeable memory and an eventual instance crash.
Workload will reach the threshold and read and write IOPS won't go further.

Scaling DB2 to increase tps

I wanna have brainstorm with you guys all about scaling option that DB2 have. Hope can helped me to resolve the problem.
I need to scale my DB2 database to anticipate flash crowd transaction to database server. My database can only serve around 200 transaction++ per sec in application term not database tps before my database totally stalled and out of cpu.
What are you guys think, if I want to increase to reach 2000++ or 10 times before, what options i have to scale my database?
Recently i read about pureScale feature. Its look promising but its not flexible solution by mean it just can be deploy on IBM System X and ours is not. Are there other solution like pureScale in shared-everything approach?
The second option maybe database partition. Is database partition or shared-nothing approach can help resolve my problem? Can add processing power to my system?
Thanks and regards,
Fritz
Before you worry about how to scale up (more hardware in 1 server) or out (more servers), look at how to tune your database. Buying your way out of a performance problem is almost always more expensive than spending time to find and fix the performance problem.
Assuming that the process(es) consuming CPU on your database server are the database engine, then high CPU activity and low I/O activity is indicative that you're doing a LOT of reads, but they are just all in memory. Scanning a huge table is still in inefficient, even if that table is stored completely in memory (buffer pools).
Find the SQL statements that are using the most CPU. Look at the explain plans, and figure out how to make them more efficient. There are LOTS of resources on the web for database performance tuning.

PostgreSQL consuming large amount of memory for persistent connection

I have a C++ application which is making use of PostgreSQL 8.3 on Windows. We use the libpq interface.
We have a multi-threaded app where each thread opens a connection and keeps using without PQFinish it.
We notice that for each query (especially the SELECT statements) postgres.exe memory consumption would go up. It goes up as high as 1.3 GB. Eventually, postgres.exe crashes and forces our program to create a new connection.
Has anyone experienced this problem before?
EDIT: shared_buffer is currently set to be 128MB in our conf. file.
EDIT2: a workaround that we have in place right now is to call PQfinish for every transaction. But then, this slows down our processing a bit since establishing a connection every time is quite slow.
In PostgreSQL, each connection has a dedicated backend. This backend not only holds connection and session state, but is also an execution engine. Backends aren't particularly cheap to leave lying around, and they cost both memory and synchronization overhead even when idle.
There's an optimum number of actively working backends for any given Pg server on any given workload, where adding more working backends slows things down rather than speeding it up. You want to find that point, and limit the number of backends to around that level. Unfortunately there's no magic recipe for this, it mostly involves benchmarking - on your hardware and with your workload.
If you need more connections than that, you should use a proxy or pooling system that allows you to separate "connection state" from "execution engine". Two popular choices are PgBouncer and PgPool-II . You can maintain light-weight connections from your app to the proxy/pooler, and let it schedule the workload to keep the database server working at its optimum load. If too many queries come in, some wait before being executed instead of competing for resources and slowing down all queries on the server.
See the postgresql wiki.
Note that if your workload is read-mostly, and especially if it has items that don't change often for which you can determine a reliable cache invalidation scheme, you can also potentially use memcached or Redis to reduce your database workload. This requires application changes. PostgreSQL's LISTEN and NOTIFY will help you do sane cache invalidation.
Many database engines have some separation of execution engine and connection state built in to the core database engine's design. Sybase ASE certainly does, and I think Oracle does too, but I'm not too sure about the latter. Unfortunately, because of PostgreSQL's one-process-per-connection model it's not easy for it to pass work around between backends, making it harder for PostgreSQL to do this natively, so most people use a proxy or pool.
I strongly recommend that you read PostgreSQL High Performance. I don't have any relationship/affiliation with Greg Smith or the publisher*, I just think it's great and will be very useful if you're concerned about your DB's performance.
* ... well, I didn't when I wrote this. I work for the same company now.
The memory usage is not necessarily a problem. PostgreSQL uses shared memory for some caching, and this memory does not count towards the size of the process memory usage until it's actually used. The more you use the process, the larger parts of the shared buffers will be active in it's address space.
If you have a large value for shared_buffers, this will happen. If you have it too large, the process can run out of address space and crash, yes.
The problem is probably that you don't close the transaction,
In PostgreSQL even if you do only selects without DML it runs in transaction which need to be rollback.
By adding rollback at the end of the transaction will reduce your memory problem