I have a FeathersJS and Apollo graphql app that uses a Compose.io as the database. The portal servers have similar memory consumption but the data servers are very different (see screenshot below). A few weeks ago I increased the memory allocation because I was experiencing memory errors. The errors have stopped but one database is still at the memory limit and the other database is less than half.
Is this an indication of an underlying problem?
Related
I have an AWS ec2 server that is running an application that is connected to a MongoDB atlas sharded cluster. Periodically, the application will slow down and I will receive alerts from MongoDB about high CPU steal %. I am looking to upgrade my MongoDB server tier and see the only difference in the options is more storage space and more RAM, but the number of vCPUs is the same. I'm wondering if anyone has any insight on whether this increased RAM will help with the CPU steal % alerts I am receiving and whether it will help speed up the app? Otherwise, am I better off upgrading my AWS server tier for more CPU that way?
Any help is appreciated! Thanks :)
I don't think more RAM will necessarily help if you're mostly CPU Bound. However, if you're using MongoAtlas then the alternative tiers definitely do provide more vCPU as you go up the scaling options.
You can also enable auto scaling and set your minimum and maximum tiers to allow the database to scale as necessary: https://docs.atlas.mongodb.com/cluster-autoscaling/
However, be warned that MongoAtlas has a pretty aggressive scale-out and a pretty crappy scale-in. I think the scale-in only happens after 24hours so it can get costly.
I am having the same issue as in this question:
Blocking on idle connections on ClientRead for parametrized queries (bindings) during high traffic
Basically we are hitting high CPU on our RDS postgres(11) instances with very minimal amount of traffic. All of the idle transactions are showing as wait type Client:ClientRead The only way that the CPU drops is if I do the horrible thing of killing the transactions manually.
We are using Framework created sql to access an rds postgresql (11) db. The transactions are remaining open by design of our developers. We are having the same issues with CPU on the instance as in the initial issue on this thread. I Keep getting pushing back when I explain that Postgres is not designed to work this way and I need some better explanation of how this works than I am able to give as a newbie and a tangible suggestion for a fix or alternative.
We have a production system which is an ASP.NET Web API (classic, not .NET Core) application published to Azure. Data storage is Azure SQL Database and we use Entity Framework to access the data. API has a medium load, 10-60 requests per second and upper_90 latency is 100-200 ms which is a target latency is our case. Some time ago we noticed that approximately every 20-30 minutes our services stalls and latency jumps to approximately 5-10 sec. All requests start to be slow for about a minute and then the system recovers by itself. Same time no requests are being dropped, they all just take longer to execute. for a short period of time (usually 1 minute).
We start to see the following picture at our HTTP requests telemetry (Azure):
We can also see a correlation to with our Azure SQL Database metrics, such as DTU (drop) and connections (increase):
We've analyzed the server and didn't see any correlation with the host (we have just one host) CPU/Memory usage, it's stable at 20-30% CPU usage level and 50% memory usage.
We also have an alternative source of telemetry which shows the same behavior. Our telemetry measures API latency and database metrics such as active connection count and pooled connection count (ADO.NET Connection Pool):
What is interesting, that every system stall is accompanied by a pooled connection quantity raise. And our tests show, the more connection pooled, the longer you spend waiting on a new connection from that pool to execute your next database operation. We analyzed a few suggestions but were unable to prove or disprove any of them:
ADO.NET connection leak (all our db access happens in a using statement with proper connection disposal/return to pool)
Socket/Port Exhaustion - where unable to properly track telemetry on that metric
CPU/Memory bottleneck - charts shows there is none
DTU (database units) bottleneck - charts shows there is none
As of now we are trying to identify the possible culprit of this behavior. Unfortunately, we cannot identify the changes which led to it becuase of missing telemetry, so now the only way to deal with the issue is to properly diagnose it. And, of course, we can only reproduce it in production, under permanent load (even when load is not high like 10 requests a second).
What are the possible causes for this behavior and what is the proper way to diagnose and troubleshoot it?
There can be several possible reasons:
The problem could be in your application code, create a staging environment and re-run your test with profiler tool telemetry (i.e. using YourKit .NET Profiler) - this will allow you to detect the heaviest methods, largest objects, slowest DB queries, etc.Also do a load test on your API with JMeter.
I would recommend you to try Kudu Process API to look at the list of currently running processes, and get more info about them list their CPU time.
The article for how to monitor CPU using in Azure App service are shown below:
https://azure.microsoft.com/en-in/documentation/articles/web-sites-monitor/
https://azure.microsoft.com/en-in/documentation/articles/app-insights-web-monitor-performance/
We ended up separating a few web apps hosted at a single App Service Plan. Even though the metrics were not showing us any bottle neck with the CPU on the app, there are other apps which cause CPU usage spikes and as a result Connection Pool Queue growth with huge Latency spikes.
When we checked the App Service Plan usage and compared it to the Database plan usage, it became clear that the bottleneck is in the App Service Plan. It's still hard to explain while CPU bottleneck causes uneven latency spikes but we decided to separate the most loaded web app to a separate plan and deal with it in isolation. After the separation the app behave normally, no CPU or Latency spikes and it look very stable (same picture as between spikes):
We will continue to analyze the other apps and eventually will find the culprit but at this point the mission critical web app is in isolation and very stable. The lesson here is to monitor not only Web App resources usage but also a hosting App Service Plan which could have other apps consuming resources (CPU, Memory)
I creating a database in Filemaker, the database is about 1GB and includes around 500 photos.
Filemaker maker server is having performance issues, its crashes and takes it’s time when searching though the database. My IT department recommended to raise the cache memory.
I raised the memory 252MB but it's still struggling to give a consistent performance. The database shows now peaks in the CPU.
What can cause this problem?
Verify at FileMaker.com that your server meets the minimum requirements for your version.
For starters:
Increase the cache to 50% of the total memory available to FileMaker server.
Verify that the hard disk is unfragmented and has plenty of free space.
FM Server should be extremely stable.
FMS only does two things:
reads data from the disk and sends it to the network
takes data from the network and writes it to the disk
Performance bottlenecks are always disk and network. FMS is relatively easy on CPU and RAM unless Web Direct is being used.
Things to check:
Are users connecting through ethernet or wifi? (Wifi is slow and unreliable.)
Is FMS running in a virtual machine?
Is the machine running a supported operating system?
Is the database using Web Direct? (Use a 2-machine deployment for web direct.)
Is there anything else running on the machine? (Disable virus and indexing.)
Make sure users are accessing the live databases through FMP client and not through file sharing.
How are the database being backed up? NEVER let anything other than FMS see the live files. Only let OS-level backup processes see backup copies, never the live files.
Make sure all the energy saving options on the server are DISABLED. You do NOT want the CPU or disks sleeping or powering down.
Put the server onto an uninterruptible power supply (UPS). Bad power could be causing problems.
I have one standard web dyno and worker dyno connected to the same standard 0 database.
The worker dyno runs background jobs that insert a lot of data into the database. I feel like I have noticed slower response times while browsing my site when the workers are running.
I'm always well below the 120 connection limit. Am I imagining this or does it have an impact on read time? If so, how do people mitigate it?
From the database's perspective, there is no difference between connections originating from the web dynos and worker dynos; they're both just clients of the database.
If your worker dynos are doing heavy inserts all the time, then they could certainly impact query performance as this places a lot of load on the database; how this impacts your web response times is specific to your particular application.
I would recommend starting by looking at Heroku Postgres tools for database performance tuning.
https://devcenter.heroku.com/articles/heroku-postgres-database-tuning
Without knowing more about your application I would say you could start with looking at the slowest queries related to your web requests and compare them to query time with and without the workers enabled.