Unpredictable API requests latency spikes in my ASP.NET Web API published to Azure Web App - entity-framework

We have a production system which is an ASP.NET Web API (classic, not .NET Core) application published to Azure. Data storage is Azure SQL Database and we use Entity Framework to access the data. API has a medium load, 10-60 requests per second and upper_90 latency is 100-200 ms which is a target latency is our case. Some time ago we noticed that approximately every 20-30 minutes our services stalls and latency jumps to approximately 5-10 sec. All requests start to be slow for about a minute and then the system recovers by itself. Same time no requests are being dropped, they all just take longer to execute. for a short period of time (usually 1 minute).
We start to see the following picture at our HTTP requests telemetry (Azure):
We can also see a correlation to with our Azure SQL Database metrics, such as DTU (drop) and connections (increase):
We've analyzed the server and didn't see any correlation with the host (we have just one host) CPU/Memory usage, it's stable at 20-30% CPU usage level and 50% memory usage.
We also have an alternative source of telemetry which shows the same behavior. Our telemetry measures API latency and database metrics such as active connection count and pooled connection count (ADO.NET Connection Pool):
What is interesting, that every system stall is accompanied by a pooled connection quantity raise. And our tests show, the more connection pooled, the longer you spend waiting on a new connection from that pool to execute your next database operation. We analyzed a few suggestions but were unable to prove or disprove any of them:
ADO.NET connection leak (all our db access happens in a using statement with proper connection disposal/return to pool)
Socket/Port Exhaustion - where unable to properly track telemetry on that metric
CPU/Memory bottleneck - charts shows there is none
DTU (database units) bottleneck - charts shows there is none
As of now we are trying to identify the possible culprit of this behavior. Unfortunately, we cannot identify the changes which led to it becuase of missing telemetry, so now the only way to deal with the issue is to properly diagnose it. And, of course, we can only reproduce it in production, under permanent load (even when load is not high like 10 requests a second).
What are the possible causes for this behavior and what is the proper way to diagnose and troubleshoot it?

There can be several possible reasons:
The problem could be in your application code, create a staging environment and re-run your test with profiler tool telemetry (i.e. using YourKit .NET Profiler) - this will allow you to detect the heaviest methods, largest objects, slowest DB queries, etc.Also do a load test on your API with JMeter.
I would recommend you to try Kudu Process API to look at the list of currently running processes, and get more info about them list their CPU time.
The article for how to monitor CPU using in Azure App service are shown below:
https://azure.microsoft.com/en-in/documentation/articles/web-sites-monitor/
https://azure.microsoft.com/en-in/documentation/articles/app-insights-web-monitor-performance/

We ended up separating a few web apps hosted at a single App Service Plan. Even though the metrics were not showing us any bottle neck with the CPU on the app, there are other apps which cause CPU usage spikes and as a result Connection Pool Queue growth with huge Latency spikes.
When we checked the App Service Plan usage and compared it to the Database plan usage, it became clear that the bottleneck is in the App Service Plan. It's still hard to explain while CPU bottleneck causes uneven latency spikes but we decided to separate the most loaded web app to a separate plan and deal with it in isolation. After the separation the app behave normally, no CPU or Latency spikes and it look very stable (same picture as between spikes):
We will continue to analyze the other apps and eventually will find the culprit but at this point the mission critical web app is in isolation and very stable. The lesson here is to monitor not only Web App resources usage but also a hosting App Service Plan which could have other apps consuming resources (CPU, Memory)

Related

Cadence - Identifying important Operation metrics

I am doing some metrics collection and want to do some aggregations based on Operation.
What would you say are the top 5 (or more or less) operations across all services that we should be focusing on? OR
Are there top 5 (or more or less) for individual services? If yes, can you list them.
Thanks in advance.
First of all, this question is quite vague. I just made some for my own preference as minimum set of monitors.
Server metrics
You should monitor availability & latency of all APIs for every service, and persistence API.
You should monitor queue latency from history service -- this is the key metric to understand the background task perf which is missing from API availability & latency
You should make dashboard for API counters for each service so that you can see the load changing over the time
Client metrics
You should monitor on Workflow failure/timeout
You should monitor on Activity task failure/timeout
You should monitor decision task failure/timeout

How to identify the network performance issue?

I am a little confuse about my message server's network bottleneck issue. I can obviously found the problem caused by the a lot of network operation, but I am not sure why and how to identify it.
Currently we are using GCP as our VM and 4 core/8G RAM for our message server. Redis & Cassandra is in other server at the same place. The problem happened at the network operation to the redis server and cassandra server.
I need to handle 3000+ requests at once to save data to redis and 12000+ requests to cassandra server.
My task consuming all my CPU power and the CPU usage down right after I merge the redis request and cassandra request to kind of batch request. The penalty is I have to delay my data saving.
What I want to know is how can I know the network's capability of my system. How many requests within 1 second is a reasonable task?. As my testing, this is obviously true that the bottleneck is the network operation, but I can't prove it. I can't even know how to estimate a reasonable network usage of my system? Are there some tools or other thing that can help to my make sure my network's problem? Or this is just a error config of my GCP system?
Thanks,
Eric
There is a "monitoring" label in each instance where you can check through graphs values like instance CPU, Network and RAM usage.
But to further check the performance of your instance you should use StackDriver Logging1 and Monitoring2. It stores a lot of information from the internal servers and the system performance. for that you will need to install the agent in the instance. It also stores information about your Load Balancer3, in case you are using one with your web application, which is very advisable since it scale your resources up or down with intelligent Autoscaling.
But in order to test out your network you will need to use some third party tool to overload the network. There are multiple tools to achieve this, like JMeter.

filemaker 15 Pro server preformance

I creating a database in Filemaker, the database is about 1GB and includes around 500 photos.
Filemaker maker server is having performance issues, its crashes and takes it’s time when searching though the database. My IT department recommended to raise the cache memory.
I raised the memory 252MB but it's still struggling to give a consistent performance. The database shows now peaks in the CPU.
What can cause this problem?
Verify at FileMaker.com that your server meets the minimum requirements for your version.
For starters:
Increase the cache to 50% of the total memory available to FileMaker server.
Verify that the hard disk is unfragmented and has plenty of free space.
FM Server should be extremely stable.
FMS only does two things:
reads data from the disk and sends it to the network
takes data from the network and writes it to the disk
Performance bottlenecks are always disk and network. FMS is relatively easy on CPU and RAM unless Web Direct is being used.
Things to check:
Are users connecting through ethernet or wifi? (Wifi is slow and unreliable.)
Is FMS running in a virtual machine?
Is the machine running a supported operating system?
Is the database using Web Direct? (Use a 2-machine deployment for web direct.)
Is there anything else running on the machine? (Disable virus and indexing.)
Make sure users are accessing the live databases through FMP client and not through file sharing.
How are the database being backed up? NEVER let anything other than FMS see the live files. Only let OS-level backup processes see backup copies, never the live files.
Make sure all the energy saving options on the server are DISABLED. You do NOT want the CPU or disks sleeping or powering down.
Put the server onto an uninterruptible power supply (UPS). Bad power could be causing problems.

Will a worker dyno sharing a postgres instance degrade the performance for the web dyno?

I have one standard web dyno and worker dyno connected to the same standard 0 database.
The worker dyno runs background jobs that insert a lot of data into the database. I feel like I have noticed slower response times while browsing my site when the workers are running.
I'm always well below the 120 connection limit. Am I imagining this or does it have an impact on read time? If so, how do people mitigate it?
From the database's perspective, there is no difference between connections originating from the web dynos and worker dynos; they're both just clients of the database.
If your worker dynos are doing heavy inserts all the time, then they could certainly impact query performance as this places a lot of load on the database; how this impacts your web response times is specific to your particular application.
I would recommend starting by looking at Heroku Postgres tools for database performance tuning.
https://devcenter.heroku.com/articles/heroku-postgres-database-tuning
Without knowing more about your application I would say you could start with looking at the slowest queries related to your web requests and compare them to query time with and without the workers enabled.

What is the practical / hard limit on socket connections per server

I have a number of client devices that open socket connection exposed by a service running on a Windows 2008 R2 server. I'm wondering if what is hard limit on the number of concurrent client connections.
According to this article, one hard limit is (was) 16,777,214. The practical limit depends on your application also: for example, if you create a thread per connection, then the practical limit comes from the limitation in the number of threads more than from the network stack. There is also a limit on the number of handles any process may have, and so on.
Assuming you select a sensible architecture for your server then the limit will be memory and cpu related. IMHO you'll never reach the hard limit that Martin mentions :)
So, rather than worrying about a theoretical limit that you'll never hit you should, IMHO, be thinking about how you will design your application and how you will test it to determine the current maximum number of client connections that you can maintain for your application on given hardware. The important thing for me is to run your perf tests from Day 0 (see here for a blog posting where I explain this). Modern operating systems and hardware allow you to build very scalable systems but simple day to day coding and design mistakes can easily squander that scalability and so you simply MUST run perf tests all the time so that you know when you are building in road blocks to your performance. You simply cannot go back and fix these kind of mistakes at the end of the project.
As an aside, I ran some tests on Windows 2003 Server with a low spec VM and easily achieved more than 70,000 concurrent and active connections with a simple server based on an overlapped I/O (I/O completion port) based design. See this answer for more details.
My personal approach would be to get a shell of a server put together quickly using whatever technology you decide on (I favour unmanaged C++ using I/O Completion Ports and minimal threads), see this blog posting for more details. Then build a client or series of clients that can stress test the application and keep updating and running the test clients as you implement your server logic. You would expect to see a gradually declining curve of maximum concurrent clients as you add more complexity to your server; large drops in scalability should cause you to examine the latest check ins to look for unfortunate design decisions.