PowerShell based Process specific resource monitoring - powershell

I am using PowerShell for some benchmarking of Autodesk Revit, and I would like to add some resource monitoring. I know I can get historical CPU utilization and RAM utilization in general, but I would like to be able to poll Process specific CPU and RAM utilization every 5 seconds or so.
In addition, I would love to be able to poll how many cores a process is currently using, the clock speed of those specific cores, and the frame rate of the screen that process is currently displayed on.
Are those things even accessible via PowerShell/.NET? Or is that low level stuff I just can't get to with PS?

Related

Locust eats CPU after 2-3 hours running

I have a simple HTTP server that I was testing. This server interacts with other HTTP servers and Cassandra DB.
Currently I was using 100 users with 1 request/s, so totally 100 tps was on the server. What I noticed with the Docker stats was that the CPU usage became higher and higher and ~ 2-3 hours later the CPU usage reaches the 90% mark, and even more. After that I got a notice from Locust, stating that the measurement may be inconsistent. But the latencies were not increased, so I do not know why this has been happening.
Can you please suggest possible cause(s) of the problem? I think 100 tps should be handled by one vCPU.
Thanks,
AM
There's no way for us to know exactly what's wrong without at very least seeing some code, and even then other factors like the environment or data or server you're running it on or against could have additional factors we wouldn't know about.
It's possible you have a problem with your code for your Locust users, such as a memory leak or they're just doing too much for a single worker to handle that many users. For users only doing simple HTTP calls, a single CPU typically can handle upwards of thousands of requests per second. Do anything more than that and you'll start to expect to reduce what a worker can handle. It's also possible you may just need a more powerful CPU (or more RAM or bandwidth) to do what you want it to do at the scale you want.
Do some profiling to see if you can find any inefficiencies in your code. Run smaller tests to see if the same behavior is evident with smaller loads. Run the same load but with additional Locust workers on other CPUs.
It's also just as possible your DB can't handle the load. The increasing CPU usage could be due to how your code is handling waiting on the connection from the DB. Perhaps the DB could sustain, say, 80 users at an acceptable rate but any additional users makes it fall further and further behind and your Locust users are then waiting longer and longer for the requested data.
For more suggestions, check out the Locust FAQ https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps

CPU Utilization while using locust

We are planning to use locust for performance testing. I have started locust in distributed mode on Kubernetes, with 800 Users for a duration of 5 minutes. Hatch rate is 100 as well. After a couple of minutes, I can see the below warning on the worker log.
[2020-07-15 07:03:15,990] pipeline1-locust-worker-1-gp824/WARNING/root: Loadgen CPU usage above 90%! This may constrain your throughput and may even give inconsistent response time measurements!
I am unable to figure what is 90% here since I have not specified any resource limits. Is it the 90% of node capacity? Which is unlikely since we use beefy nodes, 16Vcpus, and 128Gb memory. Can anyone give any insight?
It is 90% of one core (which is all a single locust process can utilize because of the python GIL) (measured using https://psutil.readthedocs.io/en/latest/#psutil.Process.cpu_percent)
If you have 16vcpu you need 16 processes to utilize the whole node.
I guess we should clarify the message.

How to calculate the number of CPU, memory and storage that my Google Cloud SQL needs

My DB is reaching the 100% of CPU utilization and increasing the number of CPU is not working anymore.
What kind of information should I consider to create my Google Cloud SQL? How do you set up the DB configuration?
Info I have:
For 10-50 minute a day I have 120 request/seconds and the CPU reaches 100% of utilization
Memory usage is the maximum 2.5GB during this critical period
Storage usage is currently around 1.3GB
Current configuration:
vCPUs: 10
Memory: 10 GB
SSD storage: 50 GB
Unfortunately, there is no magic formula for determining the correct database size. This is because queries have variable load - some are small and simple and take no time at all, others are complex or huge and take lots of resources to complete.
There are generally two strategies to dealing with high load: Reduce your load (use connection pooling, optimize your queries, cache results), or increase the size of your database (add additional CPUs, Storage, or Read replicas).
Usually, when we have CPU utilization, it is because the CPU is overloaded or we have too many database tables in the same instances. Here are some common issues and fixes provided by Google’s documentation:
If CPU utilization is over 98% for 6 hours, your instance is not properly sized for your workload, and it is not covered by the SLA.
If you have 10,000 or more database tables on a single instance, it could result in the instance becoming unresponsive or unable to perform maintenance operations, and the instance is not covered by the SLA.
When the CPU is overloaded, it is recommended to use this documentation to view the percentage of available CPU your instance is using on the Instance details page in the Google Cloud Console.
It is also recommended to monitor your CPU usage and receive alerts at a specified threshold, set up a Stackdriver alert.
Increasing the number of CPUs for your instance should reduce the strain of your instance. Note that changing CPUs requires an instance restart. If your instance is already at the maximum number of CPUs, shard your database to multiple instances.
Google has this very interesting documentation about investigating high utilization and determining whether a system or user task is causing high CPU utilization. You could use it to troubleshoot your instance and find what's causing the high CPU utilization.

Multi core machine - cpu load metric

In a multi core machine what is the best metric to understand whether cpu is loaded or not ?
I have a web application that sends a post request to apache CGI server. CGI server loops over the post data and launches perl process for each of the item in the loop. Since requests from clients ends up hitting a single endpoint, I am concerned if I end up creating lots of processes which my server can't handle. Hence I wanted to understand what system metric should I check before launching a new process from loop.
Note: I have a 20 core machine.
The reason the answer isn't easy to find, is that it depends on the nature of your processes, and which system constraint is your limiting factor.
For CPU intensive work, then the metric to look at is load average - load average is a measure of processes in a runnable state - very roughly if LA is the same as number of cores, then you're running your CPUs at maximum.
However, it's increasingly the case that CPU is not the limiting factor - you may have a finite amount of memory, and memory hungry processes will consume it. 'spare' memory is used for caching, so filling the whole lot up actually starts to slow things down (because you have a smaller cache). Over spilling the available will either cause swapping or OOMkiller.
But as you mention apache and web, then chances are pretty good that your network pipe is a limiting factor - controlling bandwidth from the local host is actually surprisingly hard.
And then there's disk IO - which may also be a factor - I think that's unlikely for a web server, because your outbound network will usually be a tighter limit.
It all depends what your processes are doing - if they're lightweight 'helpers' that are mostly idle, or heavyweight 'grinders' that all introduce noticeable load.
So the best answer I can give is a very vague estimate - if your processes are CPU intensive, cap them at 2 per core. If your processes are memory, aim to consume about 50% of your system RAM. If your processes are IO intensive, aim to consume about 50% of your IO (either network or disk).

What is the Overhead of matlabpool?

Could anyone point to me what is the overhead of running a matlabpool ?
I started a matlabpool :
matlabpool open 132procs 100
Starting matlabpool using the '132procs' configuration ... connected to 100 labs.
And followed cpu usage on the nodes as :
pdsh -A ps aux |grep dmlworker
When I launch the matlabpool, it starts with ~35% cpu usage on average and when the pool
is not being used it slowly (in 5-7 minutes) goes down to ~2% on average.
Is this normal ? What is the typical overhead ? Does that change if matlabpooljob is launched as a "batch" job ?
This is normal. ps aux reports the average CPU utilization since the process was started, not over a rolling window. This means that, although the workers initialize relatively quickly and then become idle, it will take longer for this to reflect in CPU%. This is different to the Linux top command, for example, which will reflect the utilization since the last screen update in %CPU.
As for typical overhead, this depends on a number of factors: clearly the number of workers, the rate and data size of jobs submitted (as well as in maintaining the worker processes, there is some overhead in marshalling input and output, which is not part of "useful computation"), whether the Matlab pool is local or attached to a job manager, and the Matlab version and O/S.
From experience, as a rough guide on a modern *nix server, I would think an idle worker should be not be consuming more than 20% of a single core (e.g. <~1% total CPU utilization on a 16-core box) after initilization, unless there is a configuration issue. I should not expect this to be influenced by what kind of jobs you are submitting (whether using "createJob" or "batch" or "parfor" for example): the workers and communication mechanisms underneath are essentially the same.