I am seeing almost 99% of thread in park/wait condition and 1% in running out of threadpool of size 100. CPU usage is 25% and memory just 2GB out of 6 GB allocated. Should I increase threadpool size? I think so because system has enough resource and more threads will utilize it properly to increase througput.
Related
Here's an image taken from Grafana which shows the CPU usage, requests and limits, as well as throttling, for a pod running in a K8s cluster.
As you can see, the CPU usage of the pod is very low compared to the requests and limits. However, there is around 25% of CPU throttling.
Here are my questions:
How is the CPU usage (yellow) calculated?
How is the ~25% CPU throttling (green) calculated?
How is it possible that the CPU throttles when the allocated resources for the pod are so much higher than the usage?
Extra info:
The yellow is showing container_cpu_usage_seconds_total.
The green is container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total
I am having a Mongodb replica set with one primary, one secondary and one arbiter.
The hardware specification of both primary and secondary is same, i.e. 8 core, 32 GB RAM and 700 GB SSD.
I have recently moved the database to the WiredTiger db engine from MMap.
According to the documents of Mongo, I know that, page eviction will start when:
Wired Tiger cache is 80% used.
When dirty cache % is more than 5%.
My resident memory is 13 GB. I can see our dirty cache % more than 5%, around 7% all the time, also our Wired Tiger cache usage is more than 11 GB, which is around 80% of our WT cache usage.
I can see an increase in CPU usage due to app thread going into cache evictions all the time.
I want to know, if increase the box size to 16 core, 64 GB, is it going to fix the issue?
From this page we know that .Q.w[] gives us for example:
used| 108432 / bytes malloced
heap| 67108864 / heap bytes available
peak| 67108864 / heap high-watermark in bytes
wmax| 0 / workspace limit from -w param
mmap| 0 / amount of memory mapped
syms| 537 / number of symbols interned
symw| 15616 / bytes used by 537 symbols
If I wanted to monitor the instance for memory issues (eg. memory full) should I be looking at used or heap or a combination?
If you want to monitor how much is currently being used you would use used but it's only a rough estimate of the actual used as it doesn't take into account the memory used by interned strings (symbols) or memory-mapped files.
Monitoring the heap is useful to get a sense of how your memory spikes (and peak gives what the max spike is) but it wouldn't necessarily be ideal for informing you if you're close to your limit because if you have a big memory spike and you hit your limit then the process will die before you have a chance to monitor the fact that the spike was close to the limit.
Ultimately I would monitor both (and peak) and allow yourself buffers in both cases. Have a low-level alert if the heap/peak reaches say 50% of the limit, higher levels at 60%, 70% etc. Then also monitor your used as a percentage of your heap/peak. If your used is a high percentage of your heap - and your heap is a high percentage of your limit - then this could be alarming. Essentially your process could either be:
Low-medium memory usage but spiking:
If the used is generally a low-medium percentage of the heap/peak then your process is using low-med memory but spiking. This is pretty harmless and expected if crunching a lot of data
used is a high % of heap/peak and heap/peak is a high % of max
Here you might have a situation where a process is storing more and more memory without releasing. So the used is continually growing and the heap/peak is continually growing with it. This is a problem if unchecked.
So essentially you want to capture behaviour 2 while allowing behaviour 1.
There are some other behaviour patterns also but this would be the general gist. Whether or not automatic garbage collect is enabled also plays into it. If auto garbage collect isn't enabled and used is a lot less than heap then this process is hogging memory that it doesn't need to.
Is Total cpu usages of a node sum up of all pods running in specific nodes.what is the relation between millicpu in cpu cores and % of cpu usages of node.Is request and limit control the cpu usages of pods if so then if a pods cpu usages reach its limit then it will be killed and move other node or continues execution in similar node with maximum limit.
Millicores is an absolute number (one core divided by 1000). A given node typically has multiple cores, so the relation between the number of millicores and the total percentage varies. For example 1000 millicores (one core) would be 25% on a four core node, but 50% on a node with two cores.
Request determines how much cpu a pod is guaranteed. It will not be scheduled on a node unless the node can deliver that much.
Limit determines how much a pod can get. It will not be killed or moved if it exceeds the limit; it is simply not allowed to exceed the limit.
Am wondering if there is any size limit to Spark executor memory ?
Considering the case of running a badass job doing collect, unions, count, etc.
Just a bit of context, let's say I have these resources (2 machines)
Cores: 40 cores, Total = 80 cores
Memory: 156G, Total = 312
What's the recommendation, bigger vs smaller executors ?
The suggestion by Spark development team is to not have an executor that is more than 64GB or so (often mentioned in training videos by Databricks). The idea is that a larger JVM will have a larger Heap that can result in really slow garbage collection cycles.
I think is a good practice to have your executors 32GB or even 24GB or 16GB. So instead of having one large one you have 2-4 smaller ones.
It will perhaps have some more coordination overhead, but I think these should be ok for the vast majority of applications.
If you have not read this post, please do.