Interpreting.Q.w[] for potential problems? - kdb

From this page we know that .Q.w[] gives us for example:
used| 108432 / bytes malloced
heap| 67108864 / heap bytes available
peak| 67108864 / heap high-watermark in bytes
wmax| 0 / workspace limit from -w param
mmap| 0 / amount of memory mapped
syms| 537 / number of symbols interned
symw| 15616 / bytes used by 537 symbols
If I wanted to monitor the instance for memory issues (eg. memory full) should I be looking at used or heap or a combination?

If you want to monitor how much is currently being used you would use used but it's only a rough estimate of the actual used as it doesn't take into account the memory used by interned strings (symbols) or memory-mapped files.
Monitoring the heap is useful to get a sense of how your memory spikes (and peak gives what the max spike is) but it wouldn't necessarily be ideal for informing you if you're close to your limit because if you have a big memory spike and you hit your limit then the process will die before you have a chance to monitor the fact that the spike was close to the limit.
Ultimately I would monitor both (and peak) and allow yourself buffers in both cases. Have a low-level alert if the heap/peak reaches say 50% of the limit, higher levels at 60%, 70% etc. Then also monitor your used as a percentage of your heap/peak. If your used is a high percentage of your heap - and your heap is a high percentage of your limit - then this could be alarming. Essentially your process could either be:
Low-medium memory usage but spiking:
If the used is generally a low-medium percentage of the heap/peak then your process is using low-med memory but spiking. This is pretty harmless and expected if crunching a lot of data
used is a high % of heap/peak and heap/peak is a high % of max
Here you might have a situation where a process is storing more and more memory without releasing. So the used is continually growing and the heap/peak is continually growing with it. This is a problem if unchecked.
So essentially you want to capture behaviour 2 while allowing behaviour 1.
There are some other behaviour patterns also but this would be the general gist. Whether or not automatic garbage collect is enabled also plays into it. If auto garbage collect isn't enabled and used is a lot less than heap then this process is hogging memory that it doesn't need to.

Related

Spark Garbage Collection Tuning - Reduce Memory for Caching using spark.memory.fraction - Why?

I was going through the book Spark The Definitive giude for Garbage Collection Tuning where it says that
If a full garbage collection is invoked multiple times before a task completes, it means that there isn’t enough memory available for executing tasks, so you should decrease the amount of memory Spark uses for caching i.e. spark.memory.fraction
Also the Spark documentation says,
If the OldGen is close to being full, reduce the amount of memory used for caching by lowering spark.memory.fraction; it is better to cache fewer objects than to slow down task execution
(https://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning)
Question -
Why should we reduce spark.memory.fraction to reduce the memory for caching?
Shouldn't we reduce spark.memory.storageFraction which is the amount of storage memory immune to eviction, expressed as a fraction of the size of the region set aside by spark.memory.fraction?
THere is a relationship between the two:
spark.memory.fraction expresses the size of M as a fraction of the
(JVM heap space - 300MB) (default 0.6). The rest of the space (40%)
is reserved for user data structures, internal metadata in Spark, and
safeguarding against OOM errors in the case of sparse and unusually
large records.
spark.memory.storageFraction expresses the size of R
as a fraction of M (default 0.5). R is the storage space within M
where cached blocks immune to being evicted by execution.
So this is really like the course tuning knob and the fine tuning knob. But... But... In practice, for performance tuning, I would start by tuning the code, and then the # or partitions, and then consider thinking about tuning configuration settings. I hope your are following that path first before digging into the minutiae of these settings. They come in handy but only when you've done the rest of the work to get to them.

How can I increase CPU/RAM available to VSCode?

Was playing around with some larger data sets and noticed that VSCode only uses around 30% CPU and RAM.
Is there some way to increase it? Probably some configurations? Thanks
You can increase/decrease the available RAM for VS Code on its Settings. Go to File -> Preferences -> Settings, there you can type files.maxMemoryForLargeFilesMB and change the value for your desired maximum RAM.
Not sure which coding language are you using, but let's break your question in two parts:
How to use more CPU ? (Can Increase Performance)
By using multiprocessing apis, which can divide a given large data sets into smaller units to be processed by various CPU cores, it is like a master slave architecture, where each sub process will execute on separate core and at max it is driven by total number of CPU cores.
If number of data units is more than CPU cores, then it will context switch
How to use more RAM ? (Can Degrade Performance)
Why do you need to increase RAM usage, that will be anyway dependant on amount of data allocated by the program
You may plan to create multiple copies to have a snapshot for each thread, needn't then use mutex or lock, but generally not a good practice
Finally :
CPU and RAM will be used process that is executing, based on programming langiage not the VSCode which is just an editor

NVMe SSD's bandwidth decreases when increasing the number of I/O queues

As far as I have learned from all the relevant articles about NVMe SSDs, one of NVMe SSDs' benefits is multiple queues. Leveraging multiple NVMe I/O queues, NVMe bandwidth can be greatly utilized.
However, what I have found from my own experiment does not agree with that.
I want to do parallel 4k-granularity sequential reads from an NVMe SSD. I'm using Samsung 970 EVO Plus 250GB. I used FIO to benchmark the SSD. The command I used is:
fio --size=1000m --directory=/home/xxx/fio_test/ --ioengine=libaio --direct=1 --name=4kseqread --bs=4k --iodepth=64 --rw=read --numjobs 1/2/4 --group_reporting
And below is what I got testing 1/2/4 parallel sequential reads:
numjobs=1: 1008.7MB/s
numjobs=2: 927 MB/s
numjobs=4: 580 MB/s
Even if will not increasing bandwidth, I expect increasing I/O queues would at least keep the same bandwidth as the single-queue performance. The bandwidth decrease is a little bit counter-intuitive. What are the possible reasons for the decrease?
Thank you.
I would like to highlight 3 reasons why you may see the issue:
Effective Queue Depth is too high,
Capacity under the test is limited to 1GB only,
Drive Precondition
First, parameter --iodepth=X is specified per Job. It means in your last experiment (--iodepth=64 and --numjobs=4) effective Queue Depth is 4x64=256. This may be too high for your Drive. Based on the vendor specification of your 250GB Drive, 4KB Random Read should show 250 KIOPS (1GB/s) for the Queue Depth of 32. By this Vendor is stating that QD32 is quite optimal for your Drive operation in order to reach best performance. If we start to increase QD, then commands will start aggregating and waiting in the Submission Queue. It does not improve performance. Vice Versa it will start to eat system resources (CPU, memory) and will degrade the throughput.
Second, limiting capacity under test to such a small range (1GB) can cause lot of collisions inside SSD. It is the situation when Reads will hit the same Media Physical Read Unit (aka Die aka LUN). In such situation new Reads will have to wait for previous one to complete. Increase of the testing capacity to entire Drive or at least to 50-100GB should minimize the collisions.
Third, in order to get performance numbers as per specification, Drive needs to be preconditioned accordingly. For the case of measuring Sequential and Random Reads it is better to use Full Drive Sequential Precondition. Command bellow will perform 128KB Sequential Write at QD32 to the Entire Drive Capacity.
fio --size=100% --ioengine=libaio --direct=1 --name=128KB_SEQ_WRITE_QD32 --bs=128k --iodepth=4 --rw=write --numjobs=8

How can I use only part of my total RAM during MATLAB computation?

I would like to dedicate 8 GB of RAM instead of the full (12) for a very long computation, in order to use the remainder for another operation. Is it possible?
Is there maybe a MATLAB command that forces the maximum limit of memory usage?
I would like to work with 2 separate editors.
See here for a possible solution on "limit the memory of a process on windows":
Set Windows process (or user) memory limit
Matlab has no command to limit the memory usage, it will aquire as much memory as needed to do the computation. On some operating systems you can limit the memory usage, for example using ulimit on Linux. But be aware, when Matlab needs more than 8gb it will not be slow when reaching the limit, it will throw an exception and stop computing.

Memory Warning but Small Live Bytes

In my application, I get a memory warning of level 1 and then 2 after repeating some action (choosing a picture + processing) several times and then a crash.
The leak tool doesn't show any leak. I'm also following the Allocations tool in Instruments and my Live Bytes are roughly 4 MB, overall I allocate 113 MB. At maximum I have maybe 20 MB in memory when the picture is loaded.
Since I have to repeat an action to get to the crash, it is very likely to be a memory leak. However, I don't know how to locate it since my live bytes are 4 MB and things supposed to be allocated (apart a small leak of ~100 KB in the UIImagePickerController).
How much can I trust the memory leak/allocation tools? Would you have an advice to help me locate the reason of the problem?
I don't know how iPhone OS works, so this is basically just guessing, but in systems where no garbage collector compacts the heap memory, it will be fragmented over time. Having a lot of memory free does not mean that a lot of contiguous memory is free.
For example, if you always need 4MB of memory for some processing, and you have this allocation pattern:
Allocate 4MB
Allocate 1KB
Free 4MB
Allocate 1KB
(You don't free the 1KB blocks because it's the computation result, or whatever)
You may end up with only 3,999K of free contiguous memory - so next time you allocate 4MB, it will be located after the gap, even though it almost fits. This means you can run out of memory even though almost the entire memory (or rather, addressing space) is free.
Granted, modern systems shouldn't suffer from this problem, but they may, especially if the application is never shut down and does not have a compacting garbage collector. Note that some systems have a low-fragmentation heap especially for situations like this (re-allocating and freeing blocks of the same size), but you usually need to explicitly request it.