How can I increase CPU/RAM available to VSCode? - visual-studio-code

Was playing around with some larger data sets and noticed that VSCode only uses around 30% CPU and RAM.
Is there some way to increase it? Probably some configurations? Thanks

You can increase/decrease the available RAM for VS Code on its Settings. Go to File -> Preferences -> Settings, there you can type files.maxMemoryForLargeFilesMB and change the value for your desired maximum RAM.

Not sure which coding language are you using, but let's break your question in two parts:
How to use more CPU ? (Can Increase Performance)
By using multiprocessing apis, which can divide a given large data sets into smaller units to be processed by various CPU cores, it is like a master slave architecture, where each sub process will execute on separate core and at max it is driven by total number of CPU cores.
If number of data units is more than CPU cores, then it will context switch
How to use more RAM ? (Can Degrade Performance)
Why do you need to increase RAM usage, that will be anyway dependant on amount of data allocated by the program
You may plan to create multiple copies to have a snapshot for each thread, needn't then use mutex or lock, but generally not a good practice
Finally :
CPU and RAM will be used process that is executing, based on programming langiage not the VSCode which is just an editor

Related

NVMe SSD's bandwidth decreases when increasing the number of I/O queues

As far as I have learned from all the relevant articles about NVMe SSDs, one of NVMe SSDs' benefits is multiple queues. Leveraging multiple NVMe I/O queues, NVMe bandwidth can be greatly utilized.
However, what I have found from my own experiment does not agree with that.
I want to do parallel 4k-granularity sequential reads from an NVMe SSD. I'm using Samsung 970 EVO Plus 250GB. I used FIO to benchmark the SSD. The command I used is:
fio --size=1000m --directory=/home/xxx/fio_test/ --ioengine=libaio --direct=1 --name=4kseqread --bs=4k --iodepth=64 --rw=read --numjobs 1/2/4 --group_reporting
And below is what I got testing 1/2/4 parallel sequential reads:
numjobs=1: 1008.7MB/s
numjobs=2: 927 MB/s
numjobs=4: 580 MB/s
Even if will not increasing bandwidth, I expect increasing I/O queues would at least keep the same bandwidth as the single-queue performance. The bandwidth decrease is a little bit counter-intuitive. What are the possible reasons for the decrease?
Thank you.
I would like to highlight 3 reasons why you may see the issue:
Effective Queue Depth is too high,
Capacity under the test is limited to 1GB only,
Drive Precondition
First, parameter --iodepth=X is specified per Job. It means in your last experiment (--iodepth=64 and --numjobs=4) effective Queue Depth is 4x64=256. This may be too high for your Drive. Based on the vendor specification of your 250GB Drive, 4KB Random Read should show 250 KIOPS (1GB/s) for the Queue Depth of 32. By this Vendor is stating that QD32 is quite optimal for your Drive operation in order to reach best performance. If we start to increase QD, then commands will start aggregating and waiting in the Submission Queue. It does not improve performance. Vice Versa it will start to eat system resources (CPU, memory) and will degrade the throughput.
Second, limiting capacity under test to such a small range (1GB) can cause lot of collisions inside SSD. It is the situation when Reads will hit the same Media Physical Read Unit (aka Die aka LUN). In such situation new Reads will have to wait for previous one to complete. Increase of the testing capacity to entire Drive or at least to 50-100GB should minimize the collisions.
Third, in order to get performance numbers as per specification, Drive needs to be preconditioned accordingly. For the case of measuring Sequential and Random Reads it is better to use Full Drive Sequential Precondition. Command bellow will perform 128KB Sequential Write at QD32 to the Entire Drive Capacity.
fio --size=100% --ioengine=libaio --direct=1 --name=128KB_SEQ_WRITE_QD32 --bs=128k --iodepth=4 --rw=write --numjobs=8

Number of workers in Matlab's parfor

I am running a for loop using MATLAB's parfor function. My CPU's specs are
I set preferred number of workers to 24. However, MATLAB sets this number to 6. Is number of workers bounded by the number of cores or by (number of cores)x(number of processors=6x12?
Matlab prefers to limit the number of workers to the number of cores (six in your case).
Your CPU (intel i7-9750H) has hyperthreading, i.e. you can run multiple (here 2) threads per core. However, this is of no use if you want to run them under full-load, which means that there is simply no resources available to switch to a different task (what the additional threads effectively are).
See the documentation.
Restricting to one worker per physical core ensures that each worker
has exclusive access to a floating point unit, which generally
optimizes performance of computational code. If your code is not
computationally intensive, for example, it is input/output (I/O)
intensive, then consider using up to two workers per physical core.
Running too many workers on too few resources may impact performance
and stability of your machine.
Note that Matlab needs to stream data to every core in order to run the distributed code. This is some kind of initialization effort and the reason why you won't be able to cut the runtime in half if you double the number of cores/workers. And that is also the explanation why there is no use for Matlab to make use of hyperthreading. It would just mean to increase the initial streaming effort without any speed-up -- in fact, the core would probably force matlab to save intermediate results and switch to the other task from time to time... which is the same task as before;)

Why is swap not good when using a SSD?

On Digitalocean I came up with this message when I want to add swap:
Although swap is generally recommended for systems utilizing traditional spinning hard drives, using swap with SSDs can cause issues with hardware degradation over time. Due to this consideration, we do not recommend enabling swap on DigitalOcean or any other provider that utilizes SSD storage. Doing so can impact the reliability of the underlying hardware for you and your neighbors. This guide is provided as reference for users who may have spinning disk systems elsewhere.
If you need to improve the performance of your server on DigitalOcean, we recommend upgrading your Droplet. This will lead to better results in general and will decrease the likelihood of contributing to hardware issues that can affect your service.
Why is that? I thought it was necessary for creating a stable server (not running into memory issues)
I believe that here's your answer.
Early SSDs had a reputation for failing after fewer writes than HDDs. If the swap was used often, then the SSD may fail sooner. This might be why you heard it could be bad to use an SSD for swap.
Modern SSDs don't have this issue, and they should not fail any faster than a comparable HDD. Placing swap on an SSD will result in better performance than placing it on an HDD due to its faster speeds.
I believe this is referring to the fact that SSDs have a relatively limited lifetime measured in number of times data is written in each memory location. Although such number has gotten big enough that using SSD as storage drives should not be a concern anymore, Swap memory, as a backup for ram memory, can potentially be written on pretty frequently, thus reducing the overall life of the SSD.
SSD Endurance is measured in so called DWPD units. DWPD stands for Drive full Writes Per Day. For Mobile, Client and Enterprise Storage Market segments DWPD requirements are very different. SSD Vendors usually state warranty as, for example, 0.8 DWPD / 3 years or 3.0 DWPD / 5 years. First example means that writing 80% of Drive Capacity every single day will result into 3 years life-time. Technically you can kill your 480GB Drive (let's say with 1 DWPD / 3 years warranty) within 12 days if to perform non-stop write access at the speed of 500 MB/s.
SSDs show much higher throughput on the one side if to compare with HDDs, but at the same time quite low endurance level. Partially it is due to the media physical structure and mapping. For example, when writing 1GB of user data to the HDD drive - internally physical media will receive around 10% more data (meta data, error protection data, etc.). Ratio between Host Data Amount and Internal Data Amount is called Write Amplification Factor (WAF). In comparison SSD may need to write 4 times more data than received from Host. Pure Random access is the worst scenario, when writing 1GB of Host Data will result into writing 4GB of data to the Internal Flash Media. If to perform only sequential write access WAF for SSDs will be close to 1.0, like for HDDs.
Enabling System swap and its intensive usage (probably due to DRAM shortage) will generate more Random access to the SSD. Endurance will degrade quicker if to compare with disable swap. Unless you are running Enterprise System with non-stop IO traffic to the SSD, I would not expect Swap enablement to affect SSD endurance much. You can always monitor SSD SMART Health parameter called - SSD Life Left. How it is changing in dynamic with/without swap enabled will help to make a decision.

How can I use only part of my total RAM during MATLAB computation?

I would like to dedicate 8 GB of RAM instead of the full (12) for a very long computation, in order to use the remainder for another operation. Is it possible?
Is there maybe a MATLAB command that forces the maximum limit of memory usage?
I would like to work with 2 separate editors.
See here for a possible solution on "limit the memory of a process on windows":
Set Windows process (or user) memory limit
Matlab has no command to limit the memory usage, it will aquire as much memory as needed to do the computation. On some operating systems you can limit the memory usage, for example using ulimit on Linux. But be aware, when Matlab needs more than 8gb it will not be slow when reaching the limit, it will throw an exception and stop computing.

Performance degrades, if number of threads is more than 2 on Xeon X5355

I have a strange problem but may not be that much strange to some of you.
I am writing an application using boost threads and using boost barriers to synchronize the threads. I have two machines to test the application.
Machine 1 is a core2 duo (T8300) cpu machine (windows XP professional - 4GB RAM) where I am getting following performance figures :
Number of threads :1 , TPS :21
Number of threads :2 , TPS :35 (66 % improvement)
further increase in number of threads decreases the TPS but that is understandable as the machine has only two cores.
Machine 2 is a 2 quad core ( Xeon X5355) cpu machine (windows 2003 server with 4GB RAM) and has 8 effective cores.
Number of threads :1 , TPS :21
Number of threads :2 , TPS :27 (28 % improvement)
Number of threads :4 , TPS :25
Number of threads :8 , TPS :24
As you can see, performance is degrading after 2 threads (though it has 8 cores). If the program has some bottle neck , then for 2 thread also it should have degraded.
Any idea? , Explanations ? , Does the OS has some role in performance ? - It seems like the Core2duo (2.4GHz) scales better than Xeon X5355 (2.66GHz) though it has better clock speed.
Thank you
-Zoolii
The clock speed and the operating system doesn't have as much to do with it as the way your code is written. Things to check might include:
Are you actually spinning up more than two threads at one time?
Do you have unnecessary synchronization artifacts in your code?
Are you synchronizing your code at the appropriate places?
What is your shareable resource and how many of then are there? If each of your transactions is relying on a single section of code, native library, file, database, whatever, then it doesn't matter how many CPUs you've got.
One tool at your disposal when analyzing software bottlenecks is the simple thread dump. Taking a few dumps throughout the life of an execution of your software should expose bottlenecks in your software. You may be able to take that output and use it to reevaluate your code.
Adding more CPU's does not always equate to better performance, locking and contention can severely degrade performance. Factors to consider are:
Is your algorithm suited to parallelisation?
Any inherently sequential portions of code?
Can you partition work into coarse grained 'chunks'? Corase is usually better than fine grained...
Can you alter your code to use less locking?
Synchronisation overheads can often be reduced by ensuring chunks of work are similiar sized.
Based on experience it could be that the Intel policy is 2 threads or dual-process only on that processor, that only pthreads can be used with that version of operating system, that the two processors were designed to conform to different laws with different provisions or allows, that the own thread process is not allowed, that more than n threads are being backed-out by the processor and the processing of error messages reporting this is slowing down throughput of the two cores and may lead to deactivate of cores 3 and 4.