Reduce CPU usage by using sleep - swift

I'm taking snapshots from AR view 30 times per sec and sending it on another vc to process, the cpu usage goes eventually until 280% and the device gets too hot and the performance decrease and then I have frame drops.
I'm thinking of using Thread.sleep to make the thread sleep for 1 sec in every minutes, my question is, do you think it's a good approach? does it have any effects at all to reduce CPU usage or cool down the phone a bit? Or there is another better way?
If the thread sleep for 1 sec in 1 min, doesn't matter much for me.
Thank you so much

Related

Understand CPU utilisation with image preprocessing applications

I'm trying to understand how to compute the CPU utilisation for audio and video use cases.
In real time audio applications, this is what I typically do:
if an application takes 4ms to process 28ms of audio data, I say that the CPU utilisation is 14.28% (4/28).
How should this be done for applications like resize/crop? let's say I'm resizing an image from 162*122 to 128*128 size image at 1FPS, and it takes 11ms.. What would be the CPU utilisation?
CPU utilization is quite complicated, and strongly depends on stuff like:
The CPU itself
The algorithms utilized for the task
Other tasks running alongside the CPU
CPU utilization is also strongly related to the process scheduling of your PC, hence the operating system used, so most operating systems will expose some kind of API for CPU utilization diagnostics, but such API is highly platform-dependent.
But how does CPU utilization calculations work anyway?
The most simple way in which CPU utilization is calculated is taking a (for example) 1 second period, in which you observe how long the CPU has been idling (not executing any processes), and divide that by the time interval you selected. For example, if the CPU did useful calculations for 10 milliseconds, and you were observing for 500ms, this would mean that the CPU utilization is 2%.
Answering your question / TL; DR
You can apply this principle in your program. For the case you provided (processing video), this could be done in more or less the same way: you calculate how long it takes to calculate one frame, and divide that by the length of a frame (1 / FPS). Of course, this could be done for a longer period of time, to get a more accurate reading, in the following way: you track how much time it takes to process, for example, 2 seconds of video, and divide that by 2. Then, you'll have your CPU utilization.
NOTE: if you aren't able to process the frame in time, for example, your video is 10FPS (0.1ms), and processing one frame takes 0.5ms, then your CPU utilization will be seemingly 500%, but obviously you can't utilize more than 100% of your CPU, so you should just cap the CPU utilization at 100%.

STM32 ADC: leave it running at 'high' speed or switch it off as much as possible?

I am using a G0 with one ADC and 8 channels. Works fine. I use 4 channels. One is temperature that is measured constantly and I am interested in the value every 60s. Another one is almost the opposite: it is measuring sound waves for a couple a minutes per day and I need those samples at 10kHz.
I solved this by letting all 4 channels sample at 10kHz and have the four readings moved to memory by DMA (array of length 4 with 1 measurement each). Every 60s I take the temperature and when I need the audio, I retrieve the audio values.
If I had two ADC's, I would start the temperature ADC reading for 1 conversion every 60s. Non-stop. And I would only start the audio ADC for the the couple of minutes a day that it is needed. But with the one ADC solution, it seems simple to let all conversions run at this high speed continuously and that raised my question: Is there any true downside in having 40.000 conversions per second, 24 hours per day? If not, the code is simple. I just have the most recent values in memory all the time. But maybe I ruin the chip? I use too much energy I know. But there is plenty of it in this case.
You aren't going to "wear it out" by running it when you don't need to.
The main problems are wasting power and RAM.
If you have enough of these, then the lesser problems are:
The wasted power will become heat, this may upset your temperature measurements (this is a very small amount though).
Having the DMA running will increase your interrupt latency and maybe also slow down the processor slightly, if it encounters bus contention (this only matters if you are close to capacity in these regards).
Having it running all the time may also have the advantage of more stable readings, not being perturbed turning things on and off.

About CPU operation and I/O processing

My question is why do we want to have CPU's operation overlap with that of the I/O processing. I have been thinking about optimization and such but yet to arrive at a conclusion.
If anyone is able to answer this question, it will be great. :D
I/O is generally very slow compared to the operating frequency of the CPU.
Suppose you have a 1GHz CPU that's capable of executing one instruction every clock cycle. That means the CPU is able to execute one instruction every nanosecond.
Now let's assume you want to fetch some data from your hard drive. Disk operations often take place in the milisecond scale, and we'll assume your drives are fast enough to fetch the data in only 1ms.
If the CPU just sit around and wait for the disk to fetch the data, the CPU will waste 1 million nanoseconds doing nothing, whereas it could be executing 1 million instructions for another task. When a program has a lot of IO access, those wasted cycles stacks up and become noticeable if you let the CPU wait and do nothing. This is why it's a good idea to overlap computation with IO so CPU cycles aren't wasted.
This is also why your computer becomes super unresponsive when your main memory is full, and the CPU has to page frequently to the disk. Your CPU cannot perform any useful task unless the data it needs has been retrieved from the disk into the main memory, so it must sit around and wait for the IOs to complete.

How to run a multi-queue code using OpenCL?

For example, I'm doing some image processing work on every frame of a video.
Every frame's processing using 200ms including writing, processing and reading.
And the fps is 25, in that case every two frames' distance is 40ms. Then the processing is too slow to show continuous result.
So here is my idea, I use multi-queues for this work.
In CPU part,
while(video is not over)
{
1. read the frame0;
processing the **frame0** using **queue0**;
wait 40 ms;
2. read the frame1;
processing the frame1 using **queue1**;
wait 40 ms;
3.4.5.
...(after 5 frames(just about the 200ms's processing time))
6. download the **frame0**'s result.
7. read the frame5;
processing the frame5 using **queue0**;
wait 40 ms;
...
}
The code means that, I use different queues for reading and processing the same frame in a video.
However, my experiment result is faster, but just 2 times faster, but not in my imaginary speed.
Can anyone tell me how to deal it? THX!
Assuming you have one Device, here are some thoughts on this point:
Main reason to have multiple Command Queues (CQ) per single OpenCL Device is the ability to execute kernels & do IO operations simultaneously.
Usually one CQ is enought to load single Device at ~100%. Though, your multi-CQ idea is good (in my opinion), as you're constantly feeding GPU with workload.
Look at kernel execution time. May be, it's big enough, so that your Device is constantly executing kernels & can't go any faster.
I think, you don't need to wait for 40ms. Good solution is to process frames in queue, in which they are put to eliminate the difference between bitstream & display order.
If you have too many CQ, your OpenCL driver thread will be busy maintaining them, so that performance may decrease.

What is the fastest I should run an NSTimer?

What is the fastest I can run an NSTimer and still get reliable results? I've read that approaching 30ms it STARTS to become useless, so where does it "start to start becoming useless"...40ms? 50ms?
Say the docs:
the effective resolution of the time
interval for a timer is limited to on
the order of 50-100 milliseconds
Sounds like if you want to be safe, you shouldn't use timers below 0.1 sec. But why not try it in your own app and see how low you can go?
You won't find a guarantee on this. NSTimers are opportunistic by nature since they run with the event loop, and their effective finest granularity will depend on everything else going on in your app in addition to the limits of whatever the Cocoa timer dispatch mechanisms are.
What's you definition of reliable? A 16 mS error in a 1 second timer is under 2% error, but in a 30 mS timer is over 50% error.
NSTimers will wait for whatever is happening in the current run loop to finish, and any errors in time can accumulate. e.g. if you touch the display N times, all subsequent repeating NSTimer firings may be late by the cumulative time taken by 0 to N touch handlers (plus anything else that was running at the "wrong" time). etc.
CADisplayLink timers will attempt to quantize time to the frame rate, assuming that no set of foreground tasks takes as long as a frame time.
Depends on what kind of results you are trying to accomplish. NSTimer Class 0.5 - 1.0 is a good place to start for reliable results.