By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.
The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.
Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.
I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.
If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.
You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
I am new with signal processing, i have following signals which i've got after some pre-processing on original signals.
You can see some of them has some similarities with others and some doesn't. but the problem is They have various range(in this example from 1000 to 3000).
Question
How can i analysis their properties scale-free(what i mean from properties is statistical properties of signals or whatever)??
Note that i don't want to cross-comparing the signals, i just want independent signals signatures which i can run some process on them sometime later.
Anything would help.
If you want to make a filter that separates signals that follow this pattern from signals that don't, well, there's tons of things you could do!
Just think practically. As a first shot at it, you could do something like this (in this order):
Check if the signals are all-positive
Check if the first element is close in value to the last element
Check if the maximum lies "in the middle" somewhere
Check if the first value is small, then the signal grows, then shrinks again
Check if the growth rates are gradual. You could for example analyze their derivatives (after smoothing):
a. derivative should be all-positive for a while, then all-negative.
b. derivative should be smooth (no jumps greater than some tolerance)
Without additional knowledge about the signal's nature/origin, it's going to be hard to come up with more meaningful metrics than these...
I need to implement an anti-windup (output limitation) for my PID controller. Simulink is offering two options: back calculation and clamping (documentation) which seem to deliver equal results. I know what back calculation is doing mathematically. It requires to define the back-calculation gain Kb. This gain is dependent on how long my controller is saturated, therefore it is actually a dynamic value (because I may have a high variation of saturation times). Do you see a way to control this value? (in this case it probably would be necessary to build my own PID Controller as shown in the documentation above or in the picture below.
Which brings me to the question, what is clamping actually doing? And what are other differences? Which one is faster, which one is more robust against stiff slopes? Does anybody has experiences using both?
Not sure if this fully answers the question, but the PID Controller documentation page, explains a bit more about clamping:
clamping
Stops integration when the sum of the block components
exceeds the output limits and the integrator output and block input
have the same sign. Resumes integration when the sum of the block
components exceeds the output limits and the integrator output and
block input have opposite sign. The integrator portion of the block
is:
The clamping circuit implements the logic necessary to determine whether integration continues.
If you select the clamping option and look under the mask, you can probably see the details of the clamping circuit.
Additionally to am304's answer there are some more things to consider.
Clamping
Clamping will always work. It detects when there is integrator overflow and sets the integral path of the PID-controller to zero to avoid windup by using a simple switch.
Clamping is a commmonly used anti windup method, especially in case of digital control systems. In serious applications however, there is also forward clamping involved - evaluating the controller input as well. This mechanism must me implemented manually.
Back Calculation
Back Calculation highly depends on the back calculation coefficient Kb. If you don't know how to actually calculate the parameter Kb don't use back-calculation. This method calculates the difference between the actual controller output and the saturated output and subtracts it from the I-Gain path, amplified by Kb.
In most of cases the default value Kb = 1 will lead to worse results than clamping, it is even possible that it has no effect at all. Kb should be calculated based on the sampling time or
in case a D-Gain is involded, based on D- and I-Gain. Appropriate literatur should be consulted to calculate the coefficient. Back calculation with a properly set coeffient enables better dynamics than clamping!
I am trying to find more information on making a custom PID block in MATLAB. I have most of it done but there are a few parameters that I don't really understand and as such I don't know what value to give them. NOTE I am NOT asking for help tuning PID gains.
They are all inside the filter coefficient block:
When I open the block I have to set a few parameters (output min/max, data type, parameter min/max, etc.). Can someone explain to me what these mean? I can't find good resources anywhere. The only thing that I've tried which works is setting each to [] (i.e. -inf) and the input/output data types to 'Inherit: Inherit via internal rule' but then my output goes to hell. If I copy paste the blocks from the PID block there are a bunch of variables which I haven't defined anywhere so the program won't even compile.
Can someone point out some good resources for this or else explain it? Thanks!
You should get your blocks from the standard Simulink library, not from under the PID block mask. The ones under the mask have been set-up to use variables that are passed from/through the mask, which you are not doing.
The block you have circled is just a gain block (from the Math library).
You most likely won't need to make any changes to the default settings of the block other than the constant value (which needs to be the N that you want to use in the approximation of the derivative term in your controller).
To answer your specific question about what the parameters are, some of them are used to specify data types (if you don't want to use the default double precision), some are only used in code generation, some others only for other specific tasks.
All of them are described (in more, or sometimes less, detail) in the doc for the block, obtained by pressing the help button on the block's dialog.
I'm coding CUDA in Matlab mex-Files. When you look at CUDA examples on the internet or even manuals from nvidia, you often see the use of preprocessing variables to specify the problem size, e.g. the vector length for a vector addition or something like this. I coded my program also like this: Preprocessing Variables for specifying the problem size. And I have to admit it: I like it since you can access those everywhere in your code, e.g. as limits in a loop or something like this, without having to explicitly pass them via argument to the function.
But I ran into the following problem: I wanted to bench the program for several different problem sizes and thus I need to compile the code everytime again by passing the preprocessing-variable to the compiler. It's not a problem, I already coded the benchmark and it works. But I just wonder afterwards now, why I chose this version and did not simply specify it by a user input on runtime. And thus I'm looking for reasons one might want to use preprocessing variables instead of simply passing the problem size to the program.
Thanks!
When you compile-in problem-size constants in the kernel, then the compiler can make certain classes of optimizations that it can't if the sizes are only known at runtime. Full loop unrolling is an obvious example.
In other cases, for instance shared memory array sizes, it is a lot clearer if the sizes are compiled-in; otherwise you have to pass in the total shared memory size at kernel launch time and break that memory up into the number of shared arrays you need. That works fine, but the code is much clearer if you can just have static declarations, for which you need the compile-time sizes.
The main reason is that in general the problem size will be intimately linked to the GPU architecture, e.g. number of threads per block, number of blocks, amount of shared memory per thread, number of registers per thread, etc. In general these numbers are all carefully hand tuned to get the maximum usage of available resources and you can't easily change the problem size dynamically while still maintaining optimum performance.