I am solving a complicate equation that includes a huge matrices size and many operations. Its in vector formula. It takes very long time to finish. So, I need to show which step is done on the screen. When we are using loop we can include counter inside it to show the steps that is done. For example, multiplying two matrices.
clear;
clc;
a=rand(1,5);
b=rand(1,5);
c(1,:)=0;
for i=1:5
c(i)=a(i)*b(i);
fprintf('%d\n ', i);
end
However, if we use vector to multiply two matrices this will be like
c=a.*b
Is there anyway that we can monitor the progress. So, we can show which step is done?
As mentioned here:
There is no built-in functionailty to perform this in MATLAB beyond specifying debug statements and print-to-screen updates at specific sections of the user's code.
Also, something like waitbar is not your solution, as you want to monitor the process of computation which is done by matlab, not yours.
Related
I'm using MATLAB to work with some spectrograms. I'm new to this kind of thing and come from more of a CS background than a signals background, so I'm not sure what I'm missing here although it may turn out to be fairly basic.
I'm trying to compute spectral difference, which I conceptually am pretty sure I understand. I've got a signal, I can do the SFT on it with MATLAB's spectrogram() function. Then, I try to loop over the spectrogram for the entire signal and at each sampled point compute the difference from the previous point by looping over powers at each frequency and subtracting. I thought I had the concept down, but when I try to run it, I realize that the points returned when I write "MySpectrogram(n,k)" are complex numbers, or atleast look like that. They are formed such that the first part of the number is negative, and then after it there is a +Coefficient*I (-.07+0.0061i) for example. I tried to square these results. After squaring them, they still appear as complex numbers. Now I am totally lost. Can someone explain what's happening?
I'm calling s = spectrogram(x,window,noverlap,nfft).
Here's the documentation for spectrogram
If you need to access the power spectrum, use this:
[s,f,t,ps] = spectrogram(x,window,noverlap,f,fs)
I have a signal that repeats periodically as the one attached in the figure (the same pattern repeats 4 times). I would like to create a template of this signal as the averaging of the 4 repetitions. Which is the best approach for my problem? I know the answer might be obvious to experts in signal processing, I have tried searching for signal folding techniques but couldn't find anything useful. I am prototyping it in Matlab.
Assuming your signal length is dividable by 4 and each of the repetitions is 1/4th of this, simply use:
mean(reshape(signal,[],4),2)
reshape puts each repetition into one column, then the mean over all columns is calculated.
I have three big 3D arrays of the same size [41*141*12403], named in the Matlab code below alpha, beta and ni. From them I need to calculate another 3D array with the same size, which is obtained elementwise from the original matrices through a calculation that combines an infinite sum and definite integral calculations, using the value of each element. It therefore seems inevitible to have to use several nested loops to make this calculation. The code is already running now for several hours(!) and it is still in the first iteration of the outer loop (which needs to be performed 41 times!! According to my calculation, in this way the program will have to run more than two years!!!). I don't know how to optimize the code. Please help me !!
the code I use:
z_len=size(KELDYSH_PARAM_r_z_t,1); % 41 rows
r_len=size(KELDYSH_PARAM_r_z_t,2); % 141 columns
t_len=size(KELDYSH_PARAM_r_z_t,3); % 12403 slices
sumRes=zeros(z_len,r_len,t_len);
for z_ind=1:z_len
z_ind % in order to track the advancement of the calculation
for r_ind=1:r_len
for t_ind=1:t_len
sumCurrent=0;
sumPrevious=inf;
s=0;
while abs(sumPrevious-sumCurrent)>1e-6
kapa=kapa_0+s; %some scalar
x_of_w=(beta(z_ind,r_ind,t_ind).*(kapa-ni...
(z_ind,r_ind,t_ind))).^0.5;
sumPrevious=sumCurrent;
sumCurrent=sumCurrent+exp(-alpha(z_ind,r_ind,t_ind).* ...
(kapa-ni(z_ind,r_ind,t_ind))).*(x_of_w.^(2*abs(m)+1)/2).* ...
w_m_integral(x_of_w,m);
s=s+1;
end
sumRes(z_ind,r_ind,t_ind)=sumCurrent;
end
end
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function res=w_m_integral(x_of_w,m)
res=quad(#integrandFun,0,1,1e-6);
function y=integrandFun(t)
y=exp(-x_of_w^2*t).*t.^(abs(m))./((1-t).^0.5);
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Option 1 - more vectorising
It's a pretty complex model you're working with and not all the terms are explained, but some parts can still be further vectorised. Your alpha, beta and ni matrices are presumably static and precomputed? Your s value is a scalar and kapa could be either, so you can probably precompute the x_of_w matrix all in one go too. This would give you a very slight speedup all on its own, though you'd be spending memory to get it - 71 million points is doable these days but will call for an awful lot of hardware. Doing it once for each of your 41 rows would reduce the burden neatly.
That leaves the integral itself. The quad function doesn't accept vector inputs - it would be a nightmare wouldn't it? - and neither does integral, which Mathworks are recommending you use instead. But if your integration limits are the same in each case then why not do the integral the old-fashioned way? Compute a matrix for the value of the integrand at 1, compute another matrix for the value of the integrand at 0 and then take the difference.
Then you can write a single loop that computes the integral for the whole input space then tests the convergence for all the matrix elements. Make a mask that notes the ones that have not converged and recalculate those with the increased s. Repeat until all have converged (or you hit a threshold for iterations).
Option 2 - parallelise it
It used to be the case that matlab was much faster with vectorised operations than loops. I can't find a source for it now but I think I've read that it's become a lot faster recently with for loops too, so depending on the resources you have available you might get better results by parallelising the code you currently have. That's going to need a bit of refactoring too - the big problems are overheads while copying in data to the workers (which you can fix by chopping the inputs up into chunks and just feeding the relevant one in) and the parfor loop not allowing you to use certain variables, usually ones which cover the whole space. Again chopping them up helps.
But if you have a 2 year runtime you will need a factor of at least 100 I'm guessing, so that means a cluster! If you're at a university or somewhere where you might be able to get a few days on a 500-core cluster then go for that...
If you can write the integral in a closed form then it might be amenable to GPU computation. Those things can do certain classes of computation very fast but you have to be able to parallelise the job and reduce the actual computation to something basic comprised mainly of addition and multiplication. The CUDA libraries have done a lot of the legwork and matlab has an interface to them so have a read about those.
Option 3 - reduce the scope
Finally, if neither of the above two results in sufficient speedups, then you may have to reduce the scope of your calculation. Trim the input space as much as you can and perhaps accept a lower convergence threshold. If you know how many iterations you tend to need inside the innermost while loop (the one with the s counter in it) then it might turn out that reducing the convergence criterion reduces the number of iterations you need, which could speed it up. The profiler can help see where you're spending your time.
The bottom line though is that 71 million points are going to take some time to compute. You can optimise the computation only so far, the odds are that for a problem of this size you will have to throw hardware at it.
i was just researching the net and came across this; how would one go about computing this??
Question: The Gauss-Jordan method is similar to Gaussian Elimination but creates zeroes also above
the pivot (thus no back substitution is needed). Write out the full algorithm in Maple code,
always starting with the normalization of the current row, then creating the zeroes. Avoid
unnecessary operations.
You can get some hints by looking at some of the procedures in Maple which can compute the reduced row echelon form (RREF).
One of the simplest examples, with not too much cruft at beginning and end, is the gaussjord command in the now deprecated linalg package.
interface(verboseproc=3):
print(linalg[gaussjord]);
Somewhat more obscured by its surrounding code is a version within the LUDecomposition command of the newer LinearAlgebra package. It's a little tricky to see which part of the procedure computes the RREF, and so viewing it is slightly easier if done using the showstat command. For example, using the line numbers in Maple 17,
showstat(LinearAlgebra:-LUDecomposition,228..339);
In the code for LUDecomposition, the key bits are the loops with computation of the Matrix mU (Gaussian elimination to get row echelon form), followed by the loops with further computation of the Matrix mR (further reduction of rows to the right of leading nonzero entry) to get the final RREF. If you just want the RREF then it's not really necessary to split the row reduction into two subtasks like this, and you won't be interested in the mL and mU pieces.
If you reduce whole rows at once then you might try using LinearAlgebra:-RowOperation instead of some inner loops. That command can swap rows, or add a multiple of one row to another, or scale a single row.
You could also search the web for "pseudocode" and "RREF".
I have a parfor loop through say 100 iterations, and the workload on every iteration is different but changes linearly in a way that the first one takes the most time and the last one is the fastest. But when I run through the parfor loop with my four instances/labs, during the last few hours only one lab is active as it's running through the few first iterations by its own.
So I know which iterations are the slow ones. How could I make workload between cores more even. For example somehow force all labs to start working on the first four slow ones and then proceed in order? Or something similar to prevent only one active core running the few slow ones alone..
Matlab parfor does nothing more but split up the indices and distributes them to the workers. It does this by creating contiguous chunks from the indices. I don't know the exact algorithm but this means that data with similar indices get computed in the same chunk and by the same worker.
The simplest solution would be a stochastic one. Just shuffle your indices so that the work intensive steps are distributed nicely. While this doesn't give you any guarantees on performance it is simple and will work most of the time.
Some example code:
% dummy data
N=10;
data=1:N;
% generate the permutated indices
permIndex=randperm(N);
% permute the data
dataPermuted=data(permIndex);
% run the loop
parfor i=1:N
% do something e.g. pause for the time as specified by data
pause(dataPermuted(i));
end
%invert the index permutation
dataInversePermuted(permIndex)=dataPermuted;
I used pause to simulate the different computation times.
I don't think this is documented anywhere, but you can quickly deduce that PARFOR runs iterations in reverse loop order (using pause and disp if you want to see it in action). So, you should simply reverse your loop. PARFOR gives you no means to explicitly control execution order, but SPMD using for-drange does (PARFOR is significantly easier to use though).
#denahiro's suggestion is also a good one.