I want to replace the for loops with bsxfun to calculate convolution in Matlab.
Following is the script:
for Rx = 1:Num_Rx
for Tx= 1:Num_Tx
Received(Rx,:)=Received(Rx,:)+conv(squeeze(channel(Rx,Tx,:))', Transmitted(Tx,:));
end
end
% Received is a Num_Rx by N matrix, Transmitted is a Num_Tx by N matrix and channel is a 3D matrix with dimension Num_Rx, Num_Tx, N.
When I changed code as:
Received = bsxfun(#plus, Received, bsxfun(#conv, permute(squeeze(channel), [3 1 2]), Transmitted));
Error came out, which said "two non-single-dimension of input arrays must be matched".
How could I correct this line? Thanks a lot!
Why do you want to replace the loops with bsxfun? If the sizes involved in the convolution aren't particularly small, then the convolution is going to take up most of your overhead and the difference between loops and some vectorized version of this call is going to be minimal.
One option you have, if you can afford the temporary storage and it doesn't mess with your numerics too much, is to use the FFT to do this convolution instead. That would look something like
Transmitted = reshape(Transmitted, [1 Num_Tx size(Transmitted, 2)]);
N = size(Transmitted, 3) + size(channel, 3) - 1;
Received = ifft(fft(channel, N, 3).*fft(Transmitted, N, 3), N, 3);
Received = squeeze(sum(Received, 2));
Related
I have written a function to calculate entropy of a vector where each element represents number of elements of a class.
function x = Entropy(a)
t = sum(a);
t = repmat(t, [1, size(a, 2)]);
x = sum(-a./t .* log2(a./t));
end
e.g: a = [4 0], then entropy = -(0/4)*log2(0/4) - (4/4)*log2(4/4)
But for above function, the entropy is NaN when the split is pure because of log2(0), as in above example. The entropy of pure split should be zero.
How should I solve the problem with least effect on performance as data is very large? Thanks
I would suggest you create your own log2 function
function res=mylog2(a)
res=log2(a);
res(isinf(res))=0;
end
This function, while breaking the log2 behaviour, can be used in your specific example because you are multiplying the result with the inside of the log, thus making it zero. It is not "mathematically correct", but I believe that's what you are looking for.
I have two arrays in MATLAB:
A; % size(A) = [NX NY NZ 3 3]
b; % size(b) = [NX NY NZ 3 1]
In fact, in the three dimensional domain, I have two arrays defined for each (i, j, k) which are obtained from above-mentioned arrays A and b, respectively and their sizes are [3 3] and [3 1], respectively. Let's for the sake of example, call these arrays m and n.
m; % size(m) = [3 3]
n; % size(n) = [3 1]
How can I solve m\n for each point of the domain in a vectorize fashion? I used bsxfun but I am not successful.
solution = bsxfun( #(A,b) A\b, A, b );
I think the problem is with the expansion of the singleton elements and I don't know how to fix it.
I tried some solutions, it seems that a for loop is acutally the fastest possibility in this case.
A naive approach looks like this:
%iterate
C=zeros(size(B));
for a=1:size(A,1)
for b=1:size(A,2)
for c=1:size(A,3)
C(a,b,c,:)=squeeze(A(a,b,c,:,:))\squeeze(B(a,b,c,:));
end
end
end
The squeeze is expensive in computation time, because it needs some advanced indexing. Swapping the dimensions instead is faster.
A=permute(A,[4,5,1,2,3]);
B=permute(B,[4,1,2,3]);
C2=zeros(size(B));
for a=1:size(A,3)
for b=1:size(A,4)
for c=1:size(A,5)
C2(:,a,b,c)=(A(:,:,a,b,c))\(B(:,a,b,c));
end
end
end
C2=permute(C2,[2,3,4,1]);
The second solution is about 5 times faster.
/Update: I found an improved version. Reshaping and using only one large loop increases the speed again. This version is also suitable to be used with the parallel computing toolbox, in case you own it replace the for with a parfor and start the workers.
A=permute(A,[4,5,1,2,3]);
B=permute(B,[4,1,2,3]);
%linearize A and B to get a better performance
linA=reshape(A,[size(A,1),size(A,2),size(A,3)*size(A,4)*size(A,5)]);
linB=reshape(B,[size(B,1),size(B,2)*size(B,3)*size(B,4)]);
C3=zeros(size(linB));
for a=1:size(linA,3)
C3(:,a)=(linA(:,:,a))\(linB(:,a));
end
%undo linearization
C3=reshape(C3,size(B));
%undo dimension swap
C3=permute(C3,[2,3,4,1]);
I need to multiply a matrix A with n matrices, and get n matrices back. For example, multiply a 2x2 matrix with 3 2x2 matrices stacked as a 2x2x3 Matlab array. bsxfun is what I usually use for such situations, but it only applies for element-wise operations.
I could do something like:
blkdiag(a, a, a) * blkdiag(b(:,:,1), b(:,:,2), b(:,:,3))
but I need a solution for arbitrary n - ?
You can reshape the stacked matrices. Suppose you have k-by-k matrix a and a stack of m k-by-k matrices sb and you want the product a*sb(:,:,ii) for ii = 1..m. Then all you need is
sza = size(a);
b = reshape( b, sza(2), [] ); % concatenate all matrices aloong the second dim
res = a * b;
res = reshape( res, sza(1), [], size(sb,3) ); % stack back to 3d
Your solution can be adapted to arbitrary size using comma-saparated lists obtained from cell arrays:
[k m n] = size(B);
Acell = mat2cell(repmat(A,[1 1 n]),k,m,ones(1,n));
Bcell = mat2cell(B,k,m,ones(1,n));
blkdiag(Acell{:}) * blkdiag(Bcell{:});
You could then stack the blocks on a 3D array using this answer, and keep only the relevant ones.
But in this case a good old loop is probably faster:
C = NaN(size(B));
for nn = 1:n
C(:,:,nn) = A * B(:,:,nn);
end
For large stacks of matrices and/or vectors over which to execute matrix multiplication, speed can start becoming an issue. To avoid re-inventing the wheel, you could simply compile and use the following fast MEX code:
MTIMESX - Mathworks.
As a rule of thumb, MATLAB is often quite inefficient at executing for loops over large numbers of operations which look like they should be vectorizable; I cannot think of a straightforward way of generalising Shai's answer to this case.
I want a code the below code more efficient timewise. preferably without a loop.
arguments:
t % time values vector
t_index = c % one of the possible indices ranging from 1:length(t).
A % a MXN array where M = length(t)
B % a 1XN array
code:
m = 1;
for k = t_index:length(t)
A(k,1:(end-m+1)) = A(k,1:(end-m+1)) + B(m:end);
m = m + 1;
end
Many thanks.
I'd built from B a matrix of size NxM (call it B2), with zeros in the right places and a triangular from according to the conditions and then all you need to do is A+B2.
something like this:
N=size(A,2);
B2=zeros(size(A));
k=c:length(t);
B2(k(1):k(N),:)=hankel(B)
ans=A+B2;
Note, the fact that it is "vectorized" doesn't mean it is faster these days. Matlab's JIT makes for loops comparable and sometimes faster than built-in vectorized options.
I'm computing a function f(x) = exp(-x) in Matlab, where x is a vector of scalars. The function is computed on GPU, e.g.
x_cpu = [4 5 11 1];
x = gpuArray(x_cpu);
f = exp(-x);
then the result would be:
f = exp(-[4, 5, 11, 1]) = [0.183, 0.0067, 1.6702e-005, 0.3679].
Note that f(x(3)) = f(11) = exp(-11) = 1.6702e-005 = 0.000016702, which is a pretty small value. So, I would like to avoid computing the function for all x(i) > 10 by simply setting f(x(i)) = 0.
I can probably use the sparse matrix representation for x. However, the Parallel Computing Toolbox does not support operations on sparse matrices on GPU.
How would you approach this?
While the Parallel Computing Toolbox does not support sparse matrix operations on the GPU, Jacket does. So one possible approach is to simply use the different tool.
Disclaimer is that I work on Jacket, but I really do think it would be beneficial to you on this since it supports the things you want to do and that PCT does not do, and for reasons here.
PLEASE NOTE: This approach is a workaround meant to address the statement in the question:
So, I would like to avoid computing the function for all x(i) > 10 by
simply setting f(x(i)) = 0.
In no way is this a truly "sparse" numerical method. This is simply a means to "avoid computing the function for all x(i) > 10" on the GPU in MATLAB
% original input vector
x_cpu = [4 5 10 1 13 8 9];
% logical indeces of x where exp(-x) is significant
ix = x_cpu <= 10;
% values of x where exp(-x) is significant ("sparse" x)
x_sp = x_cpu(ix);
% Load our "sparse" vector to GPU
x_gpu = gpuArray(x_sp);
% create a vector of zeros for function output on GPU
f_gpu = parallel.gpu.GPUArray.zeros(size(x_cpu));
% do the calculations only for the "sparse" matrix on the GPU
f_gpu(ix) = exp(-x_gpu);
For when you want to get your computations back in the workspace, use gather:
f_cpu = gather(f_gpu); % GPU --> workspace
NOTE: I have not tested this code
You should combine some of these initializations (x_sp or ix, maybe) to conserve memory and speed up the process. Honestly, the initializations and the transfer of data between the workspace and the GPU might actually make this whole process slower than before. Nothing left to do but try it!