I need to operate in big 3-dim non-sparse matrices in Matlab. Using pure vectorization gives a high computation time. So, I have tried to split the operations into 10 blocks and then parse the results.
I got surprised when I saw the the pure vectorization does not scale very well with the data size as presented in the following figure.
I include an example of the two approaches.
% Parameters:
M = 1e6; N = 50; L = 4; K = 10;
% Method 1: Pure vectorization
mat1 = randi(L,[M,N,L]);
mat2 = repmat(permute(1:L,[3 1 2]),M,N);
result1 = nnz(mat1>mat2)./(M+N+L);
% Method 2: Split computations
result2 = 0;
for ii=1:K
mat1 = randi(L,[M/K,N,L]);
mat2 = repmat(permute(1:L,[3 1 2]),M/K,N);
result2 = result2 + nnz(mat1>mat2);
end
result2 = result2/(M+N+L);
Hence, I wonder if there is any other approach that makes big matrix operations in Matlab more efficient. I know it is a quite broad question, but I will take the risk :)
Edit:
Using the implementation of #Shai
% Method 3
mat3 = randi(L,[M,N,L]);
result3 = nnz(bsxfun( #gt, mat3, permute( 1:L, [3 1 2] ) ))./(M+N+L);
The times are:
Why repmat and not bsxfun?
result = nnz(bsxfun( #gt, mat1, permute( 1:L, [3 1 2] ) ))./(M+N+L);
It seems like you are using up your RAM and the OS starts to allocate room in swap for the very large matrics. Memory swapping is always a very time consuming operation and it gets worse as the amount of memory you require increases.
I believe you are witnessing thrashing.
Related
I have 2 big arrays A and b:
A: 10.000++ rows, 4 columns, not unique integers
b: vector with 500.000++ elements, unique integers
Due to the uniqueness of the values of b, I need to find the only index of b, where A(i,j) == b.
What I started with is
[rows,columns] = size(A);
B = zeros(rows,columns);
for i = 1 : rows
for j = 1 : columns
B(i,j) = find(A(i,j)==b,1);
end
end
This takes approx 5.5 seconds to compute, which is way to long, since A and b can be significantly bigger... That in mind I tried to speed up the code by using logical indexing and reducing the for-loops
[rows,columns] = size(A);
B = zeros(rows,columns);
for idx = 1 : numel(b)
B(A==b(idx)) = idx;
end
Sadly this takes even longer: 21 seconds
I even tried to do use bsxfun
for i = 1 : columns
[I,J] = find(bsxfun(#eq,A(:,i),b))
... stitch B together ...
end
but with a bigger arrays the maximum array size is quickly exceeded (102,9GB...).
Can you help me find a faster solution to this? Thanks in advance!
EDIT: I extended find(A(i,j)==b,1), which speeds up the algorithm by factor 2! Thank you, but overall still too slow... ;)
The function ismember is the right tool for this:
[~,B] = ismember(A,b);
Test code:
function so
A = rand(1000,4);
b = unique([A(:);rand(2000,1)]);
B1 = op1(A,b);
B2 = op2(A,b);
isequal(B1,B2)
tic;op1(A,b);op1(A,b);op1(A,b);op1(A,b);toc
tic;op2(A,b);op2(A,b);op2(A,b);op2(A,b);toc
end
function B = op1(A,b)
B = zeros(size(A));
for i = 1:numel(A)
B(i) = find(A(i)==b,1);
end
end
function B = op2(A,b)
[~,B] = ismember(A,b);
end
I ran this on Octave, which is not as fast with loops as MATLAB. It also doesn't have the timeit function, hence the crappy timing using tic/toc (sorry for that). In Octave, op2 is more than 100 times faster than op1. Timings will be different in MATLAB, but ismember should still be the fastest option. (Note I also replaced your double loop with a single loop, this is the same but simpler and probably faster.)
If you want to repeatedly do the search in b, it is worthwhile to sort b first, and implement your own binary search. This will avoid the checks and sorting that ismember does. See this other question.
Assuming that you have positive integers you can use array indexing:
mm = max(max(A(:)),max(b(:)));
idxs = sparse(b,1,1:numel(b),mm,1);
result = full(idxs(A));
If the range of values is small you can use dense matrix instead of sparse matrix:
mm = max(max(A(:)),max(b(:)));
idx = zeros(mm,1);
idx(b)=1:numel(b);
result = idx(A);
So, I need to vectorize some for loops into a single line. I understand how vectorize one and two for-loops, but am really struggling to do more than that. Essentially, I am computing a "blur" matrix M2 of size (n-2)x(m-2) of an original matrix M of size nxm, where s = size(M):
for x = 0:1
for y = 0:1
m = zeros(1, 9);
k = 1;
for i = 1:(s(1) - 1)
for j = 1:(s(2) - 1)
m(1, k) = M(i+x,j+y);
k = k+1;
end
end
M2(x+1,y+1) = mean(m);
end
end
This is the closest I've gotten:
for x=0:1
for y=0:1
M2(x+1, y+1) = mean(mean(M((x+1):(3+x),(y+1):(3+y))))
end
end
To get any closer to a one-line solution, it seems like there has to be some kind of "communication" where I assign two variables (x,y) to index over M2 and index over M; I just don't see how it can be done otherwise, but I am assured there is a solution.
Is there a reason why you are not using MATLAB's convolution function to help you do this? You are performing a blur with a 3 x 3 averaging kernel with overlapping neighbourhoods. This is exactly what convolution is doing. You can perform this using conv2:
M2 = conv2(M, ones(3) / 9, 'valid');
The 'valid' flag ensures that you return a size(M) - 2 matrix in both dimensions as you have requested.
In your code, you have hardcoded this for a 4 x 4 matrix. To double-check to see if we have the right results, let's generate a random 4 x 4 matrix:
rng(123);
M = rand(4, 4);
s = size(M);
If we run this with your code, we get:
>> M2
M2 =
0.5054 0.4707
0.5130 0.5276
Doing this with conv2:
>> M2 = conv2(M, ones(3) / 9, 'valid')
M2 =
0.5054 0.4707
0.5130 0.5276
However, if you want to do this from first principles, the overlapping of the pixel neighbourhoods is very difficult to escape using loops. The two for loop approach you have is good enough and it tackles the problem appropriately. I would make the size of the input instead of being hard coded. Therefore, write a function that does something like this:
function M2 = blur_fp(M)
s = size(M);
M2 = zeros(s(1) - 2, s(2) - 2);
for ii = 2 : s(1) - 1
for jj = 2 : s(2) - 1
p = M(ii - 1 : ii + 1, jj - 1 : jj + 1);
M2(ii - 1, jj - 1) = mean(p(:));
end
end
The first line of code defines the function, which we will call blur_fp. The next couple lines of code determine the size of the input matrix as well as initialising a blank matrix to store out output. We then loop through each pixel location in the matrix that is possible without the kernel going outside of the boundaries of the image, we grab a 3 x 3 neighbourhood with each pixel location serving as the centre, we then unroll the matrix into a single column vector, find the average and store it in the appropriate output. For small kernels and relatively large matrices, this should perform OK.
To take this a little further, you can use user Divakar's im2col_sliding function which takes overlapping neighbourhoods and unrolls them into columns. Therefore, each column represents a neighbourhood which you can then blur the input using vector-matrix multiplication. You would then use reshape to reshape the result back into a matrix:
T = im2col_sliding(M, [3 3]);
V = ones(1, 9) / 9;
s = size(M);
M2 = reshape(V * T, s(1) - 2, s(2) - 2);
This unfortunately cannot be done in a single line unless you use built-in functions. I'm not sure what your intention is, but hopefully the gamut of approaches you have seen here have given you some insight on how to do this efficiently. BTW, using loops for small matrices (i.e. 4 x 4) may be better in efficiency. You will start to notice performance changes when you increase the size of the input... then again, I would argue that using loops are competitive as of R2015b when the JIT has significantly improved.
I'm using bsxfun to vectorize an operation with singleton expansion between matrixes of sizes:
MS: (nms, nls)
KS: (nks, nls)
The operation is the sum of the absolute differences between each value MS(m,l) with m in 1:nms and l in 1:nls, and every KS(k,l) with k in 1:nks.
I achieve this through the code:
[~, nls] = size(MS);
MS = reshape(MS',1,nls,[]);
R = sum(abs(bsxfun(#minus,MS,KS)));
R is of size (nls, nms).
I want to generalize this operation to a list of samples, so the new sizes will be:
MS: (nxs, nls, nms)
KS: (nxs, nls, nks)
This can be achieved easily with a for loop that executes the first piece of code for each 2 dimensional matrixes, but I suspect that performance may be much better by generalizing the previous code by adding a new dimension.
R has would be of size: (nxs, nls, nms)
I have tried to reshape MS to 4 dimensions with no success. Could this be done with reshaping and bsxfun?
You might need this:
% generate small dummy data
nxs = 2;
nls = 3;
nms = 4;
nks = 5;
MS = rand(nxs, nls, nms);
KS = rand(nxs, nls, nks);
R = sum(abs(bsxfun(#minus,MS,permute(KS,[1,2,4,3]))),4)
This will produce a matrix of size [2,3,4], i.e. [nxs,nls,nms]. Each element [k1,k2,k3] will correspond to
R(k1,k2,k3) == sum_k abs(MS(k1,k2,k3) - KS(k1,k2,k))
For instance, in my random run
R(2,1,3)
ans =
1.255765020150647
>> sum(abs(MS(2,1,3)-KS(2,1,:)))
ans =
1.255765020150647
The trick is to introduce singleton dimensions with permute: permute(KS,[1,2,4,3]) is of size [nxs,nls,1,nks], while MS of size [nxs,nls,nms] is implicitly also of size [nxs,nls,nms,1]: every array in MATLAB is assumed to possess a countably infinite number of trailing singleton dimensions. From here it's easy to see how you can bsxfun together arrays of size [nxs,nls,nms,1] and [nxs,nls,1,nks], respectively, to obtain one with size [nxs,nls,nms,nks]. Summing along dimension 4 seals the deal.
I noted in a comment, that it might be faster to permute the summing index to be in the first place. Turns out that this by itself makes the code run slower. However, by reshaping the arrays to have decreasing dimension sizes, the overall performance increases (due to optimal memory access). Compare this:
% generate larger dummy data
nxs = 20;
nls = 30;
nms = 40;
nks = 500;
MS = rand(nxs, nls, nms);
KS = rand(nxs, nls, nks);
MS2 = permute(MS,[4 3 2 1]);
KS2 = permute(KS,[3 4 2 1]);
R3 = permute(squeeze(sum(abs(bsxfun(#minus,MS2,KS2)),1)),[3 2 1]);
What I did was put the summing nks dimension into first place, and order the rest of the dimensions in decreasing order. This could be done automatically, I just didn't want to overcomplicate the example. In your use case you'll probably know the magnitude of the dimensions anyway.
Runtimes with the above two codes: 0.07028 s for the original, 0.051162 s for the reordered one (best out of 5). Larger examples don't fit into memory for me now, unfortunately.
My goal is to combine many sparse matrices together to form one large sparse matrix. The only two ideas I've been able to think of are (1) create a large sparse matrix and overwrite certain blocks, (2) create the blocks individually use vertcat to form my final sparse matrix. However,I've read that overwriting sparse matrices is quite inefficient, and I've also read that vertcat isn't exactly computationally efficient. (I didn't both to consider using a for loop because of how inefficient they are).
What other alternatives do I have then?
Edit: By combine I mean "gluing" matrices together (vertically), the elements don't interact.
According to the matlab help, you can "disassemble" a sparse matrix with
[i,j,s] = find(S);
This means that if you have two matrices S and T, and you want to (effectively) vertcat them, you can do
[is, js, ss] = find(S);
[it, jt, st] = find(T);
ST = sparse([is; it + size(S,1)], [js; jt], [ss; st]);
Not sure if this is very efficient... but I'm guessing it's not too bad.
EDIT: using a 2000x1000 sparse matrix with a density of 1%, and combining it with another that has density of 2%, the above code ran in 0.016 seconds on my machine. Just doing [S;T] was 10x faster. What makes you think vertical concatenation is slow?
EDIT2: assuming you need to do this with "many" sparse matrices, the following works (this assumes you want them all "in the same place"):
m = 1000; n = 2000; density = 0.01;
N = 100;
Q = cell(1, N);
is = Q;
js = Q;
ss = Q;
numrows = 0; % keep track of dimensions so far
for ii = 1:N
Q{ii} = sprandn(m+ii, n-jj, density); % so each matrix has different size
[a b c] = find(Q{ii});
sz = size(Q{ii});
is{ii} = a' + numrows; js{ii}=b'; ss{ii}=c'; % append "on the corner"
numrows = numrows + sz(1); % keep track of the size
end
tic
ST = sparse([is{:}], [js{:}], [ss{:}]);
fprintf(1, 'using find takes %.2f sec\n', toc);
Output:
using find takes 0.63 sec
The big advantage of this method is that you don't need to have the same number of columns in your individual sparse arrays... it will all get sorted out by the sparse command which will simply consider the missing columns to be all zeros.
Considering the answer already given.
I have changed the experiment a bit, to be able to join matricies vertically (it should have the same width), so you we do no need to tweak n by extracting ii (which is mistyped by jj).
This approach
tic
ST = sparse([is{:}], [js{:}], [ss{:}]);
fprintf(1, 'using find takes %.2f sec\n', toc);
with its 0.45 sec is much slower than this one
tic
ST = vertcat(Q{:});
fprintf(1, 'using vertcat takes %.2f sec\n', toc);
with 0.18 sec average.
I also checked it with profiler, first example is expectedly slower, since at least the memory allocation is 100 times higher. Most probably, because of [ss{:}] array constructions which explicityly copies the data to the new array.
However, even with precomputed vectors the speed is 0,3 sec vs 0,18 sec for vertcat.
Thus, I suggest that vertcat is a better option for original problem. At least in 2021 :)
This question already has answers here:
Octave / Matlab: Extend a vector making it repeat itself?
(3 answers)
Closed 9 years ago.
I have a vector, e.g.
vector = [1 2 3]
I would like to duplicate it within itself n times, i.e. if n = 3, it would end up as:
vector = [1 2 3 1 2 3 1 2 3]
How can I achieve this for any value of n? I know I could do the following:
newvector = vector;
for i = 1 : n-1
newvector = [newvector vector];
end
This seems a little cumbersome though. Any more efficient methods?
Try
repmat([1 2 3],1,3)
I'll leave you to check the documentation for repmat.
This is a Faster Method Than repmat or reshape by an Order of Magnitude
One of the best methods for doing such things is Using Tony's Trick. Repmat and Reshape are usually found to be slower than Tony's trick as it directly uses Matlabs inherent indexing. To answer you question,
Lets say, you want to tile the row vector r=[1 2 3] N times like r=[1 2 3 1 2 3 1 2 3...], then,
c=r'
cc=c(:,ones(N,1));
r_tiled = cc(:)';
This method has significant time savings against reshape or repmat for large N's.
EDIT : Reply to #Li-aung Yip's doubts
I conducted a small Matlab test to check the speed differential between repmat and tony's trick. Using the code mentioned below, I calculated the times for constructing the same tiled vector from a base vector A=[1:N]. The results show that YES, Tony's-Trick is FASTER BY AN ORDER of MAGNITUDE, especially for larger N. People are welcome to try it themselves. This much time differential can be critical if such an operation has to be performed in loops. Here is the small script I used;
N= 10 ;% ASLO Try for values N= 10, 100, 1000, 10000
% time for tony_trick
tic;
A=(1:N)';
B=A(:,ones(N,1));
C=B(:)';
t_tony=toc;
clearvars -except t_tony N
% time for repmat
tic;
A=(1:N);
B=repmat(A,1,N);
t_repmat=toc;
clearvars -except t_tony t_repmat N
The Times (in seconds) for both methods are given below;
N=10, time_repmat = 8e-5 , time_tony = 3e-5
N=100, time_repmat = 2.9e-4 , time_tony = 6e-5
N=1000, time_repmat = 0.0302 , time_tony = 0.0058
N=10000, time_repmat = 2.9199 , time_tony = 0.5292
My RAM didn't permit me to go beyond N=10000. I am sure, the time difference between the two methods will be even more significant for N=100000. I know, these times might be different for different machines, but the relative difference in order-of-magnitude of times will stand. Also, I know, the avg of times could have been a better metric, but I just wanted to show the order of magnitude difference in time consumption between the two approaches. My machine/os details are given below :
Relevant Machine/OS/Matlab Details : Athlon i686 Arch, Ubuntu 11.04 32 bit, 3gb ram, Matlab 2011b
Based on Abhinav's answer and some tests, I wrote a function which is ALWAYS faster than repmat()!
It uses the same parameters, except for the first parameter which must be a vector and not a matrix.
function vec = repvec( vec, rows, cols )
%REPVEC Replicates a vector.
% Replicates a vector rows times in dim1 and cols times in dim2.
% Auto optimization included.
% Faster than repmat()!!!
%
% Copyright 2012 by Marcel Schnirring
if ~isscalar(rows) || ~isscalar(cols)
error('Rows and cols must be scaler')
end
if rows == 1 && cols == 1
return % no modification needed
end
% check parameters
if size(vec,1) ~= 1 && size(vec,2) ~= 1
error('First parameter must be a vector but is a matrix or array')
end
% check type of vector (row/column vector)
if size(vec,1) == 1
% set flag
isrowvec = 1;
% swap rows and cols
tmp = rows;
rows = cols;
cols = tmp;
else
% set flag
isrowvec = 0;
end
% optimize code -> choose version
if rows == 1
version = 2;
else
version = 1;
end
% run replication
if version == 1
if isrowvec
% transform vector
vec = vec';
end
% replicate rows
if rows > 1
cc = vec(:,ones(1,rows));
vec = cc(:);
%indices = 1:length(vec);
%c = indices';
%cc = c(:,ones(rows,1));
%indices = cc(:);
%vec = vec(indices);
end
% replicate columns
if cols > 1
%vec = vec(:,ones(1,cols));
indices = (1:length(vec))';
indices = indices(:,ones(1,cols));
vec = vec(indices);
end
if isrowvec
% transform vector back
vec = vec';
end
elseif version == 2
% calculate indices
indices = (1:length(vec))';
% replicate rows
if rows > 1
c = indices(:,ones(rows,1));
indices = c(:);
end
% replicate columns
if cols > 1
indices = indices(:,ones(1,cols));
end
% transform index when row vector
if isrowvec
indices = indices';
end
% get vector based on indices
vec = vec(indices);
end
end
Feel free to test the function with all your data and give me feedback. When you found something to even improve it, please tell me.