Is there a fast way to remove rows and columns from a large matrix in MATLAB?
I have a very large (square) distance matrix, that I want to remove a number of rows/columns from.
Naively:
s = 12000;
D = rand(s);
cols = sort(randsample(s,2))
rows = sort(randsample(s,2))
A = D;
tic
A(rows,:) = [];
A(:,cols) = [];
toc
% Elapsed time is 54.982124 seconds.
This is terribly slow though.
Oddly, this is the fastest solution suggested at the bottom here.
An improvement can be made by preallocating the array and using boolean indices
A = zeros(size(D) - [numel(rows) numel(cols)]);
r = true(size(D,1),1);
c = true(size(D,2),1);
r(rows) = false;
c(cols) = false;
tic
A = D(r,c);
toc
% Elapsed time is 20.083072 seconds.
Is there still a faster way to do this?
It seems like a memory bottleneck. On my feeble laptop, breaking D up and applying these operators to each part was much faster (using s=12,000 crashed my computer). Here I break it into two pieces, but you can probably find a more optimal partition.
s = 8000;
D = rand(s);
D1 = D(1:s/2,:);
D2 = D((s/2 + 1):end,:);
cols = sort(randsample(s,2));
rows = sort(randsample(s,2));
A1 = D1;
A2 = D2;
tic
A1(rows(rows <= s/2),:) = [];
A2(rows(rows > s/2) - s/2,:) = [];
A1(:,cols) = [];
A2(:,cols) = [];
toc
A = D;
tic
A(rows,:) = [];
A(:,cols) = [];
toc
Elapsed time is 2.317080 seconds.
Elapsed time is 140.771632 seconds.
I think it will depend on your usage, but I have two ideas:
Make it a sparse matrix. The more you're removing the better this option will probably be.
Why do you need to remove the values? Could you maybe do:
A = D(randsample(s,2), randsample(s,2));
clear D;
% Use A
Related
I'm trying to generate a new matrix, based on index values stored in another matrix.
This is trivial to do with a for loop, but this is currently the slowest line in some code I'm trying to optimise, and so I'm looking for a way to do it without the loop, and pulling my hair out.
I'm sure this has been answered before, and that I just don't know the right search terms.
n1 = 10;
n2 = 100;
a = randi(n2,[1,n1]);
b = randi(n2,[4,n1]);
c = rand(100,100);
for i = 1:n1
d(:,i) = c(a(i),b(:,i));
end
I'm assuming the value of n1 in your code is way bigger than in the example you provide, which would explain why it is "slow".
In order to do this without a loop, you can use Linear indexing:
n1 = 1e6;
n2 = 100;
a = randi(n2,[1,n1]);
b = randi(n2,[4,n1]);
c = rand(n2,n2);
% With a loop
d = zeros(4,n1);
tic
for i = 1:n1
d(:,i) = c(a(i),b(:,i));
end
toc
% A faster way for big values of `n1`
d2 = zeros(4,n1);
tic
a_rep = repmat(a,4,1); % Repeat row indexes to match the elements in b
idx_Lin = sub2ind([n2,n2],a_rep(:),b(:)); % Get linear indexes
d2(:) = c(idx_Lin); % Fill
toc
isequal(d,d2)
Elapsed time is 1.309654 seconds.
Elapsed time is 0.062549 seconds.
ans =
logical
1
Try This:
n1 = 10;
n2 = 100;
a = randi(n2,[1,n1]);
b = randi(n2,[4,n1]);
c = rand(100,100);
idx = (1:n1);
tic
d1=(c(a(idx),b(:,idx)))';
[idx,idy]=meshgrid(0:44:400,1:4);
d1=d1(idy+idx);
toc
this is the timeing:
Elapsed time is 0.000517 seconds.
In this code, I have a cluster image with 10 classes and i want to extract 10 different images for each level and save as a 10 images Below is the code, I used
tic
numberOfClasses = 10;
segment_label_images = cell(1,numberOfClasses);
pixelCount = zeros(1,numberOfClasses);
[rs, cs] = size(classImage);
% classImage has intensity range from 1-numberOfClasses
for k = 1:numberOfClasses
for i = 1:rs
for j = 1:cs
if classImage(i,j) == k
segment_label_images{k}(i,j) = 1;
else
segment_label_images{k}(i,j) = 0;
end
end
end
pixelCount(k) = sum(segment_label_images{k}(:));
%figure, imshow(segment_label_images{k},[]);
end
toc
Here, I have 3 for loops and I think that is affecting computational time. Elapsed time is 0.089413 seconds.
Any suggestions to avoid for loop to improve comp time.? Thanks, Gopi
Assuming classImage is a matrix you could speed it up with
for k = 1:numberOfClasses
segment_label_images{k} = classImage == k;
pixelCount(k) = sum(segment_label_images{k}(:));
end
Assuming MATLAB 2016b (or Octave):
k = permute(1:numberOfClasses, [1,3,2]);
segment_label_images = (classImage == k);
pixelCount = squeeze(sum(sum(segment_label_images, 1), 2));
For pre-2016b MATLAB, just add bsxfun:
k = permute(1:numberOfClasses, [1,3,2]);
segment_label_images = bsxfun(#eq, classImage, k);
pixelCount = squeeze(sum(sum(segment_label_images, 1), 2));
Of course, both of these leave segment_label_images as a 3D array rather than a cell array. Given that all of the arrays are the same size, I prefer to work with multi-dimensional arrays rather than cell arrays, for speed and convenience. It can, of course, be converted to a cell array if necessary.
I am trying to derive a function for calculating a moving/rolling correlation for two vectors and speed is a high priority, since I need to apply this function in an array function. What I have (which is too slow) is this:
Data1 = rand(3000,1);
Data2 = rand(3000,1);
function y = MovCorr(Data1,Data2)
[N,~] = size(Data1);
correlationTS = nan(N, 1);
for t = 20+1:N
correlationTS(t, :) = corr(Data1(t-20:t, 1),Data2(t-20:t,1),'rows','complete');
end
y = correlationTS;
end
I am thinking that the for loop could be done more efficiently if I knew how to generate the roling window indices and then applying accumarray. Any suggestions?
Following the advice from #knedlsepp, and using filter as in the movingstd, I found the following solution, which is quite fast:
function Cor = MovCorr1(Data1,Data2,k)
y = zscore(Data2);
n = size(y,1);
if (n<k)
Cor = NaN(n,1);
else
x = zscore(Data1);
x2 = x.^2;
y2 = y.^2;
xy = x .* y;
A=1;
B = ones(1,k);
Stdx = sqrt((filter(B,A,x2) - (filter(B,A,x).^2)*(1/k))/(k-1));
Stdy = sqrt((filter(B,A,y2) - (filter(B,A,y).^2)*(1/k))/(k-1));
Cor = (filter(B,A,xy) - filter(B,A,x).*filter(B,A,y)/k)./((k-1)*Stdx.*Stdy);
Cor(1:(k-1)) = NaN;
end
end
Comparing with my original solution the execution times are:
tic
MovCorr(Data1,Data2);
toc
Elapsed time is 1.017552 seconds.
tic
MovCorr1(Data1,Data2,21);
toc
Elapsed time is 0.019400 seconds.
I have these two different ways to implement the same thing but I guess the second is the best. However, I get a better result when using tic toc for the first. How comes ?
j=6;
i=j;
Savings = zeros(i,j);
Costs = magic(400);
tic;
for x=2:i
for y=2:j
if(x ~= y)
Savings(x,y) = Costs(x,1) + Costs(1,y) - Costs(x,y);
end
end
end
first=toc;
disp(num2str(first))
Savings = zeros(i,j);
tic;
Ix=2:i;
Iy=2:j;
I = false(i,j);
I(Ix,Iy) = bsxfun(#ne, Ix', Iy);
S = bsxfun(#plus, Costs(Ix,1), Costs(1,Iy)) - Costs(Ix,Iy);
Savings(I) = S(I(Ix,Iy));
second=toc;
temp = Savings;
disp(num2str(second))
It depends on how MATLAB's JIT engine can improve the performance of for loops. For small matrices it works fine but for large ones not really. Seems for i less than 60, first method is faster, but not for larger matrices. Try this benchmark
for j=[6 30 60 100 200 400 600]
disp(['j=' num2str(j)]);
i=j;
Savings = zeros(i,j);
Costs = magic(600);
tic;
for mm=1:1e2
for x=2:i
for y=2:j
if(x ~= y)
Savings(x,y) = Costs(x,1) + Costs(1,y) - Costs(x,y);
end
end
end
end
first=toc;
disp(num2str(first));
Savings = zeros(i,j);
tic;
for mm=1:1e2
Ix=2:i;
Iy=2:j;
I = false(i,j);
I(Ix,Iy) = bsxfun(#ne, Ix', Iy);
S = bsxfun(#plus, Costs(Ix,1), Costs(1,Iy)) - Costs(Ix,Iy);
Savings(I) = S(I(Ix,Iy));
end
second=toc;
temp = Savings;
disp(num2str(second))
end
On my machine, it returns:
j=6
0.0001874
0.0052893
j=30
0.0034454
0.0057184
j=60
0.011097
0.01268
j=100
0.027957
0.023952
j=200
0.11529
0.058686
j=400
0.45791
0.37246
j=600
1.1496
0.74932
Is there any difference between these two methods for deleting elements in Matlab:
ElementsToDelete = [0 0 1 0 1 0 0 1 1 0]
A = 1:10
A(ElementsToDelete) = []
%Versus
A = 1:10
A = A(~ElementsToDelete)
Are there times when one method is more appropriate than the other? Is there a difference in efficiency? Or are they completely interchangeable?
Try this:
A = rand(1e3, 1);
b = A<0.5;
tic;
for ii = 1:1e5
a = A;
a(b) = [];
end
toc
tic;
for ii = 1:1e5
a = A;
a = a(~b);
end
toc
Results:
Elapsed time is 1.654146 seconds
Elapsed time is 1.126325 seconds
So the difference is a speed factor of 1.5 in favour of re-assigning. This however, is worse:
A = rand(1e4, 1);
stop = 0;
for jj = 1:10
a = A;
start = tic;
for ii = 1:1e5
a(a < rand) = [];
end
stop = stop + toc(start);
end
avg1 = stop/10
stop = 0;
for jj = 1:10
a = A;
start = tic;
for ii = 1:1e5
a = a(a > rand);
end
stop = stop + toc(start);
end
avg2 = stop/10
avg1/avg2
Results:
avg1 = 1.1740235 seconds
avg2 = 0.1850463 seconds
avg1/avg2 = 6.344485136963019
So, the factor's increased to well over 6.
My guess is that deletion (i.e., assigning with []) re-writes the entire array on each and every occurrence of a true in the internal loop through the logical indices. This is hopelessly inefficient, as becomes apparent when testing it like this. Re-assigning on the other hand can determine the size of the new array beforehand and initialize it accordingly; no re-writes needed.
Why the JIT does not compile the one into the other is a mystery to me, because deletion is a far more intuitive notation IMHO. But, as you see, it is inefficient compared to alternatives, and should thus be used sparingly. Never use it inside loops!