I'm trying to assign ~1 Million values to a 100x100 logical matrix like this:
CC(Labels,LabelsXplusOne) = true;
where CC is 100x100 logical and Labels, LabelsXplusOne are 1024x768 int32.
The problem now is the above statement takes about as long as 5 minutes to complete on a modern CPU.
Obviously it is badly implemented in MATLAB, so how can we make the above run faster without resorting to loops?
In case you are wondering, i need this statement to compute blobs in a integer (not binary) image.
And also:
max(max(Labels)) = 100
max(max(LabelsXplusOne)) = 100
EDIT:
Ok i got it. Maybe this will help others in the future:
tic; CC(sub2ind(size(CC),Labels,LabelsXplusOne)) = true; toc;
Elapsed time is 0.026414 seconds.
Much better now.
There are a couple of issues that jump out at me...
I have the feeling you are doing the matrix indexing wrong. As it stands now, what will happen is every value in Labels will be paired with every value in LabelsXplusOne, producing (1024*768)^2 total index pairs for your rows and columns of CC. That's likely what's taking so long.
What you probably want is to only use each pair of values as indices, like Labels(1,1),LabelsXplusOne(1,1), Labels(1,2),LabelsXplusOne(1,2), etc. To do this, you should convert your indices into linear indices using the function SUB2IND.
Additionally, your matrix CC only contains 10,000 entries, yet your index matrices each contain 786,432 integer values. This means you will end up assigning the value true to the same entry in CC many times over. You should first remove redundant sets of indices using the function UNIQUE, then use them to assign values to CC.
This is what I think you want:
CC(unique(sub2ind(size(CC), Labels, LabelsXplusOne))) = true;
Related
I'm currently working on implementing a gradient check function in which it requires to get certain index values from the result matrix. Could someone tell me how to get a group of values from the matrix?
To be specific, for a result matrx res with size M x N, I'll need to get element res(3,1), res(4,2), res(1,3), res(2,4)...
In my case, M is dimension and N is batch size and there's a label array whose size is 1xbatch_size, [3 4 1 2...]. So the desired values are res(label(:),1:batch_size). Since I'm trying to practice vectorization programming and it's better not using loop. Could someone tell me how to get a group of value without a iteration?
Cheers.
--------------------------UPDATE----------------------------------------------
The only idea I found is firstly building a 'mask matrix' then use the original result matrix to do element wise multiplication (technically called 'Hadamard product', see in wiki). After that just get non-zero element out and do the sum operation, the code in matlab should look like:
temp=Mask.*res;
desired_res=temp(temp~=0); %Note: the temp(temp~=0) extract non-zero elements in a 'column' fashion: it searches temp matrix column by column then put the non-zero number into container 'desired_res'.
In my case, what I wanna do next is simply sum(desired_res) so I don't need to consider the order of those non-zero elements in 'desired_res'.
Based on this idea above, creating mask matrix is the key aim. There are two methods to do this job.
Codes are shown below. In my case, use accumarray function to add '1' in certain location (which are stored in matrix 'subs') and add '0' to other space. This will give you a mask matrix size [rwo column]. The usage of full(sparse()) is similar. I made some comparisons on those two methods (repeat around 10 times), turns out full(sparse) is faster and their time costs magnitude is 10^-4. So small difference but in a large scale experiments, this matters. One benefit of using accumarray is that it could define the matrix size while full(sparse()) cannot. The full(sparse(subs, 1)) would create matrix with size [max(subs(:,1)), max(subs(:,2))]. Since in my case, this is sufficient for my requirement and I only know few of their usage. If you find out more, please share with us. Thanks.
The detailed description of those two functions could be found on matlab's official website. accumarray and full, sparse.
% assume we have a label vector
test_labels=ones(10000,1);
% method one, accumarray(subs,1,[row column])
tic
subs=zeros(10000,2);
subs(:,1)=test_labels;
subs(:,2)=1:10000;
k1=accumarray(subs,1,[10, 10000]);
t1=toc % to compare with method two to check which one is faster
%method two: full(sparse(),1)
tic
k2=full(sparse(test_labels,1:10000,1));
t2=toc
I have a small MATLAB script (included below) for handling data read from a CSV file with two columns and hundreds of thousands of rows. Each entry is a natural number, with zeros only occurring in the second column. This code is taking a truly incredible amount of time (hours) to run what should be achievable in at most some seconds. The profiler identifies that approximately 100% of the run time is spent writing a matrix of zeros, whose size varies depending on input, but in all usage is smaller than 1000x1000.
The code is as follows
function [data] = DataHandler(D)
n = size(D,1);
s = max(D,1);
data = zeros(s,s);
for i = 1:n
data(D(i,1),D(i,2)+1) = data(D(i,1),D(i,2)+1) + 1;
end
It's the data = zeros(s,s); line that takes around 100% of the runtime. I can make the code run quickly by just changing out the s's in this line for 1000, which is a sufficient upper bound to ensure it won't run into errors for any of the data I'm looking at.
Obviously there're better ways to do this, but being that I just bashed the code together to quickly format some data I wasn't too concerned. As I said, I fixed it by just replacing s with 1000 for my purposes, but I'm perplexed as to why writing that matrix would bog MATLAB down for several hours. New code runs instantaneously.
I'd be very interested if anyone has seen this kind of behaviour before, or knows why this would be happening. Its a little disconcerting, and it would be good to be able to be confident that I can initialize matrices freely without killing MATLAB.
Your call to zeros is incorrect. Looking at your code, D looks like a D x 2 array. However, your call of s = max(D,1) would actually generate another D x 2 array. By consulting the documentation for max, this is what happens when you call max in the way you used:
C = max(A,B) returns an array the same size as A and B with the largest elements taken from A or B. Either the dimensions of A and B are the same, or one can be a scalar.
Therefore, because you used max(D,1), you are essentially comparing every value in D with the value of 1, so what you're actually getting is just a copy of D in the end. Using this as input into zeros has rather undefined behaviour. What will actually happen is that for each row of s, it will allocate a temporary zeros matrix of that size and toss the temporary result. Only the dimensions of the last row of s is what is recorded. Because you have a very large matrix D, this is probably why the profiler hangs here at 100% utilization. Therefore, each parameter to zeros must be scalar, yet your call to produce s would produce a matrix.
What I believe you intended should have been:
s = max(D(:));
This finds the overall maximum of the matrix D by unrolling D into a single vector and finding the overall maximum. If you do this, your code should run faster.
As a side note, this post may interest you:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
It was shown in this post that doing zeros(n,n) is in fact slow and there are several neat tricks to initializing an array of zeros. One way is to accomplish this by empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
One of my personal favourites is that if you assume that data was not declared / initialized, you can do:
data(n,n) = 0;
If I can also comment, that for loop is quite inefficient. What you are doing is calculating a 2D histogram / accumulation of data. You can replace that for loop with a more efficient accumarray call. This also avoids allocating an array of zeros and accumarray will do that under the hood for you.
As such, your code would basically become this:
function [data] = DataHandler(D)
data = accumarray([D(:,1) D(:,2)+1], 1);
accumarray in this case will take all pairs of row and column coordinates, stored in D(i,1) and D(i,2) + 1 for i = 1, 2, ..., size(D,1) and place all that match the same row and column coordinates into a separate 2D bin, we then add up all of the occurrences and the output at this 2D bin gives you the total tally of how many values at this 2D bin which corresponds to the row and column coordinate of interest mapped to this location.
I need to convert this to Matlab code, and am struggling without the "table" function.
Table[{i,1000,ability,savingsrate,0,RandomInteger[{15,30}],1,0},{i,nrhhs}];
So basically, these values are all just numbers, and I think I need to use a function handle, or maybe a for loop. I'm no expert, so I really need some help?
I'm not an expert in Mathematics (just used it long time ago). According to this documentation for Table function, you are using this form:
Table[expr, {i, imax}]
generates a list of the values of expr when i runs from 1 to imax.
It looks like your statement will produce list duplicating the list in first argument increasing i from 1 to nrhhs and using different random number.
In MATLAB the output can be equivalent to a matrix or a cell array.
To create a matrix with rows as your lists you can do:
result = [ (1:nrhhs)', repmat([1000,ability,savingsrate,0],nrhhs,1), ...
randi([15 30],nrhhs,1), repmat([1,0],nrhhs,1) ];
You can convert the above matrix to a cell array:
resultcell = cell2mat(result, ones(nrhhs,1));
The "Table" example you gave creates a list of nrhhs sub-lists, each of which contains 8 numbers (i, 1000, ability, savingsrate, 0, a random integer between 15 and 30 inclusive, 1, and 0). This is essentially (though not exactly) the same as an nrhhs x 8 matrix.
Assuming you do just want a matrix out, though, an analogous for loop in Matlab would be:
result = zeros(nrhhs,8); % preallocate memory for the result
for i = 1:nrhhs
result(i,:) = [i 1000 ability savingsrate 0 randi([15 30]) 1 0];
end
This method is likely slower than yuk's answer (which makes much more efficient use of vectors to avoid the for loop), but might be a little easier to pick apart depending on how familiar you are with Matlab.
I want to sum up several vectors of different size in an array. Each time one of the vectors drops out of my program, I want to append it to my array. Like this:
array = [array, vector];
In the end I want to let this array be the output of a function. But it gives me wrong results. Is this possible with MATLAB?
Thanks and kind regards,
Damian
Okay, given that we're dealing with column vectors of different size, you can't put them all in a numerical array, since a numerical array has to be rectangular. If you really wanted to put them in the numerical array, then the column length of the array will need to be the length of the longest vector, and you'll have to pad out the shorter vectors with NaNs.
Given this, a better solution would be, as chaohuang hinted at in the comments, to use a cell array, and store one vector in each cell. The problem is that you don't know beforehand how many vectors there will be. The usual approach that I'm aware of for this problem is as follows (but if someone has a better idea, I'm keen to learn!):
UpperBound = SomeLargeNumber;
Array = cell(1, UpperBound);
Counter = 0;
while SomeCondition
Counter = Counter + 1;
if Counter > UpperBound
error('You did not choose a large enough upper bound!');
end
%#Create your vector here
Array{1, Counter} = YourVectorHere;
end
Array = Array(1, 1:Counter);
In other words, choose some upper bound beforehand that you are sure you won't go above in the loop, and then cut your cell array down to size once the loop is finished. Also, I've put in an error trap in case you're choice of upper bound turns out to be too small!
Oh, by the way, I just noted in your question the words "sum up several vectors". Was this a figure of speech or did you actually want to perform a sum operation somewhere?
I need this section of my code to run faster, as it is called many many times. I am new to Matlab and I feel as though there MUST be a way to do this that is not so round-about. Any help you could give on how to improve the speed of what I have or other functions to look into that would help me perform this task would be appreciated.
(Task is to get only lines of "alldata" where the first column is in the set of "minuteintervals" into "alldataMinutes". "minuteintervals" is just the minimum value of "alldata" column one increasing by twenty to the maximum of alldata.
minuteintervals= min(alldata(:,1)):20:max(alldata(:,1)); %20 second intervals
alldataMinutes= zeros(30000,4);
counter=1;
for x=1:length(alldata)
if ismember(alldata(x,1), minuteintervals)
alldataMinutes(counter,:)= alldata(x,:);
counter= counter+1;
end
end
alldataMinutes(counter:length(alldataMinutes),:)= [];
This should give you what you want, and it should be substantially faster:
minuteintervals = min(alldata(:,1)):20:max(alldata(:,1)); %# Interval set
index = ismember(alldata(:,1),minuteintervals); %# Logical index showing first
%# column values in the set
alldataMinutes = alldata(index,:); %# Extract the corresponding rows
This works by passing a vector of values to the function ISMEMBER, instead of passing values one at a time. The output index is a logical vector the same size as alldata(:,1), with a value of 1 (i.e. true) for elements of alldata(:,1) that are in the set minuteintervals, and a value of 0 (i.e. false) otherwise. You can then use logical indexing to easily extract the rows corresponding to the ones in index, placing them in alldataMinutes.