(matlab matrix operation), Is it possible to get a group of value from matrix without loop? - matlab

I'm currently working on implementing a gradient check function in which it requires to get certain index values from the result matrix. Could someone tell me how to get a group of values from the matrix?
To be specific, for a result matrx res with size M x N, I'll need to get element res(3,1), res(4,2), res(1,3), res(2,4)...
In my case, M is dimension and N is batch size and there's a label array whose size is 1xbatch_size, [3 4 1 2...]. So the desired values are res(label(:),1:batch_size). Since I'm trying to practice vectorization programming and it's better not using loop. Could someone tell me how to get a group of value without a iteration?
Cheers.
--------------------------UPDATE----------------------------------------------
The only idea I found is firstly building a 'mask matrix' then use the original result matrix to do element wise multiplication (technically called 'Hadamard product', see in wiki). After that just get non-zero element out and do the sum operation, the code in matlab should look like:
temp=Mask.*res;
desired_res=temp(temp~=0); %Note: the temp(temp~=0) extract non-zero elements in a 'column' fashion: it searches temp matrix column by column then put the non-zero number into container 'desired_res'.
In my case, what I wanna do next is simply sum(desired_res) so I don't need to consider the order of those non-zero elements in 'desired_res'.
Based on this idea above, creating mask matrix is the key aim. There are two methods to do this job.
Codes are shown below. In my case, use accumarray function to add '1' in certain location (which are stored in matrix 'subs') and add '0' to other space. This will give you a mask matrix size [rwo column]. The usage of full(sparse()) is similar. I made some comparisons on those two methods (repeat around 10 times), turns out full(sparse) is faster and their time costs magnitude is 10^-4. So small difference but in a large scale experiments, this matters. One benefit of using accumarray is that it could define the matrix size while full(sparse()) cannot. The full(sparse(subs, 1)) would create matrix with size [max(subs(:,1)), max(subs(:,2))]. Since in my case, this is sufficient for my requirement and I only know few of their usage. If you find out more, please share with us. Thanks.
The detailed description of those two functions could be found on matlab's official website. accumarray and full, sparse.
% assume we have a label vector
test_labels=ones(10000,1);
% method one, accumarray(subs,1,[row column])
tic
subs=zeros(10000,2);
subs(:,1)=test_labels;
subs(:,2)=1:10000;
k1=accumarray(subs,1,[10, 10000]);
t1=toc % to compare with method two to check which one is faster
%method two: full(sparse(),1)
tic
k2=full(sparse(test_labels,1:10000,1));
t2=toc

Related

How can I reduce the number of rows of a matrix by half in matlab?

What is the best way to reduce the number of rows of a matrix by half in matlab?
What is the following command doing?
mymatrix = mymatrix(1:2:end,:);
Is there any better way available?
Short answer this is taking every second row of the matrix mymatrix starting with the first one (all odd-rows) and yes that is probably the easiest way. Added clarification based on comment from #Sardar_Usama
Longer version
end is matlab internal command that refers to the end of the array in the given dimension. roughly equivalent to size(var,dim).
so actually what mymatrix(1:2:end,:) can be re-written to mymatrix(1:2:size(mymatrix,1),:). Now if you actually look at 1:2:size(mymatrix,1) these are the rows that you are selecting.1, 3, 5, etc. You can actually specify whichever rows you want there, here are some examples.
1:floor(end/2); % first 'half'
floor(end/2)+1:end; % second 'half'
1:3:end; % every third element
1:2:floor(end/2); % every second element in the first 'half'
Added floor() to avoid problems for odd number lengths. In that case 'half' is not exactly half but rather roughly half. Alternatively ceil() depending on how you would like to define half for odd lengths.

MATLAB spending an incredible amount of time writing a relatively small matrix

I have a small MATLAB script (included below) for handling data read from a CSV file with two columns and hundreds of thousands of rows. Each entry is a natural number, with zeros only occurring in the second column. This code is taking a truly incredible amount of time (hours) to run what should be achievable in at most some seconds. The profiler identifies that approximately 100% of the run time is spent writing a matrix of zeros, whose size varies depending on input, but in all usage is smaller than 1000x1000.
The code is as follows
function [data] = DataHandler(D)
n = size(D,1);
s = max(D,1);
data = zeros(s,s);
for i = 1:n
data(D(i,1),D(i,2)+1) = data(D(i,1),D(i,2)+1) + 1;
end
It's the data = zeros(s,s); line that takes around 100% of the runtime. I can make the code run quickly by just changing out the s's in this line for 1000, which is a sufficient upper bound to ensure it won't run into errors for any of the data I'm looking at.
Obviously there're better ways to do this, but being that I just bashed the code together to quickly format some data I wasn't too concerned. As I said, I fixed it by just replacing s with 1000 for my purposes, but I'm perplexed as to why writing that matrix would bog MATLAB down for several hours. New code runs instantaneously.
I'd be very interested if anyone has seen this kind of behaviour before, or knows why this would be happening. Its a little disconcerting, and it would be good to be able to be confident that I can initialize matrices freely without killing MATLAB.
Your call to zeros is incorrect. Looking at your code, D looks like a D x 2 array. However, your call of s = max(D,1) would actually generate another D x 2 array. By consulting the documentation for max, this is what happens when you call max in the way you used:
C = max(A,B) returns an array the same size as A and B with the largest elements taken from A or B. Either the dimensions of A and B are the same, or one can be a scalar.
Therefore, because you used max(D,1), you are essentially comparing every value in D with the value of 1, so what you're actually getting is just a copy of D in the end. Using this as input into zeros has rather undefined behaviour. What will actually happen is that for each row of s, it will allocate a temporary zeros matrix of that size and toss the temporary result. Only the dimensions of the last row of s is what is recorded. Because you have a very large matrix D, this is probably why the profiler hangs here at 100% utilization. Therefore, each parameter to zeros must be scalar, yet your call to produce s would produce a matrix.
What I believe you intended should have been:
s = max(D(:));
This finds the overall maximum of the matrix D by unrolling D into a single vector and finding the overall maximum. If you do this, your code should run faster.
As a side note, this post may interest you:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
It was shown in this post that doing zeros(n,n) is in fact slow and there are several neat tricks to initializing an array of zeros. One way is to accomplish this by empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
One of my personal favourites is that if you assume that data was not declared / initialized, you can do:
data(n,n) = 0;
If I can also comment, that for loop is quite inefficient. What you are doing is calculating a 2D histogram / accumulation of data. You can replace that for loop with a more efficient accumarray call. This also avoids allocating an array of zeros and accumarray will do that under the hood for you.
As such, your code would basically become this:
function [data] = DataHandler(D)
data = accumarray([D(:,1) D(:,2)+1], 1);
accumarray in this case will take all pairs of row and column coordinates, stored in D(i,1) and D(i,2) + 1 for i = 1, 2, ..., size(D,1) and place all that match the same row and column coordinates into a separate 2D bin, we then add up all of the occurrences and the output at this 2D bin gives you the total tally of how many values at this 2D bin which corresponds to the row and column coordinate of interest mapped to this location.

Passing values to a sparse matrix in MATLAB

Might sound too simple to you but I need some help in regrad to do all folowings in one shot instead of defining redundant variables i.e. tmp_x, tmp_y:
X= sparse(numel(find(G==0)),2);
[tmp_x, temp_y] = ind2sub(size(G), find(G == 0));
X(:)=[tmp_x, tmp_y];
(More info: G is a sparse matrix)
I tried:
X(:)=ind2sub(size(G), find(G == 0));
but that threw an error.
How can I achieve this without defining tmp_x, tmp_y?
A couple of comments with your code:
numel(find(G == 0)) is probably one of the worst ways to determine how many entries that are zero in your matrix. I would personally do numel(G) - nnz(G). numel(G) determines how many elements are in G and nnz(G) determines how many non-zero values are in G. Subtracting these both would give you the total number of elements that are zero.
What you are doing is first declaring X to be sparse... then when you're doing the final assignment in the last line to X, it reconverts the matrix to double. As such, the first statement is totally redundant.
If I understand what you are doing, you want to find the row and column locations of what is zero in G and place these into a N x 2 matrix. Currently with what MATLAB has available, this cannot be done without intermediate variables. The functions that you'd typically use (find, ind2sub, etc.) require intermediate variables if you want to capture the row and column locations. Using one output variable will give you the column locations only.
You don't have a choice but to use intermediate variables. However, if you want to make this more efficient, you don't even need to use ind2sub. Just use find directly:
[I,J] = find(~G);
X = [I,J];

Finding the best threshold level without a for loop in MATLAB

Let A and B be two matrices of the same size. For a matrix M, let ht(M,t) threshold all the entries of M by t. That is All entries whose absolute value is less than t are set to 0. Suppose I want to find the optimal threshold t such that norm(ht(A,t)-B,'fro')^2 is minimized.
The only way that I can see to do this is deficient: do a for loop over the unique values of A and threshold A and setting C=ht(A,t)-B, compute sum(sum(C.*C)).
This is just too slow when A is large. I have considered sorting the elements of A and finding some efficient way to set a few entries to zero at a time, but I'm not sure this can all be done without a for loop.
Is there a way to do it?
Here's a very simple example (so simple a for loop works easily in this case):
B =
0.101508820368332 0
0 0.301996943246957
Set
A=B+.1*ones(2)
A =
0.201508820368332 0.1
0.1 0.401996943246957
Simple inspection shows that if we zero out the off-diagonal entries of A we minimize the difference between A and B. There are 3 possible threshold values, given by unique(A)=[.1,.2015,.402]. Given a potential threshold value t, we can hard threshold A by:
function [A_thresholded] = ht(A,t)
%
A_thresholded = A .* (abs(A)>t);
The form of the data in a matrix is irrelevant. You can convert them to vectors and simply compute the square-norm. In fact, you can sort the contents of A in increasing order (and permute B to preserve pairing). When you increase the threshold to include one more value in A, the norm only changes by that one increment. Therefore, you can find your solution in O(n log n). Hope this helps.

matlab: out of memory when concatenating sparse matrix with a vector

I create a 3560 x 3560 sparse matrix, A. I then create two 1 X 3560 vectors, S and T.
When I run the following code (which concatenates S and T as rows in A and afterwards also as columns in A)
A=[A;S;T];
S=[S 0 0];
T=[T 0 0];
A=[A, S', T'];
The last line produces an out of memory error.
I guess I am running out of memory since I have other variables stored, but it seems odd to me that adding two 3560 vectors would be the point in which I am exactly hitting my limit, so I think (or more accurately, wishfully think) that somehow the concatenations aren't done in a smart way...
Am I right or is there no hope (except for optimizing other pieces in my code)?
EDIT:
At the request of yoda, I am posting the full code.
Basically what it does is get a N X N matrix of edge weights between the nodes of a graph, and adds two vectors that will act as a source and sink in a max flow computation.
nbr_sim(nbr_sim<0.8)=0;
A=sparse(size(nbr_sim,1)+2,size(nbr_sim,2)+2);
nelements=size(nbr_sim,1);
A(nbr_sim>0)=nbr_sim(nbr_sim>0);
clear nbr_sim;
S=abs([1 0 0]*n);
T=abs([0 1 0]*n);
A(1:nelements,end-1)=S';
A(1:nelements,end)=T';
A(end-1,1:nelements)=S;
A(end,1:nelements)=T;
EDIT:
As you say you have used considerable resources before this operation, it is entirely likely that you are close to the tipping point, when MATLAB gives you an out of memory error.
Remember that when you grow matrices on the fly either by concatenating or by indexing out of range, MATLAB creates a copy of the matrix in memory. So you're not just using up resources for that extra row, but for a copy of that entire matrix!
Here's an example on my machine where I try to grow a vector that's large enough to tip it over the memory limit.
clear
a=rand(2*10^9+1,1); %#create a large array
whos a
Name Size Bytes Class Attributes
a 2000000001x1 16000000008 double
%#Now repeat the same, but by growing the array by one element
clear
a=rand(2*10^9,1);
a=[a;0];
??? Error using ==> vertcat
Out of memory. Type HELP MEMORY for your options.
So you see that although MATLAB can create a matrix with 2*10^9+1 elements in one go, when you try to create an array of the same size by append a single element to a 2*10^9 element vector, it runs out of memory.
If S and T are column vectors as you say, then A=[A;S;T] should give you an error:
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
So you must be doing something else. Concatenating will not change sparseness of the matrix i.e., it won't switch from sparse to full.
A=sprand(3560,3560,0.01); %#test matrices
S=rand(3560,1);
T=rand(3560,1);
B=[A,S,T]; %#join the columns
issparse(B)
ans =
1
Moreover, a 3560x3560 matrix of doubles is only ~97 MB, which shouldn't give you an "out of memory" error...
When dealing with large matrix:
For full matrix, you'd better preallocate memory to avoid memory copy during extending.see why
The sparse case is more complicated, and can be even less efficiency than extending in full matrix, because the elements is stored in a compressed manner. Setting an "inner" entry may cause large memory overwrites(have a look here).
So you'd better edit all the entries in advance and create with sparse() function, rather than call sparse() and then pad the data.