MATLAB: how to calculate the distribution of elements in matrix - matlab

I have a matrix A with integer elements from 0 to N-1.
What I need to get is vector V of length N which for each position "i" will contain number of elements equal to "i" in matrix A.
For example:
N = 6
A:
0 0 1
1 2 3
3 5 0
V:
3 2 1 2 0 1 0
What is the efficient way to do this?
My real matrix is about 10K x 10K elements and N is about 100.

Use v = histc(A(:), 0:(N-1)). To get exactly your result, perform v = v'.

You want to use histc (after reshape to convert to a vector)
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts.
V = histc(reshape(A,1,[]), 0:(N-1) );

Related

Finding equal rows in Matlab

I have a matrix suppX in Matlab with size GxN and a matrix A with size MxN. I would like your help to construct a matrix Xresponse with size GxM with Xresponse(g,m)=1 if the row A(m,:) is equal to the row suppX(g,:) and zero otherwise.
Let me explain better with an example.
suppX=[1 2 3 4;
5 6 7 8;
9 10 11 12]; %GxN
A=[1 2 3 4;
1 2 3 4;
9 10 11 12;
1 2 3 4]; %MxN
Xresponse=[1 1 0 1;
0 0 0 0;
0 0 1 0]; %GxM
I have written a code that does what I want.
Xresponsemy=zeros(size(suppX,1), size(A,1));
for x=1:size(suppX,1)
Xresponsemy(x,:)=ismember(A, suppX(x,:), 'rows').';
end
My code uses a loop. I would like to avoid this because in my real case this piece of code is part of another big loop. Do you have suggestions without looping?
One way to do this would be to treat each matrix as vectors in N dimensional space and you can find the L2 norm (or the Euclidean distance) of each vector. After, check if the distance is 0. If it is, then you have a match. Specifically, you can create a matrix such that element (i,j) in this matrix calculates the distance between row i in one matrix to row j in the other matrix.
You can treat your problem by modifying the distance matrix that results from this problem such that 1 means the two vectors completely similar and 0 otherwise.
This post should be of interest: Efficiently compute pairwise squared Euclidean distance in Matlab.
I would specifically look at the answer by Shai Bagon that uses matrix multiplication and broadcasting. You would then modify it so that you find distances that would be equal to 0:
nA = sum(A.^2, 2); % norm of A's elements
nB = sum(suppX.^2, 2); % norm of B's elements
Xresponse = bsxfun(#plus, nB, nA.') - 2 * suppX * A.';
Xresponse = Xresponse == 0;
We get:
Xresponse =
3×4 logical array
1 1 0 1
0 0 0 0
0 0 1 0
Note on floating-point efficiency
Because you are using ismember in your implementation, it's implicit to me that you expect all values to be integer. In this case, you can very much compare directly with the zero distance without loss of accuracy. If you intend to move to floating-point, you should always compare with some small threshold instead of 0, like Xresponse = Xresponse <= 1e-10; or something to that effect. I don't believe that is needed for your scenario.
Here's an alternative to #rayryeng's answer: reduce each row of the two matrices to a unique identifier using the third output of unique with the 'rows' input flag, and then compare the identifiers with singleton expansion (broadcast) using bsxfun:
[~, ~, w] = unique([A; suppX], 'rows');
Xresponse = bsxfun(#eq, w(1:size(A,1)).', w(size(A,1)+1:end));

sum matrix using logical matrix - index exceeds matrix dimensions

I have two matrices.
mcaps which is a double 1698 x 2
index_g which is a logical 1698 x 2
When using the line of code below I get the error message that Index exceeds matrix dimensions. I don't see how this is the case though?
tsp = nansum(mcaps(index_g==1, :));
Update
Sorry I should have mentioned that I need the sum of each column in the mcaps vector
** Example of data **
mcaps index_g
5 6 0 0
4 3 0 0
6 5 1 1
4 6 0 1
8 7 0 0
There are two problems here. I missed one. Original answer is below.
What I missed is that when you use the logical index in this way, you are picking out elements of the matrix that may have different numbers of elements in each column, so MATLAB can't return a well formed matrix back to nansum, and so returns a vector. To get around this, use the fact that 0 + anything = 0
% create a mask of values you don't want to sum. Note that since
% index_g is already logical, you don't have to test equal to 1.
mask = ~index_g & isnan(mcaps)
% create a temporary variable
mcaps_to_sum = mcaps;
% change all of the values that you don't want to sum to zero
mcaps_to_sum(mask) = 0;
% do the sum
sum(mcaps_to_sum,1);
This is basically all that the nansum function does internally, is to set all of the NaN values to zero and then call the sum function.
index_g == 1 returns a 1698 x 2 logical matrix, but then you add in an extra dimension with the colon. To sum the columns, use the optional dim input. You want:
tsp = nansum(mcaps(index_g == 1),1);

Matlab:Efficient assignment of values in a sparse matrix

I'm working in Matlab and I have the next problem:
I have a B matrix of nx2 elements, which contains indexes for the assignment of a big sparse matrix A (almost 500,000x80,000). For each row of B, the first column is the column index of A that has to contain a 1, and the second column is the column index of A that has to contain -1.
For example:
B= 1 3
2 5
1 5
4 1
5 2
For this B matrix, The Corresponding A matrix has to be like this:
A= 1 0 -1 0 0
0 1 0 0 -1
1 0 0 0 -1
-1 0 0 1 0
0 -1 0 0 1
So, for the row i of B, the corresponding row i of A must be full of zeros except on A(i,B(i,1))=1 and A(i,B(i,2))=-1
This is very easy with a for loop over all the rows of B, but it's extremely slow. I also tried the next formulation:
A(:,B(:,1))=1
A(:,B(:,2))=-1
But matlab gave me an "Out of Memory Error". If anybody knows a more efficient way to achieve this, please let me know.
Thanks in advance!
You can use the sparse function:
m = size(B,1); %// number of rows of A. Or choose larger if needed
n = max(B(:)); %// number of columns of A. Or choose larger if needed
s = size(B,1);
A = sparse(1:s, B(:,1), 1, m, n) + sparse(1:s, B(:,2), -1, m, n);
I think you should be able to do this using the sub2ind function. This function converts matrix subscripts to linear indices. You should be able to do it like so:
pind = sub2ind(size(A),1:n,B(:,1)); % positive indices
nind = sub2ind(size(A),1:n,B(:,2)); % negative indices
A(pind) = 1;
A(nind) = -1;
EDIT: I (wrongly, I think) assumed the sparse matrix A already existed. If it doesn't exist, then this method wouldn't be the best option.

Creating matrix from two vectors of (duplicated) indices in MATLAB

Suppose now I have two vectors of same length:
A = [1 2 2 1];
B = [2 1 2 2];
I would like to create a matrix C whose dim=m*n, m=max(A), n=max(B).
C = zeros(m,n);
for i = 1:length(A)
u = A(i);
v = B(i);
C(u,v)=C(u,v)+1;
end
and get
C =[0 2;
1 1]
More precisely, we treat the according indices in A and B as rows and columns in C, and C(u,v) is the number of elements in {k | A(i)=u and B(i)=v, i = 1,2,...,length(A)}
Is there a faster way to do that?
Yes. Use sparse. It assembles (i.e., sums up) the matrix values for repeating row-column pairs for you. You need an additional vector with the values that will be assembled into the matrix entries. If you use ones(size(A)), you will have exactly what you need - counting of repeated row-column pairs
spA=sparse(A, B, ones(size(A)));
full(spA)
ans =
0 2
1 1
The same can be obtained by simply passing scalar 1 to sparse function instead of a vector of values.
For matrices that have a large number of zero entries this is absolutely crucial that you use sparse storage. Another function you could use is accumarray. It can essentially do the same thing, but also works on dense matrix structure:
AA=accumarray([A;B]', 1);
AA =
0 2
1 1
You can pass size argument to accumarray if you want to create a matrix of specific size
AA=accumarray([A;B]', 1, [2 3]);
AA =
0 2 0
1 1 0
Note that you can actually also make it produce sparse matrices, and use a different operator in assembly (i.e., not necessarily a sum)
AA=accumarray([A;B]', 1, [2 3], #sum, 0, true)
will produce a sparse matrix (last parameter set to true) using sum for assembly and 0 as a fill value, i.e. a value which is used in cases a given row-column pair does not exist in A/B.

Splitting a matrix based on its contents in MATLAB

A matrix has m rows and n columns (n being a number not exceeding 10), and the nth column contains either 1 or 0 (binary). I want to use this binary as a decision to take out the associated row (if 1, or otherwise if 0). I understand that this can be done through iteration with the use of the IF conditional.
However, this may become impractical with matrices whose number of rows m gets into the hundreds (up to 1000). What other procedures are available?
You can use logical datatypes for indexing. For example,
M =
1 2 0
4 5 1
7 8 0
M = [1 2 0;4 5 1;7 8 0];
v = (M(:,n) == 1);
M(v,2) = 1;
M =
1 2 0
4 1 1
7 8 0
Now you have set all the elements in column 2 to 1 if the corresponding element in column n is true.
Note that the v = (M(:,n) == 1) converts the nth column to a logical vector. You can accomplish the same with v = logical(M(:,n));
I would recommend this blog entry for a detailed look at logical indexing.
Update:
If you want to erase rows, then use:
M(v,:) = [];