Count non-zero entries in each column of a matrix - matlab

If I have a matrix:
A = [1 2 3 4 5; 1 1 6 1 2; 0 0 9 0 1]
A =
1 2 3 4 5
1 1 6 1 2
0 0 9 0 1
How can I count the number of non-zero entries for each column? For example the desired output for this matrix would be:
2, 2, 3, 2, 3
I am not sure how to do this as size, length or numel do not appear to meet the requirements. Perhaps it would be best to remove zero entries first?

It's simply
> A ~= 0
ans =
1 1 1 1 1
1 1 1 1 1
0 0 1 0 1
> sum(A ~= 0, 1)
ans =
2 2 3 2 3

Here's another solution I can suggest that isn't very speed worthy for dense matrices but quite fast for sparse matrices (thanks #user1877862!). This also would mimic how one might do this in a compiled language, like C or Java, and perhaps for research purposes too. First find the row and column locations that are non zero, then do a histogram on just the column locations to count the frequency of how often you see a non-zero in each column. In other words:
[~,col] = find(A ~= 0);
counts = histc(col, 1:size(A,2));
find outputs the row and column locations of where a matrix satisfies some Boolean condition inside the argument of the function. We ignore the first output as we aren't concerned with the row locations.
The output we get is:
counts =
2
2
3
2
3

Related

Calculate mean of all values below the diagnonal line in an adjacency matrix

I have an adjacency matrix M, something like this:
[1 2 0 2 4
2 1 2 0 -1
0 3 1 2 3
2 0 2 1 0
4 -1 3 0 1]
I want to calculate the mean of all values below (but not including) the diagonal. The final output should be 1.5.
To get those values, I thought I'd use N = tril(M,-1). The issue is that I now have zeros in upper and lower part of the matrix N and therefore mean(sum(N)./sum(N~=0)) wouldn't work. Since I also have negative values, I can't just do the mean of values >=0 either. How can I do this?
In one line using logical indexing to extract just the values below the diagonal:
M = [ 1 2 0 2 4;
2 1 2 0 -1;
0 3 1 2 3;
2 0 2 1 0;
4 -1 3 0 1];
mean(M(tril(true(size(M)),-1)))
This returns 1.5 as #excaza indicated.

MATLAB generate all ways that n items can be put into m bins?

I want to find all ways that n items can be split among m bins. For example, for n=3 and m=3 the output would be (the order doesn't matter):
[3 0 0
0 3 0
0 0 3
2 1 0
1 2 0
0 1 2
0 2 1
1 0 2
2 0 1
1 1 1]
The algorithm should be as efficient as possible, preferrably vectorized/using inbuilt functions rather than for loops. Thank you!
This should be pretty efficient.
It works by generating all posible splitings of the real interval [0, n] at m−1 integer-valued, possibly coincident split points. The lengths of the resulting subintervals give the solution.
For example, for n=4 and m=3, some of the possible ways to split the interval [0, 4] at m−1 points are:
Split at 0, 0: this gives subintervals of lenghts 0, 0, 4.
Split at 0, 1: this gives subintervals of lenghts 0, 1, 3.
...
Split at 4, 4: this gives subintervals of lenghts 4, 0, 0.
Code:
n = 4; % number of items
m = 3; % number of bins
x = bsxfun(#minus, nchoosek(0:n+m-2,m-1), 0:m-2); % split points
x = [zeros(size(x,1),1) x n*ones(size(x,1),1)]; % add start and end of interval [0, n]
result = diff(x.').'; % compute subinterval lengths
The result is in lexicographical order.
As an example, for n = 4 items in m = 3 bins the output is
result =
0 0 4
0 1 3
0 2 2
0 3 1
0 4 0
1 0 3
1 1 2
1 2 1
1 3 0
2 0 2
2 1 1
2 2 0
3 0 1
3 1 0
4 0 0
I'd like to suggest a solution based on an external function and accumarray (it should work starting R2015a because of repelem):
n = uint8(4); % number of items
m = uint8(3); % number of bins
whichBin = VChooseKR(1:m,n).'; % see FEX link below. Transpose saves us a `reshape()` later.
result = accumarray([repelem(1:size(whichBin,2),n).' whichBin(:)],1);
Where VChooseKR(V,K) creates a matrix whose rows are all combinations created by choosing K elements of the vector V with repetitions.
Explanation:
The output of VChooseKR(1:m,n) for m=3 and n=4 is:
1 1 1 1
1 1 1 2
1 1 1 3
1 1 2 2
1 1 2 3
1 1 3 3
1 2 2 2
1 2 2 3
1 2 3 3
1 3 3 3
2 2 2 2
2 2 2 3
2 2 3 3
2 3 3 3
3 3 3 3
All we need to do now is "histcount" the numbers on each row using positive integer bins to get the desired result. The first output row would be [4 0 0] because all 4 elements go in the 1st bin. The second row would be [3 1 0] because 3 elements go in the 1st bin and 1 in the 2nd, etc.

Transform a matrix to a stacked vector where all zeroes after the last non-zero value per row are removed

I have a matrix with some zero values I want to erase.
a=[ 1 2 3 0 0; 1 0 1 3 2; 0 1 2 5 0]
>>a =
1 2 3 0 0
1 0 1 3 2
0 1 2 5 0
However, I want to erase only the ones after the last non-zero value of each line.
This means that I want to retain 1 2 3 from the first line, 1 0 1 3 2 from the second and 0 1 2 5 from the third.
I want to then store the remaining values in a vector. In the case of the example this would result in the vector
b=[1 2 3 1 0 1 3 2 0 1 2 5]
The only way I figured out involves a for loop that I would like to avoid:
b=[];
for ii=1:size(a,1)
l=max(find(a(ii,:)));
b=[b a(ii,1:l)];
end
Is there a way to vectorize this code?
There are many possible ways to do this, here is my approach:
arotate = a' %//rotate the matrix a by 90 degrees
b=flipud(arotate) %//flips the matrix up and down
c= flipud(cumsum(b,1)) %//cumulative sum the matrix rows -and then flip it back.
arotate(c==0)=[]
arotate =
1 2 3 1 0 1 3 2 0 1 2 5
=========================EDIT=====================
just realized cumsum can have direction parameter so this should do:
arotate = a'
b = cumsum(arotate,1,'reverse')
arotate(b==0)=[]
This direction parameter was not available on my 2010b version, but should be there for you if you are using 2013a or above.
Here's an approach using bsxfun's masking capability -
M = size(a,2); %// Save size parameter
at = a.'; %// Transpose input array, to be used for masked extraction
%// Index IDs of last non-zero for each row when looking from right side
[~,idx] = max(fliplr(a~=0),[],2);
%// Create a mask of elements that are to be picked up in a
%// transposed version of the input array using BSXFUN's broadcasting
out = at(bsxfun(#le,(1:M)',M+1-idx'))
Sample run (to showcase mask usage) -
>> a
a =
1 2 3 0 0
1 0 1 3 2
0 1 2 5 0
>> M = size(a,2);
>> at = a.';
>> [~,idx] = max(fliplr(a~=0),[],2);
>> bsxfun(#le,(1:M)',M+1-idx') %// mask to be used on transposed version
ans =
1 1 1
1 1 1
1 1 1
0 1 1
0 1 0
>> at(bsxfun(#le,(1:M)',M+1-idx')).'
ans =
1 2 3 1 0 1 3 2 0 1 2 5

MATLAB what lines are different between matrices

I am trying to find the number of the lines where the values of two matrices are not the same
I found only a way to know the indexs on the not same items by:
find(a~=b)
where a is N*N and b is N*N
How can I know the rows numbers of the not same items
ps
looking for nicer way then
dint the find and then having some vector in a loop filling with
ind2sub(size(A), 6)
You can use max on the logical array of such matches or mis-mistaches in this case along a certain dimension, alongwith find.
If you are looking to find unique row IDs for mismatches, do this -
find(max(a~=b,[],2))
For unique column IDs, just change the dimension specifier -
find(max(a~=b,[],1))
Sample run -
>> a
a =
1 2 2 2 1
1 2 1 1 1
2 2 2 2 1
1 1 2 1 1
>> b
b =
1 2 1 1 2
1 2 1 2 1
2 2 2 2 1
1 1 2 2 2
>> a~=b
ans =
0 0 1 1 1
0 0 0 1 0
0 0 0 0 0
0 0 0 1 1
>> find(max(a~=b,[],2)) %// unique row IDs
ans =
1
2
4
>> find(max(a~=b,[],1)) %// unique col IDs
ans =
3 4 5
here I found an easy way if any one will need it
indexs=find(a~=b)
[~,rows]=ind2sub(size(a),indexs)
rows=unique( sort( rows ) )
now rows are only the different rows
NotSame = 0;
for ii = 1:size(a,1)
if a(ii,:) ~= b(ii,:)
NotSame = NotSame+1;
end
end
This checks it row by row and when a row in a is not the same as the row in b this will increase the count of NotSame. Not the fastest way, I'm sure someone can produce a solution using bsxfun, but I'm not an expert in that.
You can also use the double output of find
[row, col] = find(a~=b)
myrows = unique(row);
You can also have the columns where a & b have different values
mycols = unique(col);

How to count patterns columnwise in Matlab?

I have a matrix S in Matlab that looks like the following:
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1
I would like to count patterns of values column-wise. I am interested into the frequency of the numbers that follow right after number 3 in any of the columns. For instance, number 3 occurs three times in the first column. The first time we observe it, it is followed by 3, the second time it is followed by 3 again and the third time it is followed by 4. Thus, the frequency for the patters observed in the first column would look like:
3-3: 66.66%
3-4: 33.33%
3-1: 0%
3-2: 0%
To generate the output, you could use the convenient tabulate
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1];
idx = find(S(1:end-1,:)==3);
S2 = S(2:end,:);
tabulate(S2(idx))
Value Count Percent
1 0 0.00%
2 0 0.00%
3 4 66.67%
4 2 33.33%
Here's one approach, finding the 3's then looking at the following digits
[i,j]=find(S==3);
k=i+1<=size(S,1);
T=S(sub2ind(size(S),i(k)+1,j(k))) %// the elements of S that are just below a 3
R=arrayfun(#(x) sum(T==x)./sum(k),1:max(S(:))).' %// get the number of probability of each digit
I'm going to restate your problem statement in a way that I can understand and my solution will reflect this new problem statement.
For a particular column, locate the locations that contain the number 3.
Look at the row immediately below these locations and look at the values at these locations
Take these values and tally up the total number of occurrences found.
Repeat these for all of the columns and update the tally, then determine the percentage of occurrences for the values.
We can do this by the following:
A = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// Define your matrix
[row,col] = find(A(1:end-1,:) == 3);
vals = A(sub2ind(size(A), row+1, col));
h = 100*accumarray(vals, 1) / numel(vals)
h =
0
0
66.6667
33.3333
Let's go through the above code slowly. The first few lines define your example matrix A. Next, we take a look at all of the rows except for the last row of your matrix and see where the number 3 is located with find. We skip the last row because we want to be sure we are within the bounds of your matrix. If there is a number 3 located at the last row, we would have undefined behaviour if we tried to check the values below the last because there's nothing there!
Once we do this, we take a look at those values in the matrix that are 1 row beneath those that have the number 3. We use sub2ind to help us facilitate this. Next, we use these values and tally them up using accumarray then normalize them by the total sum of the tallying into percentages.
The result would be a 4 element array that displays the percentages encountered per number.
To double check, if we look at the matrix, we see that the value of 3 follows other values of 3 for a total of 4 times - first column, row 3, row 4, second column, row 2 and third column, row 6. The value of 4 follows the value of 3 two times: first column, row 6, second column, row 3.
In total, we have 6 numbers we counted, and so dividing by 6 gives us 4/6 or 66.67% for number 3 and 2/6 or 33.33% for number 4.
If I got the problem statement correctly, you could efficiently implement this with MATLAB's logical indexing and an approach that is essentially of two lines -
%// Input 2D matrix
S = [
2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]
Labels = [1:4]'; %//'# Label array
counts = histc(S([false(1,size(S,2)) ; S(1:end-1,:) == 3]),Labels)
Percentages = 100*counts./sum(counts)
Verify/Present results
The styles for presenting the output results listed next use MATLAB's table for a well human-readable format of data.
Style #1
>> table(Labels,Percentages)
ans =
Labels Percentages
______ ___________
1 0
2 0
3 66.667
4 33.333
Style #2
You can do some fancy string operations to present the results in a more "representative" manner -
>> Labels_3 = strcat('3-',cellstr(num2str(Labels','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-1' 0
'3-2' 0
'3-3' 66.667
'3-4' 33.333
Style #3
If you want to present them in descending sorted manner based on the percentages as listed in the expected output section of the question, you can do so with an additional step using sort -
>> [Percentages,idx] = sort(Percentages,'descend');
>> Labels_3 = strcat('3-',cellstr(num2str(Labels(idx)','%1d')'));
>> table(Labels_3,Percentages)
ans =
Labels_3 Percentages
________ ___________
'3-3' 66.667
'3-4' 33.333
'3-1' 0
'3-2' 0
Bonus Stuff: Finding frequency (counts) for all cases
Now, let's suppose you would like repeat this process for say 1, 2 and 4 as well, i.e. find occurrences after 1, 2 and 4 respectively. In that case, you can iterate the above steps for all cases and for the same you can use arrayfun -
%// Get counts
C = cell2mat(arrayfun(#(n) histc(S([false(1,size(S,2)) ; S(1:end-1,:) == n]),...
1:4),1:4,'Uni',0))
%// Get percentages
Percentages = 100*bsxfun(#rdivide, C, sum(C,1))
Giving us -
Percentages =
90.9091 20.0000 0 100.0000
9.0909 20.0000 0 0
0 60.0000 66.6667 0
0 0 33.3333 0
Thus, in Percentages, the first column are the counts of [1,2,3,4] that occur right after there is a 1 somewhere in the input matrix. As as an example, one can see column -3 of Percentages is what you had in the sample output when looking for elements right after 3 in the input matrix.
If you want to compute frequencies independently for each column:
S = [2 2 1 2
2 3 1 1
3 3 1 1
3 4 1 1
3 1 2 1
4 1 3 1
1 1 3 1]; %// data: matrix
N = 3; %// data: number
r = max(S(:));
[R, C] = size(S);
[ii, jj] = find(S(1:end-1,:)==N); %// step 1
count = full(sparse(S(ii+1+(jj-1)*R), jj, 1, r, C)); %// step 2
result = bsxfun(#rdivide, count, sum(S(1:end-1,:)==N)); %// step 3
This works as follows:
find is first applied to determine row and col indices of occurrences of N in S except its last row.
The values in the entries right below the indices of step 1 are accumulated for each column, in variable count. The very convenient sparse function is used for this purpose. Note that this uses linear indexing into S.
To obtain the frequencies for each column, count is divided (with bsxfun) by the number of occurrences of N in each column.
The result in this example is
result =
0 0 0 NaN
0 0 0 NaN
0.6667 0.5000 1.0000 NaN
0.3333 0.5000 0 NaN
Note that the last column correctly contains NaNs because the frequency of the sought patterns is undefined for that column.