Am I using histc wrong, or is this MATLAB's fault? - matlab

Ok, here's some code in MATLAB:
data = [1 1.5 2 3 4 4.5 5 6 7 7 7 0 0 0];
histc(data, [1:1:5])
histc(data, [1:1:5, inf])
histc(data, [-inf, 1:1:5])
which outputs the following:
ans = 2 1 1 2 1
ans = 2 1 1 2 5 0
ans = 3 2 1 1 2 1
My question is, why does MATLAB return a useless 0 when you use inf in the bin size (to mean >= 5 in this case)?
Won't it always be zero? The help says the output will always be the same length as the bin size, but isn't that a bad spec in this case?

That's actually the correct behavior of HISTC. When you use the syntax:
n = histc(x,edges);
then, from the documentation:
n(k) counts the value x(i) if edges(k)
<= x(i) < edges(k+1). The last bin
counts any values of x that match
edges(end).
Therefore, the last edge value you give returns the count of how many things exactly match it. When inf is the last edge value, that counts 0 (i.e. there are no infs in the data). When 5 is the last edge value, it exactly matches 1 value in the data.

Related

How to find minimum of each column in MATLAB and sum op all elements of that column?

I have a 300x178 matrix and I want to find the minimum of each column of that matrix, i.e. resulting in a 1x178 array. Then I want to store the sum of all elements but the minimum in each column in the 300x178 matrix on the location/pixel of the minimum value, leaving all other elements zero. How can I do this using MATLAB?
Example:
1 4 6 3
2 6 7 4
5 1 5 7
becomes:
1 0 0 1
0 0 0 0
0 1 1 0
and eventually:
8 0 0 14
0 0 0 0
0 11 18 0
Your example and title do not correspond to the question text. Your example sums all values in a column and stores them at the location of the minimum, which the title also asks. You can do this by making smart use of sub2ind:
A = [1 4 6 3
2 6 7 4
5 1 5 7];
C = zeros(size(A));
[tmp, idx] = min(A); % find the locations of minima
% one liner to store the sum of columns
C(sub2ind(size(A), idx, 1:size(A,2))) = sum(A,1);
C =
8 0 0 14
0 0 0 0
0 11 18 0
If, on the other hand, you're after what your question text asks about, subsequently subtract A on the minimum locations using the same sub2ind trick:
C(sub2ind(size(A), idx, 1:size(A,2))) = C(sub2ind(size(A), idx, 1:size(A,2))) - A(sub2ind(size(A), idx, 1:size(A,2)))
C =
7 0 0 11
0 0 0 0
0 10 13 0
This way you get the sum of all elements but the minimum.
For an in-depth explanation what sub2ind does, read this fantastic Q/A by Luis Mendo keeping in mind that in A(2,3) the 2 and 3 are called subscripts, which, in case of a 3-by-4 matrix, translates to linear index 8.
I cannot test this on my R2007b, but according to the documentation on min you could use [M, I] = min(A, [], 1, 'linear') to get the linear indices into I directly, without going through sub2ind:
C = zeros(size(A));
[tmp, idx] = min(A, [], 1, 'linear');
C(idx) = sum(A, 1);
% Optional, to sum all but the minimum
C(idx) = C(idx) - A(idx);
Small note from the documentation on the occurrence of multiple same-valued minima in your original matrix:
If the smallest element occurs more than once, then I contains the index to the first occurrence of the value.

Remove single elements from a vector

I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?
Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);
You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.
Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online
X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.

Count the number of the first zero elements

I would line to find the number of the first consecutive zero elements. For example in [0 0 1 -5 3 0] we have two zero consecutive elements that appear first in the vector.
could you please suggest a way without using for loops?
V=[0 0 1 -5 3 0] ;
k=find(V);
Number_of_first_zeros=k(1)-1;
Or,
Number_of_first_zeros=find(V,1,'first')-1;
To solve #The minion comment (if that was the purpose):
Number_of_first_zeros=find(V(find(~V,1,'first'):end),1,'first')-find(~V,1,'first');
Use a logical array to find the zeros and then look at where the zeros and ones are alternating.
V=[1 2 0 0 0 3 5123];
diff(V==0)
ans =
0 1 0 0 -1 0
Create sample data
V=[1 2 0 0 0 3 5123];
Find the zeros. The result will be a logical array where 1 represents the location of the zeros
D=V==0
D =
0 0 1 1 1 0 0
Take the difference of that array. 1 would then represent the start and -1 would represent the end.
T= diff(D)
ans =
0 1 0 0 -1 0
find(T==1) would give you the start and find(T==-1) would give you the end. The first index+1 of T==1 would be the start of the first set of zeros and the first index of T==-1 would be the end of the first set of zeros.
You could find position the first nonzero element using find.
I=find(A, 1);
The number of leading zeros is then I-1.
My solution is quite complex yet it doesn't use the loops and it does the trick. I am pretty sure, that there is a more direct approach.
Just in case no one else posts a working solution here my idea.
x=[1 2 4 0 20 0 10 1 23 45];
x1=find(x==0);
if numel(x1)>1
x2=[x1(2:end), 0];
x3=x2-x1;
y=find(x3~=1);
y(1)
elseif numel(x1)==1
display(1)
else
display('No zero found')
end
x is the dataset. x1 contains the index of all zero elements. x2 contains all those indices except the first one (because matrix dimensions must agree, one zero is added. x3 is the difference between the index and the previous index of zeros in your dataset. Now I find all those differences which are not 1 (do not correspond to sequences of zeros) and the first index (of this data is the required result. The if case is needed in case you have only one or no zero at all.
I'm assuming your question is the following: for the following vector [0 0 1 -5 3 0], I would like to find the index of the first element of a pair of 0 values. Is this correct? Therefore, the desired output for your vector would be '1'?
To extend the other answers to find any such pairs, not just 0 0 (eg. 0 1, 0 2, 3 4 etc), then this might help.
% define the pattern
ptrn = [ 0 0 ];
difference = ptrn(2) - ptrn(1)
V = [0 0 1 -5 3 0 0 2 3 4 0 0 1 0 0 0]
x = diff(V) == difference
indices = find(x)
indices =
1 6 11 14 15

Count non-zero entries in each column of a matrix

If I have a matrix:
A = [1 2 3 4 5; 1 1 6 1 2; 0 0 9 0 1]
A =
1 2 3 4 5
1 1 6 1 2
0 0 9 0 1
How can I count the number of non-zero entries for each column? For example the desired output for this matrix would be:
2, 2, 3, 2, 3
I am not sure how to do this as size, length or numel do not appear to meet the requirements. Perhaps it would be best to remove zero entries first?
It's simply
> A ~= 0
ans =
1 1 1 1 1
1 1 1 1 1
0 0 1 0 1
> sum(A ~= 0, 1)
ans =
2 2 3 2 3
Here's another solution I can suggest that isn't very speed worthy for dense matrices but quite fast for sparse matrices (thanks #user1877862!). This also would mimic how one might do this in a compiled language, like C or Java, and perhaps for research purposes too. First find the row and column locations that are non zero, then do a histogram on just the column locations to count the frequency of how often you see a non-zero in each column. In other words:
[~,col] = find(A ~= 0);
counts = histc(col, 1:size(A,2));
find outputs the row and column locations of where a matrix satisfies some Boolean condition inside the argument of the function. We ignore the first output as we aren't concerned with the row locations.
The output we get is:
counts =
2
2
3
2
3

Creating a variable with unequal rows

I want to create a variable that finds a pattern (let's say [1 1]) in different rows of a matrix (A). Of course there aren't an equal number of occurrences of this string in each row.
A = [ 0 0 0 1 1
1 1 1 0 0
0 1 0 1 1
1 1 1 0 0
0 1 0 0 1
1 0 1 1 1
0 1 0 1 0
1 1 1 0 1];
I could do:
for i = 1:n
var(i,:) = strfind(A(i,:),[1 1]);
end
but then both sides of the equation won't be equal.
ERROR: ??? Subscripted assignment dimension mismatch.
I try to preallocate. I create a matrix with what I think would be the maximum number of occurrences of this string in each row of matrix A (let's say 50).
for i = 1:n
var(i, :) = NaN(1,50)
end
That's followed by the previous bit of code and it's no good either.
I've also tried:
for i = 1:n
var(i,1:numel(strfind(A(i,:),[1 1])) = strfind(A(i,:),[1 1])
end
Error: The expression to the left of the equals sign is not a valid
target for an assignment.
How should I go about doing this?
The output I expect is a matrix var(i,:) that gives me the position in the matrix where each of these patterns occur. It works fine for just one row.
For example:
var(1,:) = [1 2 5 8 10 22 48]
var(2,:) = [2 3 4 7 34 45 NaN]
var(3,:) = [4 5 21 32 33 NaN]
Thanks!
In your first try: you tried to build a matrix with different length of rows.
In your second try: you pre-allocated, but then run it over by re-definning var(i,:), while you tried to put there your desired result.
In your third try: unfortunately you just missed one brackets- ) at the end of left expression.
This code suppose to work (what you did at 2nd and 3rd attempts, with pre-allocate and fixed brackets):
var=NaN(1,50);
for i = 1:n
var(i,1:numel(strfind(A(i,:),[1 1]))) = strfind(A(i,:),[1 1])
end