Understanding Matlab histcounts behavior

Understanding Matlab histcounts behavior - matlab

histcounts(1:100,'BinWidth',50)
returns
49 51
Why doesn't it return
50 50
instead?

Histogramming 1 to 100 inclusive with h = histogram(1:100, 'BinWidth', 50) gives:
Let's see the bin edges:
h.BinEdges
ans =
0 50 100
From MATLAB's help:
Each bin includes the left edge, but does not include the right edge,
except for the last bin which includes both edges
That means that values 1 to 100 are histogrammed in this format:
Bin 1 => edges: [0 50) => Included values: [1, 2, 3, .., 49] (n = 49)
Bin 2 => edges: [50 100] => Included values: [50, 51, 52, .., 100] (n = 51)
histcount(X) partitions X in the same manner as histogram(X). Therefore, the results are what you should expect and in fact very reasonable.

Related

Get points which are within a given distance in two different matrices

I have two matrices A and B, in which the number of rows can vary. A and B do not necessarily have the same number of rows.
For example:
A = [ 110 90
130 140
230 50
370 210 ];
B = [ 321 95
102 35
303 200 ];
Now matrix A and B have 'corresponding points'. Corresponding points are rows where the values in the 2nd column of both matrices are within +/-20.
For example:
A(1,2) = 90 and B(1,2) = 95, the difference is within +/-20 so A(1,:) and B(1,:) are corresponding points.
A(2,2) = 140 and B(2,2) = 35, the difference is not within +/-20 so A(2,:) and B(2,:) are not corresponding points.
A(3,2) = 50 and B(2,2) = 35, the difference is within +/-20 so A(3,:) and B(2,:) are corresponding points.
Using this I want to store the corresponding points of A and B in C and D respectively. For the above example, the final matrices should look like this:
C = [ 110 90
230 50
370 210 ]
D = [ 321 95
102 35
303 200 ]

You can get all of the distances using pdist2
dists = pdist2( A(:,2), B(:,2) )
>> dists = [ 5 55 110
45 105 60
45 15 150
115 175 10 ]
Then get the indices of all 'corresponding points', as defined by a threshold of 20.
% Get combinations within tolerance
idx = dists < 20;
% Get indices
[iA, iB] = find(idx);
Then you can create the final matrices
C = A(iA, :);
D = B(iB, :);
Edit: One way to ensure each pairing is unique (i.e. A(1,:) cannot be paired with multiple rows from B) would be to get the minimum dists for each row/column. Note: this would still give you duplicate matches if the distances are exactly the same, you haven't defined how this should be handled.
dists = pdist2( A(:,2), B(:,2) );
% Set values which are greater than the row/column minima to be infinity.
% This means they will never be within the tolerance of 20 (or whatever else)
dists ( bsxfun(#gt, dists, min(dists,[],1)) | bsxfun(#gt, dists, min(dists,[],2)) ) = Inf;
% In MATLAB versions > 2016b, you can use implicit expansion to replace bsxfun
% That would be: dists( dists > min(dists,[],1) | dists > min(dists,[],2) )
% Now continue as before
[iA, iB] = find( dists < 20 );
C = A(iA, :);
D = B(iB, :);

Sort 2 arrays/vectors based off 1 vector

I have 2 arrays (vectors? in m vernacular?) and I want to sort them in unison. How can I achieve this in Matlab?
For example; I have found the peaks from a histogram and they are stored in 2 arrays; peakXVals, peakYVals. They will always be arranged in ascending x axis index. So they will always look like:
peakXVals = [0, 3, 20, 77, 240];
peakYVals = [10, 999, 30, 40, 20];
I wish to sort both arrays based of the values in peakYVals in descending order. Ie from largest peak to smallest peak. So the desired result is:
peakXVals = [3, 77, 20, 240, 0];
peakYVals = [999, 40, 30, 20, 10];
What function's can I use to achieve this in Matlab?

Use sort:
peakXVals = [0, 3, 20, 77, 240];
peakYVals = [10, 999, 30, 40, 20];
>> [B,I] = sort(peakYVals, 'descend')
B =
999 40 30 20 10
I =
2 4 3 5 1
Then:
>> peakXVals_sorted = peakXVals(I)
peakXVals_sorted =
3 77 20 240 0
>> peakYVals_sorted = B
peakYVals_sorted =
999 40 30 20 10

You can arrange the two vectors as columns of a matrix and sort the rows of that matrix as atoms, in lexicographical order. Then the results are the columns of the sorted matrix:
tmp = sortrows([peakYVals(:) peakXVals(:)], 'descend');
peakYVals = tmp(:,1).';
peakXVals = tmp(:,2).';

replacing values of a matrix with an if operation using matlab

mn = 1
for kn = 1:199
for sn = 1:19773
if abs((x1c{kn+1,1}(sn)) - (x1c{kn,1}(sn))) >= 20
extract{mn} = x1c{kn+1,1}(sn);
mn = mn+1;
end
end
end
extend = cell2mat(extract) + 40;
How can I change the values of "x1c" with the values of "extend"?

You are performing the operation on a cell. Considering you're comparing numbers, this would be done far more efficiently when done with matrices.
I therefor suggest you convert the cell (or a subset of it) to a matrix and then use vectorized operations, like this:
>> a={[13, 2, 3], [14, 25, 8], [100, 9, 10], [101, 8, 32], [140, 20, 3]};
>>
>> x = transpose(reshape(cell2mat(a), 3, []));
>> z = abs(x(2:end, :) - x(1:end-1,:)) > 20;
>> z2 = [zeros(1,3); z]
z2 =
0 0 0
0 1 0
1 0 0
0 0 1
1 0 1
>> x(logical(z2)) = x(logical(z2)) - 200
x =
13 2 3
14 -175 8
-100 9 10
101 8 -168
-60 20 -197
There are two alternatives if you really must use cells (I don't recommend it for speed reasons).
store the indices (k, sn) of the cell items where your condition holds true. And then you'd have to loop over the elements again (very inefficient).
You'd store the previous and next cell "row" in temporary variables and compare using those. When the condition holds, edit in-place and take the temporary variable with you in the next iteration of the loop. The code below shows how this is done:
a={[13, 2, 3], [14, 25, 8], [100, 9, 10], [101, 8, 32], [140, 20, 3]};
curr_row = a{1};
for rowind=1:4
next_row = a{rowind+1};
for colind=1:3
if abs(next_row(1, colind) - curr_row(1, colind)) > 20
a{rowind+1}(1, colind) = a{rowind+1}(1, colind) + 40;
end
end
curr_row = next_row;
end

Feature mapping using multi-variable polynomial

Consider we have a data-matrix of data points and we are interested to map those data points into a higher dimensional feature space. We can do this by using d-degree polynomials. Thus for a sequence of data points the new data-matrix is
I have studied a relevant script (Andrew Ng. online course) that make such a transform for 2-dimensional data points to a higher feature space. However, I could not figure out a way to generalize in arbitrary higher dimensional samples, . Here is the code:
d = 6;
m = size(D,1);
new = ones(m);
for k = 1:d
for l = 0:k
new(:, end+1) = (x1.^(k-l)).*(x2.^l);
end
end
Can we vectorize this code? Also given a data-matrix could you please suggest a way on how we can transform data points of arbitrary dimension to a higher one using a d-dimensional polynomial?
PS: A generalization of d-dimensional data points would be very helpful.

This solution can handle k variables and generate all the terms of a degree d polynomial where k and d are non-negative integers. Most of the code length is due to the combinatoric complexity of generating all the terms of a degree d polynomial in k variables.
It takes an n_obs by k data matrix X where n_obs is the number of observations and k is the number of variables.
Helper function
This function generates all possible rows such that every entry is a non-negative integer and the row sums to a positive integer:
the row [0, 1, 3, 0, 1] corresponds to (x1^0)*(x1^1)*(x2^3)*(x4^0)*(x5^1)
The function (which almost certainly could be written more efficiently) is:
function result = mg_sums(n_numbers, d)
if(n_numbers<=1)
result = d;
else
result = zeros(0, n_numbers);
for(i = d:-1:0)
rc = mg_sums(n_numbers - 1, d - i);
result = [result; i * ones(size(rc,1), 1), rc];
end
end
Initialization code
n_obs = 1000; % number observations
n_vars = 3; % number of variables
max_degree = 4; % order of polynomial
X = rand(n_obs, n_vars); % generate random, strictly positive data
stacked = zeros(0, n_vars); %this will collect all the coefficients...
for(d = 1:max_degree) % for degree 1 polynomial to degree 'order'
stacked = [stacked; mg_sums(n_vars, d)];
end
Final Step: Method 1
newX = zeros(size(X,1), size(stacked,1));
for(i = 1:size(stacked,1))
accumulator = ones(n_obs, 1);
for(j = 1:n_vars)
accumulator = accumulator .* X(:,j).^stacked(i,j);
end
newX(:,i) = accumulator;
end
Use either method 1 or method 2.
Final Step: Method 2 (requires all data in data matrix X is strictly positive (The problem is that if you have 0 elements, the -inf doesn't propagate properly when you call the matrix algebra routines.)
newX = real(exp(log(X) * stacked')); % multiplying log of data matrix by the
% matrix of all possible exponent combinations
% effectively raises terms to powers and multiplies them!
Example Run
X = [2, 3, 5];
max_degree = 3;
The stacked matrix and the polynomial term it represents are:
1 0 0 x1 2
0 1 0 x2 3
0 0 1 x3 5
2 0 0 x1.^2 4
1 1 0 x1.*x2 6
1 0 1 x1.*x3 10
0 2 0 x2.^2 9
0 1 1 x2.*x3 15
0 0 2 x3.^2 25
3 0 0 x1.^3 8
2 1 0 x1.^2.*x2 12
2 0 1 x1.^2.*x3 20
1 2 0 x1.*x2.^2 18
1 1 1 x1.*x2.*x3 30
1 0 2 x1.*x3.^2 50
0 3 0 x2.^3 27
0 2 1 x2.^2.*x3 45
0 1 2 x2.*x3.^2 75
0 0 3 x3.^3 125
If data matrix X is [2, 3, 5] this correctly generates:
newX = [2, 3, 5, 4, 6, 10, 9, 15, 25, 8, 12, 20, 18, 30, 50, 27, 45, 75, 125];
Where the 1st column is x1, 2nd is x2, 3rd is x3, 4th is x1.^2, 5th is x1.*x2 etc...

Select numbers from array which are much greater than the rest

Say there is an array of n elements, and out of n elements there be some numbers which are much bigger than the rest.
So, I might have:
16, 1, 1, 0, 5, 0, 32, 6, 54, 1, 2, 5, 3
In this case, I'd be interested in 32, 16 and 54.
Or, I might have:
32, 105, 26, 5, 1, 82, 906, 58, 22, 88, 967, 1024, 1055
In this case, I'd be interested in 1024, 906, 967 and 1055.
I'm trying to write a function to extract the numbers of interest. The problem is that I can't define a threshold to determine what's "much greater", and I can't just tell it to get the x biggest numbers because both of these will vary depending on what the function is called against.
I'm a little stuck. Does anyone have any ideas how to attack this?

Just taking all the numbers larger than the mean doesn't cut it all the time. For example if you only have one number which is much larger, but much more numbers wich are close to each other. The one large number won't shift the mean very much, which results in taking too many numbers:
data = [ones(1,10) 2*ones(1,10) 10];
data(data>mean(data))
ans =
2 2 2 2 2 2 2 2 2 2 10
If you look at the differences between numbers, this problem is solved:
>> data = [16, 1, 1, 0, 5, 0, 32, 6, 54, 1, 2, 5, 3];
sorted_data = sort(data);
dd = diff(sorted_data);
mean_dd = mean(dd);
ii = find(dd> 2*mean_dd,1,'first');
large_numbers = sorted_data(ii:end);
large_numbers =
6 16 32 54
the threshold value (2 in this case) lets you play with the meaning of "how much greater" a number has to be.

If it were me I'd use a little more statistical insight, that would give the most flexibility for the code in the future.
x = [1 2 3 2 2 1 4 6 15 83 2 4 22 81 0 8 7 7 7 3 1 2 3]
EpicNumbers = x( x>(mean(x) + std(x)) )
Then you can increase or decrease the number of standard deviations to broaden or tighten your threshold.
LessEpicNumbers = x( x>(mean(x) + 2*std(x)) )
MoreEpicNumbers = x( x>(mean(x) + 0.5*std(x)) )

A simple solution would be to use find and a treshold based on the mean value (or multiples thereof):
a = [16, 1, 1, 0, 5, 0, 32, 6, 54, 1, 2, 5, 3]
find(a>mean(a))