How to flip the definition of edges(1) and edges(end) for the histc function in MATLAB? - matlab

In MATLAB:
n = histc(x,edges);
is defined to behave as follows:
n(k) counts the value x(i) if edges(k)
<= x(i) < edges(k+1). The last bin
counts any values of x that match
edges(end).
Is there any way to flip the end behavior such that n(1) counts any values of x that match edges(1), and n(end) counts the values x(i) that satisfy edges(end-1) <= x(i) < edges(end)?

Consider the following code:
n = histc(x, [edges(1) edges]);
n(1) = sum(x==edges(1));
n(end) = [];
According to the question posted, the above will return:
n(1): counts any values of x that match edges(1)
n(k) [k~=1]: counts the value x(i) if edges(k-1) <= x(i) < edges(k)
This different from gnovice solution in that his answer uses the bounds: edges(k-1) < x(i) <= edges(k) (note the position of the equality sign).
To demonstrate, consider this simple example:
x = [0 1 1.5 2 2.5 4 6.5 8 10];
edges = 0:2:10;
>> n = fliplr(histc(-x,-fliplr(edges)))
n =
1 3 2 0 2 1
corresponding to the intervals:
0 (0,2] (2,4] (4,6] (6,8] (8,10]
Against:
>> n = histc(x, [edges(1) edges]);
>> n(1) = sum(x==edges(1));
>> n(end) = []
n =
1 3 2 1 1 1
corresponding to the intervals:
0 [0,2) [2,4) [4,6) [6,8) [8,10)

Since the edges argument has to have monotonically nondecreasing values, one way to flip the edge behavior is to negate and flip the edges argument and negate the values for binning. If you then flip the bin count output from HISTC, you should see the typical edge behavior of HISTC reversed:
n = fliplr(histc(-x,-fliplr(edges)));
The above uses FLIPLR, so x and edges should be row vectors (i.e. 1-by-N). This code will bin data according to the following criteria:
The first bin n(1) counts any values of x that match edges(1).
The other bins n(k) count the values x(i) such that edges(k-1) < x(i) <= edges(k).
Note that this flips the edge behavior of all the bins, not just the first and last bins! The typical behavior of HISTC for bin n(k) uses the equation edges(k) <= x(i) < edges(k+1) (Note the difference between the indices and which side has the equals sign!).
EDIT: After some discussion...
If you instead wanted to bin data according to the following criteria:
The first bin n(1) counts any values of x that match edges(1).
The second bin n(2) counts the values x(i) such that edges(1) < x(i) < edges(2).
The other bins n(k) count the values x(i) such that edges(k-1) <= x(i) < edges(k).
Then the following should accomplish this:
n = histc(x,[edges(1) edges(1)+eps(edges(1)) edges(2:end)]);
n(end) = [];
The first bin should capture only values equal to edges(1), while the lower edge of the second bin should start at an incremental value above edges(1) (found using the EPS function). The last bin, which counts the number of values equal to edges(end), is thrown out.

Related

Implementing Simplex Method infinite loop

I am trying to implement a simplex algorithm following the rules I was given at my optimization course. The problem is
min c'*x s.t.
Ax = b
x >= 0
All vectors are assumes to be columns, ' denotes the transpose. The algorithm should also return the solution to dual LP. The rules to follow are:
Here, A_J denotes columns from A with indices in J and x_J, x_K denotes elements of vector x with indices in J or K respectively. Vector a_s is column s of matrix A.
Now I do not understand how this algorithm takes care of condition x >= 0, but I decided to give it a try and follow it step by step. I used Matlab for this and got the following code.
X = zeros(n, 1);
Y = zeros(m, 1);
% i. Choose starting basis J and K = {1,2,...,n} \ J
J = [4 5 6] % for our problem
K = setdiff(1:n, J)
% this while is for goto
while 1
% ii. Solve system A_J*\bar{x}_J = b.
xbar = A(:,J) \ b
% iii. Calculate value of criterion function with respect to current x_J.
fval = c(J)' * xbar
% iv. Calculate dual solution y from A_J^T*y = c_J.
y = A(:,J)' \ c(J)
% v. Calculate \bar{c}^T = c_K^T - u^T A_K. If \bar{c}^T >= 0, we have
% found the optimal solution. If not, select the smallest s \in K, such
% that c_s < 0. Variable x_s enters basis.
cbar = c(K)' - c(J)' * inv(A(:,J)) * A(:,K)
cbar = cbar'
tmp = findnegative(cbar)
if tmp == -1 % we have found the optimal solution since cbar >= 0
X(J) = xbar;
Y = y;
FVAL = fval;
return
end
s = findnegative(c, K) %x_s enters basis
% vi. Solve system A_J*\bar{a} = a_s. If \bar{a} <= 0, then the problem is
% unbounded.
abar = A(:,J) \ A(:,s)
if findpositive(abar) == -1 % we failed to find positive number
disp('The problem is unbounded.')
return;
end
% vii. Calculate v = \bar{x}_J / \bar{a} and find the smallest rho \in J,
% such that v_rho > 0. Variable x_rho exits basis.
v = xbar ./ abar
rho = J(findpositive(v))
% viii. Update J and K and goto ii.
J = setdiff(J, rho)
J = union(J, s)
K = setdiff(K, s)
K = union(K, rho)
end
Functions findpositive(x) and findnegative(x, S) return the first index of positive or negative value in x. S is the set of indices, over which we look at. If S is omitted, whole vector is checked. Semicolons are omitted for debugging purposes.
The problem I tested this code on is
c = [-3 -1 -3 zeros(1,3)];
A = [2 1 1; 1 2 3; 2 2 1];
A = [A eye(3)];
b = [2; 5; 6];
The reason for zeros(1,3) and eye(3) is that the problem is inequalities and we need slack variables. I have set starting basis to [4 5 6] because the notes say that starting basis should be set to slack variables.
Now, what happens during execution is that on first run of while, variable with index 1 enters basis (in Matlab, indices go from 1 on) and 4 exits it and that is reasonable. On the second run, 2 enters the basis (since it is the smallest index such that c(idx) < 0 and 1 leaves it. But now on the next iteration, 1 enters basis again and I understand why it enters, because it is the smallest index, such that c(idx) < 0. But here the looping starts. I assume that should not have happened, but following the rules I cannot see how to prevent this.
I guess that there has to be something wrong with my interpretation of the notes but I just cannot see where I am wrong. I also remember that when we solved LP on the paper, we were updating our subjective function on each go, since when a variable entered basis, we removed it from the subjective function and expressed that variable in subj. function with the expression from one of the equalities, but I assume that is different algorithm.
Any remarks or help will be highly appreciated.
The problem has been solved. Turned out that the point 7 in the notes was wrong. Instead, point 7 should be

vectorize lookup values in table of interval limits

Here is a question about whether we can use vectorization type of operation in matlab to avoid writing for loop.
I have a vector
Q = [0.1,0.3,0.6,1.0]
I generate a uniformly distributed random vector over [0,1)
X = [0.11,0.72,0.32,0.94]
I want to know whether each entry of X is between [0,0.1) or [0.1,0.3) or [0.3,0.6), or [0.6,1.0) and I want to return a vector which contains the index of the maximum element in Q that each entry of X is less than.
I could write a for loop
Y = zeros(length(X),1)
for i = 1:1:length(X)
Y(i) = find(X(i)<Q, 1);
end
Expected result for this example:
Y = [2,4,3,4]
But I wonder if there is a way to avoid writing for loop? (I see many very good answers to my question. Thank you so much! Now if we go one step further, what if my Q is a matrix, such that I want check whether )
Y = zeros(length(X),1)
for i = 1:1:length(X)
Y(i) = find(X(i)<Q(i), 1);
end
Use the second output of max, which acts as a sort of "vectorized find":
[~, Y] = max(bsxfun(#lt, X(:).', Q(:)), [], 1);
How this works:
For each element of X, test if it is less than each element of Q. This is done with bsxfun(#lt, X(:).', Q(:)). Note each column in the result corresponds to an element of X, and each row to an element of Q.
Then, for each element of X, get the index of the first element of Q for which that comparison is true. This is done with [~, Y] = max(..., [], 1). Note that the second output of max returns the index of the first maximizer (along the specified dimension), so in this case it gives the index of the first true in each column.
For your example values,
Q = [0.1, 0.3, 0.6, 1.0];
X = [0.11, 0.72, 0.32, 0.94];
[~, Y] = max(bsxfun(#lt, X(:).', Q(:)), [], 1);
gives
Y =
2 4 3 4
Using bsxfun will help accomplish this. You'll need to read about it. I also added a Q = 0 at the beginning to handle the small X case
X = [0.11,0.72,0.32,0.94 0.01];
Q = [0.1,0.3,0.6,1.0];
Q_extra = [0 Q];
Diff = bsxfun(#minus,X(:)',Q_extra (:)); %vectorized subtraction
logical_matrix = diff(Diff < 0); %find the transition from neg to positive
[X_categories,~] = find(logical_matrix == true); % get indices
% output is 2 4 3 4 1
EDIT: How long does each method take?
I got curious about the difference between each solution:
Test Code Below:
Q = [0,0.1,0.3,0.6,1.0];
X = rand(1,1e3);
tic
Y = zeros(length(X),1);
for i = 1:1:length(X)
Y(i) = find(X(i)<Q, 1);
end
toc
tic
result = arrayfun(#(x)find(x < Q, 1), X);
toc
tic
Q = [0 Q];
Diff = bsxfun(#minus,X(:)',Q(:)); %vectorized subtraction
logical_matrix = diff(Diff < 0); %find the transition from neg to positive
[X_categories,~] = find(logical_matrix == true); % get indices
toc
Run it for yourself, I found that when the size of X was 1e6, bsxfun was much faster, while for smaller arrays the differences were varying and negligible.
SAMPLE: when size X was 1e3
Elapsed time is 0.001582 seconds. % for loop
Elapsed time is 0.007324 seconds. % anonymous function
Elapsed time is 0.000785 seconds. % bsxfun
Octave has a function lookup to do exactly that. It takes a lookup table of sorted values and an array, and returns an array with indices for values in the lookup table.
octave> Q = [0.1 0.3 0.6 1.0];
octave> x = [0.11 0.72 0.32 0.94];
octave> lookup (Q, X)
ans =
1 3 2 3
The only issue is that your lookup table has an implicit zero which be fixed easily with:
octave> lookup ([0 Q], X) # alternatively, just add 1 at the results
ans =
2 4 3 4
You can create an anonymous function to perform the comparison, then apply it to each member of X using arrayfun:
compareFunc = #(x)find(x < Q, 1);
result = arrayfun(compareFunc, X, 'UniformOutput', 1);
The Q array will be stored in the anonymous function ( compareFunc ) when the anonymous function is created.
Or, as one line (Uniform Output is the default behavior of arrayfun):
result = arrayfun(#(x)find(x < Q, 1), X);
Octave does a neat auto-vectorization trick for you if the vectors you have are along different dimensions. If you make Q a column vector, you can do this:
X = [0.11, 0.72, 0.32, 0.94];
Q = [0.1; 0.3; 0.6; 1.0; 2.0; 3.0];
X <= Q
The result is a 6x4 matrix indicating which elements of Q each element of X is less than. I made Q a different length than X just to illustrate this:
0 0 0 0
1 0 0 0
1 0 1 0
1 1 1 1
1 1 1 1
1 1 1 1
Going back to the original example you have, you can do
length(Q) - sum(X <= Q) + 1
to get
2 4 3 4
Notice that I have semicolons instead of commas in the definition of Q. If you want to make it a column vector after defining it, do something like this instead:
length(Q) - sum(X <= Q') + 1
The reason that this works is that Octave implicitly applies bsxfun to an operation on a row and column vector. MATLAB will not do this until R2016b according to #excaza's comment, so in MATLAB you can do this:
length(Q) - sum(bsxfun(#le, X, Q)) + 1
You can play around with this example in IDEOne here.
Inspired by the solution posted by #Mad Physicist, here is my solution.
Q = [0.1,0.3,0.6,1.0]
X = [0.11,0.72,0.32,0.94]
Temp = repmat(X',1,4)<repmat(Q,4,1)
[~, ind]= max( Temp~=0, [], 2 );
The idea is that make the X and Q into the "same shape", then use element wise comparison, then we obtain a logical matrix whose row tells whether a given element in X is less than each of the element in Q, then return the first non-zero index of each row of this logical matrix. I haven't tested how fast this method is comparing to other methods

mean of multiple rows (points) in matlab

I have a matrix named data with x,y,z positions.
I have calculated the distance between all the points using pdist and got the matrix 'out' which has the diff of x, diff of y and diff of z and their corresponding rows.
A= data;
D1= pdist(A(:,1));
D2= pdist(A(:,2));
D3= pdist(A(:,3));
D = [D1' D2' D3'];
tmp = ones(size(A,1));
tmp = tril(tmp,-1);
[rowIdx,colIdx ] = find(tmp);
out = [D,A(rowIdx,:),A(colIdx,:)];
I want to calculate the average of all those points which satisfies certain conditions:
diff z <7 and diff x and diff y < 4.
So I wrote the following code:
a= find (out(:,3)>0);
cal=out(a,:);
b= cal(:,3)<7;
cal2 = cal(b,:);
[s,k]= size (cal2);
for i=1:s
if (cal2(i,1) < 4) && (cal2(i,2) < 4);
xmean = mean (cal2(i,[4,7]));
ymean = mean (cal2(i,[5,8]));
zmean = mean (cal2(i,[6,9]));
fd = [xmean ymean zmean];
end
end
The problem is, with this code I can get the mean of two points at a time. So, my output is giving me more number of points than what I actually want.I want to get the mean off all the points which satisfies the condition.
My final goal is to get a list of points where I can get the points whose (diff z is > 7 )+ (the mean of the points whose diff z is <7 while diff x < 4 and diff y <4) + (diff z < 7 but diff x and diff y > 4).
Can someone please help?
A = [0,0,0;0,0,5;0,0,10];
% create matrices with the distances
% [p1-p1, p1-p2, p1-p3;
% p2-p1, p2-p2, p2-p3;
% p3-p1, p3-p2, p3-p3];
dx= squareform(pdist(A(:,1)));
dy= squareform(pdist(A(:,2)));
dz= squareform(pdist(A(:,3)));
% dz = [0,5,10;5,0,5;10,5,0];
% select all points stisfying condition
idx = dx < 4 & dy < 4 & dz < 7;
% remove entries of the main diagonal
idx(sub2ind(size(idx),1:size(A,1),1:size(A,1))) = 0;
% select all indices. Since all distances are 2 times in the full matrix,
% only r or c is needed
[r,c] = find(idx==1);
% r = [2,1,3,2]; c = [1,2,2,3]
meanOfPoints = mean(A(r,:))
Say you want the mean of all points where diffZ<7, diffX<4 and diffY<4. Assuming that out stores the corresponding differences in x,y,z in the first three columns, and point coordinates in columns 4-9.
%# this is true for all points you want to average
condition = out(:,1) > 4 & out(:,2) > 4 & out(:,3) < 7;
%# get coordinates satisfying the condition; create a 2n-by-3 array
coords = [out(condition, 4:6);out(condition, 7:9)];
%# if you don't want to count coordinates multiple times when taking the average, run
%# coords = unique(coords,'rows');
%# average
meanCoordinates = mean(coords,1);

Piecewise constant surface in MATLAB

I would like to produce a piecewise constant surface which is zero outside of some rectangle. More specifically, for t = (x,y) in R^2, I want
f(t) = 1 when 5<y<10 and 0<x<1;
-1 when 0<y<5 and 0<x<1;
1 when -5<y<0 and 0<x<1;
0 elsewhere
But, the surface I get doesn't look like what I want. I'm somewhat of a Matlab novice, so I suspect the problem is in the logical operators. My code is:
x = -2:.01:2; y = -15:15
[X,Y] = meshgrid(x,y); %Make domain
for i = 1:numel(X) %Piecewise function
for j = 1:numel(Y)
if Y(j) >= 0 && Y(j)<= 5 && X(i)>=0 && X(i)<=1
h2(i,j)= -1;
elseif Y(j)>5 && Y(j) <= 10 &&X(i)>=0 &&X(i)<=1
h2(i,j) = 1;
elseif Y(j)<0 && Y(j)>=-5 &&X(i)>=0 &&X(i)<=1
h2(i,j) = 1;
elseif X(i) <0 || X(i)>1 || Y(j)<-5 || Y(j)>10
h2(i,j) = 0;
end
end
end
%Normalize
C = trapz(abs(h2));
c = trapz(C);
h2 = c^(-1)*h2;
Thank you for your help and please let me know if you'd like me to specify more clearly what function I want.
You can very easily achieve what you want vectorized using a combination of logical operators. Avoid using for loops for something like this. Define your meshgrid like you did before, but allocate a matrix of zeroes, then only set the values within the meshgrid that satisfy the requirements you want to be the output values of f(t). In other words, do this:
%// Your code
x = -2:0.1:2; y = -15:15;
[X,Y] = meshgrid(x,y); %Make domain
%// New code
Z = zeros(size(X));
Z(Y > 5 & Y < 10 & X > 0 & X < 1) = 1;
Z(Y > 0 & Y < 5 & X > 0 & X < 1) = -1;
Z(Y > -5 & Y < 0 & X > 0 & X < 1) = 1;
mesh(X,Y,Z);
view(-60,20); %// Adjust for better angle
The above code allocates a matrix of zeroes, then starts to go through each part of your piecewise definition and searches for those x and y values that satisfy the particular range of interest. It then sets the output of Z to be whatever the output of f(t) is given those constraints. Take note that the otherwise condition is already handled by setting the whole matrix to be zero first. I then use mesh to visualize the surface, then adjust the azimuthal and elevation angle of the plot for a better view. Specifically, I set these to -60 degrees and 20 degrees respectively. Also take note that I decreased the resolution of the x values to have a step size of 0.1 instead of 0.01 for a lesser amount of granularity. This is solely so that you can see the mesh better.
This is the graph I get:
You can just use logical indexing:
x = -2:.01:2; y = -15:15;
[X,Y] = meshgrid(x,y); %// Make domain
h2=zeros(size(X));
h2(5<Y & Y<10 & 0<X & X<1)=1;
h2(0<Y & Y<5 & 0<X & X<1)=-1;
h2(-5<Y & Y<0 & 0<X & X<1)=1;
This statement: 5<Y & Y<10 & 0<X & X<1 returns a matrix of 1's and 0's where a 1 means that all 4 inequalities are satisfied, and a 0 means at least one is not. Where that matrix has a one, h2 will be modified to the value you want.

Replacing zeros (or NANs) in a matrix with the previous element row-wise or column-wise in a fully vectorized way

I need to replace the zeros (or NaNs) in a matrix with the previous element row-wise, so basically I need this Matrix X
[0,1,2,2,1,0;
5,6,3,0,0,2;
0,0,1,1,0,1]
To become like this:
[0,1,2,2,1,1;
5,6,3,3,3,2;
0,0,1,1,1,1],
please note that if the first row element is zero it will stay like that.
I know that this has been solved for a single row or column vector in a vectorized way and this is one of the nicest way of doing that:
id = find(X);
X(id(2:end)) = diff(X(id));
Y = cumsum(X)
The problem is that the indexing of a matrix in Matlab/Octave is consecutive and increments columnwise so it works for a single row or column but the same exact concept cannot be applied but needs to be modified with multiple rows 'cause each of raw/column starts fresh and must be regarded as independent. I've tried my best and googled the whole google but coukldn’t find a way out. If I apply that same very idea in a loop it gets too slow cause my matrices contain 3000 rows at least. Can anyone help me out of this please?
Special case when zeros are isolated in each row
You can do it using the two-output version of find to locate the zeros and NaN's in all columns except the first, and then using linear indexing to fill those entries with their row-wise preceding values:
[ii jj] = find( (X(:,2:end)==0) | isnan(X(:,2:end)) );
X(ii+jj*size(X,1)) = X(ii+(jj-1)*size(X,1));
General case (consecutive zeros are allowed on each row)
X(isnan(X)) = 0; %// handle NaN's and zeros in a unified way
aux = repmat(2.^(1:size(X,2)), size(X,1), 1) .* ...
[ones(size(X,1),1) logical(X(:,2:end))]; %// positive powers of 2 or 0
col = floor(log2(cumsum(aux,2))); %// col index
ind = bsxfun(#plus, (col-1)*size(X,1), (1:size(X,1)).'); %'// linear index
Y = X(ind);
The trick is to make use of the matrix aux, which contains 0 if the corresponding entry of X is 0 and its column number is greater than 1; or else contains 2 raised to the column number. Thus, applying cumsum row-wise to this matrix, taking log2 and rounding down (matrix col) gives the column index of the rightmost nonzero entry up to the current entry, for each row (so this is a kind of row-wise "cummulative max" function.) It only remains to convert from column number to linear index (with bsxfun; could also be done with sub2ind) and use that to index X.
This is valid for moderate sizes of X only. For large sizes, the powers of 2 used by the code quickly approach realmax and incorrect indices result.
Example:
X =
0 1 2 2 1 0 0
5 6 3 0 0 2 3
1 1 1 1 0 1 1
gives
>> Y
Y =
0 1 2 2 1 1 1
5 6 3 3 3 2 3
1 1 1 1 1 1 1
You can generalize your own solution as follows:
Y = X.'; %'// Make a transposed copy of X
Y(isnan(Y)) = 0;
idx = find([ones(1, size(X, 1)); Y(2:end, :)]);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)), [], size(X, 1)).'; %'// Reshape back into a matrix
This works by treating the input data as a long vector, applying the original solution and then reshaping the result back into a matrix. The first column is always treated as non-zero so that the values don't propagate throughout rows. Also note that the original matrix is transposed so that it is converted to a vector in row-major order.
Modified version of Eitan's answer to avoid propagating values across rows:
Y = X'; %'
tf = Y > 0;
tf(1,:) = true;
idx = find(tf);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)),fliplr(size(X)))';
x=[0,1,2,2,1,0;
5,6,3,0,1,2;
1,1,1,1,0,1];
%Do it column by column is easier
x=x';
rm=0;
while 1
%fields to replace
l=(x==0);
%do nothing for the first row/column
l(1,:)=0;
rm2=sum(sum(l));
if rm2==rm
%nothing to do
break;
else
rm=rm2;
end
%replace zeros
x(l) = x(find(l)-1);
end
x=x';
I have a function I use for a similar problem for filling NaNs. This can probably be cutdown or sped up further - it's extracted from pre-existing code that has a bunch more functionality (forward/backward filling, maximum distance etc).
X = [
0 1 2 2 1 0
5 6 3 0 0 2
1 1 1 1 0 1
0 0 4 5 3 9
];
X(X == 0) = NaN;
Y = nanfill(X,2);
Y(isnan(Y)) = 0
function y = nanfill(x,dim)
if nargin < 2, dim = 1; end
if dim == 2, y = nanfill(x',1)'; return; end
i = find(~isnan(x(:)));
j = 1:size(x,1):numel(x);
j = j(ones(size(x,1),1),:);
ix = max(rep([1; i],diff([1; i; numel(x) + 1])),j(:));
y = reshape(x(ix),size(x));
function y = rep(x,times)
i = find(times);
if length(i) < length(times), x = x(i); times = times(i); end
i = cumsum([1; times(:)]);
j = zeros(i(end)-1,1);
j(i(1:end-1)) = 1;
y = x(cumsum(j));