I have 5 variables A,V,h,l and b which all stem from different distributions. I'd like to make a 1,000 equally distributed samples from each distribution by the method of latin hypercube sampling. Is this a realistic request, ie is it really better than simple random sampling? Do you have any references of how I can do this in matlab? This page suggests I would need to transform the sample somehow...
UPDATE #2: solution using built-in function of Statistics Toolbox
The basic question is whether you want your samples on a regular grid or not. If not, you could use the built-in function lhsdesign:
p = 1000 % Number of points
N = 5 % Number of dimensions
lb = [1 1 1 1 1]; % lower bounds for A,V,h,l and b
ub = [10 10 10 10 10]; % upper bounds for A,V,h,l and b
X = lhsdesign(p,N,'criterion','correlation');
D = bsxfun(#plus,lb,bsxfun(#times,X,(ub-lb)));
'criterion','correlation' would give you the desired "equal distribution".
D then contains the irregular coordinate-distribution for your parameters.
First I thought you we're looking for samples on a regular grid, which really seems to be a tough task. I tried to modify the approach above D = round(bsxfun...), but it won't give you satisfying results. So for this case I still provide my initial idea here:
The following solution is far from fast and elegant, but it's at least a solution.
% For at least 1000 samples M=6 divisions are necessary
M = 6;
N = 5;
% the perfect LHC distribution would have 1296 samples for M=6 divisions
% and 5 dimensions
counter_max = M^(N-1); %=1296
% pre-allocation
D = zeros(6,6,6,6,6);
counter = 0;
while counter < 1000
c = randi(6,1,5);
if ( sum( D( c(1) , c(2) , c(3) , c(4) , : )) < 1 && ...
sum( D( c(1) , c(2) , c(3) , : , c(5) )) < 1 && ...
sum( D( c(1) , c(2) , : , c(4) , c(5) )) < 1 && ...
sum( D( c(1) , : , c(3) , c(4) , c(5) )) < 1 && ...
sum( D( : , c(2) , c(3) , c(4) , c(5) )) < 1 )
D(c(1),c(2),c(3),c(4),c(5)) = 1;
X(counter,:) = c;
counter = counter+1;
end
end
X is finally containing the coordinates of all your samples.
As you see I used a while-loop with an underlying if-condition. You wish 1000 samples, that's a realistic number and can be done in a reasonable time. I actually would recommend you to use a number of samples as close as possible to the maximum of 1296. That could take you a pretty while. But as you create the resulting matrix just once and use it again and again, don't hesitate to run it 24hrs. You could also implement an interuption code as described here: In MatLab, is it possible to terminate a script, but save all its internal variables to workspace? and see how many samples you got until then. (I got 900 samples in 20min when I was testing)
UPDATE: Example to show limitations of method:
The following example shall illustrate, what the asker could be willing to do and what the result is actually supposed to look like. Because I'm also very interested in a good solution, mine is limited and cannot provide the "100% result".
Imagine a cube (N=3) with M=10 divisions.
M = 10;
N = 3;
counter_max = M^(N-1); %=100 maximum number of placeable samples
% pre-allocations
D = zeros(10,10,10);
counter = 0;
while counter < counter_max
c = randi(10,1,3);
% if condition checks if there is already a sample in the same row,
% coloumn or z-coordinate,
if ( sum( D( c(1) , c(2) , : )) < 1 && ...
sum( D( c(1) , : , c(3) )) < 1 && ...
sum( D( : , c(2) , c(3) )) < 1 )
%if not a new sample is generated
D(c(1),c(2),c(3)) = 1;
counter = counter+1;
X(counter,:) = c;
end
end
After about 10000 iterations one gets the following distribution with 85 of 100 possible placed samples:
where the color indicates the normalized distance to the closest neighbor. For most of the points it's fine (1), but as there are 15 missing samples, some points are more distant from others.
The problem is: I doubt that it's possible to get all 100 samples in a reasonable time. When one plots the generated samples over the number of iterations you get:
...so the desired result seems hardly obtainable.
Please see this answer more as an encouragement than a solution.
By combining 1-D latin hypercube samples (LHS), you can make a full set of LHS for regular grid in higher order dimension. For example, imagine 3X3 LHS (ie 2-D and 3 divisions). First, you just make 1-D LHS for regular grid. (1,0,0), (0, 1, 0), (0, 0, 1) for 1-D. And then, combine the 1-D LHS to make 2-D LHS.
1, 0, 0
0, 1, 0
0, 0, 1
or
0, 1, 0
1, 0, 0
0, 0, 1
... etc.
LHS for 3-D can also be created using the same method(by combining 2-D LHS).
There are 12 possible LHS for 3X3. Generally, the number of possible LHS is N x ((M-1)!)^(M-1).
(N=divisions, M=dimensions)
The following code shows LHS for 3-D and 10 divisions.
This code generates only one LHS.
result is random.
it takes 0.001288 sec for 100% result
clear;
clc;
M = 3; % dimension
N = 10; % division
Sel2 = ':,';
stop = 0;
P_matrix = repmat([num2str(N),','],1,M);
P_matrix = P_matrix(1:end-1);
eval(['P = zeros(', P_matrix, ');']);
P(1,1) = 1;
tic
while stop == 0
for i = 1 : M-1
for j = 2:N
if i == 1
P(end , j, 1) = P(1 , j-1, 1);
P(1:end-1, j, 1) = P(2:end, j-1, 1);
else
Sel_2 = repmat(Sel2,1,i-1);
Sel_2 = Sel_2(1:end-1);
eval(['P(', Sel_2, ',end , j, 1) = P(', Sel_2 , ', 1 , j-1, 1);']);
eval(['P(', Sel_2, ',1:end-1 , j, 1) = P(', Sel_2 , ', 2:end, j-1, 1);']);
end
end
if i == 1
P(:,:,1) = P(randperm(N),:,1);
elseif i <M-1
Sel_2 = repmat(Sel2,1,i);
Sel_2 = Sel_2(1:end-1);
eval(['P(',Sel_2,',:,1) = P(',Sel_2,',randperm(N),1);']);
else
Sel_2 = repmat(Sel2,1,i);
Sel_2 = Sel_2(1:end-1);
eval(['P(',Sel_2,',:) = P(',Sel_2,',randperm(N));']);
end
end
% you can add stop condition
stop = 1;
end
toc
[x, y, z] = ind2sub(size(P),find(P == 1));
scatter3(x,y,z);
xlabel('X');
ylabel('Y');
zlabel('Z');
Result
This code gives the same result as the accepted one in this discussion.
Check it out!
n=1000; p=5;
distr=lhsdesign(n,p); %creates LHS of n samples on each of your p dimensions
% Then, you can choose any inverted distribution. in this case it is the Discrete Uniform Distribution
Parameters=[unidinv(distr(:,1),UB1) unidinv(distr(:,2),UB2) ...
unidinv(distr(:,3),UB3) unidinv(distr(:,4),UB4) ...
unidinv(distr(:,5),UB5) ];
%At the end, you'll do a simple work of indexing.
Related
I am given a set of points (p1,q1) (p2,q2) ... (p20,q20) which satisfy the function q = 1/(ap + b)^2 except that one of these does not satisfy the given relation. The values of a and b are not given to me. All I have with me is two inputs p and q as arrays. I need to find the index of the point which does not satisfy the given relation.
The way I proceeded to solve is to find the values of a and b using the first two pairs (p1,q1) and (p2,q2) and check if the remaining points satisfy the function for the solved values of a and b. The results will be stored in a logical matrix. I wish to make use of the logical matrix to pick out the odd pair, but unable to proceed further.
Specifically, the challenge is to make use of vectorization in MATLAB to find the odd point, instead of resorting to for-loops. I think that I will have to first search for the only logical zero in any of the row. In that case, the column index of that zero will fetch me the odd point. But, if there are more than one zeros in all 4 rows, then the odd point is either of the first two pairs. I need help in translating this to efficient code in MATLAB.
Please note that vectors p and q have been named as x and y in the below code.
function [res, sol] = findThePair(x, y)
N = length(x);
syms a b
vars = [a,b];
eqns = [y(1) - 1/(a*x(1) + b)^2 == 0; y(2) - 1/(a*x(2) + b)^2];
[solA, solB] = solve(eqns,vars);
sol = [double(solA) double(solB)]; %solution of a & b (total 4 possibilites)
xTest = x(3:end); % performing check on remaining points
yTest = y(3:end);
res = zeros(4, N-2); % logical matrix to store the results of equality check
for i = 1:4
A = sol(i,1); B = sol(i, 2);
res(i, :) = [yTest == 1./(A*xTest + B).^2]; % perform equality check on remaining points
end
Let's do some maths up front, to avoid needing loops or vectorisation. At most this leaves us with half a dozen function evaluations, and we only need 5 points.
q = 1 / (a*p + b)^2
% ->
sqrt(q) * ( a*p + b ) = 1
% ->
a = ( 1 - b*sqrt(q) ) / ( p * sqrt(q) )
% Sub in some points (1 and 2) ->
a1 = ( 1 - b*sqrt(q1) ) / ( p1 * sqrt(q1) )
a2 = ( 1 - b*sqrt(q2) ) / ( p2 * sqrt(q2) )
% a1 and a2 should be the same ->
( 1 - b*sqrt(q1) ) * ( p2 * sqrt(q2) ) = ( 1 - b*sqrt(q2) ) * ( p1 * sqrt(q1) )
% Rearrange ->
b = ( p2*sqrt(q2) - p1*sqrt(q1) ) / ( (p2-p1)*sqrt(q1)*sqrt(q2) )
We have two unknowns, a and b. All we need are two points to create simultaneous equations. I'll use the following logic
Choose (pm, qm) and (pn, qn) with any m ~= n.
Calculate a and b using the above equation.
test whether (pr, qr) fits with the calculated a and b.
If it fits, we know all three of these must be on the curve, and we have a and b.
If it doesn't fit, we know either point m, n, or r is the outlier. Return to step (1) with two other points, the calculated a and b must be correct, as we've not fitted to the outlier.
Here is some code to implement this:
% Random coeffs, keep things unknown
a = rand*10;
b = rand*10;
% Set up our data
p = 1:20;
q = 1 ./ (a*p + b).^2;
% Create an outlier
q( 3 ) = q( 3 ) + 1;
% Steps as described
% 1.
p1 = p(1); p2 = p(2);
q1 = q(1); q2 = q(2);
% 2.
bGuess = ( p2*sqrt(q2) - p1*sqrt(q1) ) / ( (p2-p1)*sqrt(q1)*sqrt(q2) );
aGuess = ( 1 - bGuess*sqrt(q1) ) / ( p1 * sqrt(q1) );
% 3.
p3 = p(3);
q3Guess = 1 / ( aGuess*p3 + bGuess )^2;
tol = 1e-7; % Use tolerance rather than == comparison to avoid float issues
if abs( q3Guess - q(3) ) < tol
% success
aFit = aGuess;
bFit = bGuess;
else
% p1, p2 or p3 is an outlier! Repeat using other points
% If there's known to be only one outlier, this should give the result
p1 = p(4); p2 = p(5);
q1 = q(4); q2 = q(5);
bFit = ( p2*sqrt(q2) - p1*sqrt(q1) ) / ( (p2-p1)*sqrt(q1)*sqrt(q2) );
aFit = ( 1 - bFit*sqrt(q1) ) / ( p1 * sqrt(q1) );
end
% Validate
fprintf( 'a is valid: %d, b is valid: %d\n', abs(a-aFit)<tol, abs(b-bFit)<tol )
I don't really understand how you were trying to solve this and what do syms (i.e. symbolic variables) have to do with this, so I'll show you how I would solve this problem.
Since we're essentially looking for an outlier, we might as well convert the problem to something that's easier to work with. For this reason, instead of using q as-is, I'm going to invert it: this way, we'd be dealing with an equation of a parabola - which is easy.
Next, knowing that our points should lie on a parabola, we can fit the equation of the parabola (or equivalently - find the coefficients of the polynomial that describes the relation of the input to the output). The polynomial is a^2*x^2+2*a*b*x+b^2, and so the coefficients are {a^2, 2*a*b, b^2}.
Since the majority of the points (19 out of 20) lie on the same parabola, the outlier will always have a larger error, which would make it stand out, no matter how close it is to the parabola (within the limitations of machine precision) - you can see an extreme example of this in the code below.
Fitting of a parabola is performed using polynomial interpolation (see also: Vandermonde matrix).
function I = q55241683()
%% Generate the ground truth:
TRUE_A = 2.3;
TRUE_B = -pi;
IDX_BAD = 5;
p = 1:0.04:1.76;
q = (TRUE_A * p + TRUE_B).^-2;
q(IDX_BAD) = (1-1E-10)*q(IDX_BAD); % notice just how close this is to being valid
%% Visualize dataset:
% figure(); plot(p,q.^-1);
%% Solve
I = findThePair(p, q.^-1);
%% Test
if IDX_BAD == I
disp('Great success!');
else
disp('Complete failure!');
end
end
function I = findThePair(x,y)
% Fit a parabola to {x vs. y^-1}
P = x(:).^(2:-1:0)\y(:); %alternatively: P = polyfit(x,y.^-1,2)
% Estimate {a,b} (or {-a,-b})
est_A = sqrt(P(1));
est_B = P(2)/(2*est_A);
% Compute the distances of the points from the fit (residuals), find the biggest:
[~,I] = max( abs(y - (est_A*x + est_B).^2) );
end
I am trying to implement a simplex algorithm following the rules I was given at my optimization course. The problem is
min c'*x s.t.
Ax = b
x >= 0
All vectors are assumes to be columns, ' denotes the transpose. The algorithm should also return the solution to dual LP. The rules to follow are:
Here, A_J denotes columns from A with indices in J and x_J, x_K denotes elements of vector x with indices in J or K respectively. Vector a_s is column s of matrix A.
Now I do not understand how this algorithm takes care of condition x >= 0, but I decided to give it a try and follow it step by step. I used Matlab for this and got the following code.
X = zeros(n, 1);
Y = zeros(m, 1);
% i. Choose starting basis J and K = {1,2,...,n} \ J
J = [4 5 6] % for our problem
K = setdiff(1:n, J)
% this while is for goto
while 1
% ii. Solve system A_J*\bar{x}_J = b.
xbar = A(:,J) \ b
% iii. Calculate value of criterion function with respect to current x_J.
fval = c(J)' * xbar
% iv. Calculate dual solution y from A_J^T*y = c_J.
y = A(:,J)' \ c(J)
% v. Calculate \bar{c}^T = c_K^T - u^T A_K. If \bar{c}^T >= 0, we have
% found the optimal solution. If not, select the smallest s \in K, such
% that c_s < 0. Variable x_s enters basis.
cbar = c(K)' - c(J)' * inv(A(:,J)) * A(:,K)
cbar = cbar'
tmp = findnegative(cbar)
if tmp == -1 % we have found the optimal solution since cbar >= 0
X(J) = xbar;
Y = y;
FVAL = fval;
return
end
s = findnegative(c, K) %x_s enters basis
% vi. Solve system A_J*\bar{a} = a_s. If \bar{a} <= 0, then the problem is
% unbounded.
abar = A(:,J) \ A(:,s)
if findpositive(abar) == -1 % we failed to find positive number
disp('The problem is unbounded.')
return;
end
% vii. Calculate v = \bar{x}_J / \bar{a} and find the smallest rho \in J,
% such that v_rho > 0. Variable x_rho exits basis.
v = xbar ./ abar
rho = J(findpositive(v))
% viii. Update J and K and goto ii.
J = setdiff(J, rho)
J = union(J, s)
K = setdiff(K, s)
K = union(K, rho)
end
Functions findpositive(x) and findnegative(x, S) return the first index of positive or negative value in x. S is the set of indices, over which we look at. If S is omitted, whole vector is checked. Semicolons are omitted for debugging purposes.
The problem I tested this code on is
c = [-3 -1 -3 zeros(1,3)];
A = [2 1 1; 1 2 3; 2 2 1];
A = [A eye(3)];
b = [2; 5; 6];
The reason for zeros(1,3) and eye(3) is that the problem is inequalities and we need slack variables. I have set starting basis to [4 5 6] because the notes say that starting basis should be set to slack variables.
Now, what happens during execution is that on first run of while, variable with index 1 enters basis (in Matlab, indices go from 1 on) and 4 exits it and that is reasonable. On the second run, 2 enters the basis (since it is the smallest index such that c(idx) < 0 and 1 leaves it. But now on the next iteration, 1 enters basis again and I understand why it enters, because it is the smallest index, such that c(idx) < 0. But here the looping starts. I assume that should not have happened, but following the rules I cannot see how to prevent this.
I guess that there has to be something wrong with my interpretation of the notes but I just cannot see where I am wrong. I also remember that when we solved LP on the paper, we were updating our subjective function on each go, since when a variable entered basis, we removed it from the subjective function and expressed that variable in subj. function with the expression from one of the equalities, but I assume that is different algorithm.
Any remarks or help will be highly appreciated.
The problem has been solved. Turned out that the point 7 in the notes was wrong. Instead, point 7 should be
I have a linear system Ax = b , which is created by natural splines and looks like this:
where
The code in matlab which is supposed to solve the system is the following:
clear;
clc;
x = [...] ;
a = [...];
x0 = ...;
n = length(x) - 1 ;
for i = 0 : (n-1)
h(i+1) = x(i+2) - x(i+1) ;
end
b= zeros( n+1 , 1 ) ;
for i =2: n
b(i,1) = 3 *(a(i+1)-a(i))/h(i) - 3/h(i-1)*(a(i) - a(i-1) ) ;
end
%linear system solution.
l(1) =0 ; m(1) = 0 ; z(1) = 0 ;
for i =1:(n-1)
l(i+1) = 2*( x(i+2) - x(i) ) - h(i)* m(i) ;
m(i+1) = h(i+1)/l(i+1);
z(i+1) = ( b(i+1) - h(i)*z(i) ) / l ( i+1) ;
end
l(n+1) =1;
z(n+1) = 0 ;
c(n+1) = 0 ;
for j = ( n-1) : (-1) : 0
c(j+1) = z(j+1) - m(j+1)*c(j+2) ;
end
but I can't understand which method is being used for solving the linear system.
If I had to guess I would say that the LU method is used, adjusted for tridiagonal matrices, but I still can't find the connection with the code...
Any help would be appreciated!!!
The coefficients look a little odd (particularly that 2 in the l equation), but it looks like a specialized Thomas Algorithm where:
The second-to-last loop performs a forward elimination of the subdiagonal to bring the matrix into upper triangular form.
The last loop performs the back substitution for the solution.
The code doesn't seem to match one-to-one with the general algorithm since the solution is using the vectors that compose the diagonals instead of the diagonals themselves with no apparent preallocation of memory. So I can't say if this method is "better" than the general one off the bat.
Not sure what I am doing wrong here;
I am trying to make a for loop with conditional statements for the following functions. I want to make it though so h is not a vector. I am doing this for 1 through 5 with increment 0.1.
Y = f(h) = h^2 if h <= 2 or h >= 3
Y = f(h) = 45 otherwise
my code is
for h = 0:0.1:5
if h <= 2;
Y = h^2;
elseif h >= 3;
Y = h^2;
else;
h = 45;
end
end
This could be done easier, but with a for loop i think you could use:
h=0:0.1:5;
y=zeros(1,length(h));
for i=1:length(h)
if or(h(i) <= 2, h(i) >= 3)
y(i) = h(i)^2;
else
y(i) = 45;
end
end
Why do you want to avoid making h an array? MATLAB specializes in operations on arrays. In fact, vectorized operations in MATLAB are generally faster than for loops, which I found counter-intuitive having started coding in C++.
An example of a vectorized verison of your code could be:
h = 0:0.1:5;
inds = find(h > 2 & h < 3); % grab indices where Y = 45
Y = h.^2; % set all of Y = h^2
Y(inds) = 45; % set only those entries for h between 2 and 3 to 45
The period in the .^2 operator broadcasts that operator to every element in the h array. This means that you end up squaring each number in h individually. In general, vectorized operation like this are more efficient in MATLAB, so it is probably best to get in the habit of vectorizing your code.
Finally, you could reduce the above code a bit by not storing your indices:
h = 0:0.1:5;
Y = h.^2; % set all of Y = h^2
Y(find(h > 2 & h < 3)) = 45; % set only those entries for h between 2 and 3 to 45
This blog series seems to be a good primer on vectorizing your MATLAB code.
I am a bit confused and would greatly appreciate some help.
I have read many posts about finding neighboring pixels, with this being extremely helpful:
http://blogs.mathworks.com/steve/2008/02/25/neighbor-indexing-2/
However I have trouble applying it on a 4D matrix (A) with size(A)=[8 340 340 15]. It represents 8 groups of 3D images (15 slices each) of which I want to get the neighbors.
I am not sure which size to use in order to calculate the offsets. This is the code I tried, but I think it is not working because the offsets should be adapted for 4 dimensions? How can I do it without a loop?
%A is a 4D matrix with 0 or 1 values
Aidx = find(A);
% loop here?
[~,M,~,~] =size(A);
neighbor_offsets = [-1, M, 1, -M]';
neighbors_idx = bsxfun(#plus, Aidx', neighbor_offsets(:));
neighbors = B(neighbors_idx);
Thanks,
ziggy
Have you considered using convn?
msk = [0 1 0; 1 0 1; 0 1 0];
msk4d = permute( msk, [3 1 2 4] ); % make it 1-3-3-1 mask
neighbors_idx = find( convn( A, msk4d, 'same' ) > 0 );
You might find conndef useful for defining the basic msk in a general way.
Not sure if I've understood your question but what about this sort of approach:
if you matrix is 1D:
M = rand(10,1);
N = M(k-1:k+1); %//immediate neighbours of k
However this could error if k is at the boundary. This is easy to fix using max and min:
N = M(max(k-1,1):min(k+1,size(M,1))
Now lets add a dimenion:
M = rand(10,10);
N = M(max(k1-1,1):min(k1+1,size(M,1), max(k2-1,1):min(k2+1,size(M,2))
That was easy, all you had to do was repeat the same index making the minor change of using size(M,2) for the boundary (and also I changed k to k1 and k2, you might find using an array for k instead of separate k1 and k2 variables works better i.e. k(1) and k(2))
OK so now lets skip to 4 dimensions:
M = rand(10,10,10,10);
N = M(max(k(1)-1,1):min(k(1)+1,size(M,1)), ...
max(k(2)-1,1):min(k(2)+1,size(M,2)), ...
max(k(3)-1,1):min(k(3)+1,size(M,3)), ...
max(k(4)-1,1):min(k(4)+1,size(M,4))); %// Also you can replace all the `size(M,i)` with `end` if you like
I know you said you didn't want a loop, but what about a really short loop just to refactor a bit and also make it generalized:
n=ndims(M);
ind{n} = 0;
for dim = 1:n
ind{dim} = max(k(dim)-1,1):min(k(dim)+1,size(M,dim));
end
N = M(ind{:});
Here's how to get the neighbors along the second dimension
sz = size( A );
ndims = numel(sz); % number of dimensions
[d{1:ndims}] = ind2sub( sz, find( A ) );
alongD = 2; % work along this dim
np = d{alongD} + 1;
sel = np <= sz( alongD ); % discard neighbors that fall outside image boundary
nm = d{alongD} - 1;
sel = sel & nm > 0; % discard neighbors that fall outside image boundary
d = cellfun( #(x) x(sel), d, 'uni', 0 );
neighbors = cat( 1, ...
ind2sub( sz, d{1:alongD-1}, np(sel), d{alongD+1:end} ),...
ind2sub( sz, d{1:alongD-1}, nm(sel), d{alongD+1:end} ) );