Matlab: Editing the lhsnorm() function for Latin Hypercube - matlab

I am attempting to edit the lhsnorm function so that I can obtain Latin Hypercube sample from a Normally distributed set of data.
The lhsnorm function is as follows:
function [X,z] = lhsnorm(mu,sigma,n,dosmooth)
%LHSNORM Generate a latin hypercube sample with a normal distribution
z1 = mvnrnd(mu,sigma,numel(z));
%%%%%%%%Is it possible for me to change this to a univariate distribution
% without affecting the rest of the code ?????
% Find the ranks of each column
p = length(mu);
x = zeros(size(z),class(z));
for i=1:p
x(:,i) = rank(z(:,i));
end
% Get gridded or smoothed-out values on the unit interval
if (nargin<4) || isequal(dosmooth,'on')
x = x - rand(size(x));
else
x = x - 0.5;
end
x = x / n;
% Transform each column back to the desired marginal distribution,
% maintaining the ranks (and therefore rank correlations)
for i=1:p
x(:,i) = norminv(x(:,i),mu(i), sqrt(sigma(i,i)));
end
X = x;
function r=rank(x)
% Similar to tiedrank, but no adjustment for ties here
[sx, rowidx] = sort(x);
r(rowidx) = 1:length(x);
r = r(:);
I am editing the code as is shown below:
function [X] = lhsnorm_sid(z,dosmooth)
[muhat,sigmahat] = normfit(z);
z = z';
% Find the ranks of each column
p = length(muhat);
x = zeros(size(z),class(z));
s = size(z);
n = s(1,1);
for i=1:p
x(:,i) = rank(z(:,i));
end
if (nargin<4) || isequal(dosmooth,'on')
x = x - rand(size(x));
else
x = x - 0.5;
end
x = x / n;
for i=1:p
x(:,i) = norminv(x(:,i),muhat(i), sqrt(sigmahat(i,i)));
end
X = x;
function r=rank(x)
[sx, rowidx] = sort(x);
r(rowidx) = 1:length(x);
r = r(:);
However, I end up with Latin Hypercube values which are way off what is expected. I also tried directly substituting z with d(1,:) which contains a Normally distributed set of data, however the Latin Hypercube I obtained did not contain any of the values in d(1,:).
Thanks

Related

Changing amplitude of fourier series in matlab

The code below currently plots the fourier series for a square wave for N terms. Is there any way I could change the range from [0;1] to [-1;1]?
% Assignment of variables
syms t
% Function variables
N = 5;
T0 = 1;
w0 = 2*pi/T0;
Imin = 0;
Imax = 0.5;
% Function
ft = 1;
% First term calculation
a0 = (1/T0)*int(ft, t, Imin, Imax);
y = a0;
% Calculation of n terms
for n = 1:N
an = (2/T0)*int(ft*cos(n*w0*t), t, Imin, Imax);
bn = (2/T0)*int(ft*sin(n*w0*t), t, Imin, Imax);
y = y + an*cos(n*w0*t) + bn*sin(n*w0*t);
end
fplot(y, [-4,4], "Black")
grid on
If you are talking about the figure scale, then ylim([-1 1])
1.- The following does what you asked for:
clear all;clc;close all
syms t
assume(t>0 & t<1)
% Function variables
N = 5;
T0 = 1;
w0 = 2*pi/T0;
Imin = 0;
Imax = 1;
% Function
h1=heaviside(t-.5)
h2=heaviside(t+.5)
ht=-2*((h1-h2)+.5)
% First term calculation
a0 = (1/T0)*int(ht, t, Imin, Imax);
y = a0;
% Calculation of n terms
for n = 1:N
an = (2/T0)*int(ht*cos(n*w0*t), t, Imin, Imax);
bn = (2/T0)*int(ht*sin(n*w0*t), t, Imin, Imax);
y = y + an*cos(n*w0*t) + bn*sin(n*w0*t);
end
fplot(y, [-4,4], "Black")
grid on
2.- You allocate a specific group of code lines headed with % Function to precisely define the function.
Yet you actually define the function with Imin and Imax.
It's good practice to constrain the function definition within the lines you intend for such purpose, not to scatter the function all over the place.

All my weights for gradient descent become 0 on feature expansion

I have 2 features which I expand to contain all possible combinations of the two features under order 6. When I do MATLAB's fminunc, it returns a weight vector where all elements are 0.
The dataset is here
clear all;
clc;
data = load("P2-data1.txt");
m = length(data);
para = 0; % regularization parameter
%% Augment Feature
y = data(:,3);
new_data = newfeature(data(:,1), data(:,2), 3);
[~, n] = size(new_data);
betas1 = zeros(n,1); % initial weights
options = optimset('GradObj', 'on', 'MaxIter', 400);
[beta_new, cost] = fminunc(#(t)(regucostfunction(t, new_data, y, para)), betas1, options);
fprintf('Cost at theta found by fminunc: %f\n', cost);
fprintf('theta: \n');
fprintf(' %f \n', beta_new); % get all 0 here
% Compute accuracy on our training set
p_new = predict(beta_new, new_data);
fprintf('Train Accuracy after feature augmentation: %f\n', mean(double(p_new == y)) * 100);
fprintf('\n');
%% the functions are defined below
function g = sigmoid(z) % running properly
g = zeros(size(z));
g=ones(size(z))./(ones(size(z))+exp(-z));
end
function [J,grad] = regucostfunction(theta,x,y,para) % CalculateCost(x1,betas1,y);
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));
hyp = sigmoid(x*theta);
err = (hyp - y)';
grad = (1/m)*(err)*x;
sum = 0;
for k = 2:length(theta)
sum = sum+theta(k)^2;
end
J = (1/m)*((-y' * log(hyp) - (1 - y)' * log(1 - hyp)) + para*(sum) );
end
function p = predict(theta, X)
m = size(X, 1); % Number of training examples
p = zeros(m, 1);
index = find(sigmoid(theta'*X') >= 0.5);
p(index,1) = 1;
end
function out = newfeature(X1, X2, degree)
out = ones(size(X1(:,1)));
for i = 1:degree
for j = 0:i
out(:, end+1) = (X1.^(i-j)).*(X2.^j);
end
end
end
data contains 2 columns of rows followed by a third column of 0/1 values.
The functions used are: newfeature returns the expanded features and regucostfunction computes the cost. When I did the same approach with the default features, it worked and I think the problem here has to do with some coding issue.

Is there a correlation ratio in MATLAB?

Is there any function in Matlab which calculates the correlation ratio?
Here is an implementation I tried to do, but the results are not right.
function cr = correlation_ratio(X, Y, L)
ni = zeros(1, L);
sigmai = ni;
for i = 0:(L-1)
Yn = Y(X == i);
ni(1, i+1) = numel(Yn);
m = (1/ni(1, i+1))*sum(Yn);
sigmai(1, i+1) = (1/ni(1, i+1))*sum((Yn - m).^2);
end
n = sum(ni);
prod = ni.*sigmai;
cr = (1-(1/n)*sum(prod))^0.5;
This is the equation on the Wikipedia page:
where:
η is the correlation ratio,
yx,i are the sample values (x is the class label, i the sample index),
yx (with the bar on top) is the mean of sample values for class x,
y (with the bar on top) is the mean for all samples across all classes, and
nx is the number of samples in class x.
This is how I interpreted it into code:
function eta = correlation_ratio(X, Y)
X = X(:); % make sure we've got column vectors, simplifies things below a bit
Y = Y(:);
L = max(X);
mYx = zeros(1, L+1); % we'll write mean per class here
nx = zeros(1, L+1); % we'll write number of samples per class here
for i = unique(X).'
Yn = Y(X == i);
if numel(Yn)>1
mYx(i+1) = mean(Yn);
nx(i+1) = numel(Yn);
end
end
mY = mean(Y); % mean across all samples
eta = sqrt(sum(nx .* (mYx - mY).^2) / sum((Y-mY).^2));
The loop could be replaced with accumarray.

Vectorize a regression map calculation

I compute the regression map of a time series A(t) on a field B(x,y,t) in the following way:
A=1:10; %time
B=rand(100,100,10); %x,y,time
rc=nan(size(B,1),size(B,2));
for ii=size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
Is there a way to vectorize/speed up code? Or maybe some built-in function that I did not know of to achieve the same result?
In order to vectorize this algorithm, you would have to "get your hands dirty" and compute the covariance yourself. If you take a look inside cov you'll see that it has many lines of input checking and very few lines of actual computation, to summarize the critical steps:
y = varargin{1};
x = x(:);
y = y(:);
x = [x y];
[m,~] = size(x);
denom = m - 1;
xc = x - sum(x,1)./m; % Remove mean
c = (xc' * xc) ./ denom;
To simplify the above somewhat:
x = [x(:) y(:)];
m = size(x,1);
xc = x - sum(x,1)./m;
c = (xc' * xc) ./ (m - 1);
Now this is something that is fairly straightforward to vectorize...
function q51466884
A = 1:10; %time
B = rand(200,200,10); %x,y,time
%% Test Equivalence:
assert( norm(sol1-sol2) < 1E-10);
%% Benchmark:
disp([timeit(#sol1), timeit(#sol2)]);
%%
function rc = sol1()
rc=nan(size(B,1),size(B,2));
for ii=1:size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
end
function rC = sol2()
m = numel(A);
rB = reshape(B,[],10).'; % reshape
% Center:
cA = A(:) - sum(A)./m;
cB = rB - sum(rB,1)./m;
% Multiply:
rC = reshape( (cA.' * cB) ./ (m-1), size(B(:,:,1)) ) ./ var(A);
end
end
I get these timings: [0.5381 0.0025] which means we saved two orders of magnitude in the runtime :)
Note that a big part of optimizing the algorithm is assuming you don't have any "strangeness" in your data, like NaN values etc. Take a look inside cov.m to see all the checks that we skipped.

Change the grid points of parametric splines in Matlab

My Code right now
% Create some example points x and y
t = pi*[0:.05:1,1.1,1.2:.02:2]; a = 3/2*sqrt(2);
for i=1:size(t,2)
x(i) = a*sqrt(2)*cos(t(i))/(sin(t(i)).^2+1);
y(i) = a*sqrt(2)*cos(t(i))*sin(t(i))/(sin(t(i))^2+1);
end
Please note: The points (x_i|y_i) are not necessarily equidistant, that's why t is created like this. Also t should not be used in further code as for my real problems it is not known, I just get a bunch of x, y and z values in the end. For this example I reduced it to 2D.
Now I'm creating ParametricSplines for the x and y values
% Spline
n=100; [x_t, y_t, tt] = ParametricSpline(x, y, n);
xref = ppval(x_t, tt); yref = ppval(y_t, tt);
with the function
function [ x_t, y_t, t_t ] = ParametricSpline(x,y,n)
m = length(x);
t = zeros(m, 1);
for i=2:m
arc_length = sqrt((x(i)-x(i-1))^2 + (y(i)-y(i-1))^2);
t(i) = t(i-1) + arc_length;
end
t=t./t(length(t));
x_t = spline(t, x);
y_t = spline(t, y);
t_t = linspace(0,1,n);
end
The plot generated by
plot(x,y,'ob',...
xref,yref,'xk',...
xref,yref,'-r'),...
axis equal;
looks like the follows: Plot Spline
The Question:
How do I change the code so I always have one of the resulting points (xref_i|yref_i) (shown as Black X in the plot) directly on the originally given points (x_j|y_j) (shown as Blue O) with additionally n points between (x_j|y_j) and (x_j+1|y_j+1)?
E.g. with n=2 I would like to get the following:
(xref_1|yref_1) = (x_1|y_1)
(xref_2|yref_2)
(xref_3|yref_3)
(xref_4|yref_4) = (x_2|y_2)
(xref_5|yref_5)
[...]
I guess the only thing I need is to change the definition of tt but I just can't figure out how... Thanks for your help!
Use this as your function:
function [ x_t, y_t, tt ] = ParametricSpline(x,y,nt)
arc_length = 0;
n = length(x);
t = zeros(n, 1);
mul_p = linspace(0,1,nt+2)';
mul_p = mul_p(2:end);
tt = t(1);
for i=2:n
arc_length = sqrt((x(i)-x(i-1))^2 + (y(i)-y(i-1))^2);
t(i) = t(i-1) + arc_length;
add_points = mul_p * arc_length + t(i-1);
tt = [tt ; add_points];
end
t=t./t(end);
tt = tt./tt(end);
x_t = spline(t, x);
y_t = spline(t, y);
end
The essence:
You have to construct tt in the same way as your distance vector t plus add additional nt points in between.