Is there a correlation ratio in MATLAB? - matlab

Is there any function in Matlab which calculates the correlation ratio?
Here is an implementation I tried to do, but the results are not right.
function cr = correlation_ratio(X, Y, L)
ni = zeros(1, L);
sigmai = ni;
for i = 0:(L-1)
Yn = Y(X == i);
ni(1, i+1) = numel(Yn);
m = (1/ni(1, i+1))*sum(Yn);
sigmai(1, i+1) = (1/ni(1, i+1))*sum((Yn - m).^2);
end
n = sum(ni);
prod = ni.*sigmai;
cr = (1-(1/n)*sum(prod))^0.5;

This is the equation on the Wikipedia page:
where:
η is the correlation ratio,
yx,i are the sample values (x is the class label, i the sample index),
yx (with the bar on top) is the mean of sample values for class x,
y (with the bar on top) is the mean for all samples across all classes, and
nx is the number of samples in class x.
This is how I interpreted it into code:
function eta = correlation_ratio(X, Y)
X = X(:); % make sure we've got column vectors, simplifies things below a bit
Y = Y(:);
L = max(X);
mYx = zeros(1, L+1); % we'll write mean per class here
nx = zeros(1, L+1); % we'll write number of samples per class here
for i = unique(X).'
Yn = Y(X == i);
if numel(Yn)>1
mYx(i+1) = mean(Yn);
nx(i+1) = numel(Yn);
end
end
mY = mean(Y); % mean across all samples
eta = sqrt(sum(nx .* (mYx - mY).^2) / sum((Y-mY).^2));
The loop could be replaced with accumarray.

Related

FastICA Implementation.. Matlab

I have been working on a FastICA algorithm implementation using MatLab. Currently the code does not separate the signals as good as id like. I was wondering if anyone here could give me some advice on what I could do to fix this problem?
disp('*****Importing Signals*****');
s = [1,30000];
[m1,Fs1] = audioread('OSR_us_000_0034_8k.wav', s);
[f1,Fs2] = audioread('OSR_us_000_0017_8k.wav', s);
ss = size(f1,1);
n = 2;
disp('*****Mixing Signals*****');
A = randn(n,n); %developing mixing matrix
x = A*[m1';f1']; %A*x
m_x = sum(x, n)/ss; %mean of x
xx = x - repmat(m_x, 1, ss); %centering the matrix
c = cov(x');
sq = inv(sqrtm(c)); %whitening the data
x = c*xx;
D = diff(tanh(x)); %setting up newtons method
SD = diff(D);
disp('*****Generating Weighted Matrix*****');
w = randn(n,1); %Random weight vector
w = w/norm(w,2); %unit vector
w0 = randn(n,1);
w0 = w0/norm(w0,2); %unit vector
disp('*****Unmixing Signals*****');
while abs(abs(w0'*w)-1) > size(w,1)
w0 = w;
w = x*D(w'*x) - sum(SD'*(w'*x))*w; %perform ICA
w = w/norm(w, 2);
end
disp('*****Output After ICA*****');
sound(w'*x); % Supposed to be one of the original signals
subplot(4,1,1);plot(m1); title('Original Male Voice');
subplot(4,1,2);plot(f1); title('Original Female Voice');
subplot(4,1,4);plot(w'*x); title('Post ICA: Estimated Signal');
%figure;
%plot(z); title('Random Mixed Signal');
%figure;
%plot(100*(w'*x)); title('Post ICA: Estimated Signal');
Your covariance matrix c is 2 by 2, you cannot work with that. You have to mix your signal multiple times with random numbers to get anywhere, because you must have some signal (m1) common to different channels. I was unable to follow through your code for fast-ICA but here is a PCA example:
url = {'https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0034_8k.wav';...
'https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0017_8k.wav'};
%fs = 8000;
m1 = webread(url{1});
m1 = m1(1:30000);
f1 = webread(url{2});
f1 = f1(1:30000);
ss = size(f1,1);
n = 2;
disp('*****Mixing Signals*****');
A = randn(50,n); %developing mixing matrix
x = A*[m1';f1']; %A*x
[www,comp] = pca(x');
sound(comp(:,1)',8000)

Vectorize a regression map calculation

I compute the regression map of a time series A(t) on a field B(x,y,t) in the following way:
A=1:10; %time
B=rand(100,100,10); %x,y,time
rc=nan(size(B,1),size(B,2));
for ii=size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
Is there a way to vectorize/speed up code? Or maybe some built-in function that I did not know of to achieve the same result?
In order to vectorize this algorithm, you would have to "get your hands dirty" and compute the covariance yourself. If you take a look inside cov you'll see that it has many lines of input checking and very few lines of actual computation, to summarize the critical steps:
y = varargin{1};
x = x(:);
y = y(:);
x = [x y];
[m,~] = size(x);
denom = m - 1;
xc = x - sum(x,1)./m; % Remove mean
c = (xc' * xc) ./ denom;
To simplify the above somewhat:
x = [x(:) y(:)];
m = size(x,1);
xc = x - sum(x,1)./m;
c = (xc' * xc) ./ (m - 1);
Now this is something that is fairly straightforward to vectorize...
function q51466884
A = 1:10; %time
B = rand(200,200,10); %x,y,time
%% Test Equivalence:
assert( norm(sol1-sol2) < 1E-10);
%% Benchmark:
disp([timeit(#sol1), timeit(#sol2)]);
%%
function rc = sol1()
rc=nan(size(B,1),size(B,2));
for ii=1:size(B,1)
for jj=1:size(B,2)
tmp = cov(A,squeeze(B(ii,jj,:))); %covariance matrix
rc(ii,jj) = tmp(1,2); %covariance A and B
end
end
rc = rc/var(A); %regression coefficient
end
function rC = sol2()
m = numel(A);
rB = reshape(B,[],10).'; % reshape
% Center:
cA = A(:) - sum(A)./m;
cB = rB - sum(rB,1)./m;
% Multiply:
rC = reshape( (cA.' * cB) ./ (m-1), size(B(:,:,1)) ) ./ var(A);
end
end
I get these timings: [0.5381 0.0025] which means we saved two orders of magnitude in the runtime :)
Note that a big part of optimizing the algorithm is assuming you don't have any "strangeness" in your data, like NaN values etc. Take a look inside cov.m to see all the checks that we skipped.

Compute weighted summation of matrix power (matrix polynomial) in Matlab

Given an nxn matrix A_k and a nx1 vector x, is there any smart way to compute
using Matlab? x_i are the elements of the vector x, therefore J is a sum of matrices. So far I have used a for loop, but I was wondering if there was a smarter way.
Short answer: you can use the builtin matlab function polyvalm for matrix polynomial evaluation as follows:
x = x(end:-1:1); % flip the order of the elements
x(end+1) = 0; % append 0
J = polyvalm(x, A);
Long answer: Matlab uses a loop internally. So, you didn't gain that much or you perform even worse if you optimise your own implementation (see my calcJ_loopOptimised function):
% construct random input
n = 100;
A = rand(n);
x = rand(n, 1);
% calculate the result using different methods
Jbuiltin = calcJ_builtin(A, x);
Jloop = calcJ_loop(A, x);
JloopOptimised = calcJ_loopOptimised(A, x);
% check if the functions are mathematically equivalent (should be in the order of `eps`)
relativeError1 = max(max(abs(Jbuiltin - Jloop)))/max(max(Jbuiltin))
relativeError2 = max(max(abs(Jloop - JloopOptimised)))/max(max(Jloop))
% measure the execution time
t_loopOptimised = timeit(#() calcJ_loopOptimised(A, x))
t_builtin = timeit(#() calcJ_builtin(A, x))
t_loop = timeit(#() calcJ_loop(A, x))
% check if builtin function is faster
builtinFaster = t_builtin < t_loopOptimised
% calculate J using Matlab builtin function
function J = calcJ_builtin(A, x)
x = x(end:-1:1);
x(end+1) = 0;
J = polyvalm(x, A);
end
% naive loop implementation
function J = calcJ_loop(A, x)
n = size(A, 1);
J = zeros(n,n);
for i=1:n
J = J + A^i * x(i);
end
end
% optimised loop implementation (cache result of matrix power)
function J = calcJ_loopOptimised(A, x)
n = size(A, 1);
J = zeros(n,n);
A_ = eye(n);
for i=1:n
A_ = A_*A;
J = J + A_ * x(i);
end
end
For n=100, I get the following:
t_loopOptimised = 0.0077
t_builtin = 0.0084
t_loop = 0.0295
For n=5, I get the following:
t_loopOptimised = 7.4425e-06
t_builtin = 4.7399e-05
t_loop = 1.0496e-04
Note that my timings fluctuates somewhat between different runs, but the optimised loop is almost always faster (up to 6x for small n) than the builtin function.

Matlab: Editing the lhsnorm() function for Latin Hypercube

I am attempting to edit the lhsnorm function so that I can obtain Latin Hypercube sample from a Normally distributed set of data.
The lhsnorm function is as follows:
function [X,z] = lhsnorm(mu,sigma,n,dosmooth)
%LHSNORM Generate a latin hypercube sample with a normal distribution
z1 = mvnrnd(mu,sigma,numel(z));
%%%%%%%%Is it possible for me to change this to a univariate distribution
% without affecting the rest of the code ?????
% Find the ranks of each column
p = length(mu);
x = zeros(size(z),class(z));
for i=1:p
x(:,i) = rank(z(:,i));
end
% Get gridded or smoothed-out values on the unit interval
if (nargin<4) || isequal(dosmooth,'on')
x = x - rand(size(x));
else
x = x - 0.5;
end
x = x / n;
% Transform each column back to the desired marginal distribution,
% maintaining the ranks (and therefore rank correlations)
for i=1:p
x(:,i) = norminv(x(:,i),mu(i), sqrt(sigma(i,i)));
end
X = x;
function r=rank(x)
% Similar to tiedrank, but no adjustment for ties here
[sx, rowidx] = sort(x);
r(rowidx) = 1:length(x);
r = r(:);
I am editing the code as is shown below:
function [X] = lhsnorm_sid(z,dosmooth)
[muhat,sigmahat] = normfit(z);
z = z';
% Find the ranks of each column
p = length(muhat);
x = zeros(size(z),class(z));
s = size(z);
n = s(1,1);
for i=1:p
x(:,i) = rank(z(:,i));
end
if (nargin<4) || isequal(dosmooth,'on')
x = x - rand(size(x));
else
x = x - 0.5;
end
x = x / n;
for i=1:p
x(:,i) = norminv(x(:,i),muhat(i), sqrt(sigmahat(i,i)));
end
X = x;
function r=rank(x)
[sx, rowidx] = sort(x);
r(rowidx) = 1:length(x);
r = r(:);
However, I end up with Latin Hypercube values which are way off what is expected. I also tried directly substituting z with d(1,:) which contains a Normally distributed set of data, however the Latin Hypercube I obtained did not contain any of the values in d(1,:).
Thanks

Double Summation in MATLAB and vectorized loops

Here's my attempt in implementing this lovely formula.
http://dl.dropbox.com/u/7348856/Picture1.png
%WIGNER Computes Wigner-Distribution on an image (difference of two images).
function[wd] = wigner(difference)
%Image size
[M, N, ~] = size(difference);
%Window size (5 x 5)
Md = 5;
Nd = 5;
%Fourier Transform
F = fft2(difference);
%Initializing the wigner picture
wd = zeros(M, N, 'uint8');
lambda =0.02;
value = (4/(Md*Nd));
for x = 1+floor(Md/2):M - floor(Md/2)
for y = 1+floor(Nd/2):N - floor(Nd/2)
for l = -floor(Nd/2) : floor(Nd/2)
for k = -floor(Md/2) : floor(Md/2)
kernel = exp(-lambda * norm(k,l));
kernel = kernel * value;
theta = 4 * pi * ((real(F(x, y)) * (k/M) )+ (imag(F(x, y)) * (l/N)));
wd(x, y) = (wd(x, y)) + (cos(theta) * difference(x + k, y + l) * difference(x - k, y - l) * (kernel));
end
end
end
end
end
As you can see, the outer two loops are for the sliding window, while the remaining inner ones are for the variables of the summation.
Now, my request for you my beloved stackoverflow users is: Can you help me improve these very nasty for loops that take more than its share of time, and turn it into vectorized loops?
And will that improvement be of a significant change?
Thank you.
this might not be what you are asking, but it seems (at first glance) that the order of the summations are independent and that instead of {x,y,l,k} you could go {l,k,x,y}. doing this will allow you to evaluate kernel fewer times by keeping it in the outer most loop.
Those four nested loops are basically processing each pixel in the image in a sliding-neighborhood style. I immediately thought of NLFILTER and IM2COL functions.
Here is my attempt at vectorizing the code. Note that I haven't thoroughly tested it, or compared performance against loop-based solution:
function WD = wigner(D, Md, Nd, lambda)
%# window size and lambda
if nargin<2, Md = 5; end
if nargin<3, Nd = 5; end
if nargin<4, lambda = 5; end
%# image size
[M,N,~] = size(D);
%# kernel = exp(-lambda*norm([k,l])
[K,L] = meshgrid(-floor(Md/2):floor(Md/2), -floor(Nd/2):floor(Nd/2));
K = K(:); L = L(:);
kernel = exp(-lambda .* sqrt(K.^2+L.^2));
%# frequency-domain part
F = fft2(D);
%# f(x+k,y+l) * f(x-k,y-l) * kernel
C = im2col(D, [Md Nd], 'sliding');
X1 = bsxfun(#times, C .* flipud(C), kernel);
%# cos(theta)
C = im2col(F, [Md Nd], 'sliding');
C = C(round(Md*Nd/2),:); %# take center pixels
theta = bsxfun(#times, real(C), K/M) + bsxfun(#times, imag(C), L/N);
X2 = cos(4*pi*theta);
%# combine both parts for each sliding-neighborhood
WD = col2im(sum(X1.*X2,1), [Md Nd], size(F), 'sliding') .* (4/(M*N));
%# pad array with zeros to be of same size as input image
WD = padarray(WD, ([Md Nd]-1)./2, 0, 'both');
end
For what its worth, here is the loop-based version with the improvement that #Laurbert515 suggested:
function WD = wigner_loop(D, Md, Nd, lambda)
%# window size and lambda
if nargin<2, Md = 5; end
if nargin<3, Nd = 5; end
if nargin<4, lambda = 5; end
%# image size
[M,N,~] = size(D);
%# frequency-domain part
F = fft2(D);
WD = zeros([M,N]);
for l = -floor(Nd/2):floor(Nd/2)
for k = -floor(Md/2):floor(Md/2)
%# kernel = exp(-lambda*norm([k,l])
kernel = exp(-lambda * norm([k,l]));
for x = (1+floor(Md/2)):(M-floor(Md/2))
for y = (1+floor(Nd/2)):(N-floor(Nd/2))
%# cos(theta)
theta = 4 * pi * ( real(F(x,y))*k/M + imag(F(x,y))*l/N );
%# f(x+k,y+l) * f(x-k,y-l)* kernel
WD(x,y) = WD(x,y) + ( cos(theta) * D(x+k,y+l) * D(x-k,y-l) * kernel );
end
end
end
end
WD = WD * ( 4/(M*N) );
end
and how I test it (based on what I understood from the paper you previously linked to):
%# difference between two consecutive frames
A = imread('AT3_1m4_02.tif');
B = imread('AT3_1m4_03.tif');
D = imsubtract(A,B);
%#D = rgb2gray(D);
D = im2double(D);
%# apply Wigner-Distribution
tic, WD1 = wigner(D); toc
tic, WD2 = wigner_loop(D); toc
figure(1), imshow(WD1,[])
figure(2), imshow(WD2,[])
you might then need to scale/normalize the matrix, and apply thresholding...