MATLAB: Correlation with a seed region - matlab

By default, all built-in functions for computing correlation or covariance return a matrix. I am trying to write an efficient function that will compute the correlation between a seed region and various other regions, but I do not need the correlations between the other regions. I assume that computing the full correlation matrix would therefore be inefficient.
I could instead compute a the correlation matrix between each region and the seed region, choose one of the off diagonal points and store it, but I feel like looping in this situation is also inefficient.
To be more concrete, each point in my 3-dimensional space has a time dimension. I am attempting to compute the mean correlation between a given point and all points in space within a given radius. I want to repeat this procedure hundreds of thousands of times, for many different radius lengths, and so on, so I would like for this to be as efficient as possible.
So, what is the best way to compute the correlation between a single vector and several others, without computing correlations that I will just ignore?
Thank you,
Chris
EDIT: Here is my code now...
function [corrMap] = TIME_meanCorrMap(A,radius)
% Even though the variable is "radius", we work with cubes for simplicity...
% So, the radius is the distance (in voxels) from the center of the cube an edge.
denom = ((radius*2)^3)-1;
dim = size(A);
corrMap = zeros(dim(1:3));
for x = radius+1:dim(1)-radius
rx = [x-radius : x+radius];
for y = radius+1:dim(2)-radius
ry = [y-radius : y+radius];
for z = radius+1:dim(3)-radius
rz = [z-radius : z+radius];
corrCoefs = zeros(1,denom);
seed = A(x,y,z,:);
i=0;
for xx = rx
for yy = ry
for zz = rz
if ~all([x y z] == [xx yy zz])
i = i + 1;
temp = corrcoef(seed,A(xx,yy,zz,:));
corrCoeffs(i) = temp(1,2);
end
end
end
end
corrMap = mean(corrCoeffs);
end
end
end
EDIT: Here are some more times to supplement the accepted answer.
Using bsxfun() to do normalization, and matrix multiplication to compute correlations:
tic; for i=1:10000
x=rand(100);
xz = bsxfun(#rdivide,bsxfun(#minus,x,mean(x)),std(x));
cc = xz(:,2:end)' * xz(:,1) ./ 99;
end; toc
Elapsed time is 6.928251 seconds.
Using zscore() to normalize, matrix multiplication to compute correlations:
tic; for i=1:10000
x=rand(100);
xz = zscore(x);
cc = xz(:,2:end)' * xz(:,1) ./ 99;
end; toc
Elapsed time is 7.040677 seconds.
Using bsxfun() to normalize, and corr() to compute correlations.
tic; for i=1:10000
x=rand(100);
xz = bsxfun(#rdivide,bsxfun(#minus,x,mean(x)),std(x));
cc = corr(x(:,1),x(:,2:end));
end; toc
Elapsed time is 11.385707 seconds.

It is certainly possible to improve upon the for loop that you are currently employing. The correlation compuattions can be parallelized using matrix multiplications if you have sufficient RAM. However, it will require you to unwrap your 4-dimensional data matrix A into a different shape. most likely you are dealing with 3-dimensional voxelwise fMRI data, in which case you'll have to reshape from [x y z time] matrix to an [index time] matrix. I will assume you can deal with that reshaping. Once you have your seed timecourse [Time by 1] and your target timecourses [Time by NumTargets] ready, you can perform some much more efficient computations.
A quick way to efficiently compute the desired correlation is using the corr function in MATLAB. This function will accept 2 matrix arguments and it will quite efficiently compute all pairwise correlations between the columns of argument 1 and the columns of argument 2, e.g.
T = 200; %time samples
N = 20; %number of other voxels
seed = randn(T,1); %data from seed voxel
targets = randn(T,N); %data from target voxels
%here is the for loop method
tic
for n = 1:N
tmp = corrcoef(seed, targets(:,n));
tmpcc = tmp(1,2);
end
looptime = toc;
%here is the parallel method
tic
cc = corr(seed, targets);
matrixtime = toc;
On my machine, the parallel operation in corr is faster than the loop method by a factor proportional to T*N.
It is possible to go a little faster than the corr function if you are willing to perofrm the underlying matrix operations yourself, and in any case it is worth knowing what they are. The correlation between two vectors is basically a normalized dot product, so using the conventions above you can compute the correlations in the following way
zseed = zscore(seed); %normalize the seed timecourse by z-scoring
ztargets= zscore(targets); %normalize the target timecourses by z-scoring
ztargets = ztargets'; %flip columns and rows for convenience
cc2 = ztargets*zseed./(T-1); %compute many dot products with one matrix multiplication
The code above is basically what the corr function will do which is why it is much faster than the loop. Note that most of the operation time is in the zscore operations, and you can improve on the performance of the corr function if you efficiently compute the zscore using the bsxfun command. For now, I hope this gives you some direction on how to compute a correlation between a seed timecourse and many target timecourses without having to loop through and compute each one separately.

Related

Imaginary part of coherence matlab

I want to calculate IMC according to below instructions
I wrote below code in matlab, but the result is not what I was expecting.
Is that code a valid implementation of the above instructions?
Could someone help me with a better code?
function [ imag_coherence] = imcoh( x,y )
xy=xcorr(fft(x),fft(y));
xx=xcorr(fft(x));
yy=xcorr(fft(y));
imag_coherence=imag(xy./sqrt(xx.*yy));
end
xcorr actually computes the cross-correlation between the computed spectrums (a sums contributions over all frequencies), not the expectation of the point-wise multiplication (i.e. for a given fixed frequency) of those spectrums. The latter being what the coherency definition you provided uses.
Assuming the processes producing x and y are ergodic, the expectations can be estimated by computing the average over many blocks of data. With that in mind, an implementation of the coherency as described in your definition could look like:
function [ result ] = coherency( x,y,N )
% divide data in N equal length blocks for averaging later on
L = floor(length(x)/N);
xt = reshape(x(1:L*N), L, N);
yt = reshape(y(1:L*N), L, N);
% transform to frequency domain
Xf = fft(xt,L,1);
Yf = fft(yt,L,1);
% estimate expectations by taking the average over N blocks
xy = sum(Xf .* conj(Yf), 2)/N;
xx = sum(Xf .* conj(Xf), 2)/N;
yy = sum(Yf .* conj(Yf), 2)/N;
% combine terms to get final result
result=xy./sqrt(xx.*yy);
end
If you only want the imaginary part, then it's a simple matter of computing imag(coherency(x,y,N)).

Loopless Gaussian mixture model in Matlab

I have several Gaussian distributions and I want to draw different values from all of them at the same time. Since this is basically what a GMM does, I have looked into Matlab GMM implementation (gmrnd) and I have seen that it performs a simple loop over all the components.
I would like to implement it in a faster way, but the problem is that 3d matrices are involved. A simple code (with loop) would be
n = 10; % number of Gaussians
d = 2; % dimension of each Gaussian
mu = rand(d,n); % init some means
U = rand(d,d,n); % init some covariances with their Cholesky decomposition (Cov = U'*U)
I = repmat(triu(true(d,d)),1,1,n);
U(~I) = 0;
r = randn(d,n); % random values for drawing samples
samples = zeros(d,n);
for i = 1 : n
samples(:,i) = U(:,:,i)' * r(:,i) + mu(:,i);
end
Is it possible to speed it up? I do not know how to deal with the 3d covariances matrix (without using cellfun, which is much slower).
Few improvements (hopefully are improvements) could be suggested here.
PARTE #1 You can replace the following piece of code -
I = repmat(triu(true(d,d)),[1,1,n]);
U(~I) = 0;
with bsxfun(#times,..) one-liner -
U = bsxfun(#times,triu(true(d,d)),U)
PARTE #2 You can kill the loopy portion of the code again with bsxfun(#times,..) like so -
samples = squeeze(sum(bsxfun(#times,U,permute(r,[1 3 2])),2)) + mu
I'm not fully convinced this is faster, but it gets rid of the loop. It would be interesting to see benchmarking results if you can do that. I also think this code makes is rather ugly and it's a bit hard to deduce what's going on, but I'll let you decide between readability and performance.
Anyway, I decided to define a big n*d dimensional Gaussian where each block d of variates are independent of each other (as in the original). This allows defining the covariance as a block diagonal matrix, for which I use blkdiag. From there, it is a matter of applying bsxfun to remove the need for looping.
Using the same random seed, I can recover the same samples as your code:
%// sampling with block diagonal covariance matrix
rng(1) %// set random seed
Ub = mat2cell(U, d, d, ones(n,1)); %// 1-by-1-by-10 cell of 2-by-2 matrices
C = blkdiag(Ub{:});
Ns = 1; %// number of samples
joint_samples = bsxfun(#plus, C'*randn(d*n, Ns), mu(:));
new_samples = reshape(joint_samples, [d n]); %// or [d n Ns] if Ns > 1
%//Compare to original
rng(1) %// set same seed for repeatability
r = randn(d,n); % random values for drawing samples
samples = zeros(d,n);
for i = 1 : n
samples(:,i) = U(:,:,i)' * r(:,i) + mu(:,i);
end
isequal(samples, new_samples) %// true

Unexpected result with DFT in MATLAB

I have a problem when calculate discrete Fourier transform in MATLAB, apparently get the right result but when plot the amplitude of the frequencies obtained you can see values very close to zero which should be exactly zero. I use my own implementation:
function [y] = Discrete_Fourier_Transform(x)
N=length(x);
y=zeros(1,N);
for k = 1:N
for n = 1:N
y(k) = y(k) + x(n)*exp( -1j*2*pi*(n-1)*(k-1)/N );
end;
end;
end
I know it's better to use fft of MATLAB, but I need to use my own implementation as it is for college.
The code I used to generate the square wave:
x = [ones(1,8), -ones(1,8)];
for i=1:63
x = [x, ones(1,8), -ones(1,8)];
end
MATLAB version: R2013a(8.1.0.604) 64 bits
I have tried everything that has happened to me but I do not have much experience using MATLAB and I have not found information relevant to this issue in forums. I hope someone can help me.
Thanks in advance.
This will be a numerical problem. The values are in the range of 1e-15, while the DFT of your signal has values in the range of 1e+02. Most likely this won't lead to any errors when doing further processing. You can calculate the total squared error between your DFT and the MATLAB fft function by
y = fft(x);
yh = Discrete_Fourier_Transform(x);
sum(abs(yh - y).^2)
ans =
3.1327e-20
which is basically zero. I would therefore conclude: your DFT function works just fine.
Just one small remark: You can easily vectorize the DFT.
n = 0:1:N-1;
k = 0:1:N-1;
y = exp(-1j*2*pi/N * n'*k) * x(:);
With n'*k you create a matrix with all combinations of n and k. You then take the exp(...) of each of those matrix elements. With x(:) you make sure x is a column vector, so you can do the matrix multiplication (...)*x which automatically sums over all k's. Actually, I just notice, this is exactly the well-known matrix form of the DFT.

Multiply an arbitrary number of matrices an arbitrary number of times

I have found several questions/answers for vectorizing and speeding up routines for multiplying a matrix and a vector in a single loop, but I am trying to do something a little more general, namely multiplying an arbitrary number of matrices together, and then performing that operation an arbitrary number of times.
I am writing a general routine for calculating thin-film reflection from an arbitrary number of layers vs optical frequency. For each optical frequency W each layer has an index of refraction N and an associated 2x2 transfer matrix L and 2x2 interface matrix I which depends on the index of refraction and the thickness of the layer. If n is the number of layers, and m is the number of frequencies, then I can vectorize the index into an n x m matrix, but then in order to calculate the reflection at each frequency, I have to do nested loops. Since I am ultimately using this as part of a fitting routine, anything I can do to speed it up would be greatly appreciated.
This should provide a minimum working example:
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
for x = 1:m %loop over frequencies
C = eye(2); % first medium is air
for y = 2:n %loop over layers
na = N(y-1,x);
nb = N(y,x);
%I = InterfaceMatrix(na,nb); % calculate the 2x2 interface matrix
I = [1 na*nb;na*nb 1]; % dummy matrix
%L = TransferMatrix(nb) % calculate the 2x2 transfer matrix
L = [exp(-1i*nb*W(x)*D(y)) 0; 0 exp(+1i*nb*W(x)*D(y))]; % dummy matrix
C = C*I*L;
end
a = C(1,1);
c = C(2,1);
r(x) = c/a; % reflectivity, the answer I want.
end
Running this twice for two different polarizations for a three layer (air/stuff/substrate) problem with 2562 frequencies takes 0.952 seconds while solving the exact same problem with the explicit formula (vectorized) for a three layer system takes 0.0265 seconds. The problem is that beyond 3 layers, the explicit formula rapidly becomes intractable and I would have to have a different subroutine for each number of layers while the above is completely general.
Is there hope for vectorizing this code or otherwise speeding it up?
(edited to add that I've left several things out of the code to shorten it, so please don't try to use this to actually calculate reflectivity)
Edit: In order to clarify, I and L are different for each layer and for each frequency, so they change in each loop. Simply taking the exponent will not work. For a real world example, take the simplest case of a soap bubble in air. There are three layers (air/soap/air) and two interfaces. For a given frequency, the full transfer matrix C is:
C = L_air * I_air2soap * L_soap * I_soap2air * L_air;
and I_air2soap ~= I_soap2air. Thus, I start with L_air = eye(2) and then go down successive layers, computing I_(y-1,y) and L_y, multiplying them with the result from the previous loop, and going on until I get to the bottom of the stack. Then I grab the first and third values, take the ratio, and that is the reflectivity at that frequency. Then I move on to the next frequency and do it all again.
I suspect that the answer is going to somehow involve a block-diagonal matrix for each layer as mentioned below.
Not next to a matlab, so that's only a starter,
Instead of the double loop you can write na*nb as Nab=N(1:end-1,:).*N(2:end,:);
The term in the exponent nb*W(x)*D(y) can be written as e=N(2:end,:)*W'*D;
The result of I*L is a 2x2 block matrix that has this form:
M = [1, Nab; Nab, 1]*[e-, 0;0, e+] = [e- , Nab*e+ ; Nab*e- , e+]
with e- as exp(-1i*e), and e+ as exp(1i*e)'
see kron on how to get the block matrix form, to vectorize the propagation C=C*I*L just take M^n
#Lama put me on the right path by suggesting block matrices, but the ultimate answer ended up being more complicated, and so I put it here for posterity. Since the transfer and interface matrix is different for each layer, I leave in the loop over the layers, but construct a large sparse block matrix where each block represents a frequency.
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
C = speye(2*m); % first medium is air
even = 2:2:2*m;
odd = 1:2:2*m-1;
for y = 2:n %loop over layers
na = N(y-1,:);
nb = N(y,:);
% get the reflection and transmission coefficients from subroutines as a vector
% of length m, one value for each frequency
%t = Tab(na, nb);
%r = Rab(na, nb);
t = rand(size(W)); % dummy vector for MWE
r = rand(size(W)); % dummy vector for MWE
% create diagonal and off-diagonal elements. each block is [1 r;r 1]/t
Id(even) = 1./t;
Id(odd) = Id(even);
Io(even) = 0;
Io(odd) = r./t;
It = [Io;Id/2].';
I = spdiags(It,[-1 0],2*m,2*m);
I = I + I.';
b = 1i.*(2*pi*D(n).*nb).*W;
B(even) = -b;
B(odd) = b;
L = spdiags(exp(B).',0,2*m,2*m);
C = C*I*L;
end
a = spdiags(C,0);
a = a(odd).';
c = spdiags(C,-1);
c = c(odd).';
r = c./a; % reflectivity, the answer I want.
With the 3 layer system mentioned above, it isn't quite as fast as the explicit formula, but it's close and probably can get a little faster after some profiling. The full version of the original code clocks at 0.97 seconds, the formula at 0.012 seconds and the sparse diagonal version here at 0.065 seconds.

How can we produce kappa and delta in the following model using Matlab?

I have a following stochastic model describing evolution of a process (Y) in space and time. Ds and Dt are domain in space (2D with x and y axes) and time (1D with t axis). This model is usually known as mixed-effects model or components-of-variation models
I am currently developing Y as follow:
%# Time parameters
T=1:1:20; % input
nT=numel(T);
%# Grid and model parameters
nRow=100;
nCol=100;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:1:nCol,1:1:nRow,T);
xPower=0.1;
tPower=1;
noisePower=1;
detConstant=1;
deterministic_mu = detConstant.*(((Grid.Nt).^tPower)./((Grid.Nx).^xPower));
beta_s = randn(nRow,nCol); % mean-zero random effect representing location specific variability common to all times
gammaTemp = randn(nT,1);
for t = 1:nT
gamma_t(:,:,t) = repmat(gammaTemp(t),nRow,nCol); % mean-zero random effect representing time specific variability common to all locations
end
var=0.1;% noise has variance = 0.1
for t=1:nT
kappa_st(:,:,t) = sqrt(var)*randn(nRow,nCol);
end
for t=1:nT
Y(:,:,t) = deterministic_mu(:,:,t) + beta_s + gamma_t(:,:,t) + kappa_st(:,:,t);
end
My questions are:
How to produce delta in the expression for Y and the difference in kappa and delta?
Help explain, through some illustration using Matlab, if I am correctly producing Y?
Please let me know if you need some more information/explanation. Thanks.
First, I rewrote your code to make it a bit more efficient. I see you generate linearly-spaced grids for x,y and t and carry out the computation for all points in this grid. This approach has severe limitations on the maximum attainable grid resolution, since the 3D grid (and all variables defined with it) can consume an awfully large amount of memory if the resolution goes up. If the model you're implementing will grow in complexity and size (it often does), I'd suggest you throw this all into a function accepting matrix/vector inputs for s and t, which will be a bit more flexible in this regard -- processing "blocks" of data that will otherwise not fit in memory will be a lot easier that way.
Then, I generated the the delta_st term with rand instead of randn since the noise should be "white". Now I'm very unsure about that last one, and I didn't have time to read through the paper you linked to -- can you tell me on what pages I can find relevant the sections for the delta_st?
Now, the code:
%# Time parameters
T = 1:1:20; % input
nT = numel(T);
%# Grid and model parameters
nRow = 100;
nCol = 100;
% noise has variance = 0.1
var = 0.1;
xPower = 0.1;
tPower = 1;
noisePower = 1;
detConstant = 1;
[Grid.Nx,Grid.Ny,Grid.Nt] = meshgrid(1:nCol,1:nRow,T);
% deterministic mean
deterministic_mu = detConstant .* Grid.Nt.^tPower ./ Grid.Nx.^xPower;
% mean-zero random effect representing location specific
% variability common to all times
beta_s = repmat(randn(nRow,nCol), [1 1 nT]);
% mean-zero random effect representing time specific
% variability common to all locations
gamma_t = bsxfun(#times, ones(nRow,nCol,nT), randn(1, 1, nT));
% mean zero random effect capturing the spatio-temporal
% interaction not found in the larger-scale deterministic mu
kappa_st = sqrt(var)*randn(nRow,nCol,nT);
% mean zero random effect representing the micro-scale
% spatio-temporal variability that is modelled by white
% noise (i.i.d. at different time steps) in Ds·Dt
delta_st = noisePower * (rand(nRow,nCol,nT)-0.5);
% Final result:
Y = deterministic_mu + beta_s + gamma_t + kappa_st + delta_st;
Your implementation samples beta, gamma and kappa as if they are white (e.g. their values at each (x,y,t) are independent). The descriptions of the terms suggest that this is not meant to be the case. It looks like delta is supposed to capture the white noise, while the other terms capture the correlations over their respective domains. e.g. there is a non-zero correlation between gamma(t_1) and gamma(t_1+1).
If you wish to model gamma as a stationary Gaussian Markov process with variance var_g and correlation cor_g between gamma(t) and gamma(t+1), you can use something like
gamma_t = nan( nT, 1 );
gamma_t(1) = sqrt(var_g)*randn();
K_g = cor_g/var_g;
K_w = sqrt( (1-K_g^2)*var_g );
for t = 2:nT,
gamma_t(t) = K_g*gamma_t(t-1) + K_w*randn();
end
gamma_t = reshape( gamma_t, [ 1 1 nT ] );
The formulas I've used for gains K_g and K_w in the above code (and the initialization of gamma_t(1)) produce the desired stationary variance \sigma^2_0 and one-step covariance \sigma^2_1:
Note that the implementation above assumes that later you will sum the terms using bsxfun to do the "repmat" for you:
Y = bsxfun( #plus, deterministic_mu + kappa_st + delta_st, beta_s );
Y = bsxfun( #plus, Y, gamma_t );
Note that I haven't tested the above code, so you should confirm with sampling that it does actually produce a zero noise process of the specified variance and covariance between adjacent samples. To sample beta the same procedure can be extended into two dimensions, but the principles are essentially the same. I suspect kappa should be similarly modeled as a Markov Gaussian Process, but in all three dimensions and with a lower variance to represent higher-order effects not captured in mu, beta and gamma.
Delta is supposed to be zero mean stationary white noise. Assuming it to be Gaussian with variance noisePower one would sample it using
delta_st = sqrt(noisePower)*randn( [ nRows nCols nT ] );