correlation coefficient data driven approach possible? - matlab

I have a matrix 64x64x32x90 which stands for pixels at x,y,z, at time t.
I have a reference signal 1x90 which stands for the behavior I expect for a pixel at some point (x,y,z).
I am constructing a new image of the correlation between each pixel versus my reference.
load('DATA.mat');
ON = ones(1,10);
OFF = zeros(1,10);
taskRef = [OFF ON OFF ON OFF ON OFF ON OFF];
corrImage = zeros(64,64,36);
for i=1:64,
for j=1:63,
for k=1:36
signal = squeeze(DATA(i,j,k,:));
coef = corrcoef(signal',taskRef);
corrImage(i,j,k) = coef(2);
end
end
end
My process is too slow. Is there a way to get rid of my loops or adjust the code to have a better runtime?

Reshape your data so that its first three dimensions are collapsed into one (so now there are 64*64*32 rows and 90 columns).
Then use pdist2 (with 'correlation' option) to compute the correlation of each row with the expected pattern.
Finally, reshape result into the desired shape.
DATA2 = reshape(DATA, [],90);
corrImage = 1 - pdist2(DATA2, taskRef, 'correlation');
corrImage = reshape(corrImage, 64,64,32);

Related

Verify Law of Large Numbers in MATLAB

The problem:
If a large number of fair N-sided dice are rolled, the average of the simulated rolls is likely to be close to the mean of 1,2,...N i.e. the expected value of one die. For example, the expected value of a 6-sided die is 3.5.
Given N, simulate 1e8 N-sided dice rolls by creating a vector of 1e8 uniformly distributed random integers. Return the difference between the mean of this vector and the mean of integers from 1 to N.
My code:
function dice_diff = loln(N)
% the mean of integer from 1 to N
A = 1:N
meanN = sum(A)/N;
% I do not have any idea what I am doing here!
V = randi(1e8);
meanvector = V/1e8;
dice_diff = meanvector - meanN;
end
First of all, make sure everytime you ask a question that it is as clear as possible, to make it easier for other users to read.
If you check how randi works, you can see this:
R = randi(IMAX,N) returns an N-by-N matrix containing pseudorandom
integer values drawn from the discrete uniform distribution on 1:IMAX.
randi(IMAX,M,N) or randi(IMAX,[M,N]) returns an M-by-N matrix.
randi(IMAX,M,N,P,...) or randi(IMAX,[M,N,P,...]) returns an
M-by-N-by-P-by-... array. randi(IMAX) returns a scalar.
randi(IMAX,SIZE(A)) returns an array the same size as A.
So, if you want to use randi in your problem, you have to use it like this:
V=randi(N, 1e8,1);
and you need some more changes:
function dice_diff = loln(N)
%the mean of integer from 1 to N
A = 1:N;
meanN = mean(A);
V = randi(N, 1e8,1);
meanvector = mean(V);
dice_diff = meanvector - meanN;
end
For future problems, try using the command
help randi
And matlab will explain how the function randi (or other function) works.
Make sure to check if the code above gives the desired result
As pointed out, take a closer look at the use of randi(). From the general case
X = randi([LowerInt,UpperInt],NumRows,NumColumns); % UpperInt > LowerInt
you can adapt to dice rolling by
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
as an example. Exchanging NumRolls and NumSamplePaths will yield Rolls.', or transpose(Rolls).
According to the Law of Large Numbers, the updated sample average after each roll should converge to the true mean, ExpVal (short for expected value), as the number of rolls (trials) increases. Notice that as NumRolls gets larger, the sample mean converges to the true mean. The image below shows this for two sample paths.
To get the sample mean for each number of dice rolls, I used arrayfun() with
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
which is equivalent to using the cumulative sum, cumsum(), to get the same result.
CumulativeAvg1 = (cumsum(Rolls(:,1))./(1:NumRolls).'); % equivalent
% MATLAB R2019a
% Create Dice
NumSides = 6; % positive nonzero integer
NumRolls = 200;
NumSamplePaths = 2;
% Roll Dice
Rolls = randi([1 NumSides],NumRolls,NumSamplePaths);
% Output Statistics
ExpVal = mean(1:NumSides);
CumulativeAvg1 = arrayfun(#(jj)mean(Rolls(1:jj,1)),[1:NumRolls]);
CumulativeAvgError1 = CumulativeAvg1 - ExpVal;
CumulativeAvg2 = arrayfun(#(jj)mean(Rolls(1:jj,2)),[1:NumRolls]);
CumulativeAvgError2 = CumulativeAvg2 - ExpVal;
% Plot
figure
subplot(2,1,1), hold on, box on
plot(1:NumRolls,CumulativeAvg1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvg2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(ExpVal,'k-')
title('Average')
xlabel('Number of Trials')
ylim([1 NumSides])
subplot(2,1,2), hold on, box on
plot(1:NumRolls,CumulativeAvgError1,'b--','LineWidth',1.5,'DisplayName','Sample Path 1')
plot(1:NumRolls,CumulativeAvgError2,'r--','LineWidth',1.5,'DisplayName','Sample Path 2')
yline(0,'k-')
title('Error')
xlabel('Number of Trials')

Fast way to compute row by row matrix correlation

I have two very large matrices (228453x460) and I want to compute correlation between rows.
for i=1:228453
if(vec1_preprocess(i,1))
for j=1:228453
df = effdf(vec1_preprocess(i,:)',vec2_preprocess(j,:)');
corr_temp = corr(vec1_preprocess(i,:)', vec2_preprocess(j,:)');
p = calculate_p(corr_temp, df);
temp = (meanVec(i)+p)/2;
meanVec(i) = temp;
end
disp(i);
end
end
This takes ~1day. Is there a direct way to compute this?
Edit: Code for effdf
function df = effdf(ts1,ts2);
%function df = effdf(ts1,ts2);
ts1=ts1-mean(ts1);
ts2=ts2-mean(ts2);
N=length(ts1);
ac1=xcorr(ts1);
ac1=ac1/max(ac1); % normalized autocorrelation
ac1=ac1(((length(ac1)+3)/2):((length(ac1)+3)/2+floor(N/4)));
ac2=xcorr(ts2);
ac2=ac2/max(ac2); % normalized autocorrelation
ac2=ac2(((length(ac2)+3)/2):((length(ac2)+3)/2+floor(N/4)));
df = 1/((1/N)+(2/N)*sum(((N-(1:length(ac1)))/N)'.*ac1.*ac2));
Since you didn't post the code, I assume that your custom functions calculate_p and effdf are perfectly optimized and don't represent the bottleneck of your script. Let's focus on what we have.
The first problem I see is:
if (vec1_preprocess(i,1))
A check over 228453 iterations can sensibly increase the running time. Hence, extract only the matrix rows that don't contain a 0 in the first column and perform your calculations on those:
idx = vec1_preprocess(:,1) ~= 0;
vec1_preprocess = vec1_preprocess(idx,:);
for i = 1:size(vec1_preprocess,1)
% ...
end
The second problem is corr. It seems like you are computing p-values also, using calculate_p. Why don't you use the buil-in p-values returned by the function as second output argument?
[c,p] = corr(A,B);
Alternatively, if Pearson's correlation is what you are looking for, you could replace corr with corrcoef to see if it produces a better performance.
Last but not least (in fact it's the most important thing): is there any reason why you are performing this computation row by row instead of running it on the whole matrices?
If you read the documentation, you'll see that corr computes the correlation between columns, not rows.
To convert rows into columns and columns into rows, simply transpose the matrix:
tmp1 = vec1_preprocess';
tmp2 = vec2_preprocess';
C = corr(tmp1,tmp2);

Reverse-calculating original data from a known moving average

I'm trying to estimate the (unknown) original datapoints that went into calculating a (known) moving average. However, I do know some of the original datapoints, and I'm not sure how to use that information.
I am using the method given in the answers here: https://stats.stackexchange.com/questions/67907/extract-data-points-from-moving-average, but in MATLAB (my code below). This method works quite well for large numbers of data points (>1000), but less well with fewer data points, as you'd expect.
window = 3;
datapoints = 150;
data = 3*rand(1,datapoints)+50;
moving_averages = [];
for i = window:size(data,2)
moving_averages(i) = mean(data(i+1-window:i));
end
length = size(moving_averages,2)+(window-1);
a = (tril(ones(length,length),window-1) - tril(ones(length,length),-1))/window;
a = a(1:length-(window-1),:);
ai = pinv(a);
daily = mtimes(ai,moving_averages');
x = 1:size(data,2);
figure(1)
hold on
plot(x,data,'Color','b');
plot(x(window:end),moving_averages(window:end),'Linewidth',2,'Color','r');
plot(x,daily(window:end),'Color','g');
hold off
axis([0 size(x,2) min(daily(window:end))-1 max(daily(window:end))+1])
legend('original data','moving average','back-calculated')
Now, say I know a smattering of the original data points. I'm having trouble figuring how might I use that information to more accurately calculate the rest. Thank you for any assistance.
You should be able to calculate the original data exactly if you at any time can exactly determine one window's worth of data, i.e. in this case n-1 samples in a window of length n. (In your case) if you know A,B and (A+B+C)/3, you can solve now and know C. Now when you have (B+C+D)/3 (your moving average) you can exactly solve for D. Rinse and repeat. This logic works going backwards too.
Here is an example with the same idea:
% the actual vector of values
a = cumsum(rand(150,1) - 0.5);
% compute moving average
win = 3; % sliding window length
idx = hankel(1:win, win:numel(a));
m = mean(a(idx));
% coefficient matrix: m(i) = sum(a(i:i+win-1))/win
A = repmat([ones(1,win) zeros(1,numel(a)-win)], numel(a)-win+1, 1);
for i=2:size(A,1)
A(i,:) = circshift(A(i-1,:), [0 1]);
end
A = A / win;
% solve linear system
%x = A \ m(:);
x = pinv(A) * m(:);
% plot and compare
subplot(211), plot(1:numel(a),a, 1:numel(m),m)
legend({'original','moving average'})
title(sprintf('length = %d, window = %d',numel(a),win))
subplot(212), plot(1:numel(a),a, 1:numel(a),x)
legend({'original','reconstructed'})
title(sprintf('error = %f',norm(x(:)-a(:))))
You can see the reconstruction error is very small, even using the data sizes in your example (150 samples with a 3-samples moving average).

Normalization 3D Image according to Slices in MATLAB

I have a matrix which is 256X192X80. I want to normalize all slices (80 represents the slices) without using for loop.
The way I'm doing with for is below: (im_dom_raw is our matrix)
normalized_raw = zeros(size(im_dom_raw));
for a=1:80
slice_raw = im_dom_raw(:,:,a);
slice_raw = slice_raw-min(slice_raw(:));
slice_raw = slice_raw/(max(slice_raw(:)));
normalized_raw(:,:,a) = slice_raw;
end
The code below implements your normalization approach without using loops. Its based on bsxfun.
% Shift all values to the positive side
slices_raw = bsxfun(#minus,im_dom_raw,min(min(im_dom_raw)));
% Normalize all values with respect to the slice maximum (With input from #Daniel)
normalized_raw2 = bsxfun(#mrdivide,slices_raw,max(max(slices_raw)));
% A slightly faster approach would be
%normalized_raw2 = bsxfun(#times,slices_raw,max(max(slices_raw)).^-1);
% ... but it will differ with your approach due to numerical approximation
% Comparison to your previous loop based implementation
sum(abs(normalized_raw(:)-normalized_raw2(:)))
The last line of code outputs
ans =
0
Which (thanks to #Daniel) means that both approaches yield exact same results.

matlab - optimize getting the angle between each vector with all others in a large array

I am trying to get the angle between every vector in a large array (1896378x4 -EDIT: this means I need 1.7981e+12 angles... TOO LARGE, but if there's room to optimize the code, let me know anyways). It's too slow - I haven't seen it finish yet. Here's the steps towards optimizing I've taken:
First, logically what I (think I) want (just use Bt=rand(N,4) for testing):
[ro,col]=size(Bt);
angbtwn = zeros(ro-1); %too long to compute!! total non-zero = ro*(ro-1)/2
count=1;
for ii=1:ro-1
for jj=ii+1:ro
angbtwn(count) = atan2(norm(cross(Bt(ii,1:3),Bt(jj,1:3))), dot(Bt(ii,1:3),Bt(jj,1:3))).*180/pi;
count=count+1;
end
end
So, I though I'd try and vectorize it, and get rid of the non-built-in functions:
[ro,col]=size(Bt);
% angbtwn = zeros(ro-1); %TOO LONG!
for ii=1:ro-1
allAxes=Bt(ii:ro,1:3);
repeachAxis = allAxes(ones(ro-ii+1,1),1:3);
c = [repeachAxis(:,2).*allAxes(:,3)-repeachAxis(:,3).*allAxes(:,2)
repeachAxis(:,3).*allAxes(:,1)-repeachAxis(:,1).*allAxes(:,3)
repeachAxis(:,1).*allAxes(:,2)-repeachAxis(:,2).*allAxes(:,1)];
crossedAxis = reshape(c,size(repeachAxis));
normedAxis = sqrt(sum(crossedAxis.^2,2));
dottedAxis = sum(repeachAxis.*allAxes,2);
angbtwn(1:ro-ii+1,ii) = atan2(normedAxis,dottedAxis)*180/pi;
end
angbtwn(1,:)=[]; %angle btwn vec and itself
%only upper left triangle are values...
Still too long, even to pre-allocate... So I try to do sparse, but not implemented right:
[ro,col]=size(Bt);
%spalloc:
angbtwn = sparse([],[],[],ro,ro,ro*(ro-1)/2);%zeros(ro-1); %cell(ro,1)
for ii=1:ro-1
...same
angbtwn(1:ro-ii+1,ii) = atan2(normedAxis,dottedAxis)*180/pi; %WARNED: indexing = >overhead
% WHAT? Can't index sparse?? what's the point of spalloc then?
end
So if my logic can be improved, or if sparse is really the way to go, and I just can't implement it right, let me know where to improve. THANKS for your help.
Are you trying to get the angle between every pair of vectors in Bt? If Bt has 2 million vectors that's a trillion pairs each (apparently) requiring an inner product to get the angle between. I don't know that any kind of optimization is going to help have this operation finish in a reasonable amount of time in MATLAB on a single machine.
In any case, you can turn this problem into a matrix multiplication between matrices of unit vectors:
N=1000;
Bt=rand(N,4); % for testing. A matrix of N (row) vectors of length 4.
[ro,col]=size(Bt);
magnitude = zeros(N,1); % the magnitude of each row vector.
units = zeros(size(Bt)); % the unit vectors
% Compute the unit vectors for the row vectors
for ii=1:ro
magnitude(ii) = norm(Bt(ii,:));
units(ii,:) = Bt(ii,:) / magnitude(ii);
end
angbtwn = acos(units * units') * 360 / (2*pi);
But you'll run out of memory during the matrix multiplication for largish N.
You might want to use pdist with 'cosine' distance to compute the 1-cos(angbtwn).
Another perk for this approach that it does not compute n^2 values but exaxtly .5*(n-1)*n unique values :)