Data fitting for time dependent matrix sets in Matlab - matlab

I have 6 data sets, each data set is a 576by576 matrix. Each set of data represents measurements taken within 30 second intervals. e.g. set1 at t=0, set2 at time =30, ... ,set5 at 150 seconds.
You can look at these sets as frames if you will. I need to take first data point (1,1) from each data set -> (1,1,0), (1,1,3),(1,1,6),(1,1,9),(1,1,12),(1,1,15) and based on those 6 points find a fitting formula, then assign that general solution to the first spot of my solution matriz SM(1,1). I need to do this for every data point in the 6 sets until I have a 576by576 solution matriz.
if everything goes right I should be able to plot SM(0s)=set1, SM(30s)=set2,etc. but not only that. SM(45) should return a prediction of measurements at t=45 and so on and so forth. The purpose is to have one matrix than can predict data fluctuation from time t= 0 to 150 seconds.
Additional information:
1.- Each data point is independent form the rest of the data points in the same set.
2.- it is a none-linear fit
3.- all values are real
Does Matlab have an optimization tool for this kind of problem?
Should I treat the problem as 1D data fit and create a for loop that does the job 576^2 times?
(I don't even know where to begin)
Feel free to ask or edit anything if I wasn't clear enough. I am not sure that I've chosen the most precise title for this kind of problem.Thanks
Update:
Based on Guddu's answer I came up with this:
%% Loadint data Matrix A
A(:,:,1) = abs(set1);
A(:,:,2) = abs(set2);
A(:,:,3) = abs(set3);
A(:,:,4) = abs(set4);
A(:,:,5) = abs(set5);
A(:,:,6) = abs(set6);
%% Creating Solution Matrix B
t=0:30:150;
SM=zeros([576 576 150]);
for i=1:576
for j=1:576
y=squeeze(A(i,j,1:6));
f=fit(t',y,'smoothingspline');
data=feval(f,1:150);
SM(i,j,:)=data;
end
end
%% Plotting Frame at t=45
figure(1);
imshow(SM(:,:,45),[])
I am not sure if this is the most efficient way to do it, but it works. I am open to new ideas or suggestions. Thanks

i would suggest your main data matrix would be of size (6,576,576) -> a. first point (1,1) from each dataset would be a(1,1,1), a(2,1,1), a(3,1,1) .. a(6,1,1). as you have said each point (i,j) in each dataset from other point (k,l), i would suggest treat each position (i,j) for all datasets separately from other positions. so it would be looping 576*576 times. code could be something like this
t=0:30:150;
for i=1:576
for j=1:576
datavec=squeeze(a(1:6,i,j)); % Select (i,j) point from all 6 frames
% do the curve fitting and save in SM(i,j)
end
end
i am just curious what kind of non-linear function you want to fit to 6 points. this might not be the answer you want but it was kind of long to put in a comment

Related

comparing generated data to measured data

we have measured data that we managed to determine the distribution type that it follows (Gamma) and its parameters (A,B)
And we generated n samples (10000) from the same distribution with the same parameters and in the same range (between 18.5 and 59) using for loop
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Then we tried to fit the generated data using:
h1=histfit(W);
After this we tried to plot the Gamma curve to compare the two curves on the same figure uing:
hold on
h2=histfit(W,[],'Gamma');
h2(1).Visible='off';
The problem s the two curves are shifted as in the following figure "Figure 1 is the generated data from the previous code and Figure 2 is without truncating the generated data"
enter image description here
Any one knows why??
Thanks in advance
By default histfit fits a normal probability density function (PDF) on the histogram. I'm not sure what you were actually trying to do, but what you did is:
% fit a normal PDF
h1=histfit(W); % this is equal to h1 = histfit(W,[],'normal');
% fit a gamma PDF
h2=histfit(W,[],'Gamma');
Obviously that will result in different fits because a normal PDF != a gamma PDF. The only thing you see is that for the gamma PDF fits the curve better because you sampled the data from that distribution.
If you want to check whether the data follows a certain distribution you can also use a KS-test. In your case
% check if the data follows the distribution speccified in tot
[h p] = kstest(W,'CDF',tot)
If the data follows a gamma dist. then h = 0 and p > 0.05, else h = 1 and p < 0.05.
Now some general comments on your code:
Please look up preallocation of memory, it will speed up loops greatly. E.g.
W = zeros(10000,1);
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Also,
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
is not depending in the loop index and can therefore be moved in front of the loop to speed things up further. It is also good practice to avoid using i as loop variable.
But you can actually skip the whole loop because random() allows to return multiple samples at once:
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W =random(tot,10000,1);

Interactive curve fitting with MATLAB using custom GUI?

I find examples the best way to demonstrate my question. I generate some data, add some random noise, and fit it to get back my chosen "generator" value...
x = linspace(0.01,1,50);
value = 3.82;
y = exp(-value.*x);
y = awgn(y,30);
options = optimset('MaxFunEvals',1000,'MaxIter',1000,'TolFun',1e-10,'Display','off');
model = #(p,x) exp(-p(1).*x);
startingVals = [5];
lb = [1];
ub = [10];
[fittedValue] = lsqcurvefit(model,startingVals,x,y,lb,ub,options)
fittedGraph = exp(-fittedValue.*x);
plot(x,y,'o');
hold on
plot(x,fittedGraph,'r-');
In this new example, I have generated the same data but this time added much more noise to the first 15 points. Because it is random sometimes it works out okay, but after a few runs I get a good example that illustrates my problem. Same code, except for these lines added under value = 3.82
y = exp(-value.*x);
y(1:15) = awgn(y(1:15),5);
y(15:end) = awgn(y(15:end),30);
As you can see, it has clearly not given a good fit to where the data seems reliable, because it is fitting from points 1-50. What I want to do is say, okay MATLAB, I can see we have some noisy data but it seems decent over a range, only fit your exponential from points 15 to the end. I could go back to my code and update it to do this, but I will be batch fitting graphs like this where each one will have different ranges of 'good' data.
So what I am after is a GUI callback mechanisms that allows me to click on two circles from the data and have them change color or something, which indicates the lsqcurvefit will only fit over that range. Internally all it has to change is inside the lsqcurvefit call e.g.
x(16:end),y(16:end)
But the range should update depending on the starting and ending circles I have clicked.
I hope my question is clear. Thanks.
You could use ginput to select the two points for your min and max in the plot.
[x,y]=ginput(2);
%this returns you the x and y coordinates of two points clicked after each other
%the min point is assumed to be clicked first
min=[x(1) y(1)];
max=[x(2) y(2)];
then you could fit your curve with the coordinates for min and max I guess.
You could also switch between a rightclick for the min and a leftclick for the max etc.
Hope this helps you.

Matlab Kolmogorov-Smirnov Test

I'm using MATLAB to analyze some neuroscience data, and I made an interspike interval distribution and fit an exponential to it. Then, I wanted to check this fit using a Kolmogorov-Smirnov test with MATLAB.
The data for the neuron spikes is just stored in a vector of spikes. The spikes vector is a 111 by 1 vector, where each entry is another vector. Each entry in thie spikes vector represents a trial. The number of spikes in each trial varies. For example, spikes{1} is a [1x116 double], meaning there are 116 spikes. The next has 115 spikes, then 108, etc.
Now, I understand that the kstest in MATLAB takes a couple of parameters. You enter the data in the first one, so I took all the interspike intervals and created a row vector alldiffs which stores all the interspike intervals. I want to set my CDF to that for an exponential function fit:
test_cdf = [transpose(alldiffs), transpose(1-exp(-alldiffs*firingrate))];
Note that the theoretical exponential (with which I fit the data) is r*exp(-rt) where r is the firing rate. I get a firing rate of about 0.2. Now, when I put this all together, I run the kstest:
[h,p] = kstest(alldiffs, 'CDF', test_cdf)
However, the result is a p value on the order of 1.4455e-126. I've tried redoing the test_cdf with another of the methods on Mathworks' website documentation:
test_cdf = [transpose(alldiffs), cdf('exp', transpose(alldiffs), 1/firingrate)];
This gives the exact same result! Is the fit just horrible? I don't know why I get such low p-values. Please help!
I would post an image of the fit, but I don't have enough reputation.
P.S. If there is a better place to post this, let me know and I'll repost.
Here is an example with fake data and yet another way to create the CDF:
>> data = exprnd(.2, 100);
>> test_cdf = makedist('exp', 'mu', .2);
>> [h, p] = kstest(data, 'CDF', test_cdf)
h =
0
p =
0.3418
However, why are you doing a KS Test?
All models are wrong, some are useful.
No neuron is perfectly a Poisson process and with enough data, you'll always have a significantly non-exponential ISI, as measured by a KS test. That doesn't mean you can't make the simplifying assumption of an exponential ISI, depending on what phenomena you're trying model.

Diffusion outer bounds

I'm attempting to run this simple diffusion case (I understand that it isn't ideal generally), and I'm doing fine with getting the inside of the solid, but need some help with the outer edges.
global M
size=100
M=zeros(size,size);
M(25,25)=50;
for diffusive_steps=1:500
oldM=M;
newM=zeros(size,size);
for i=2:size-1;
for j=2:size-1;
%we're considering the ij-th pixel
pixel_conc=oldM(i,j);
newM(i,j+1)=newM(i,j+1)+pixel_conc/4;
newM(i,j-1)=newM(i,j-1)+pixel_conc/4;
newM(i+1,j)=newM(i+1,j)+pixel_conc/4;
newM(i-1,j)=newM(i-1,j)+pixel_conc/4;
end
end
M=newM;
end
It's a pretty simple piece of code, and I know that. I'm not very good at using Octave yet (chemist by trade), so I'd appreciate any help!
If you have concerns about the border of your simulation you could pad your matrix with NaN values, and then remove the border after the simulation has completed. NaN stands for not a number and is often used to denote blank data. There are many MATLAB functions work in a useful way with these values.
e.g. finding the mean of an array which has blanks:
nanmean([0 nan 5 nan 10])
ans =
5
In your case, I would start by adding a border of NaNs to your M matrix. I'm using 'n' instead of 'size', since size is an important function in MATLAB, and using it as a variable can lead to confusing errors.
n=100;
blankM=zeros(n+2,n+2);
blankM([1,end],:) = nan;
blankM(:, [1,end]) = nan;
Now we can define 'M'. N.B that the first column and row will be NaNs so we need to add an offset (25+1):
M = blankM;
M(26,26)=50;
Run the simulation through,
m = size(blankM, 1);
n = size(blankM, 2);
for diffusive_steps=1:500
oldM = M;
newM = blankM;
for i=2:m-1;
for j=2:n-1;
pixel_conc=oldM(i,j);
newM(i,j+1)=newM(i,j+1)+pixel_conc/4;
newM(i,j-1)=newM(i,j-1)+pixel_conc/4;
newM(i+1,j)=newM(i+1,j)+pixel_conc/4;
newM(i-1,j)=newM(i-1,j)+pixel_conc/4;
end
end
M=newM;
end
and then extract the area of interest
finalResult = M(2:end-1, 2:end-1);
One simple change you might make is to add a boundary of ghost cells, or halo, around the domain of interest. Rather than mis-use the name size I've used a variable called sz. Replace:
M=zeros(sz,sz)
with
M=zeros(sz+2,sz+2)
and then compute your diffusion over the interior of this augmented matrix, ie over cells (2:sz+1,2:sz+1). When it comes to considering the results, discard or just ignore the halo.
Even simpler would be to simply take what you already have and ignore the cells in your existing matrix which are on the N,S,E,W edges.
This technique is widely used in problems such as, and similar to, yours and avoids the need to write code which deals with the computations on cells which don't have a full complement of neighbours. Setting the appropriate value for the contents of the halo cells is a problem-dependent matter, 0 isn't always the right value.

MATLAB query about for loop, reading in data and plotting

I am a complete novice at using matlab and am trying to work out if there is a way of optimising my code. Essentially I have data from model outputs and I need to plot them using matlab. In addition I have reference data (with 95% confidence intervals) which I plot on the same graph to get a visual idea on how close the model outputs and reference data is.
In terms of the model outputs I have several thousand files (number sequentially) which I open in a loop and plot. The problem/question I have is whether I can preprocess the data and then plot later - to save time. The issue I seem to be having when I try this is that I have a legend which either does not appear or is inaccurate.
My code (apolgies if it not elegant):
fn= xlsread(['tbobserved' '.xls']);
time= fn(:,1);
totalreference=fn(:,4);
totalreferencelowerci=fn(:,6);
totalreferenceupperci=fn(:,7);
figure
plot(time,totalrefrence,'-', time, totalreferencelowerci,'--', time, totalreferenceupperci,'--');
xlabel('Year');
ylabel('Reference incidence per 100,000 population');
title ('Total');
clickableLegend('Observed reference data', 'Totalreferencelowerci', 'Totalreferenceupperci','Location','BestOutside');
xlim([1910 1970]);
hold on
start_sim=10000;
end_sim=10005;
h = zeros (1,1000);
for i=start_sim:end_sim %is there any way of doing this earlier to save time?
a=int2str(i);
incidenceFile =strcat('result_', 'Sim', '_', a, 'I_byCal_total.xls');
est_tot=importdata(incidenceFile, '\t', 1);
cal_tot=est_tot.data;
magnitude=1;
t1=cal_tot(:,1)+1750;
totalmodel=cal_tot(:,3)+cal_tot(:,5);
h(a)=plot(t1,totalmodel);
xlim([1910 1970]);
ylim([0 500]);
hold all
clickableLegend(h(a),a,'Location','BestOutside')
end
Essentially I was hoping to have a way of reading in the data and then plot later - ie. optimise the code.
I hope you might be able to help.
Thanks.
mp
Regarding your issue concerning
I have a legend which either does not
appear or is inaccurate.
have a look at the following extracts from your code.
...
h = zeros (1,1000);
...
a=int2str(i);
...
h(a)=plot(t1,totalmodel);
...
You are using a character array as index. Instead of h(a) you should use h(i). MATLAB seems to cast the character array a to double as shown in the following example with a = 10;.
>> double(int2str(10))
ans = 49 48
Instead of h(10) the plot handle will be assigned to h([49 48]) which is not your intention.