Matlab: non-linear-regression, 2 criteria

Matlab: non-linear-regression, 2 criteria - matlab

I am trying to fit a non-linear model using 3 independent variables (light, temperature and vapor pressure deficit (VPD)) to predict net ecosystem CO2 exchange (NEE).
I know how to use the nlinfit function, but my problem is that I want to use 2 criteria:-
1. if VPD < 1.3
NEE = (Param(1).*Param(2).*Ind_var(:,1))./(Param(1).*Ind_var(:,1)+Param(2)) + Param(3).*(1.6324.^((Ind_var(:,2)-18)./10));
2. if VPD >= 1.3
NEE = (Param(1).*(Param(2).*exp(-Param(4).*(Ind_var(:,3)-1.3))).*Ind_var(:,1))./(Param(1).*Ind_var(:,1)+(Param(2).*exp(-Param(4).*(Ind_var(:,3)-1.3))))+Param(3).*(1.6324.^((Ind_var(:,2)-18)./10));
Basically, if the independent variable VPD (Vapor pressure deficit) is below 1.3, I want to force my Param(4) = 0.
But I don't know how to do that.
Could you help me?
Thanks,
Alexis

You can replace Param(4) by Param(4)*(VPD<1.3).
Conditional expressions are evaluated to 1 if true and 0 if false in Matlab.

Related

Fitting a custom equation in Matlab

I want to fit this equation to find the value of variables, Particularly 'c'
a*exp(-x/T) +c*(T*(exp(-x/T)-1)+x)
I do have the values of `
a = -45793671; T = 64.3096
due to the lack of initial parameters, the SSE and RMSE errors in cftool MATLAB are too high and it's not able to fit the data at all.
I also tried other methods (linear fitting) but the problem with high error persists.
Is there any way to fit the data nicely so that I can find the most accurate value for c?
for x:
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
`
for y:
-45793671
-87174030
-124726368
-165435857
-211887711
-255565545
-295927582
-332434440
-365137627
-383107046
-408000987
-434975682
-465932505
-492048864
-513857005
-543087921
-573111110
-588176196
-607460012
-628445691

I dont'think that the bad fitting is mainly due to a lack of initial parameters.
First trial :
If we start with the parameters stated in the wording of the question : a = -45793671; T = 64.3096 there is only one parameter c remaining to be fitted. The result is not satisfising :
Second trial :
If we keep constant only the specified value of T and optimize two parameters c and a , the RMSE is improved but the shape of the curve remains not good :
Third trial :
If we forget the specified values of the two parameters T,a and proceed with a non-linear regression wrt the three parameters T, c , a the result is better :
But a negative value for T mignt be not acceptable on physical viewpoint. This suggest that the function y(x)=a * exp(-x/T)+c*(T*(exp(-x/T)-1)+x) might be not a good model. You should check if there is no typo in the function and/or if some terms are not missing in order to better model the physical experiment.
For information only (Probably not useful) :
An even better fit is obtained with a much simpler function : y(x) = A + B * x + C * x^2

System Dynamics simulation - Translating Stella into AnyLogic syntax

I modelled the following logic in stella:
(IF "cause" > 0 THEN MONTECARLO("probabilityofconsequence") ELSE 0
But Im not getting the correct syntax on AnyLogic:
(cause > 0) ? (uniform() < probabilityofconsequence) ? 1 : 0 : 0
Any ideas?
Disclaimer:
What stella does is with the Montecarlo function a series of zeros and ones from a Bernoulli distribution based on the probability provided. The probability is the percentage probability of an event happening per DT divided by DT (it is similar too, but not the same as, the percent probability of an event per unit time). The probability value can be either a variable or a constant, but should evaluate to a number between 0 and 100/DT (numbers outside the range will be set to 0 or 100/DT). The expected value of the stream of numbers generated summed over a unit time is equation to probability/100.
MONTECARLO is equivalent to the following logic:
IF (UNIFORM(0,100,<seed>) < probability*DT THEN 1 ELSE 0

the equivalent in anylogic should be:
cause>0 && uniform(0,100) < probability*DT ? 1 : 0
you need to create a variable called DT that is the equal to either the fixed time step that you have chosen in your model configuration, or the value you consider that should be adequate.
Since anylogic depending on how you are running the model, doesn't consider the fixed time step as fixed, you need to define the DT yourself.
No matter what, you are going to get results not exactly equal to stella probably since the time steps are not necessarily the same... but maybe similar enough should satisfy you

Modeling an hrf time series in MATLAB

I'm attempting to model fMRI data so I can check the efficacy of an experimental design. I have been following a couple of tutorials and have a question.
I first need to model the BOLD response by convolving a stimulus input time series with a canonical haemodynamic response function (HRF). The first tutorial I checked said that one can make an HRF that is of any amplitude as long as the 'shape' of the HRF is correct so they created the following HRF in matlab:
hrf = [ 0 0 1 5 8 9.2 9 7 4 2 0 -1 -1 -0.8 -0.7 -0.5 -0.3 -0.1 0 ]
And then convolved the HRF with the stimulus by just using 'conv' so:
hrf_convolved_with_stim_time_series = conv(input,hrf);
This is very straight forward but I want my model to eventually be as accurate as possible so I checked a more advanced tutorial and they did the following. First they created a vector of 20 timepoints then used the 'gampdf' function to create the HRF.
t = 1:1:20; % MEASUREMENTS
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Is there a benefit to doing it this way over the simpler one? I suppose I have 3 specific questions.
The 'gampdf' help page is super short and only says the '6' and '10' in each function call represents 'A' which is a 'shape' parameter. What does this mean? It gives no other information. Why is it 6 in the first call and 10 in the second?
This question is directly related to the above one. This code is written for a situation where there is a TR = 1 and the stimulus is very short (like 1s). In my situation my TR = 2 and my stimulus is quite long (12s). I tried to adapt the above code to make a working HRF for my situation by doing the following:
t = 1:2:40; % 2s timestep with the 40 to try to equate total time to above
h = gampdf(t,6) + -.5*gampdf(t,10); % HRF MODEL
h = h/max(h); % SCALE HRF TO HAVE MAX AMPLITUDE OF 1
Because I have no idea what the 'gampdf' parameters mean (or what that line does, in all actuality) I'm not sure this gives me what I'm looking for. I essentially get out 20 values where 1-14 have SOME numeric value in them but 15-20 are all 0. I'm assuming there will be a response during the entire 12s stimulus period (first 6 TRs so values 1-6) with the appropriate rectification which could be the rest of the values but I'm not sure.
Final question. The other code does not 'scale' the HRF to have an amplitude of 1. Will that matter, ultimately?

The canonical HRF you choose is dependent upon where in the brain the BOLD signal is coming from. It would be inappropriate to choose just any HRF. Your best source of a model is going to come from a lit review. I've linked a paper discussing the merits of multiple HRF models. The methods section brings up some salient points.

Regarding Time scale issue in Netlogo

I am new user of netlogo. I have a system of reactions (converted to Ordinary Differential Equations), which can be solved using Matlab. I want to develop the same model in netlogo (for comparison with matlab results). I have the confusion regarding time/tick because netlogo uses "ticks" for increment in time, whereas Matlab uses time in seconds. How to convert my matlab sec to number of ticks? Can anyone help me in writing the code. The model is :
A + B ---> C (with rate constant k1 = 1e-6)
2A+ C ---> D (with rate constant k2 = 3e-7)
A + E ---> F (with rate constant k3 = 2e-5)
Initial values are A = B = C = 500, D = E = F = 10
Initial time t=0 sec and final time t=6 sec

I have a general comment first, NetLogo is intended for agent-based modelling. ABM has multiple entities with different characteristics interacting in some way. ABM is not really an appropriate methodology for solving ODEs. If your goal is to simply build your model in something other than Matlab for comparison rather than specifically requiring NetLogo, I can recommend Vensim as more appropriate. Having said that, you can build the model you want in NetLogo, it is just very awkward.
NetLogo handles time discretely rather than continuously. You can have any number of ticks per second (I would suggest 10 and then final time is 60 ticks). You will need to convert your equations into a discrete form, so your rates would be something like k1-discrete = k1 / 10. You may have precision problems with very small numbers.

Remove highly correlated components

I have got a problem to remove highly correlated components. Can I ask how to do this?
For example, I have got 40 instances with 20 features (random created). Feature 2 and 18 is highly correlated with feature 4. And feature 6 is highly correlated with feature 10. Then how to remove the highly correlated (redundant) features such as 2, 18 and 10? Essentially, I need the index of remaining features 1, 3, 4, 5, 6, ..., 9, 11, ..., 17, 19, 20.
Matlab codes:
x = randn(40,20);
x(:,2) = 2.*x(:,4);
x(:,18) = 3.*x(:,4);
x(:,6) = 100.*x(:,10);
x_corr = corr(x);
size(x_corr)
figure, imagesc(x_corr),colorbar
Correlation matrix x_corr looks like
edit:
I worked out a way:
x_corr = x_corr - diag(diag(x_corr));
[x_corrX, x_corrY] = find(x_corr>0.8);
for i = 1:size(x_corrX,1)
xx = find(x_corrY == x_corrX(i));
x_corrX(xx,:) = 0;
x_corrY(xx,:) = 0;
end
x_corrX = unique(x_corrX);
x_corrX = x_corrX(2:end);
im = setxor(x_corrX, (1:20)');
Am I right? Or you have a better idea please post. Thanks.
edit2: Is this method the same as using PCA?

It seems quite clear that this idea of yours, to simply remove highly correlated variables from the analysis is NOT the same as PCA. PCA is a good way to do rank reduction of what seems to be a complicated problem, into one that turns out to have only a few independent things happening. PCA uses an eigenvalue (or svd) decomposition to achieve that goal.
Anyway, you might have a problem. For example, suppose that A is highly correlated to B, and B is highly correlated to C. However, it need not be true that A and C are highly correlated. Since correlation can be viewed as a measure of the angle between those vectors in their corresponding high dimensional vector space, this can be easily made to happen.
As a trivial example, I'll create two variables, A and B, that are correlated at a "moderate" level.
n = 50;
A = rand(n,1);
B = A + randn(n,1)/2;
corr([A,B])
ans =
1 0.55443
0.55443 1
So here 0.55 is the correlation. I'll create C to be virtually the average of A and B. It will be highly correlated by your definition.
C = [A + B]/2 + randn(n,1)/100;
corr([A,B,C])
ans =
1 0.55443 0.80119
0.55443 1 0.94168
0.80119 0.94168 1
Clearly C is the bad guy here. But if one were to simply look at the pair [A,C] and remove A from the analysis, then do the same with the pair [B,C] and then remove B, we would have made the wrong choices. And this was a trivially constructed example.
In fact, it is true that the eigenvalues of the correlation matrix might be of interest.
[V,D] = eig(corr([A,B,C]))
V =
-0.53056 -0.78854 -0.311
-0.57245 0.60391 -0.55462
-0.62515 0.11622 0.7718
D =
2.5422 0 0
0 0.45729 0
0 0 0.00046204
The fact that D has two significant diagonal elements, and a tiny one tells us that really, this is a two variable problem. What PCA will not easily tell us is which vector to simply remove though, and the problem would only be less clear with more variables, with many interactions between all of them.

I think the answer of woodchips is quite good. But when you're using eigenvalues, you can run into some trouble. If the dataset is large enough, there will always be some small eigenvalues, but you won't be sure what they tell you.
Instead, consider grouping your data by a simple clustering method. It's easy to implement in Matlab.
http://www.mathworks.de/de/help/stats/cluster-analysis-1-1.html
edit:
If you disregard the points that woodchips made, you're solution is okay, as an algorithm.