I am using armax model to describe the relationship between two signals. I have used matlab armax function with different model orders.
To evaluate the efficiency of the model I pulled the value from Report.Fit.FitPercent expecting that it will tell how well the model fits the experimental data. As it is fitpercent I would expect it to be between 0-100%. My results range from ~ -257 to 99.99.
I couldn't find on the mathworks or other websites how this value is calculated and how to interpret it. It would be great if you could explain how to understand the fitPercent value.
The code I used is very simple and it generates FitPercent for different model structures (orders).
opt = armaxOptions;
opt.InitialCondition = 'auto';
opt.Focus = 'simulation';
j=1; %number of dataset for analysis
i=1;
nk=0;
for na=1:1:6
for nb=1:1:6
for nc=1:1:6
m_armax = armax(data(:,:,:,j), [na nb nc nk], opt);
fit(i) = m_armax.Report.Fit.FitPercent
struct(:,i) = [na;nb;nc];
i=i+1
end
end
end
In the doc it states that the fit percent value is calculated with the compare function:
http://www.mathworks.de/de/help/ident/ref/compare.html?searchHighlight=fit
Related
I want to ask your help in EEG data classification.
I am a graduate student trying to analyze EEG data.
Now I am struggling with classifying ERP speller (P300) with SWLDA using Matlab
Maybe there is something wrong in my code.
I have read several articles, but they did not cover much details.
My data size is described as below.
size(target) = [300 1856]
size(nontarget) = [998 1856]
row indicates the number of trials, column indicates spanned feature
(I stretched data [64 29] (for visual representation I did not select ROI)
I used stepwisefit function in Matlab to classify target vs non-target
Code is attached below.
ingredients = [targets; nontargets];
heat = [class_targets; class_nontargets]; % target: 1, non-target: -1
randomized_set = shuffle([ingredients heat]);
for k=1:10 % 10-fold cross validation
parition_factor = ceil(size(randomized_set,1) / 10);
cv_test_idx = (k-1)*parition_factor + 1:min(k * parition_factor, size(randomized_set,1));
total_idx = 1:size(randomized_set,1);
cv_train_idx = total_idx(~ismember(total_idx, cv_test_idx));
ingredients = randomized_set(cv_train_idx, 1:end-1);
heat = randomized_set(cv_train_idx, end);
[W,SE,PVAL,INMODEL,STATS,NEXTSTEP,HISTORY]= stepwisefit(ingredients, heat, 'penter', .1);
valid_id = find(INMODEL==1);
v_weights = W(valid_id)';
t_ingredients = randomized_set(cv_test_idx, 1:end-1);
t_heat = randomized_set(cv_test_idx, end); % true labels for test set
v_features = t_ingredients(:, valid_id);
v_weights = repmat(v_weights, size(v_features, 1), 1);
predictor = sum(v_weights .* v_features, 2);
m_result = predictor > 0; % class A: +1, B: 0
t_heat(t_heat==-1) = 0;
acc(k) = sum(m_result==t_heat) / length(m_result);
end
p.s. my code is currently very inefficient and might be bad..
In my assumption, stepwisefit calculates significant coefficients every steps, and valid column would be remained.
Even though it's not LDA, but for binary classification, LDA and linear regression are not different.
However, results were almost random chance.. (for other binary data on the internet, it worked..)
I think I made something wrong, and your help can correct me.
I will appreciate any suggestion and tips to implement classifier for ERP speller.
Or any idea for implementing SWLDA in Matlab code?
The name SWLDA is only used in the context of Brain Computer Interfaces, but I bet it has another name in a more general context.
If you track the recipe of SWLDA you will end up in Krusienski 2006 papers ("A comparison..." and "Toward enhanced P300..") and from there the book where stepwise logarithmic regression is explained: "Draper Smith, Applied Regression Analysis, 1981". However, as far as I am aware of, no paper gives actually the complete recipe on how to implement it (and their details and secrets).
My approach was using stepwiseglm:
H=predictors;
TH=variables;
lbs=labels % (1,2)
if (stepwiseflag)
mdl = stepwiseglm(H', lbs'-1,'constant','upper','linear','distr','binomial');
if (mdl.NumEstimatedCoefficients>1)
inmodel = [];
for i=2:mdl.NumEstimatedCoefficients
inmodel = [inmodel str2num(mdl.CoefficientNames{i}(2:end))];
end
H = H(inmodel,:);
TH = TH(inmodel,:);
end
end
lbls = classify(TH',H',lbs','linear');
You can also use a k-fold cross validaton approach using matlab cvpartition.
c = cvpartition(lbs,'k',10);
opts = statset('display','iter');
fun = #(XT,yT,Xt,yt)...
(sum(~strcmp(yt,classify(Xt,XT,yT,'linear'))));
Using treeMine = fitctree(....) I can generate a decision tree but the tree is very big, and therefore very difficult to convey information, when using view(treeMine,'Mode','Graph')
Therefore my question is if it is possible to change variable names x1-x9 to other names to make it human understandable and if I could force the numbers to be represented by engineering notation meaning 10e3.
Does anybody know how this can be done?
Minimal Example
Minimal example can be to use Matlabs own car example:
load carsmall
idxNaN = isnan(MPG + Weight);
X = Weight(~idxNaN);
Y = MPG(~idxNaN);
n = numel(X);
rng(1) % For reproducibility
idxTrn = false(n,1);
idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices
idxVal = idxTrn == false; % Validation set logical indices
Mdl = fitrtree(X(idxTrn),Y(idxTrn));
view(Mdl,'Mode','graph')
How do you then specify the value resolution and variable name
About the names: It's a bit a poor example because you use only one predictor (weight), but you can change the name with the 'PredictorNames' name-value pair, e.g.
Mdl = fitrtree(X(idxTrn),Y(idxTrn),'PredictorNames',{'weight'});
If you were to use more predictors you just have to add more elements to the cell array, e.g.
'PredictorNames',{'weight','age','women'}
I don't know about the numbers tough.
I would like to calibrate a interest rate tree using the optimization tool in matlab. Need some guidance on doing it.
The interest rate tree looks like this:
How it works:
3.73% = 2.5%*exp(2*0.2)
96.40453 = (0.5*100 + 0.5*100)/(1+3.73%)
94.15801 = (0.5*96.40453+ 0.5*97.56098)/(1+2.50%)
The value of 2.5% is arbitrary and the upper node is obtained by multiplying with an exponential of 2*volatility(here it is 20%).
I need to optimize the problem by varying different values for the lower node.
How do I do this optimization in Matlab?
What I have tried so far?
InterestTree{1}(1,1) = 0.03;
InterestTree{3-1}(1,3-1)= 2.5/100;
InterestTree{3}(2,:) = 100;
InterestTree{3-1}(1,3-2)= (2.5*exp(2*0.2))/100;
InterestTree{3-1}(2,3-1)=(0.5*InterestTree{3}(2,3)+0.5*InterestTree{3}(2,3-1))/(1+InterestTree{3-1}(1,3-1));
j = 3-2;
InterestTree{3-1}(2,3-2)=(0.5*InterestTree{3}(2,j+1)+0.5*InterestTree{3}(2,j))/(1+InterestTree{3-1}(1,j));
InterestTree{3-2}(2,3-2)=(0.5*InterestTree{3-1}(2,j+1)+0.5*InterestTree{3-1}(2,j))/(1+InterestTree{3-2}(1,j));
But I am not sure how to go about the optimization. Any suggestions to improve the code, do tell me..Need some guidance on this..
Are you expecting the tree to increase in size? Or are you just optimizing over the value of the "2.5%" parameter?
If it's the latter, there are two ways. The first is to model the tree using a closed form expression by replacing 2.5% with x, which is possible with the tree. There are nonlinear optimization toolboxes available in Matlab (e.g. more here), but it's been too long since I've done this to give you a more detailed answer.
The seconds is the approach I would immediately do. I'm interpreting the example you gave, so the equations I'm using may be incorrect - however, the principle of using the for loop is the same.
vol = 0.2;
maxival = 100;
val1 = zeros(1,maxival); %Preallocate
finalval = zeros(1,maxival);
for ival=1:maxival
val1(ival) = i/1000; %Use any scaling you want. This will go from 0.1% to 10%
val2=val1(ival)*exp(2*vol);
x1 = (0.5*100+0.5*100)/(1+val2); %Based on the equation you gave
x2 = (0.5*100+0.5*100)/(1+val1(ival)); %I'm assuming this is how you calculate the bottom node
finalval(ival) = x1*0.5+x2*0.5/(1+...); %The example you gave isn't clear, so replace this with whatever it should be
end
[maxval, indmaxval] = max(finalval);
The maximum value is in maxval, and the interest that maximized this is in val1(indmaxval).
I'm using the genetic algorithm (function ga in Matlab) to find optimum values of parameters of some algorithm. Calculating quality of this algorithm for one set of parameters is time-consuming, but is precise enough, that it isn't necessary to repeat it. But ga testing one set of parameters many times and that is big trouble for me. Is it possible somehow to change it?
One solution (to what I believe your question is asking - which is, "How do I speed up computations when the same function that takes a long time to run is called with the same parameters many times?") would be Memoization, which would allow you to return the results of repeated computations very quickly; something like this:
function output = myAlgorithm(inputs)
% inputs is a cell array
persistent memos;
bFoundMemo = false;
for ii = 1:length(memos)
if all(cellfun(#eq, memos(ii).inputs, inputs))
% Found a memo!
output = memos(ii).output;
bFoundMemo = true;
break;
end
end
if bFoundMemo
disp('Memo found for given inputs. Returning...');
% Nothing to do!
return;
else
disp('No memo found for given inputs. Computing...');
% No memo; do computation
output = myAlgorithmInner(inputs);
% Store a memo so we don't have to do it again
sMemo = struct();
sMemo.inputs = inputs;
sMemo.output = output;
if isempty(memos)
memos = sMemo;
else
memos = [memos; sMemo];
end
end
end
function output = myAlgorithmInner(inputs)
% The real work goes here
output = inputs{1} + inputs{2}; % a simple example
end
An example call:
>> myAlgorithm({1 2})
No memo found for given inputs. Computing...
ans =
3
>> myAlgorithm({1 2})
Memo found for given inputs. Returning...
ans =
3
Obviously, you should modify the input/output signature and code that checks for an existing memo to match your algorithm.
You may also want to add a maximum length for the list of memos - if it gets too long, the time to find an existing memo might become comparable to the computation time of your algorithm! Another way to solve this issue would be to store the memos in, e.g., a containers.Map, which would speed up searching for existing memos in a long list.
My question is specific to the "learn_params()" function of the BayesNetToolbox in MatLab. In the user manual, "learn_params()" is stated to be suitable for use only if the input data is fully observed. I have tried it with a partially observed dataset where I represented unobserved values as NaN's.
It seems like "learn_params()" can deal with NaNs and the node state combinations that do not occur in the dataset. When I apply dirichlet priors to smoothen the 0 values, I get 'sensible' MLE distributions for all nodes. I have copied the script where I do this.
Can someone clarify whether what I am doing makes sense or if I am missing
something, i.e. the reason why "learn_params()" cannot be used with partially
observed data.
The MatLab Script where I test this is here:
% Incomplete dataset (where NaN's are unobserved)
Age = [1,2,2,NaN,3,3,2,1,NaN,2,1,1,3,NaN,2,2,1,NaN,3,1];
TNMStage = [2,4,2,3,NaN,1,NaN,3,1,4,3,NaN,2,4,3,4,1,NaN,2,4];
Treatment = [2,3,3,NaN,2,NaN,4,4,3,3,NaN,2,NaN,NaN,4,2,NaN,3,NaN,4];
Survival = [1,2,1,2,2,1,1,1,1,2,2,1,2,2,1,2,1,2,2,1];
matrixdata = [Age;TNMStage;Treatment;Survival];
node_sizes =[3,4,4,2];
% Enter the variablesmap
keys = {'Age', 'TNM','Treatment', 'Survival'};
v= 1:1:length(keys);
VariablesMap = containers.Map(keys,v);
% create the dag and the bnet
N = length(node_sizes); % Instead of entering it manually
dag2 = zeros(N,N);
dag2(VariablesMap('Treatment'),VariablesMap('Survival')) = 1;
bnet21 = mk_bnet(dag2, node_sizes);
draw_graph(bnet21.dag);
dirichletweight=1;
% define the CPD priors you want to use
bnet23.CPD{VariablesMap('Age')} = tabular_CPD(bnet23, VariablesMap('Age'), 'prior_type', 'dirichlet','dirichlet_type', 'unif', 'dirichlet_weight', dirichletweight);
bnet23.CPD{VariablesMap('TNM')} = tabular_CPD(bnet23, VariablesMap('TNM'), 'prior_type', 'dirichlet','dirichlet_type', 'unif', 'dirichlet_weight', dirichletweight);
bnet23.CPD{VariablesMap('Treatment')} = tabular_CPD(bnet23, VariablesMap('Treatment'), 'prior_type', 'dirichlet','dirichlet_type', 'unif','dirichlet_weight', dirichletweight);
bnet23.CPD{VariablesMap('Survival')} = tabular_CPD(bnet23, VariablesMap('Survival'), 'prior_type', 'dirichlet','dirichlet_type', 'unif','dirichlet_weight', dirichletweight);
% Find MLEs from incomplete data with Dirichlet prior CPDs
bnet24 = learn_params(bnet23, matrixdata);
% Look at the new CPT values after parameter estimation has been carried out
CPT24 = cell(1,N);
for i=1:N
s=struct(bnet24.CPD{i}); % violate object privacy
CPT24{i}=s.CPT;
end
According to my understanding of the BNT documentation, you need to make a couple of changes:
Missing values should be represented as empty cells instead of NaN values.
The learn_params_em function is the only one that supports missing values.
My previous response was incorrect, as I mis-recalled which of the BNT learning functions had support for missing values.