Confidence Bound of linear regression using polyfit and polyconf - matlab

I am doing a simple linear regression and I am trying to plot the 95% confidence boundary using the matlab functions polyfit and polyconf, but I'm wondering why I am getting more than two lines for the boundary? Thank you for your help.
x1= [ 165.371 227.7475 204.4437 93.874 259.2976 113.3138 74.67 121.1493 60.7095 46.7491 355.6146 876.4135 1.2875 169.7753 341.4739 29.8034 260.1231 664.0960 ];
y= [ 165.371 228.6 203.416 93.874 262.066 104.902 74.67 121.63 59.463 46.749061 186.82666 931.3108984074 11.287521 176.76547 338.75586 29.803396 169.38878 692.66666 ];
surface_area_model = fitlm(y,x1,'linear')
[p,s]=polyfit(y,x1,1);
[yfit,dy]=polyconf(p,y,s,'predopt','curve');
figure(3)
a= 100;
h1= scatter(y,x1,a,'ob','LineWidth',1.2);
line(y,yfit,'color','b','LineWidth',2);
line(y,yfit-dy,'color','r','linestyle',':');
line(y,yfit+dy,'color','r','linestyle',':');
Figure:

Your data is not sorted in ascending order. In fact not only the boundary, but the data and fitted data line are of multiple segments as well.
Applying the following code which sorts the data right after assignments
[y,k] = sort(y);
x1 = x1(k);
will result in this graph

Related

How to interprete the regression plot obtained at the end of neural network regression for multiple outputs?

I have trained my Neural network model using MATLAB NN Toolbox. My network has multiple inputs and multiple outputs, 6 and 7 respectively, to be precise. I would like to clarify few questions based on it:-
The final regression plot showed at the end of the training shows a very good accuracy, R~0.99. However, since I have multiple outputs, I am confused as to which scatter plot does it represent? Shouldn't we have 7 target vs predicted plots for each of the output variable?
According to my knowledge, R^2 is a better method of commenting upon the accuracy of the model, whereas MATLAB reports R in its plot. Do I treat that R as R^2 or should I square the reported R value to obtain R^2.
I have generated the Matlab Script containing weight, bias and activation functions, as a final Result of the training. So shouldn't I be able to simply give my raw data as input and obtain the corresponding predicted output. I gave the exact same training set using the indices Matlab chose for training (to cross check), and plotted the predicted output vs actual output, but the result is not at all good. Definitely, not along the lines of R~0.99. Am I doing anything wrong?
code:
function [y1] = myNeuralNetworkFunction_2(x1)
%MYNEURALNETWORKFUNCTION neural network simulation function.
% X = [torque T_exh lambda t_Spark N EGR];
% Y = [O2R CO2R HC NOX CO lambda_out T_exh2];
% Generated by Neural Network Toolbox function genFunction, 17-Dec-2018 07:13:04.
%
% [y1] = myNeuralNetworkFunction(x1) takes these arguments:
% x = Qx6 matrix, input #1
% and returns:
% y = Qx7 matrix, output #1
% where Q is the number of samples.
%#ok<*RPMT0>
% ===== NEURAL NETWORK CONSTANTS =====
% Input 1
x1_step1_xoffset = [-24;235.248;0.75;-20.678;550;0.799];
x1_step1_gain = [0.00353982300884956;0.00284355877067267;6.26959247648903;0.0275865874012055;0.000366568914956012;0.0533831576137729];
x1_step1_ymin = -1;
% Layer 1
b1 = [1.3808996210168685;-2.0990163849711894;0.9651733083552595;0.27000953282929346;-1.6781835509820286;-1.5110463684800366;-3.6257438832309905;2.1569498669085361;1.9204156230460485;-0.17704342477904209];
IW1_1 = [-0.032892214008082517 -0.55848270745152429 -0.0063993424771670616 -0.56161004933654057 2.7161844536020197 0.46415317073346513;-0.21395624254052176 -3.1570133640176681 0.71972178875396853 -1.9132557838515238 1.3365248285282931 -3.022721627052706;-1.1026780445896862 0.2324603066452392 0.14552308208231421 0.79194435276493658 -0.66254679969168417 0.070353201192052434;-0.017994515838487352 -0.097682677816992206 0.68844109281256027 -0.001684535122025588 0.013605622123872989 0.05810686279306107;0.5853667840629273 -2.9560683084876329 0.56713425120259764 -2.1854386350040116 1.2930115031659106 -2.7133159265497957;0.64316656469750333 -0.63667017646313084 0.50060179040086761 -0.86827897068177973 2.695456517458648 0.16822164719859456;-0.44666821007466739 4.0993786464616679 -0.89370838440321498 3.0445073606237933 -3.3015566360833453 -4.492874075961689;1.8337574137485424 2.6946232855369989 1.1140472073136622 1.6167763205944321 1.8573696127039145 -0.81922672766933646;-0.12561950922781362 3.0711045035224349 -0.6535751823440773 2.0590707752473199 -1.3267693770634292 2.8782780742777794;-0.013438026967107483 -0.025741311825949621 0.45460734966889638 0.045052447491038108 -0.21794568374100454 0.10667240367191703];
% Layer 2
b2 = [-0.96846557414356171;-0.2454718918618051;-0.7331628718025488;-1.0225195290982099;0.50307202195645395;-0.49497234988401961;-0.21817117469133171];
LW2_1 = [-0.97716474643411022 -0.23883775971686808 0.99238069915206006 0.4147649511973347 0.48504023209224734 -0.071372217431684551 0.054177719330469304 -0.25963474838320832 0.27368380212104881 0.063159321947246799;-0.15570858147605909 -0.18816739764334323 -0.3793600124951475 2.3851961990944681 0.38355142531334563 -0.75308427071748985 -0.1280128732536128 -1.361052031781103 0.6021878865831336 -0.24725687748503239;0.076251356114485525 -0.10178293627600112 0.10151304376762409 -0.46453434441403058 0.12114876632815359 0.062856969143306296 -0.0019628163322658364 -0.067809039768745916 0.071731544062023825 0.65700427778446913;0.17887084584125315 0.29122649575978238 0.37255802759192702 1.3684190468992126 0.60936238465090853 0.21955911453674043 0.28477957899364675 -0.051456306721251184 0.6519451272106177 -0.64479205028051967;0.25743349663436799 2.0668075180209979 0.59610776847961111 -3.2609682919282603 1.8824214917530881 0.33542869933904396 0.03604272669356564 -0.013842766338427388 3.8534510207741826 2.2266745660915586;-0.16136175574939746 0.10407287099228898 -0.13902245286490234 0.87616472446622717 -0.027079111747601223 0.024812287505204988 -0.030101536834009103 0.043168268669541855 0.12172932035587079 -0.27074383434206573;0.18714562505165402 0.35267726325386606 -0.029241400610813449 0.53053853235049087 0.58880054832728757 0.047959541165126809 0.16152268183097709 0.23419456403348898 0.83166785128608967 -0.66765237856750781];
% Output 1
y1_step1_ymin = -1;
y1_step1_gain = [0.114200879346771;0.145581598485951;0.000139011547272197;0.000456244862967996;2.05816254143146e-05;5.27704485488127;0.00284355877067267];
y1_step1_xoffset = [-0.045;1.122;2.706;17.108;493.726;0.75;235.248];
% ===== SIMULATION ========
% Dimensions
Q = size(x1,1); % samples
% Input 1
x1 = x1';
xp1 = mapminmax_apply(x1,x1_step1_gain,x1_step1_xoffset,x1_step1_ymin);
% Layer 1
a1 = tansig_apply(repmat(b1,1,Q) + IW1_1*xp1);
% Layer 2
a2 = repmat(b2,1,Q) + LW2_1*a1;
% Output 1
y1 = mapminmax_reverse(a2,y1_step1_gain,y1_step1_xoffset,y1_step1_ymin);
y1 = y1';
end
% ===== MODULE FUNCTIONS ========
% Map Minimum and Maximum Input Processing Function
function y = mapminmax_apply(x,settings_gain,settings_xoffset,settings_ymin)
y = bsxfun(#minus,x,settings_xoffset);
y = bsxfun(#times,y,settings_gain);
y = bsxfun(#plus,y,settings_ymin);
end
% Sigmoid Symmetric Transfer Function
function a = tansig_apply(n)
a = 2 ./ (1 + exp(-2*n)) - 1;
end
% Map Minimum and Maximum Output Reverse-Processing Function
function x = mapminmax_reverse(y,settings_gain,settings_xoffset,settings_ymin)
x = bsxfun(#minus,y,settings_ymin);
x = bsxfun(#rdivide,x,settings_gain);
x = bsxfun(#plus,x,settings_xoffset);
end
The above one is the automatically generated code. The plot which I generated to cross-check the first variable is below:-
% X and Y are input and output - same as above
X_train = X(results.info1.train.indices,:);
y_train = Y(results.info1.train.indices,:);
out_train = myNeuralNetworkFunction_2(X_train);
scatter(y_train(:,1),out_train(:,1))
To answer your question about R: Yes, you should square R to get the R^2 value. In this case, they will be very close since R is very close to 1.
The graphs give the correlation between the estimated and real (target) values. So R is the strenght of the correlation. You can square it to find the R-square.
The graph you draw and matlab gave are not the graph of the same variables. The ranges or scales of the axes are very different.
First of all, is the problem you are trying to solve a regression problem? Or is it a classification problem with 7 classes converted to numeric? I assume this is a classification problem, as you are trying to get the success rate for each class.
As for your first question: According to the literature it is recommended to use the value "All: R". If you want to get the success rate of each of your classes, Precision, Recall, F-measure, FP rate, TP Rate, etc., which are valid in classification problems. values ​​you need to reach. There are many matlab documents for this (help ROC) and you can look at the details. All the values ​​I mentioned and which I think you actually want are obtained from the confusion matrix.
There is a good example of this.
[x,t] = simpleclass_dataset;
net = patternnet(10);
net = train(net,x,t);
y = net(x);
[c,cm,ind,per] = confusion(t,y)
I hope you will see what you want from the "nntraintool" window that appears when you run the code.
Your other questions have already been answered. Alternatively, you can consider using a machine learning algorithm with open source software such as Weka.

Matlab simulation error

I am completely new to Matlab. I am trying to simulate a Wiener and Poisson combined process.
Why do I get Subscripted assignment dimension mismatch?
I am trying to simulate
Z(t)=lambda*W^2(t)-N(t)
Where W is a wiener process and N is a poisson process.
The code I am using is below:
T=500
dt=1
K=T/dt
W(1)=0
lambda=3
t=0:dt:T
for k=1:K
r=randn
W(k+1)=W(k)+sqrt(dt)*r
N=poissrnd(lambda*dt,1,k)
Z(k)=lambda*W.^2-N
end
plot(t,Z)
It is true that some indexing is missing, but I think you would benefit from rewriting your code in a more 'Matlab way'. The following code is using the fact that Matlab basic variables are matrices, and compute the results in a vectorized way. Try to understand this kind of writing, as this is the way to exploit Matlab more efficiently, along with writing shorter and readable code:
T = 500;
dt = 1;
K = T/dt;
lambda = 3;
t = 1:dt:T;
sqdtr = sqrt(dt)*randn(K-1,1); % define sqrt(dt)*r as a vector
N = poissrnd(lambda*dt,K,1); % define N as a vector
W = cumsum([0; sqdtr],1); % cumulative sum instead of the loop
Z = lambda*W.^2-N; % summing the processes element-wiesly
plot(t,Z)
Example for a result:
you forget index
Z(k)=lambda*W.^2-N
it must be
Z(k)=lambda*W(k).^2-N(k)

Matlab SVM linear binary classification failure

I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.
Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes
What am I doing wrong?
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)]; % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores
figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
% Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
% Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY');
set(gca,'Color',[0.5 0.5 0.5]);
hold off
A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
Notice the line before the last.

looping through column data to fit curve in matlab

I have a spreadsheet ANA2009_1ag.xlsx with column 1= wavelength lambda and col2=specta1, col3=spectra2, col4=spectra3 etc
I want to fit a curve to each of the spectra using the equation:
ag(lambda) = ag(lambda_o)*exp(-S(lambda-lambda_o)
So I can get the slope S.
I have the code and I am able to run it if I have a text file with 2 columns:
column 1= wavelength lambda and col2=specta1
But as i said above I have several spectra so i would like to make a loop so that the routine i got for fitting the curve and getting the slope loops from one column to the next. ut as a novice in matlab loops are my biggest problem.
My code is as follows.
%//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[num,txt,raw]=xlsread('ANA2009_1ag.xlsx');
%// Get the size of data to be plotted
[r,c]=size(num);
%//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%// using txt vector to make appropriate legend that matches the data plot
Sepctra=cellstr(txt); %//C = cellstr converts array to a cell array.
%//FIND OUT HOW YOU CAN SAY TO END OF COLUMN
for i=2:7
Sepctra(i)=txt(i);
end
%//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%//takeout range of interest
I=find(num(:,1)<650 & num(:,1)>400);
wl=num(I,1);
a_g=num(I,2);
%//setting options for fmisearch
opts = optimset('fminsearch');
opts = optimset(opts,'MaxIter',4000);
opts = optimset(opts,'MaxFunEvals',2000); %// usually 100*number of params
opts = optimset(opts,'TolFun',1e-9);
%//opts = optimset('LevenbergMarquardt','on');
%//guess for paramters (amplitude at 532 and slope)
x0=[0.1, 0.03];
%//minimization routine
x1 = fminsearch(#least_squares,x0,opts,a_g,wl)
%//plot data and fit
plot(wl, a_g, '.k', wl, x1(1)*exp(-x1(2)*(wl-412)),'b')
DATA IS FROM 300 NM (LAMBDA) TO 685 NM (lambda) BUT I CHOSE TO FIT THE CURVE FROM 650-400NM
Wavelength AN10070A-4m AN10066A-10m
300 1.561434 1.434769
300.5 1.549919 1.42786
301 1.531495 1.414042
301.5 1.506162 1.400224
302 1.483132 1.386406
302.5 1.467011 1.372588
303 1.45089 1.356467
303.5 1.443981 1.342649
304 1.42786 1.333437
304.5 1.414042 1.324225
305 1.407133 1.31271
My expected output is:
x1 which is a vector of the amplitude at 412nm and the slope S
I would like to post the figure that i got for the curve fit but I dont know how to do it.

Can you suggest something faster than imfilter under certain conditions?

I'm using a Matlab program that has a very long loop, inside this loop is the following code:
...
H = fspecial('gaussian', 6*sig(i), sig(i));
img_out = imfilter(img{i},H,'same');
...
Where 'sig' is a list of Gaussian widths, and 'img' is a cell array of images.
I need to make this code more efficient and perhaps those two points will allow for something more clever:
The filter is always Gaussian - just different sigma.
The image inside 'img{i}' is a grayscale sparse matrix.
I found a wonderful solution to the problem:
http://blog.ivank.net/fastest-gaussian-blur.html
There is a quick implementation in Matlab Help files:
intImage = integralImage(I);
avgH = integralKernel([1 1 7 7], 1/49);
J = integralFilter(intImage, avgH);
So 3 passes of that should approximate a Gaussian!
I tried to batch process images of same size and sigma, stacking them:
%problem generator
%real number is 10^6
num_of_images=10^4;
%i assume squared images
image_size=randi([60,100],num_of_images,1);
sig=randi([2,4],num_of_images,1);
img=cell(num_of_images,1);
ratio_nnz=.02;
for idx=1:num_of_images
ti=rand(image_size(idx))/ratio_nnz;
ti(ti>1)=0;
img{idx}=ti;
end
%existing approac
tic;
for idx=1:num_of_images
H = fspecial('gaussian', 6*sig(idx), sig(idx));
img_out = imfilter(img{idx},H,'same');
end
toc;
%idea: Match images of same sigma and
tic
%calculate all filters offline
[sig_unique,~,sig_index]=unique(sig);
H=cell(numel(sig_unique),1);
for idx=1:numel(sig_unique)
H{idx}= fspecial('gaussian', 6*sig_unique(idx), sig_unique(idx));
end
%find instances of same size and sigma
[x,y]=cellfun(#size,img);
[a,b,c]=unique([sig_index,x,y],'rows');
img_out=cell(size(img));
for didx=1:numel(b)
%img{c==didx} contains images of same sigma and size, process them at
%once
iH=H{a(didx,1)};
timg=cat(3,img{c==didx});
timg_out=imfilter(timg,iH,'same');
img_out(c==didx)=num2cell(timg_out,[1,2]);
end
toc
The result surprised me, actually calling imfilter with less but larger matrices was slower with the data I generated. Nevertheless try it with your data and or the faster gaussian filter you are planning to implement. It might be faster then.