Poor exponential curve fitting in MATLAB - matlab

I am getting quite poor results of exponential curve fitting in Matlab. In excel, exponential trendline yields excellent results (imho). What I'm doing wrong in Matlab?
example dataset:
1,0
1,0
0,8
0,8
0,8
0,8
1,1
1,1
0,9
0,9
0,8
0,8
0,8
0,8
0,7
0,7
0,6
0,6
0,7
0,7
1,1
1,1
1,0
1,0
0,9
0,9
0,8
0,8
0,9
0,9
2,1
2,1
1,9
1,9
2,1
2,1
6,5
6,5
6,0
6,0
5,7
5,7
6,4
6,4
11,1
11,1
10,9
10,9
10,2
10,2
8,5
8,5
12,6
12,6
11,8
11,8
9,9
9,9
11,6
11,6
10,6
10,6
9,7
9,7
9,6
9,6
8,2
8,2
10,1
10,1
9,0
9,0
9,0
9,0
8,9
8,9
8,9
8,9
8,2
8,2
11,8
11,8
15,8
15,8
13,1
13,1
14,8
14,8
13,4
13,4
13,6
13,6
15,4
15,4
16,9
16,9
16,7
16,7
25,9
25,9
23,4
23,4
24,5
24,5
26,6
26,6
24,2
24,2
22,7
22,7
21,2
21,2
21,0
21,0
17,3
17,3
42,1
42,1
40,8
40,8
41,3
41,3
39,7
39,7
42,4
42,4
42,6
42,6
89,0
89,0
196,2
196,2
228,1
228,1
385,4
385,4
746,7
746,7
701,8
701,8
633,7
633,7
1051,2
1051,2
1083,1
1083,1
1034,6
1034,6
1096,0
1096,0
1010,5
1010,5
1001,5
1001,5
835,6
835,6
886,1
886,1
1038,2
1038,2
867,4
867,4
821,8
821,8
753,8
753,8
704,5
704,5
616,4
616,4
555,5
555,5
854,1
854,1
yields
y = 0,4734*e^0,0442x, hence a = 0,4734 and b = 0,0442
but in Matlab, with code:
curveFitValues = fit(xdata,ydata,'exp1');
a = curveFitValues.a;
b = curveFitValues.b;
yields
y = 8,6631*e^0,0280x, hence a = 8,661 and b = 0,0280
which is not satisfying result, as seen in image below:
image
What I'm doing wrong?
P.S: I need to do exponential curve fitting into millions of datasets and trying to find fastest algorithm, any ideas which is fastest way?

I have managed to fit your data by using the polyfit function. I don't have the Curve Fitting Toolbox, but simply using polyfit serves me well usually. I stored your data as the variable x in my code.
t = 1:numel(x);
p = polyfit(t, log(x), 1);
figure; hold on
plot(x)
plot(t, exp(p(2)) * exp(p(1)*t))
set(gca, 'yscale', 'log')
hold off
This code takes ln(x) and fits it to t using a least squares method. Then you just convert back when you produce the plot.
p has values p(1) = 0.0442 and exp(p(2)) = 0.4375.

Related

Translating chemical equations from article, results differ (Matlab)

I've been trying to translate a set of chemical equations to MATLAB code, to be able to solve for different chemical species. I have the approximate solution (as it's from a graph) but after entering all the data and checking multiple times I still haven't been able to find what is wrong. I'm wondering what is going wrong and if anyone could please help me out. The source for the graph/equation is the article at this link: The chemistry of co-injected BOE. The graph I want to reproduce later on is figure 2 in the paper, see the image below:
Now the results I get for 10cc, 40cc and 90cc are respectively:
HF 43%, H2F2 48%, F- 3%, HF2- 6% in comparison ~28%, 63%, 2%, 7% (10cc).
HF 35%, H2F2 33%, F- 14%, HF2- 18% in comparison ~24%, 44%, 6%, 26% (40cc).
HF 21%, H2F2 12%, F- 37%, HF2- 30% in comparison ~18%, 23%, 20%, 45% (90cc).
The script is the following:
clc;
clear all;
%Units to be used
%Volume is in CC also cm^3, 1 litre is 1000 CC, 1 cc = 1 ml
%density is in g/cm^3
%weigth percentages are in fractions of 0 to 1
%Molecular weight is in g/mol
% pts=10; %number of points for linear spacing
%weight percentages of NH4OH and HF
xhf=0.49;
xnh3=0.28;
%H2O
Vh2o=1800;
dh2o=1.00; %0.997 at 25C when rounded 1
mh2o=18.02;
%HF values
Vhf=100;
dhf49=1.15;
dhf=dh2o+(dhf49-dh2o)*xhf/0.49; %# 25C
Mhf=20.01;
nhf=mols(Vhf,dhf,xhf,Mhf);
%NH4OH (NH3) values
% Vnh3=linspace(0.1*Vhf,1.9*Vhf,pts);
Vnh3=10;
dnh3=0.9; %for ~20-31% #~20-25C
Mnh3=17.03; %The wt% of NH4OH actually refers to the wt% of NH3 dissolved in H2O
nnh3=mols(Vnh3,dnh3,xnh3,Mnh3);
if max(nnh3)>=nhf
error(['There are more mols NH4OH,',num2str(max(nnh3)),', than mols HF,',num2str(nhf),'.'])
end
%% Calculations for species
Vt=(Vhf+Vh2o+Vnh3)/1000; %litre
A=nhf/Vt; %mol/l
B=nnh3/Vt; %mol/l
syms HF F H2F2 HF2 NH3 NH4 H OH
eq2= H*F/HF==6.85*10^(-4);
eq3= NH3*H/NH4==6.31*10^(-10);
eq4= H*OH==10^(-14);
eq5= HF2/(HF*F)==3.963;
eq6= H2F2/(HF^2)==2.7;
eq7= H+NH4==OH+F+HF2;
eq8= HF+F+2*H2F2+2*HF2==A;
eq9= NH3+NH4==B;
eqns=[eq2,eq5,eq6,eq8,eq4,eq3,eq9,eq7];
varias=[HF, F, H2F2, HF2, NH3, NH4, H, OH];
assume(HF> 0 & F>= 0 & H2F2>= 0 & HF2>= 0& NH3>= 0 & NH4>= 0 & H>= 0 & OH>= 0)
[HF, F, H2F2, HF2, NH3, NH4, H, OH]=vpasolve(eqns,varias);% [0 max([A,B])])
totalHF=double(HF)+double(F)+double(H2F2)+double(HF2);
HFf=double(HF)/totalHF %fraction of species for HF
H2F2f=double(H2F2)/totalHF %fraction of species for H2F2
Ff=double(F)/totalHF %fraction of species for F-
HF2f=double(HF2)/totalHF %fraction of species for HF2-
an extra function needed is called mols.m
%%%% amount of mol, Vol=volume, d=density, pwt=%weight, M=molecularweight
function mol=mols(Vol, d, pwt, M)
mol=(Vol*d*pwt)/M;
end
The equations being used from the article are in the image below:
(HF)2 is H2F2 in my script
So appears the issue wasn't so much with Matlab, had some help in that area as well.
Final solution and updated Matlab code can be found here:
https://chemistry.stackexchange.com/questions/98306/why-do-my-equilibrium-calculations-on-this-hf-nh4oh-buffer-system-not-match-thos

Does h2o.kmeans() make predictions based on euclidean distance?

I created a clustering model using h2o.kmeans(). The modeling dataset was standardized by scale() in R first.
The model has five clusters and the coordinates of the centroids are:
CENTROID X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
1 -0.646544 -0.6322714 -0.5101907 -0.2980412 -1.6182105 -1.7939725 -1.8194372 -1.82349 -1.8174061 -1.8069266 -2.2213561 -2.2618561 -2.2170297 -2.2004509 -2.196722 -2.2267695 -2.2536694 -2.2653944 -2.1599764 -2.2074994 -1.9114193 -2.78E-16
2 -0.2505012 -0.2582746 -0.2542313 -0.3205136 0.2912933 0.3239872 0.3236214 0.3231876 0.3234663 0.309818 0.362641 0.3800735 0.3615138 0.3542787 0.350817 0.3583391 0.375764 0.3715018 0.3533203 0.3533025 0.2651153 3.72E-15
3 0.4237044 0.4421857 0.408422 0.6620773 0.2371281 0.2592748 0.2597783 0.2782299 0.258803 0.3129833 0.4157714 0.3704712 0.3948566 0.4137049 0.4289137 0.4229101 0.3904031 0.4323851 0.3984215 0.442518 0.5278553 1.00E+00
4 2.2426614 2.2450805 2.0475964 1.5666675 0.2249847 0.2887632 0.3391117 0.3224008 0.3375972 0.3617759 0.5063836 0.4805747 0.5226613 0.5097081 0.5196333 0.5136624 0.4780912 0.4686772 0.4743151 0.5357567 0.5734882 8.24E-01
5 4.4718381 4.5243432 4.8917335 5.223828 0.2374653 0.3096633 0.3215417 0.3326531 0.3189998 0.414707 0.5065842 0.5113028 0.558864 0.5482378 0.543278 0.5436269 0.5204451 0.5341745 0.5096259 0.6486469 0.6595461 9.89E-01
When using the model to make predictions for new data, mostly the result makes sense, which returns the cluster whose centroid has the shortest euclidean distance to the data point; however, sometimes (about 5%) the prediction is off. For example, for a data point as below:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22
-0.2001578 -0.2485784 -0.3008685 -0.005366991 0.2624246 0.3142725 0.3074037 0.3221539 0.3033765 0.3403944 0.3557642 0.3810387 0.4848038 0.2788213 0.544491 0.2838926 0.2899755 0.3963652 0.2594092 0.3083141 0.463528 1
The prediction is cluster 3; however, the euclidean distance between the data point and centroids are:
cluster 1: 10
cluster 2: 1.11
cluster 3: 1.39
cluster 4: 4.53
cluster 5: 9.97.
Based on the calculation above, the data point should be assigned to cluster 2, not 3.
Is it a bug or h2o.kmeans() uses other methods instead of euclidean distance for prediction?
Thank you.
Yes, as stated in the K-Means documentation, it uses Euclidean distance.
If you can provide a reproducible example showing that this is a bug, please file a bug report. Thanks!

Make a vertical spacing plot

I need to make a B-scan data from a chunk of A-scan data. The A-scan data that I received are arranged in such a way that the row resembles the amplitude of the each point of the A-scan data and the column represents each A-scan data gathered.
This is how my data looks like:
4855 4641 4891 4791 4812 4812
4827 4766 4862 4745 4767 4785
6676 5075 6903 6879 6697 6084
7340 6829 7678 7753 7263 6726
6176 6237 6708 6737 6316 5943
12014 10467 10915 10914 10124 10642
8251 7538 7641 7619 7269 7658
6522 6105 6132 6136 5921 6227
5519 5287 5330 5376 5255 5237
4904 4784 4835 4855 4794 4758
4553 4527 4472 4592 4469 4455
4298 4323 4291 4293 4221 4238
4167 3957 4089 3991 3938 3907
3789 3721 3777 3777 3643 3596
3736 4615 3639 2814 3638 2782
4413 5286 4248 3998 4370 4199
5994 6896 6134 5548 6102 6161
8506 9020 7841 8060 8663 8941
12347 12302 10639 11151 12533 12478
18859 18175 15035 15938 18358 18160
27106 26261 22613 24069 27015 27114
32767 32601 32767 32767 32767 32767
32767 32767 32767 32767 32767 32767
32767 32767 32767 32767 32767 32767
32767 32767 32767 32767 32767 32767
32767 32767 32767 32767 32767 32767
32767 32767 32767 32767 32767 32767
26416 26459 32767 32767 26308 26945
6523 6900 13327 16665 6616 6477
-14233 -14011 -8554 -5649 -13956 -13858
-28128 -26784 -26157 -24055 -27875 -28374
-28775 -27905 -30348 -26285 -28918 -29066
-20635 -19776 -21144 -21548 -22107 -22759
-16915 -15742 -15908 -17398 -19600 -20143
This is just the sample of the data. It is in .txt format.
A-scan data
B-scan data
The problem I am facing is to plot this data into a b-scan data. Matlab would be great (Though other methods would be great too). Please share your way of plotting this B-scan data.
Scilab
On Scilab you can use the read function. It reads formatted texted files, and you need to know at least the number of columns. To add vertical spacing, you should add a constant value i*d for each whole column, where i is the column number.
I put the example you gave in a text file so I could read it, than I plotted it.
//read(): first argument is file path
// (or file name, if you change current directory)
// second argument is number of lines (-1 if unknown)
// third argument is the number of columns
B = read("file.txt",-1,6);
d = 1000; //vertical spacing
Bs = B; //copy of the original data
for i = 1 : size(Bs,'c')
//loop adding constant to each column
Bs(:,i) = Bs(:,i) + (i-1) * d;
end
//simply plot the matrix
plot2d(Bs);
The result in Scilab is:
MATLAB
In MATLAB, you can use the importdata function, which also reads formatted text files, but the minimum necessary is the file name. You should also add the vertical spacing manually.
%call importdata() after changing current directory
B = importdata("file.txt");
d = 1000; %vertical spacing
Bs = B; %copy of the original data
for i = 1 : size(Bs,2)
%loop adding constant to each column
Bs(:,i) = Bs(:,i) + (i-1) * d;
end
%plot the modified matrix
plot(Bs);
The result in MATLAB is:
%ploting columnspace
figure(1)
plot3([0 columnspaceA(1,1)],[0 columnspaceA(2,1)],[0
columnspaceA(3,1)],'y- ^', 'LineWidth',3)
hold on
plot3([0 columnspaceA(1,2)],[0 columnspaceA(2,2)],[0
columnspaceA(3,2)],'y-^', 'LineWidth',3)
%ploting leftynullspace
plot3([0 leftnullspace(1,1)],[0 leftnullspace(2,1)],[0
leftnullspace(3,1)],'g','linew',3)
h=fmesh(#(s,t)columnspaceA(1,1)*s+columnspaceA(1,2)*t,#(s,t)columnspaceA(2,1)*s+columnspaceA(2,2)*t,#(s,t)columnspaceA(3,1)*s+columnspaceA(3,2)*t,[-1 1]);
figure(2)
%ploting nullspace
hold on
plot3([0 nullspace(1,1)],[0 nullspace(2,1)],[0 nullspace(3,1)],'g-^','LineWidth',3)
% %ploting rowspace
plot3([0 rowspaceA(1,1)],[0 rowspaceA(2,1)],[0 rowspaceA(3,1)],'r-^','LineWidth',3)
hold on
plot3([0 rowspaceA(1,2)],[0 rowspaceA(2,2)],[0 rowspaceA(3,2)],'r-^','LineWidth',3)
h1 = fmesh(#(s,t)rowspaceA(1,1)*s+rowspaceA(1,2)*t,#(s,t)rowspaceA(2,1)*s+rowspaceA(2,2)*t,#(s,t)rowspaceA(3,1)*s+rowspaceA(3,2)*t,[-1 1]);

How to find subset selection for linear regression model?

I am working with mtcars dataset and using linear regression
data(mtcars)
fit<- lm(mpg ~.,mtcars);summary(fit)
When I fit the model with lm it shows the result like this
Call:
lm(formula = mpg ~ ., data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5087 -1.3584 -0.0948 0.7745 4.6251
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.87913 20.06582 1.190 0.2525
cyl6 -2.64870 3.04089 -0.871 0.3975
cyl8 -0.33616 7.15954 -0.047 0.9632
disp 0.03555 0.03190 1.114 0.2827
hp -0.07051 0.03943 -1.788 0.0939 .
drat 1.18283 2.48348 0.476 0.6407
wt -4.52978 2.53875 -1.784 0.0946 .
qsec 0.36784 0.93540 0.393 0.6997
vs1 1.93085 2.87126 0.672 0.5115
amManual 1.21212 3.21355 0.377 0.7113
gear4 1.11435 3.79952 0.293 0.7733
gear5 2.52840 3.73636 0.677 0.5089
carb2 -0.97935 2.31797 -0.423 0.6787
carb3 2.99964 4.29355 0.699 0.4955
carb4 1.09142 4.44962 0.245 0.8096
carb6 4.47757 6.38406 0.701 0.4938
carb8 7.25041 8.36057 0.867 0.3995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.833 on 15 degrees of freedom
Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124
I found that none of variables are marked as significant at 0.05 significant level.
To find out significant variables I want to to do subset selection to find out best pair of vairables as predictors with response variable mpg.
The function regsubsets in the package leaps does best subset regression (see ?leaps). Adapting your code:
library(leaps)
regfit <- regsubsets(mpg ~., data = mtcars)
summary(regfit)
# or for a more visual display
plot(regfit,scale="Cp")

Finding more than one maximum in an array

I would like to find more than one maximum value from an array using Matlab.
Here is my code that returns only one max and its position:
[peak, pos] = max(abs(coeffs));
Problem is that I want to detect more than one max in the array. In fact, I would need to detect the first two peaks and their positions in the following array:
>> abs(coeffs())
ans =
0.5442
0.5465
0.5545
0.5674
0.5862
0.6115
0.6438
0.6836
0.7333
0.7941
0.8689
0.9608
1.0751
1.2188
1.4027
1.6441
1.9701
2.4299
3.1178
4.2428
6.3792
11.8611
53.7537
24.9119
10.8982
7.3470
5.7768
4.9340
4.4489
4.1772
4.0564
4.0622
4.1949
4.4801
4.9825
5.8496
7.4614
11.1087
25.6071
53.2831
12.0029
6.4743
4.3096
3.1648
2.4631
1.9918
1.6558
1.4054
1.2129
1.0608
0.9379
0.8371
0.7532
0.6827
0.6224
0.5702
0.5255
0.4861
0.4517
0.4212
0.3941
0.3698
0.3481
0.3282
0.3105
0.2946
0.2796
0.2665
0.2541
0.2429
0.2326
0.2230
0.2141
0.2057
0.1986
0.1914
0.1848
0.1787
0.1729
0.1677
0.1627
0.1579
0.1537
0.1494
0.1456
0.1420
0.1385
0.1353
0.1323
0.1293
0.1267
0.1239
0.1216
0.1192
0.1172
0.1151
0.1132
0.1113
0.1096
0.1080
0.1064
0.1048
0.1038
0.1024
0.1011
0.1000
0.0987
0.0978
0.0967
0.0961
0.0951
0.0943
0.0936
0.0930
0.0924
0.0917
0.0913
0.0908
0.0902
0.0899
0.0894
0.0892
0.0889
0.0888
0.0885
0.0883
0.0882
0.0883
0.0882
0.0883
0.0882
0.0883
0.0885
0.0888
0.0889
0.0892
0.0894
0.0899
0.0902
0.0908
0.0913
0.0917
0.0924
0.0930
0.0936
0.0943
0.0951
0.0961
0.0967
0.0978
0.0987
0.1000
0.1011
0.1024
0.1038
0.1048
0.1064
0.1080
0.1096
0.1113
0.1132
0.1151
0.1172
0.1192
0.1216
0.1239
0.1267
0.1293
0.1323
0.1353
0.1385
0.1420
0.1456
0.1494
0.1537
0.1579
0.1627
0.1677
0.1729
0.1787
0.1848
0.1914
0.1986
0.2057
0.2141
0.2230
0.2326
0.2429
0.2541
0.2665
0.2796
0.2946
0.3105
0.3282
0.3481
0.3698
0.3941
0.4212
0.4517
0.4861
0.5255
0.5702
0.6224
0.6827
0.7532
0.8371
0.9379
1.0608
1.2129
1.4054
1.6558
1.9918
2.4631
3.1648
4.3096
6.4743
12.0029
53.2831
25.6071
11.1087
7.4614
5.8496
4.9825
4.4801
4.1949
4.0622
4.0564
4.1772
4.4489
4.9340
5.7768
7.3470
10.8982
24.9119
53.7537
11.8611
6.3792
4.2428
3.1178
2.4299
1.9701
1.6441
1.4027
1.2188
1.0751
0.9608
0.8689
0.7941
0.7333
0.6836
0.6438
0.6115
0.5862
0.5674
0.5545
0.5465
The reason I need only the two first max values is that the two last ones are reflections of the two first ones as a result of a fast fourier transform.
you can use many peak finding tools to do that. Here's some of them:
Findpeaks
The function [pks,locs] = findpeaks(data) returns local maxima or peaks, pks, in the input data at locations locs (sorted from first to last found). Data requires a row or column vector with real-valued elements with a minimum length of three. findpeaks compares each element of data to its neighboring values. If an element of data is larger than both of its neighbors or equals Inf, the element is a local peak. If there are no local maxima, pks is an empty vector.
For example:
[pks,locs] = findpeaks(abs(coeffs))
plot(abs(coeffs)); hold on
plot(locs(1:2),pks(1:2),'ro');
1D Non-derivative Peak Finder - a FEX tool that finds peaks without taking first or second derivatives, rather it uses local slope features in a given data set.
PeakFinder - another peak finder from the FEX by nate yoder.
and there are plenty more of these in the FEX...