Poor exponential curve fitting in MATLAB - matlab
I am getting quite poor results of exponential curve fitting in Matlab. In excel, exponential trendline yields excellent results (imho). What I'm doing wrong in Matlab?
example dataset:
1,0
1,0
0,8
0,8
0,8
0,8
1,1
1,1
0,9
0,9
0,8
0,8
0,8
0,8
0,7
0,7
0,6
0,6
0,7
0,7
1,1
1,1
1,0
1,0
0,9
0,9
0,8
0,8
0,9
0,9
2,1
2,1
1,9
1,9
2,1
2,1
6,5
6,5
6,0
6,0
5,7
5,7
6,4
6,4
11,1
11,1
10,9
10,9
10,2
10,2
8,5
8,5
12,6
12,6
11,8
11,8
9,9
9,9
11,6
11,6
10,6
10,6
9,7
9,7
9,6
9,6
8,2
8,2
10,1
10,1
9,0
9,0
9,0
9,0
8,9
8,9
8,9
8,9
8,2
8,2
11,8
11,8
15,8
15,8
13,1
13,1
14,8
14,8
13,4
13,4
13,6
13,6
15,4
15,4
16,9
16,9
16,7
16,7
25,9
25,9
23,4
23,4
24,5
24,5
26,6
26,6
24,2
24,2
22,7
22,7
21,2
21,2
21,0
21,0
17,3
17,3
42,1
42,1
40,8
40,8
41,3
41,3
39,7
39,7
42,4
42,4
42,6
42,6
89,0
89,0
196,2
196,2
228,1
228,1
385,4
385,4
746,7
746,7
701,8
701,8
633,7
633,7
1051,2
1051,2
1083,1
1083,1
1034,6
1034,6
1096,0
1096,0
1010,5
1010,5
1001,5
1001,5
835,6
835,6
886,1
886,1
1038,2
1038,2
867,4
867,4
821,8
821,8
753,8
753,8
704,5
704,5
616,4
616,4
555,5
555,5
854,1
854,1
yields
y = 0,4734*e^0,0442x, hence a = 0,4734 and b = 0,0442
but in Matlab, with code:
curveFitValues = fit(xdata,ydata,'exp1');
a = curveFitValues.a;
b = curveFitValues.b;
yields
y = 8,6631*e^0,0280x, hence a = 8,661 and b = 0,0280
which is not satisfying result, as seen in image below:
image
What I'm doing wrong?
P.S: I need to do exponential curve fitting into millions of datasets and trying to find fastest algorithm, any ideas which is fastest way?
I have managed to fit your data by using the polyfit function. I don't have the Curve Fitting Toolbox, but simply using polyfit serves me well usually. I stored your data as the variable x in my code.
t = 1:numel(x);
p = polyfit(t, log(x), 1);
figure; hold on
plot(x)
plot(t, exp(p(2)) * exp(p(1)*t))
set(gca, 'yscale', 'log')
hold off
This code takes ln(x) and fits it to t using a least squares method. Then you just convert back when you produce the plot.
p has values p(1) = 0.0442 and exp(p(2)) = 0.4375.
Related
Translating chemical equations from article, results differ (Matlab)
I've been trying to translate a set of chemical equations to MATLAB code, to be able to solve for different chemical species. I have the approximate solution (as it's from a graph) but after entering all the data and checking multiple times I still haven't been able to find what is wrong. I'm wondering what is going wrong and if anyone could please help me out. The source for the graph/equation is the article at this link: The chemistry of co-injected BOE. The graph I want to reproduce later on is figure 2 in the paper, see the image below: Now the results I get for 10cc, 40cc and 90cc are respectively: HF 43%, H2F2 48%, F- 3%, HF2- 6% in comparison ~28%, 63%, 2%, 7% (10cc). HF 35%, H2F2 33%, F- 14%, HF2- 18% in comparison ~24%, 44%, 6%, 26% (40cc). HF 21%, H2F2 12%, F- 37%, HF2- 30% in comparison ~18%, 23%, 20%, 45% (90cc). The script is the following: clc; clear all; %Units to be used %Volume is in CC also cm^3, 1 litre is 1000 CC, 1 cc = 1 ml %density is in g/cm^3 %weigth percentages are in fractions of 0 to 1 %Molecular weight is in g/mol % pts=10; %number of points for linear spacing %weight percentages of NH4OH and HF xhf=0.49; xnh3=0.28; %H2O Vh2o=1800; dh2o=1.00; %0.997 at 25C when rounded 1 mh2o=18.02; %HF values Vhf=100; dhf49=1.15; dhf=dh2o+(dhf49-dh2o)*xhf/0.49; %# 25C Mhf=20.01; nhf=mols(Vhf,dhf,xhf,Mhf); %NH4OH (NH3) values % Vnh3=linspace(0.1*Vhf,1.9*Vhf,pts); Vnh3=10; dnh3=0.9; %for ~20-31% #~20-25C Mnh3=17.03; %The wt% of NH4OH actually refers to the wt% of NH3 dissolved in H2O nnh3=mols(Vnh3,dnh3,xnh3,Mnh3); if max(nnh3)>=nhf error(['There are more mols NH4OH,',num2str(max(nnh3)),', than mols HF,',num2str(nhf),'.']) end %% Calculations for species Vt=(Vhf+Vh2o+Vnh3)/1000; %litre A=nhf/Vt; %mol/l B=nnh3/Vt; %mol/l syms HF F H2F2 HF2 NH3 NH4 H OH eq2= H*F/HF==6.85*10^(-4); eq3= NH3*H/NH4==6.31*10^(-10); eq4= H*OH==10^(-14); eq5= HF2/(HF*F)==3.963; eq6= H2F2/(HF^2)==2.7; eq7= H+NH4==OH+F+HF2; eq8= HF+F+2*H2F2+2*HF2==A; eq9= NH3+NH4==B; eqns=[eq2,eq5,eq6,eq8,eq4,eq3,eq9,eq7]; varias=[HF, F, H2F2, HF2, NH3, NH4, H, OH]; assume(HF> 0 & F>= 0 & H2F2>= 0 & HF2>= 0& NH3>= 0 & NH4>= 0 & H>= 0 & OH>= 0) [HF, F, H2F2, HF2, NH3, NH4, H, OH]=vpasolve(eqns,varias);% [0 max([A,B])]) totalHF=double(HF)+double(F)+double(H2F2)+double(HF2); HFf=double(HF)/totalHF %fraction of species for HF H2F2f=double(H2F2)/totalHF %fraction of species for H2F2 Ff=double(F)/totalHF %fraction of species for F- HF2f=double(HF2)/totalHF %fraction of species for HF2- an extra function needed is called mols.m %%%% amount of mol, Vol=volume, d=density, pwt=%weight, M=molecularweight function mol=mols(Vol, d, pwt, M) mol=(Vol*d*pwt)/M; end The equations being used from the article are in the image below: (HF)2 is H2F2 in my script
So appears the issue wasn't so much with Matlab, had some help in that area as well. Final solution and updated Matlab code can be found here: https://chemistry.stackexchange.com/questions/98306/why-do-my-equilibrium-calculations-on-this-hf-nh4oh-buffer-system-not-match-thos
Does h2o.kmeans() make predictions based on euclidean distance?
I created a clustering model using h2o.kmeans(). The modeling dataset was standardized by scale() in R first. The model has five clusters and the coordinates of the centroids are: CENTROID X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 1 -0.646544 -0.6322714 -0.5101907 -0.2980412 -1.6182105 -1.7939725 -1.8194372 -1.82349 -1.8174061 -1.8069266 -2.2213561 -2.2618561 -2.2170297 -2.2004509 -2.196722 -2.2267695 -2.2536694 -2.2653944 -2.1599764 -2.2074994 -1.9114193 -2.78E-16 2 -0.2505012 -0.2582746 -0.2542313 -0.3205136 0.2912933 0.3239872 0.3236214 0.3231876 0.3234663 0.309818 0.362641 0.3800735 0.3615138 0.3542787 0.350817 0.3583391 0.375764 0.3715018 0.3533203 0.3533025 0.2651153 3.72E-15 3 0.4237044 0.4421857 0.408422 0.6620773 0.2371281 0.2592748 0.2597783 0.2782299 0.258803 0.3129833 0.4157714 0.3704712 0.3948566 0.4137049 0.4289137 0.4229101 0.3904031 0.4323851 0.3984215 0.442518 0.5278553 1.00E+00 4 2.2426614 2.2450805 2.0475964 1.5666675 0.2249847 0.2887632 0.3391117 0.3224008 0.3375972 0.3617759 0.5063836 0.4805747 0.5226613 0.5097081 0.5196333 0.5136624 0.4780912 0.4686772 0.4743151 0.5357567 0.5734882 8.24E-01 5 4.4718381 4.5243432 4.8917335 5.223828 0.2374653 0.3096633 0.3215417 0.3326531 0.3189998 0.414707 0.5065842 0.5113028 0.558864 0.5482378 0.543278 0.5436269 0.5204451 0.5341745 0.5096259 0.6486469 0.6595461 9.89E-01 When using the model to make predictions for new data, mostly the result makes sense, which returns the cluster whose centroid has the shortest euclidean distance to the data point; however, sometimes (about 5%) the prediction is off. For example, for a data point as below: X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 -0.2001578 -0.2485784 -0.3008685 -0.005366991 0.2624246 0.3142725 0.3074037 0.3221539 0.3033765 0.3403944 0.3557642 0.3810387 0.4848038 0.2788213 0.544491 0.2838926 0.2899755 0.3963652 0.2594092 0.3083141 0.463528 1 The prediction is cluster 3; however, the euclidean distance between the data point and centroids are: cluster 1: 10 cluster 2: 1.11 cluster 3: 1.39 cluster 4: 4.53 cluster 5: 9.97. Based on the calculation above, the data point should be assigned to cluster 2, not 3. Is it a bug or h2o.kmeans() uses other methods instead of euclidean distance for prediction? Thank you.
Yes, as stated in the K-Means documentation, it uses Euclidean distance. If you can provide a reproducible example showing that this is a bug, please file a bug report. Thanks!
Make a vertical spacing plot
I need to make a B-scan data from a chunk of A-scan data. The A-scan data that I received are arranged in such a way that the row resembles the amplitude of the each point of the A-scan data and the column represents each A-scan data gathered. This is how my data looks like: 4855 4641 4891 4791 4812 4812 4827 4766 4862 4745 4767 4785 6676 5075 6903 6879 6697 6084 7340 6829 7678 7753 7263 6726 6176 6237 6708 6737 6316 5943 12014 10467 10915 10914 10124 10642 8251 7538 7641 7619 7269 7658 6522 6105 6132 6136 5921 6227 5519 5287 5330 5376 5255 5237 4904 4784 4835 4855 4794 4758 4553 4527 4472 4592 4469 4455 4298 4323 4291 4293 4221 4238 4167 3957 4089 3991 3938 3907 3789 3721 3777 3777 3643 3596 3736 4615 3639 2814 3638 2782 4413 5286 4248 3998 4370 4199 5994 6896 6134 5548 6102 6161 8506 9020 7841 8060 8663 8941 12347 12302 10639 11151 12533 12478 18859 18175 15035 15938 18358 18160 27106 26261 22613 24069 27015 27114 32767 32601 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 26416 26459 32767 32767 26308 26945 6523 6900 13327 16665 6616 6477 -14233 -14011 -8554 -5649 -13956 -13858 -28128 -26784 -26157 -24055 -27875 -28374 -28775 -27905 -30348 -26285 -28918 -29066 -20635 -19776 -21144 -21548 -22107 -22759 -16915 -15742 -15908 -17398 -19600 -20143 This is just the sample of the data. It is in .txt format. A-scan data B-scan data The problem I am facing is to plot this data into a b-scan data. Matlab would be great (Though other methods would be great too). Please share your way of plotting this B-scan data.
Scilab On Scilab you can use the read function. It reads formatted texted files, and you need to know at least the number of columns. To add vertical spacing, you should add a constant value i*d for each whole column, where i is the column number. I put the example you gave in a text file so I could read it, than I plotted it. //read(): first argument is file path // (or file name, if you change current directory) // second argument is number of lines (-1 if unknown) // third argument is the number of columns B = read("file.txt",-1,6); d = 1000; //vertical spacing Bs = B; //copy of the original data for i = 1 : size(Bs,'c') //loop adding constant to each column Bs(:,i) = Bs(:,i) + (i-1) * d; end //simply plot the matrix plot2d(Bs); The result in Scilab is: MATLAB In MATLAB, you can use the importdata function, which also reads formatted text files, but the minimum necessary is the file name. You should also add the vertical spacing manually. %call importdata() after changing current directory B = importdata("file.txt"); d = 1000; %vertical spacing Bs = B; %copy of the original data for i = 1 : size(Bs,2) %loop adding constant to each column Bs(:,i) = Bs(:,i) + (i-1) * d; end %plot the modified matrix plot(Bs); The result in MATLAB is:
%ploting columnspace figure(1) plot3([0 columnspaceA(1,1)],[0 columnspaceA(2,1)],[0 columnspaceA(3,1)],'y- ^', 'LineWidth',3) hold on plot3([0 columnspaceA(1,2)],[0 columnspaceA(2,2)],[0 columnspaceA(3,2)],'y-^', 'LineWidth',3) %ploting leftynullspace plot3([0 leftnullspace(1,1)],[0 leftnullspace(2,1)],[0 leftnullspace(3,1)],'g','linew',3) h=fmesh(#(s,t)columnspaceA(1,1)*s+columnspaceA(1,2)*t,#(s,t)columnspaceA(2,1)*s+columnspaceA(2,2)*t,#(s,t)columnspaceA(3,1)*s+columnspaceA(3,2)*t,[-1 1]); figure(2) %ploting nullspace hold on plot3([0 nullspace(1,1)],[0 nullspace(2,1)],[0 nullspace(3,1)],'g-^','LineWidth',3) % %ploting rowspace plot3([0 rowspaceA(1,1)],[0 rowspaceA(2,1)],[0 rowspaceA(3,1)],'r-^','LineWidth',3) hold on plot3([0 rowspaceA(1,2)],[0 rowspaceA(2,2)],[0 rowspaceA(3,2)],'r-^','LineWidth',3) h1 = fmesh(#(s,t)rowspaceA(1,1)*s+rowspaceA(1,2)*t,#(s,t)rowspaceA(2,1)*s+rowspaceA(2,2)*t,#(s,t)rowspaceA(3,1)*s+rowspaceA(3,2)*t,[-1 1]);
How to find subset selection for linear regression model?
I am working with mtcars dataset and using linear regression data(mtcars) fit<- lm(mpg ~.,mtcars);summary(fit) When I fit the model with lm it shows the result like this Call: lm(formula = mpg ~ ., data = mtcars) Residuals: Min 1Q Median 3Q Max -3.5087 -1.3584 -0.0948 0.7745 4.6251 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 23.87913 20.06582 1.190 0.2525 cyl6 -2.64870 3.04089 -0.871 0.3975 cyl8 -0.33616 7.15954 -0.047 0.9632 disp 0.03555 0.03190 1.114 0.2827 hp -0.07051 0.03943 -1.788 0.0939 . drat 1.18283 2.48348 0.476 0.6407 wt -4.52978 2.53875 -1.784 0.0946 . qsec 0.36784 0.93540 0.393 0.6997 vs1 1.93085 2.87126 0.672 0.5115 amManual 1.21212 3.21355 0.377 0.7113 gear4 1.11435 3.79952 0.293 0.7733 gear5 2.52840 3.73636 0.677 0.5089 carb2 -0.97935 2.31797 -0.423 0.6787 carb3 2.99964 4.29355 0.699 0.4955 carb4 1.09142 4.44962 0.245 0.8096 carb6 4.47757 6.38406 0.701 0.4938 carb8 7.25041 8.36057 0.867 0.3995 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.833 on 15 degrees of freedom Multiple R-squared: 0.8931, Adjusted R-squared: 0.779 F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124 I found that none of variables are marked as significant at 0.05 significant level. To find out significant variables I want to to do subset selection to find out best pair of vairables as predictors with response variable mpg.
The function regsubsets in the package leaps does best subset regression (see ?leaps). Adapting your code: library(leaps) regfit <- regsubsets(mpg ~., data = mtcars) summary(regfit) # or for a more visual display plot(regfit,scale="Cp")
Finding more than one maximum in an array
I would like to find more than one maximum value from an array using Matlab. Here is my code that returns only one max and its position: [peak, pos] = max(abs(coeffs)); Problem is that I want to detect more than one max in the array. In fact, I would need to detect the first two peaks and their positions in the following array: >> abs(coeffs()) ans = 0.5442 0.5465 0.5545 0.5674 0.5862 0.6115 0.6438 0.6836 0.7333 0.7941 0.8689 0.9608 1.0751 1.2188 1.4027 1.6441 1.9701 2.4299 3.1178 4.2428 6.3792 11.8611 53.7537 24.9119 10.8982 7.3470 5.7768 4.9340 4.4489 4.1772 4.0564 4.0622 4.1949 4.4801 4.9825 5.8496 7.4614 11.1087 25.6071 53.2831 12.0029 6.4743 4.3096 3.1648 2.4631 1.9918 1.6558 1.4054 1.2129 1.0608 0.9379 0.8371 0.7532 0.6827 0.6224 0.5702 0.5255 0.4861 0.4517 0.4212 0.3941 0.3698 0.3481 0.3282 0.3105 0.2946 0.2796 0.2665 0.2541 0.2429 0.2326 0.2230 0.2141 0.2057 0.1986 0.1914 0.1848 0.1787 0.1729 0.1677 0.1627 0.1579 0.1537 0.1494 0.1456 0.1420 0.1385 0.1353 0.1323 0.1293 0.1267 0.1239 0.1216 0.1192 0.1172 0.1151 0.1132 0.1113 0.1096 0.1080 0.1064 0.1048 0.1038 0.1024 0.1011 0.1000 0.0987 0.0978 0.0967 0.0961 0.0951 0.0943 0.0936 0.0930 0.0924 0.0917 0.0913 0.0908 0.0902 0.0899 0.0894 0.0892 0.0889 0.0888 0.0885 0.0883 0.0882 0.0883 0.0882 0.0883 0.0882 0.0883 0.0885 0.0888 0.0889 0.0892 0.0894 0.0899 0.0902 0.0908 0.0913 0.0917 0.0924 0.0930 0.0936 0.0943 0.0951 0.0961 0.0967 0.0978 0.0987 0.1000 0.1011 0.1024 0.1038 0.1048 0.1064 0.1080 0.1096 0.1113 0.1132 0.1151 0.1172 0.1192 0.1216 0.1239 0.1267 0.1293 0.1323 0.1353 0.1385 0.1420 0.1456 0.1494 0.1537 0.1579 0.1627 0.1677 0.1729 0.1787 0.1848 0.1914 0.1986 0.2057 0.2141 0.2230 0.2326 0.2429 0.2541 0.2665 0.2796 0.2946 0.3105 0.3282 0.3481 0.3698 0.3941 0.4212 0.4517 0.4861 0.5255 0.5702 0.6224 0.6827 0.7532 0.8371 0.9379 1.0608 1.2129 1.4054 1.6558 1.9918 2.4631 3.1648 4.3096 6.4743 12.0029 53.2831 25.6071 11.1087 7.4614 5.8496 4.9825 4.4801 4.1949 4.0622 4.0564 4.1772 4.4489 4.9340 5.7768 7.3470 10.8982 24.9119 53.7537 11.8611 6.3792 4.2428 3.1178 2.4299 1.9701 1.6441 1.4027 1.2188 1.0751 0.9608 0.8689 0.7941 0.7333 0.6836 0.6438 0.6115 0.5862 0.5674 0.5545 0.5465 The reason I need only the two first max values is that the two last ones are reflections of the two first ones as a result of a fast fourier transform.
you can use many peak finding tools to do that. Here's some of them: Findpeaks The function [pks,locs] = findpeaks(data) returns local maxima or peaks, pks, in the input data at locations locs (sorted from first to last found). Data requires a row or column vector with real-valued elements with a minimum length of three. findpeaks compares each element of data to its neighboring values. If an element of data is larger than both of its neighbors or equals Inf, the element is a local peak. If there are no local maxima, pks is an empty vector. For example: [pks,locs] = findpeaks(abs(coeffs)) plot(abs(coeffs)); hold on plot(locs(1:2),pks(1:2),'ro'); 1D Non-derivative Peak Finder - a FEX tool that finds peaks without taking first or second derivatives, rather it uses local slope features in a given data set. PeakFinder - another peak finder from the FEX by nate yoder. and there are plenty more of these in the FEX...