finding absolute error of approximated function - matlab - matlab

During an experiment i registered several points. Thereafter I approximated them with 9th order polynomial. I need to find the absolute error of the measurements and the approximated function on y axis. Any idea?
*edit:
y = [0.006332 0.04056 0.11813 0.1776723 0.23840 0.29827 0.358396...
0.418149 0.4786 0.478154 0.538114 0.53862 0.598954 0.659804...
0.720267 0.781026 0.8412 0.901548 0.962022 1.022567 1.083291...
1.143653 1.20449 1.14398 1.02273 0.962285 0.90203 0.841474...
0.780881 0.720346 0.659896 0.579599 0.539505 0.478662 0.418963...
0.35859 0.299039 0.238886 0.179108 0.118999 0.058841 0.006249...
0.06189];
x2 = linspace (1,43,43);
x2 = x2';
y = y';
f = fit(x2,y,'poly9');
figure()
plot(f,x2,y)

This will do it:
y_fit = f(x2);
error = y - y_fit;
hold on
plot(x2, error)
% Several popular error norms:
norm(error, 1)
norm(error, 2)
norm(error, Inf)
Like y, the variable error is a vector. If you want to reduce this vector to a single number, you can use one the norms. See this for more on error norms.

Related

How can I add the slope of a specific point in a polynomial line in plotly

Let's say I have a polynomial regression in plotly that looks like that
Something along a code like this:
fig = px.scatter(
x=final_df.index,
y=final_df.nr_deaths,
trendline="lowess", #ols
trendline_color_override="red",
trendline_options=dict(frac=0.1),
opacity=.5,
title='Deaths per year'
)
fig.show()
How would I calculate the slope (= tangent) line on a specific point of the polynomial regression line?
Currently, this cannot be done within plotly alone. But you can achieve this by using other libraries for calculation and applying the results in the chart.
The difficulty in this question lies in
calculating the slope of the polynomial at a certain point
calculating the x and y values for plotting them as lines
For calculating the slopes at a certain point you can use numpy functionality. Afterwards you can just calculate the x and y values with python and plot them with plotly.
poly_degree = 3
y = df.col.values
x = np.arange(0, len(y))
x = x.reshape(-1, 1)
fitted_params = np.polyfit(np.arange(0, len(y)), y, poly_degree )
polynomials = np.poly1d(fitted_params)
derivatives = np.polyder(polynomials)
y_value_at_point = polynomials(x).flatten()
slope_at_point = np.polyval(derivatives, np.arange(0, len(y)))
For calculating the corresponding slope values (the necessary x values and y values) at a point, and plotting it in plotly you can do something like this:
def draw_slope_line_at_point(fig, ind, x, y, slope_at_point, verbose=False):
"""Plot a line from an index at a specific point for x values, y values and their slopes"""
y_low = (x[0] - x[ind]) * slope_at_point[ind] + y[ind]
y_high = (x[-1] - x[ind]) * slope_at_point[ind] + y[ind]
x_vals = [x[0], x[-1]]
y_vals = [y_low, y_high]
if verbose:
print((x[0] - x[ind]))
print(x[ind], x_vals, y_vals, y[ind],slope_at_point[ind])
fig.add_trace(
go.Scatter(
x=x_vals,
y=y_vals,
name="Tangent at point",
line = dict(color='orange', width=2, dash='dash'),
)
)
return x_vals, y_vals
Calling it and adding annotation would look like this:
for pt in [31]:
draw_slope_line_at_point(
fig,
x= np.arange(0, len(y)),
y = y_value_at_point,
slope_at_point=slope_at_point,
ind = pt)
fig.add_annotation(x=pt, y=y_value_at_point[pt],
text=f'''Slope: {slope_at_point[pt]:.2f}\t {df.date.strftime('%Y-%m-%d')[pt]}''',
showarrow=True,
arrowhead=1)
and then looking like that in the result:

defining the X values for a code

I have this task to create a script that acts similarly to normcdf on matlab.
x=linspace(-5,5,1000); %values for x
p= 1/sqrt(2*pi) * exp((-x.^2)/2); % THE PDF for the standard normal
t=cumtrapz(x,p); % the CDF for the standard normal distribution
plot(x,t); %shows the graph of the CDF
The problem is when the t values are assigned to 1:1000 instead of -5:5 in increments. I want to know how to assign the correct x values, that is -5:5,1000 to the t values output? such as when I do t(n) I get the same result as normcdf(n).
Just to clarify: the problem is I cannot simply say t(-5) and get result =1 as I would in normcdf(1) because the cumtrapz calculated values are assigned to x=1:1000 instead of -5 to 5.
Updated answer
Ok, having read your comment; here is how to do what you want:
x = linspace(-5,5,1000);
p = 1/sqrt(2*pi) * exp((-x.^2)/2);
cdf = cumtrapz(x,p);
q = 3; % Query point
disp(normcdf(q)) % For reference
[~,I] = min(abs(x-q)); % Find closest index
disp(cdf(I)) % Show the value
Sadly, there is no matlab syntax which will do this nicely in one line, but if you abstract finding the closest index into a different function, you can do this:
cdf(findClosest(x,q))
function I = findClosest(x,q)
if q>max(x) || q<min(x)
warning('q outside the range of x');
end
[~,I] = min(abs(x-q));
end
Also; if you are certain that the exact value of the query point q exists in x, you can just do
cdf(x==q);
But beware of floating point errors though. You may think that a certain range outght to contain a certain value, but little did you know it was different by a tiny roundoff erorr. You can see that in action for example here:
x1 = linspace(0,1,1000); % Range
x2 = asin(sin(x1)); % Ought to be the same thing
plot((x1-x2)/eps); grid on; % But they differ by rougly 1 unit of machine precision
Old answer
As far as I can tell, running your code does reproduce the result of normcdf(x) well... If you want to do exactly what normcdf does them use erfc.
close all; clear; clc;
x = linspace(-5,5,1000);
cdf = normcdf(x); % Result of normcdf for comparison
%% 1 Trapezoidal integration of normal pd
p = 1/sqrt(2*pi) * exp((-x.^2)/2);
cdf1 = cumtrapz(x,p);
%% 2 But error function IS the integral of the normal pd
cdf2 = (1+erf(x/sqrt(2)))/2;
%% 3 Or, even better, use the error function complement (works better for large negative x)
cdf3 = erfc(-x/sqrt(2))/2;
fprintf('1: Mean error = %.2d\n',mean(abs(cdf1-cdf)));
fprintf('2: Mean error = %.2d\n',mean(abs(cdf2-cdf)));
fprintf('3: Mean error = %.2d\n',mean(abs(cdf3-cdf)));
plot(x,cdf1,x,cdf2,x,cdf3,x,cdf,'k--');
This gives me
1: Mean error = 7.83e-07
2: Mean error = 1.41e-17
3: Mean error = 00 <- Because that is literally what normcdf is doing
If your goal is not not to use predefined matlab funcitons, but instead to calculate the result numerically (i.e. calculate the error function) then it's an interesting challange which you can read about for example here or in this stats stackexchange post. Just as an example, the following piece of code calculates the error function by implementing eq. 2 form the first link:
nerf = #(x,n) (-1)^n*2/sqrt(pi)*x.^(2*n+1)./factorial(n)/(2*n+1);
figure(1); hold on;
temp = zeros(size(x)); p =[];
for n = 0:20
temp = temp + nerf(x/sqrt(2),n);
if~mod(n,3)
p(end+1) = plot(x,(1+temp)/2);
end
end
ylim([-1,2]);
title('\Sigma_{n=0}^{inf} ( 2/sqrt(pi) ) \times ( (-1)^n x^{2*n+1} ) \div ( n! (2*n+1) )');
p(end+1) = plot(x,cdf,'k--');
legend(p,'n = 0','\Sigma_{n} 0->3','\Sigma_{n} 0->6','\Sigma_{n} 0->9',...
'\Sigma_{n} 0->12','\Sigma_{n} 0->15','\Sigma_{n} 0->18','normcdf(x)',...
'location','southeast');
grid on; box on;
xlabel('x'); ylabel('norm. cdf approximations');
Marcin's answer suggests a way to find the nearest sample point. It is easier, IMO, to interpolate. Given x and t as defined in the question,
interp1(x,t,n)
returns the estimated value of the CDF at x==n, for whatever value of n. But note that, for values outside the computed range, it will extrapolate and produce unreliable values.
You can define an anonymous function that works like normcdf:
my_normcdf = #(n)interp1(x,t,n);
my_normcdf(-5)
Try replacing x with 0.01 when you call cumtrapz. You can either use a vector or a scalar spacing for cumtrapz (https://www.mathworks.com/help/matlab/ref/cumtrapz.html), and this might solve your problem. Also, have you checked the original x-values? Is the problem with linspace (i.e. you are not getting the correct x vector), or with cumtrapz?

Error: Assignment has more non-singleton rhs dimensions than non-singleton subscripts

I have a big problem with a part of my code, which I have spent lots of hours on, trying to understand what I have to do to solve my problem. Well, I have the following .m files and -as the title of my question says- during running the main.m file the error: "Assignment has more non-singleton rhs dimensions than non-singleton subscripts" occurs.
So, the files are:
computeCost.m
function J = computeCost(X, y, theta)
m = length(y); % number of training examples
J(m,1) = 0;
for k=1:m
J(:,1) = ((X*theta)-y).^2;
end
%syms k;
%S = symsum(((X*theta)-y).^2,k,1,m);
%J = (1/(2*m))*S;
J(m,1) = (1/(2*m))*J(m,1);
gradientDescent.m
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
J_history(iter,1) = computeCost(X, y, theta); // HERE IS THE ERROR!!!(1st case)
end
end
Prior to write the above written code I had the following:
computeCost.m
function J = computeCost(X, y, theta)
m = length(y); % number of training examples
J = 0;
for k=1:m
J(:) = ((X(k)*theta(k))-y(k)).^2;
end
%syms k;
%S = symsum(((X*theta)-y).^2,k,1,m);
%J = (1/(2*m))*S;
J(m) = (1/(2*m))*J(m);
gradientDescent.m
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
J_history(iter) = computeCost(X, y, theta); // HERE IS THE ERROR(2nd case)
end
end
When I had the last code I was facing another error: "in an assignment a( ) = b the number of elements in a and b must be the same.". So, I did the necessary-in my opinion- changes and I created the code I presented firstly. I do not know which one of the two above mentioned aprroaches is better.
My task: To complete the code in the file computeCost.m, which
is a function that computes J(theta).
Hint: The variables X and y are not scalar values, but matrices whose rows represent the examples from the training set. As well the gradientDescent.m file is executed after computeCost.m file's execution. Let the parameters, functions and in general all mentioned unknown data be given data from a different .txt file.
I am desperated and I would appreciate if someone could fix my code(and my problem). Well, what do I have to do??
Thank you in advance!
J_history(:,iter) = computeCost(X, y, theta); should do the trick. You are assigning a column vector, output of computeCost, to a variable J_history. This means the storage variable has to be accessed column-wise, provided computeCost always outputs the same number of columns. If that is not the case you'll have to look into cell arrays.
This error occurs because you are trying to store something that has more dimensions than the storage container. See it like you are trying to store your 24 beer bottles in a crate for 6 bottles, that won't fit. (technically the 6 need to be on one line to fit the example, but it'll do).
So, what you do when you see this error is:
Select db stop if error in the editor tab
Check the size of the thing you are storing
Check the size of the container
Match the two if possible, if not, think of another solution.

MATLAB Plotting Inner Matrix elements must agree

So I'm just trying to plot 4 different subplots with variations of the increments. So first would be dx=5, then dx=1, dx=0.1 and dx=0.01 from 0<=x<=20.
I tried to this:
%for dx = 5
x = 0:5:20;
fx = 2*pi*x *sin(x^2)
plot(x,fx)
however I get the error inner matrix elements must agree. Then I tried to do this,
x = 0:5:20
fx = (2*pi).*x.*sin(x.^2)
plot(x,fx)
I get a figure, but I'm not entirely sure if this would be the same as what I am trying to do initially. Is this correct?
The initial error arose since two vectors with the same shape cannot be squared (x^2) nor multiplied (x * sin(x^2)). The addition of the . before the * and ^ operators is correct here since that will perform the operation on the individual elements of the vectors. So yes, this is correct.
Also, bit of a more advanced feature, you can use an anonymous function to aid in the expressions:
fx = #(x) 2*pi.*x.*sin(x.^2); % function of x
x = 0:5:20;
plot(x,fx(x));
hold('on');
x = 0:1:20;
plot(x,fx(x));
hold('off');

Nonlinear parameters search

need to find a set of optimal parameters P of the system y = P(1)*exp(-P(2)*x) - P(3)*x where x and y are experimental values. I defined my function
f = #(P) P(1)*exp(-P(2)*x) - P(3)*x
and
guess = [1, 1, 1]
and tried
P = fminsearch(f,guess)
according to Help. I get an error
Subscripted assignment dimension mismatch.
Error in fminsearch (line 191)
fv(:,1) = funfcn(x,varargin{:});
I don't quite understand where my y values would fall in, as well as where the function takes P from. I unfortunately have no access to nlinfit or optimization toolboxes.
You should try the matlab function lsqnonlin(#testfun,[1;1;1])
But first make a function and save in an m-file that includes all the data points, lets say your y is A and x is x like here below:
function F = testfun(P)
A = [1;2;3;7;30;100];
x = [1;2;3;4;5;6];
F = A-P(1)*exp(-P(2)*x) - P(3)*x;
This minimized the 2-norm end gives you the best parameters.