How can I get all solutions to this equation in MATLAB? - matlab

I would like to solve the following equation: tan(x) = 1/x
What I did:
syms x
eq = tan(x) == 1/x;
sol = solve(eq,x)
But this gives me only one numerical approximation of the solution. After that I read about the following:
[sol, params, conds] = solve(eq, x, 'ReturnConditions', true)
But this tells me that it can't find an explicit solution.
How can I find numerical solutions to this equation within some given range?

I've never liked using solvers "blindly", that is, without some sort of decent initial value selection scheme. In my experience, the values you will find when doing things blindly, will be without context as well. Meaning, you'll often miss solutions, think something is a solution while in reality the solver exploded, etc.
For this particular case, it is important to realize that fzero uses numerical derivatives to find increasingly better approximations. But, derivatives for f(x) = x · tan(x) - 1 get increasingly difficult to accurately compute for increasing x:
As you can see, the larger x becomes, the better f(x) approximates a vertical line; fzero will simply explode! Therefore it is imperative to get an estimate as closely to the solution as possible before even entering fzero.
So, here's a way to get good initial values.
Consider the function
f(x) = x · tan(x) - 1
Knowing that tan(x) has Taylor expansion:
tan(x) ≈ x + (1/3)·x³ + (2/15)·x⁵ + (7/315)·x⁷ + ...
we can use that to approximate the function f(x). Truncating after the second term, we can write:
f(x) ≈ x · (x + (1/3)·x³) - 1
Now, key to realize is that tan(x) repeats with period π. Therefore, it is most useful to consider the family of functions:
fₙ(x) ≈ x · ( (x - n·π) + (1/3)·(x - n·π)³) - 1
Evaluating this for a couple of multiples and collecting terms gives the following generalization:
f₀(x) = x⁴/3 - 0π·x³ + ( 0π² + 1)x² - (0π + (0π³)/3)·x - 1
f₁(x) = x⁴/3 - 1π·x³ + ( 1π² + 1)x² - (1π + (1π³)/3)·x - 1
f₂(x) = x⁴/3 - 2π·x³ + ( 4π² + 1)x² - (2π + (8π³)/3)·x - 1
f₃(x) = x⁴/3 - 3π·x³ + ( 9π² + 1)x² - (3π + (27π³)/3)·x - 1
f₄(x) = x⁴/3 - 4π·x³ + (16π² + 1)x² - (4π + (64π³)/3)·x - 1
⋮
fₙ(x) = x⁴/3 - nπ·x³ + (n²π² + 1)x² - (nπ + (n³π³)/3)·x - 1
Implementing all this in a simple MATLAB test:
% Replace this with the whole number of pi's you want to
% use as offset
n = 5;
% The coefficients of the approximating polynomial for this offset
C = #(npi) [1/3
-npi
npi^2 + 1
-npi - npi^3/3
-1];
% Find the real, positive polynomial roots
R = roots(C(n*pi));
R = R(imag(R)==0);
R = R(R > 0);
% And use these as initial values for fzero()
x_npi = fzero(#(x) x.*tan(x) - 1, R)
In a loop, this can produce the following table:
% Estimate (polynomial) Solution (fzero)
0.889543617524132 0.860333589019380 0·π
3.425836967935954 3.425618459481728 1·π
6.437309348195653 6.437298179171947 2·π
9.529336042900365 9.529334405361963 3·π
12.645287627956868 12.645287223856643
15.771285009691695 15.771284874815882
18.902410011613000 18.902409956860023
22.036496753426441 22.036496727938566 ⋮
25.172446339768143 25.172446326646664
28.309642861751708 28.309642854452012
31.447714641852869 31.447714637546234
34.586424217960058 34.586424215288922 11·π
As you can see, the approximant is basically equal to the solution. Corresponding plot:

To find a numerical solution to a function within some range, you can use fzero like this:
fun = #(x)x*tan(x)-1; % Multiplied by x so fzero has no issue evaluating it at x=0.
range = [0 pi/2];
sol = fzero(fun,range);
The above would return just one solution (0.8603). If you want additional solutions, you will have to call fzero more times. This can be done, for example, in a loop:
fun = #(x)tan(x)-1/x;
RANGE_START = 0;
RANGE_END = 3*pi;
RANGE_STEP = pi/2;
intervals = repelem(RANGE_START:RANGE_STEP:RANGE_END,2);
intervals = reshape(intervals(2:end-1),2,[]).';
sol = NaN(size(intervals,1),1);
for ind1 = 1:numel(sol)
sol(ind1) = fzero(fun, mean(intervals(ind1,:)));
end
sol = sol(~isnan(sol)); % In case you specified more intervals than solutions.
Which gives:
[0.86033358901938;
1.57079632679490; % Wrong
3.42561845948173;
4.71238898038469; % Wrong
6.43729817917195;
7.85398163397449] % Wrong
Note that:
The function is symmetric, and so are its roots. This means you can solve on just the positive interval (for example) and get the negative roots "for free".
Every other entry in sol is wrong because this is where we have asymptotic discontinuities (tan transitions from +Inf to -Inf), which is mistakenly recognized by MATLAB as a solution. So you can just ignore them (i.e. sol = sol(1:2:end);.

Multiply the equation by x and cos(x) to avoid any denominators that can have the value 0,
f(x)=x*sin(x)-cos(x)==0
Consider the normalized function
h(x)=(x*sin(x)-cos(x)) / (abs(x)+1)
For large x this will be increasingly close to sin(x) (or -sin(x) for large negative x). Indeed, plotting this this is already visually true, up to an amplitude factor, for x>pi.
For the first root in [0,pi/2] use the Taylor approximation at x=0 of second degree x^2-(1-0.5x^2)==0 to get x[0]=sqrt(2.0/3) as root approximation, for the higher ones take the sine roots x[n]=n*pi, n=1,2,3,... as initial approximations in the Newton iteration xnext = x - f(x)/f'(x) to get
n initial 1. Newton limit of Newton
0 0.816496580927726 0.863034004302817 0.860333589019380
1 3.141592653589793 3.336084918413964 3.425618459480901
2 6.283185307179586 6.403911810682199 6.437298179171945
3 9.424777960769379 9.512307014150883 9.529334405361963
4 12.566370614359172 12.635021895208379 12.645287223856643
5 15.707963267948966 15.764435036320542 15.771284874815882
6 18.849555921538759 18.897518573777646 18.902409956860023
7 21.991148575128552 22.032830614521892 22.036496727938566
8 25.132741228718345 25.169597069842926 25.172446326646664
9 28.274333882308138 28.307365162331923 28.309642854452012
10 31.415926535897931 31.445852385744583 31.447714637546234
11 34.557519189487721 34.584873343220551 34.586424215288922

Related

Finding the roots of a polynomial defined as a function handle in matlab

I need to be able to find the roots of a couple of polynomials that are almost characteristic functions, but not quite (rather than an eigenvalue, it's more like an eigen-block matrix). The function is defined as a function handle because I don't have analytic expressions for the coefficients on the equation (I could presumably find them but the algebra in my model is already quite messy).
The equations are
charA = #(k1,k2) det([k1*eye(N),zeros(N);zeros(N),k2*eye(N)]*Amatrix - eye(2*N));
charB = #(k1,k2) det([k1*eye(N),zeros(N);zeros(N),k2*eye(N)]*Bmatrix - eye(2*N));
and I need to find all of the roots of each one, since the solution of the system is the pair (k1,k2) that satisfies charA(k1,k2)=0 and charB(k1,k2)=0 (at the moment I'm just trusting that the derivations of these matrices are such that such a solution exists, but for the purpose of this question - finding all of the roots of a polynomial defined in this sort of way - this is unimportant).
Is there any way I can take this function handle and turn it into a matrix of coefficients, or is there a solver for polynomials defined as function handles in Matlab? If it changes anything, the matrices aren't massive but they are 84x84, that is, N=42.
As said Daniel
use symbolic math toolbox to build the polynomials
Then use solve() to find the roots
The sample code is as follows with N = 2
rng(0)
%Unknowns
syms k1 k2
N = 2;
Amatrix = rand(2*N);
Bmatrix = rand(2*N);
charA = det([k1*eye(N),zeros(N);zeros(N),k2*eye(N)]*Amatrix - eye(2*N));
charB = det([k1*eye(N),zeros(N);zeros(N),k2*eye(N)]*Bmatrix - eye(2*N));
% solver
solution = solve(charA == 0, charB == 0);
% Convert syms to numeric, specifying precision as 3
k1_solution = vpa(solution.k1, 3)
k2_solution = vpa(solution.k2, 3)
% Only real solution
k1_solution_real = vpa(k1_solution(k1_solution == real(k1_solution)), 3)
k2_solution_real = vpa(k2_solution(k2_solution == real(k2_solution)), 3)
Solution
All
k1_solution =
0.475
-2.52
0.0161
- 1.58 + 1.79i
- 1.6 - 1.79i
- 2.0 - 0.863i
- 2.0 + 0.865i
11.2
k2_solution =
0.345
-0.869
0.946
- 1.37 + 0.0219i
- 1.37 - 0.0219i
1.69 + 3.24i
1.69 - 3.24i
-5.65
Real only
k1_solution_real =
0.475
-2.52
0.0161
11.2
k2_solution_real =
0.345
-0.869
0.946
-5.65
First root solution
k1 = 0.475 and k2 = 0.345

Optimization of matrix on matlab using fmincon

I have a 30x30 matrix as a base matrix (OD_b1), I also have two base vectors (bg and Ag). My aim is to optimize a matrix (X) who's dimensions are 30X30 such that:
1) the squared difference between vector (bg) and vector of sum of all the columns is minimized.
2)the squared difference between vector (Ag) and vector of sum of all rows is minimized.
3)the squared difference between the elements of matrix (X) and matrix (OD_b1) is minimized.
The mathematical form of the equation is as follows:
I have tried this:
fun=#(X)transpose(bg-sum(X,2))*(bg-sum(X,2))+ (Ag-sum(X,1))*transpose(Ag-sum(X,1))+sumsqr(X_b-X);
[val,X]=fmincon(fun,OD_b1,AA,BB,Aeq,beq,LB,UB)
I don't get errors but it seems like it's stuck.
Is it because I have too many variables or is there another reason?
Thanks in advance
This is a simple, unconstrained least squares problem and hence has a simple solution that can be expressed as the solution to a linear system.
I will show you (1) the precise and efficient way to solve this and (2) how to solve with fmincon.
The precise, efficient solution:
Problem setup
Just so we're on the same page, I initialize the variables as follows:
n = 30;
Ag = randn(n, 1); % observe the dimensions
X_b = randn(n, n);
bg = randn(n, 1);
The code:
A1 = kron(ones(1,n), eye(n));
A2 = kron(eye(n), ones(1,n));
A = (A1'*A1 + A2'*A2 + eye(n^2));
b = A1'*bg + A2'*Ag + X_b(:);
x = A \ b; % solves A*x = b
Xstar = reshape(x, n, n);
Why it works:
I first reformulated your problem so the objective is a vector x, not a matrix X. Observe that z = bg - sum(X,2) is equivalent to:
x = X(:) % vectorize X
A1 = kron(ones(1,n), eye(n)); % creates a special matrix that sums up
% stuff appropriately
z = A1*x;
Similarly, A2 is setup so that A2*x is equivalent to Ag'-sum(X,1). Your problem is then equivalent to:
minimize (over x) (bg - A1*x)'*(bg - A1*x) + (Ag - A2*x)'*(Ag - A2*x) + (y - x)'*(y-x) where y = Xb(:). That is, y is a vectorized version of Xb.
This problem is convex and the first order condition is a necessary and sufficient condition for the optimum. Take the derivative with respect to x and that equation will define your solution! Sample example math for almost equivalent (but slightly simpler problem is below):
minimize(over x) (b - A*x)'*(b - A*x) + (y - x)' * (y - x)
rewriting the objective:
b'b- b'Ax - x'A'b + x'A'Ax +y'y - 2y'x+x'x
Is equivalent to:
minimize(over x) (-2 b'A - 2y'*I) x + x' ( A'A + I) * x
the first order condition is:
(A'A+I+(A'A+I)')x -2A'b-2I'y = 0
(A'A+I) x = A'b+I'y
Your problem is essentially the same. It has the first order condition:
(A1'*A1 + A2'*A2 + I)*x = A1'*bg + A2'*Ag + y
How to solve with fmincon
You can do the following:
f = #(X) transpose(bg-sum(X,2))*(bg-sum(X,2)) + (Ag'-sum(X,1))*transpose(Ag'-sum(X,1))+sum(sum((X_b-X).^2));
o = optimoptions('fmincon');%MaxFunEvals',30000);
o.MaxFunEvals = 30000;
Xstar2 = fmincon(f,zeros(n,n),[],[],[],[],[],[],[],o);
You can then check the answers are about the same with:
normdif = norm(Xstar - Xstar2)
And you can see that gap is small, but that the linear algebra based solution is somewhat more precise:
gap = f(Xstar2) - f(Xstar)
If the fmincon approach hangs, try it with a smaller n just to gain confidence that my linear algebra based solution is more precise, way way faster etc... n = 30 is solving a 30^2 = 900 variable optimization problem: not easy. With the linear algebra approach, you can go up to n = 100 (i.e. 10000 variable problem) or even larger.
I would probably solve this as a QP using quadprog using the following reformulation (keeping the objective as simple as possible to make the problem "less nonlinear"):
min sum(i,v(i)^2)+sum(i,w(i)^2)+sum((i,j),z(i,j)^2)
v = bg - sum(c,x)
w = ag - sum(r,x)
Z = xbase-x
The QP solver is more precise (no gradients using finite differences). This approach also allows you to add additional bounds and linear equality and inequality constraints.
The other suggestion to form the first order conditions explicitly is also a good one: it also has no issue with imprecise gradients (the first order conditions are linear). I usually prefer a quadratic model because of its flexibility.

A double symbolic integral in Matlab

I am trying to calculate some integrals, for example:
a = 1/sqrt(2);
b = -5;
c = 62;
d = 1;
f = exp(-x^2-y^2)*(erfc((sym(a) + 1/(x^2+y^2)*(sym(b)*x+sym(d)*y))*sqrt((x^2+y^2)*sym(10.^(c/10))))...
+ erfc((sym(a) - 1/(x^2+y^2)*(sym(b)*x+sym(d)*y))*sqrt((x^2+y^2)*sym(10.^(c/10)))));
h = int(int(f,x,-Inf,Inf),y,-Inf,Inf);
It will occur error like this:
Warning: Explicit integral could not be found.
Then, I try to use vpato calculate that integral,and get the result like this
vpa(int(int(f,x,-Inf,Inf),y,-Inf,Inf),5)
numeric::int(numeric::int(exp(- x^2 - y^2)*(erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 + (5*x - y)/(x^2 + y^2))) + erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 - (5*x - y)/(x^2 + y^2)))), x == -Inf..Inf), y == -Inf..Inf)
I already tried to change the interval [-Inf,Inf] to [-100,100], and get the same above result:
numeric::int(numeric::int(exp(- x^2 - y^2)*(erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 + (5*x - y)/(x^2 + y^2))) + erfc(((6807064429273519*x^2)/4294967296 + (6807064429273519*y^2)/4294967296)^(1/2)*(2^(1/2)/2 - (5*x - y)/(x^2 + y^2)))), x == -100..100), y == -100..100)
My question is why vpa in this case could not return to a real value?
There are something wrong in above Matlab code? (I, myself, could not find the bug so far)
Thank you in advance for your help.
It is unlikely that there is an analytic solution to this integral so using int may not be a good choice. In some cases vpa can be used to for a numeric solution. When this fails (by returning a call to itself) it may be for several reason: the integral may not exist, the integral may converge too slowly, singularities may cause issues, the integrand may be highly oscillatory or non-smooth, etc. Mathematica also struggles with this integral.
You can try calculating the integral numerically using integral2:
a = 1/sqrt(2);
b = -5;
c = 62;
d = 1;
f = #(x,y)exp(-x.^2-y.^2).*(erfc((a + 1./(x.^2+y.^2).*(b*x+d*y)).*sqrt((x.^2+y.^2)*10^(c/10)))...
+erfc((a - 1./(x.^2+y.^2).*(b*x+d*y)).*sqrt((x.^2+y.^2)*10^(c/10))));
h = integral2(f,-Inf,Inf,-Inf,Inf)
which returns 5.790631184403967. This compares well with Mathematica's numerical integration using NIntegrate. You can try specifying smaller absolute and relative tolerances for integral2 to get more accurate values, but this will result in much slower compute times.

weighted sum of matrices optimization

I'm beginner in optimization and welcome any guide in this field.
I have 15 matrices (i.e., Di of size (n*m)) and want to find best weights (i.e., wi) for weighted averaging them and make a better matrix that is more similar to one given matrix (i.e., Dt).
In fact my objective function is like it:
min [norm2(sum(wi * Di) - Dt) + norm2(W)]
for i=1 ... 15 s.t. sum(wi) = 1 , wi >= 0
How can I optimize this function in Matlab?
You are describing a simple Quadratic programming, that can be easily optimized using Matlab's quadprog.
Here how it goes:
You objective function is [norm2(sum(wi * Di) - Dt) + norm2(W)] subject to some linear constraints on w. Let's re-write it using some simplified notations. Let w be a 15-by-1 vector of unknowns. Let D be an n*m-by-15 matrix (each column is one of the Di matrices you have - written as a single column), and Dt is a n*m-by-1 vector (same as your Dt but written as a column vector). Now some linear algebra (using the fact that ||x||^2 = x'*x and that argmin x is equivalent to argmin x^2)
[norm2(sum(wi * Di) - Dt)^2 + norm2(W)^2] =
(D*w-Dt)'*(D*w-Dt) + w'*w =
w'D'Dw - 2w'D'Dt + Dt'Dt + w'w =
w'(D'D+I)w - 2w'D'Dt + Dt'Dt
The last term Dt'Dt is constant w.r.t w and therefore can be discarded during minimization, leaving you with
H = 2*(D'*D+eye(15));
f = -2*Dt'*D;
As for the constraint sum(w)=1, this can easily be defined by
Aeq = ones(1,15);
beq = 1;
And a lower bound lb = zeros(15,1) will ensure that all w_i>=0.
And the quadratic optimization:
w = quadprog( H, f, [], [], Aeq, beq, lb );
Should do the trick for you!

Avoiding numerical overflow when calculating the value AND gradient of the Logistic loss function

I am currently trying to implement a machine learning algorithm that involves the logistic loss function in MATLAB. Unfortunately, I am having some trouble due to numerical overflow.
In general, for a given an input s, the value of the logistic function is:
log(1 + exp(s))
and the slope of the logistic loss function is:
exp(s)./(1 + exp(s)) = 1./(1 + exp(-s))
In my algorithm, the value of s = X*beta. Here X is a matrix with N data points and P features per data point (i.e. size(X)=[N,P]) and beta is a vector of P coefficients for each feature such that size(beta)=[P 1].
I am specifically interested in calculating the average value and gradient of the Logistic function for given value of beta.
The average value of the Logistic function w.r.t to a value of beta is:
L = 1/N * sum(log(1+exp(X*beta)),1)
The average value of the slope of the Logistic function w.r.t. to a value of b is:
dL = 1/N * sum((exp(X*beta)./(1+exp(X*beta))' X, 1)'
Note that size(dL) = [P 1].
My issue is that these expressions keep producing numerical overflows. The problem effectively comes from the fact that exp(s)=Inf when s>1000 and exp(s)=0 when s<-1000.
I am looking for a solution such that s can take on any value in floating point arithmetic. Ideally, I would also really appreciate a solution that allows me to evaluate the value and gradient in a vectorized / efficient way.
How about the following approximations:
– For computing L, if s is large, then exp(s) will be much larger than 1:
1 + exp(s) ≅ exp(s)
and consequently
log(1 + exp(s)) ≅ log(exp(s)) = s.
If s is small, then using the Taylor series of exp()
exp(s) ≅ 1 + s
and using the Taylor series of log()
log(1 + exp(s)) ≅ log(2 + s) ≅ log(2) + s / 2.
– For computing dL, for large s
exp(s) ./ (1 + exp(s)) ≅ 1
and for small s
exp(s) ./ (1 + exp(s)) ≅ 1/2 + s / 4.
– The code to compute L could look for example like this:
s = X*beta;
l = log(1+exp(s));
ind = isinf(l);
l(ind) = s(ind);
ind = (l == 0);
l(ind) = log(2) + s(ind) / 2;
L = 1/N * sum(l,1)
I found a good article about this problem.
Cutting through a lot of words, we can simplify the argument to stating that the original expression
log(1 + exp(s))
can be rewritten as
log(exp(s)*(exp(-s) + 1))
= log(exp(s)) + log(exp(-s) + 1)
= s + log(exp(-s) + 1)
This stops overflow from occurring - it doesn't prevent underflow, but by the time that occurs, you have your answer (namely, s). You can't just use this instead of the original, since it will still give you problems. However, we now have the basis for a function that can be written that will be accurate and won't produce over/underflow:
function LL = logistic(s)
if s<0
LL = log(1 + exp(s));
else
LL = s + logistic(-s);
I think this maintains reasonably good accuracy.
EDIT now to the meat of your question - making this vectorized, and allowing the calculation of the slope as well. Let's take these one at a time:
function LL = logisticVec(s)
LL = zeros(size(s));
LL(s<0) = log(1 + exp(s(s<0)));
LL(s>=0) = s(s>=0) + log(1 + exp(-s(s>=0)));
To obtain the average you wanted:
L = logisticVec(X*beta) / N;
The slope is a little bit trickier; note I believe you may have a typo in your expression (missing a multiplication sign).
dL/dbeta = sum(X * exp(X*beta) ./ (1 + exp(X*beta))) / N;
If we divide top and bottom by exp(X*beta) we get
dL = sum(X ./ (exp(-X*beta) + 1)) / N;
Once again, the overflow has gone away and we are left with underflow - but since the underflowed value has 1 added to it, the error this creates is insignificant.