Computing roots of a very high degree polynomial - polynomial-math

Here is my problem ; for my research, I believe that the complex numbers I am looking at are precisely the (very large) set of roots of some high degree polynomial, of degree $\sim 2^n$ (~ 2^n) where $1 \le n \le 10 \sim 15$ (1 <= n <= 10 ~ 15). Mathematica has been running for the whole day on my computer, and I wondered if any other program out there would be faster than Mathematica so that I could compute more of those roots. If I had more examples it would REALLY help. The thing I need the program to be able to do is simple : I give you a polynomial of very high degree, and I want to numerically plot its complex roots. I don't care about multiplicity.
Thanks in advance,

Related

How many iterations should you make for the simulation to be a 'Monte Carlo simulation' for BER calculations?

Edited question
How many iterations should you make for the simulation to be an accurate 'Monte Carlo simulation' for Bit error rate calculations?
What is the minimum value? If I want to repeat the simulation by an exponentially growing number for five times? should I start from 1e2 thus>> iterations = [1e2 1e3 1e4 1e5 1e6] or 1e3 >> [1e3 1e4 1e5 1e6 1e7]? or something else? what is the common practice?
Additional info:
I used [8e3 1e4 3e4 5e4 8e4 1e5] before but that is not enough according to the prof. because the result is not satisfactory.
Simulations take a very long time on my computer so I cannot keep changing the iterations based on the result. If there is a common practice about this, please let me know.
Thanks #BillBokeey for helping me edit the question.
What your professor propose strikes me as qualitative, but not quantitative way to estimate the convergence of your simulation.
Frankly, I don't know how BER is computed, but I deal a lot with some integral calculations by MC.
In such case you sample xi over some interval and compute
fMC = Si fi / N, where S denotes summation. We know that fMC will converge to true value with variance of sigma2/N (or std.deviation of sigma/sqrt(N)). What do we do then, we compute in the same simulation estimation of sigma, assume for large enough N to be good approximation of sigma and get simulation error plotted. IN practical terms alongside with fMC we compute second momentum sum and average as f2MC = Si f2i / N, and at the end get s=sqrt(f2MC - (fMC)2)/sqrt(N) as estimated error of the MC simulation (it will be a bit biased though).
Thus you could plot on the same graph value of BER and statistical error of the simulation. You could even do better - ask user to input required statistical error (say, in %, meaning user enters s/f*100), and continue simulation in bunches till you reach required precision.
THen you could judge if 109 points are enough or not...
Assuming that we denote our simulated BER as Pb_hat and that Pb_hat in [(1 - alpha)Pb, (1 + alpha)Pb], where Pb is the true BER, and alpha is the percent deviation tolerance (e.g., 0.1), then from [van Trees 2013, pg. 83] we know that the number of Monte Carlo trials required to obtain Pb_hat with a confidence probability pc is K=(c / alpha)^2 x (1-Pb) / Pb,
with c given in Table I.
Table I: confidence interval probabilities from the Gaussian distribution
pc
0.900
0.950
0.954
0.990
0.997
c
1.645
1.960
2.000
2.576
3.000
Example: Suppose we want to simulate a BER of 10^-4 with a percent deviation tolerance of 0.01 and a confidence probability 0.950, then from Table I we know that c = 1.960 and by applying the formula K = (1.96/0.01)^2 x (1-10^-4)/10^-4 = 384121584 Monte Carlo trials. This is a surprisingly large value, though.
As a rule of thumb, K should be on the order of 1O/BER [Jeruchim 1984]
[van Trees 2013] H. L. van Trees, K. L. Bell, and Z. Tian, Detection, estimation, and filtering theory, 2nd ed., Hoboken, NJ: Wiley, 2013.
[Jeruchim 1984] M. Jeruchim, "Techniques for Estimating the Bit Error Rate in the Simulation of Digital Communication Systems," in IEEE Journal on Selected Areas in Communications, vol. 2, no. 1, pp. 153-170, January 1984, doi: 10.1109/JSAC.1984.1146031.

How to find this solution with matlab?

I'm beginner at matlab, and I'm interested how to find the solution of equation :
2x + y <= 6
4x + 5y <= 20
x + 2y >= 4
5x + 3y <= 15
x - 2y + 6 <= 0
how to graph this equation in matlab?
Thanks in advance
In response to some of the comments, I want to say that it does make sense to have these 5 equations in 2 unknowns. First of all, these are inequalities, not equalities. Each of them represent half side of the 2D plane after being cut by a line (all different lines). And your solution to this system of inequality is just the area that is intersection of all these half planes. It could be a closed polygon region, or an unbounded region, or empty set.
Since this looks like an assignment question, I'm not gonna given you the solution here. But here's a hint, densely sample points from XY plane, and for each point, if it satisfies all the equations, plot it, otherwise don't ...
P.S. even if there are all equalities, the system of more linear equations than variables still make sense. It's an overdetermined system, and there is solution in the "least square" sense, i.e. line fit to lots of noisy data with the lowest sum of squared error. But this is not your scenario.
This can be solved by a simplex (optimization method, linear programming) which is totally deterministic hence a computer can achieve the job. Matlab provide tools for this such as linprog. Inequations are your constraints and will define a convex polytope which can be bounded, unbounded or empty. And your goal function is equals to 1.

Why eigs( 'lm') is much faster than eigs('sm')

I use eigs to calculate the eigen vectors of sparse square matrices which are large (tens of thousands).
What I want is the smallest set of eigen vectors.
But
eigs(A, 10, 'sm') % Note: A is the matrix
runs very slow.
However, using eigs(A, 10, 'lm') gives me the answer relatively faster.
And as I tried, replacing 10 with A_width in eigs(A, 10, 'lm') so that this includes all the eigen vectors, doesn't solve this problem, 'cause this make it the as slow as using 'sm'.
So, I want to know why calculating the smallest vectors(using 'sm') is much slower than calculating the largest?
BTW, if you have any idea about how to use eigs with 'sm' as fast as with 'lm', please tell me that.
The algorithm used in pretty much any standard eigs function is (some variation of) the Lanczos algorithm. It is iterative and the first iterations give you the largest eigenvalues. This explains pretty much every observation you make:
Largest eigenvalues take the least amount of iterations,
Smallest eigenvalues take the maximum amount of iterations,
All eigenvalues also take the maximum amount of iterations.
There are tricks to "fool" eigs into calculating the smallest eigenvalues by actually making them the largest eigenvalues of another problem. This is usually accomplished by a shift parameter. Skimming over the Matlab documentation for eigs, I see that they have a sigma parameter, which might help you. Note the same documentation recommends proper eig if the matrix fits into memory, as eigs has its numerical quirks.
Since eigs is actually an m-file function, we can profile it. I have run a couple of basic tests, and it depends very much on the nature of the data in the matrix. If we run the profiler separately on the following two lines of code:
eigs(eye(1000), 10, 'lm'), and
eigs(eye(1000), 10, 'sm'),
then in the first instance it calls arpackc (the main function that does the work - according to the comments in eigs it's probably from here) a total of 22 times. In the second instance it is called 103 times.
On the other hand, trying it with
eigs(rand(1000), 10, 'lm'), and
eigs(rand(1000), 10, 'sm'),
I get results where the 'lm' option consistently calls arpackc many more times than the sm option.
I'm afraid I don't know the details of the algorithm, and so can't explain it in any deeper mathematical sense, but the page that I linked suggests ARPACK is best for matrices with some structure. Since matrices generated by rand have little structure, it is probably safe to assume the latter behaviour I described is not what you'd expect under normal operating conditions.
In short: it simply takes the algorithm more iterations to converge when you ask it for the smallest eigenvalues of a structured matrix. This being an iterative process, however, it very much depends on the actual data you give it.
Edit: There is a wealth of information and references about this method here, and the key to understanding exactly why this happens is surely contained somewhere therein.
The reason is actually much more simple and due to the basics of solving large sparse eigenvalue problems. These are all based on solving:
(1) A x = lam x
Most solution methods use some power law (e.g. a Krylov subspace spanned in both the Lanczos and Arnoldi methods)
The thing is that the a power series converge to the largest eigenvalue of (1). Therefore we have that the largest eigenvalues are found by the subspace spanned by: K^k = {A*r0,....,A^k*r0}, which requires only matrix vector multiplications (cheap).
To find the smallest, we have to reformulate (1) as follows:
(2) 1/lam x = A^(-1) x or A^(-1) x = invlam x
Now solving for the largest eigenvalue of (2) is equivalent to finding the smallest eigenvalue of (1). In this case the subspace is spanned by K^k = {A^(-1)*r0,....,A^(-k)*r0}, which requires solving several linear system (expensive!).

Minimization of L1-Regularized system, converging on non-minimum location?

This is my first post to stackoverflow, so if this isn't the correct area I apologize. I am working on minimizing a L1-Regularized System.
This weekend is my first dive into optimization, I have a basic linear system Y = X*B, X is an n-by-p matrix, B is a p-by-1 vector of model coefficients and Y is a n-by-1 output vector.
I am trying to find the model coefficients, I have implemented both gradient descent and coordinate descent algorithms to minimize the L1 Regularized system. To find my step size I am using the backtracking algorithm, I terminate the algorithm by looking at the norm-2 of the gradient and terminating if it is 'close enough' to zero(for now I'm using 0.001).
The function I am trying to minimize is the following (0.5)*(norm((Y - X*B),2)^2) + lambda*norm(B,1). (Note: By norm(Y,2) I mean the norm-2 value of the vector Y) My X matrix is 150-by-5 and is not sparse.
If I set the regularization parameter lambda to zero I should converge on the least squares solution, I can verify that both my algorithms do this pretty well and fairly quickly.
If I start to increase lambda my model coefficients all tend towards zero, this is what I expect, my algorithms never terminate though because the norm-2 of the gradient is always positive number. For example, a lambda of 1000 will give me coefficients in the 10^(-19) range but the norm2 of my gradient is ~1.5, this is after several thousand iterations, While my gradient values all converge to something in the 0 to 1 range, my step size becomes extremely small (10^(-37) range). If I let the algorithm run for longer the situation does not improve, it appears to have gotten stuck somehow.
Both my gradient and coordinate descent algorithms converge on the same point and give the same norm2(gradient) number for the termination condition. They also work quite well with lambda of 0. If I use a very small lambda(say 0.001) I get convergence, a lambda of 0.1 looks like it would converge if I ran it for an hour or two, a lambda any greater and the convergence rate is so small it's useless.
I had a few questions that I think might relate to the problem?
In calculating the gradient I am using a finite difference method (f(x+h) - f(x-h))/(2h)) with an h of 10^(-5). Any thoughts on this value of h?
Another thought was that at these very tiny steps it is traveling back and forth in a direction nearly orthogonal to the minimum, making the convergence rate so slow it is useless.
My last thought was that perhaps I should be using a different termination method, perhaps looking at the rate of convergence, if the convergence rate is extremely slow then terminate. Is this a common termination method?
The 1-norm isn't differentiable. This will cause fundamental problems with a lot of things, notably the termination test you chose; the gradient will change drastically around your minimum and fail to exist on a set of measure zero.
The termination test you really want will be along the lines of "there is a very short vector in the subgradient."
It is fairly easy to find the shortest vector in the subgradient of ||Ax-b||_2^2 + lambda ||x||_1. Choose, wisely, a tolerance eps and do the following steps:
Compute v = grad(||Ax-b||_2^2).
If x[i] < -eps, then subtract lambda from v[i]. If x[i] > eps, then add lambda to v[i]. If -eps <= x[i] <= eps, then add the number in [-lambda, lambda] to v[i] that minimises v[i].
You can do your termination test here, treating v as the gradient. I'd also recommend using v for the gradient when choosing where your next iterate should be.

How does number of points change a FFT in MATLAB

When taking fft(signal, nfft) of a signal, how does nfft change the outcome and why? Can I have a fixed value for nfft, say 2^18, or do I need to go 2^nextpow2(2*length(signal)-1)?
I am computing the power spectral density(PSD) of two signals by taking the FFT of the autocorrelation, and I want to compare the the results. Since the signals are of different lengths, I am worried if I don't fix nfft, it would make the comparison really hard!
There is no inherent reason to use a power-of-two (it just might make the processing more efficient in some circumstances).
However, to make the FFTs of two different signals "commensurate", you will indeed need to zero-pad one or other (or both) signals to the same lengths before taking their FFTs.
However, I feel obliged to say: If you need to ask this, then you're probably not at a point on the DSP learning curve where you're going to be able to do anything useful with the results. You should get yourself a decent book on DSP theory, e.g. this.
Most modern FFT implementations (including MATLAB's which is based on FFTW) now rarely require padding a signal's time series to a length equal to a power of two. However, nearly all implementations will offer better, and sometimes much much better, performance for FFT's of data vectors w/ a power of 2 length. For MATLAB specifically, padding to a power of 2 or to a length with many low prime factors will give you the best performance (N = 1000 = 2^3 * 5^3 would be excellent, N = 997 would be a terrible choice).
Zero-padding will not increase frequency resolution in your PSD, however it does reduce the bin-size in the frequency domain. So if you add NZeros to a signal vector of length N the FFT will now output a vector of length ( N + NZeros )/2 + 1. This means that each bin of frequencies will now have a width of:
Bin width (Hz) = F_s / ( N + NZeros )
Where F_s is the signal sample frequency.
If you find that you need to separate or identify two closely space peaks in the frequency domain, you need to increase your sample time. You'll quickly discover that zero-padding buys you nothing to that end - and intuitively that's what we'd expect. How can we expect more information in our power spectrum w/o adding more information (longer time series) in our input?
Best,
Paul