Understanding scipy b-spline base interval - scipy

I want to use scipy's b splines. However, in their examples they say that compared to the naive implementation, "outside of the base interval results differ". I don't understand what the base interval here is. I need the behavior of the naive implementation, e.g. going to 0 for x going to plus/minus infinity. Is this possible to get with the scipy implementation?

Yes. Use extrapolate=False and replace nans with zeros.

Related

Circular convolution of binary vectors (mod 2) using NTT

Let x, y be vectors of length n, with entries either 1 or 0. I want to efficiently compute the circular convolution
(x * y) mod 2
Where each component of the result is taken mod 2.
I know how to do it using a Fast Fourier Transform
(multiply Fourier transforms of x and y. transform back. Do the "mod 2")
However, this uses floating point calculations to solve a discrete problem and for large n (I'm interested in n ~ 10^7) it might lead to rounding errors. I expect there is a better way to do this using the number theoretic transform (NTT) but unfortunately I'm not familiar with number theory or NTT.
I looked at this website. Following the procedure there,
let's say n = 10^7. I need
a modulus M (use 10^7).
a prime N=kn+1 for some k. (use N = 3 * 10^7 + 1)
a root ω≡g^k mod N , where g is a generator (e.g. ω=2744)
Do the transform, etc.
Question
This seems promising. However, I would need 32-bit integers to store each bit during this calculation?
Also, this is not making use of the fact that I only need results modulo 2.
Is there a way to make use of this to simplify the procedure?
Since I don't know the number theory, this is not obvious to me.
I'm not asking for a full solution, only for an argument if my "mod 2" significantly simplifies the implementation (both in terms of difficulty to implement the necessary algorithms as well as computational resources).
Another question: If it's not possible to simplify using "mod 2", do you think it would still pay off to use NTT, as opposed to just throwing a well-known FFT library at the floating point problem?
For the NTT, your procedure looks correct. Yes, you would need 32-bit integers for each bit in your original vector. Unfortunately, there's not a lot you can do there to make use of the fact that the end result is mod 2, since you need a root of order 10^7. You may be able to shrink that number by a couple factors of two (and doing the standard DFT for a few base levels of recursion), but it wouldn't change much, relatively speaking.
Note, for your FFT implementation, I believe you could use integer arithmetic since its mod 2, but I'm not convinced it would be at all efficient. See this math stackexchange answer for details.

Dirac probability measure in Matlab

I want to implement densities of probability measures in Matlab. For that I define density as a function handle such that the integral of some function f (given as a function handle) on the interval [a,b] can be computed by
syms x
int(f(x)*density(x),x,a,b)
When it comes to the Dirac measure the problem is that
int(dirac(x),x,0,b)
delivers the value 1/2 instead of 1 for all b>0. However if I type
int(dirac(x),x,a,b)
where a<0 and b>0 the returned value is 1 as it should be. For this reason multiplying by 2 will not suffice as I want my density to be valid for all intervals [a,b]. I also dont want to distinct cases before integrating so that the code remains valid for a large class of densities.
Does someone know how I can implement the Dirac probability measure (as defined here) in Matlab?
There's no unique, accepted definition for int(delta, 0, b). The problem here isn't that you're getting the "wrong" answer as that you want to impose a different convention somehow on what the delta function is than what was provided by Matlab. (Their choice is defensible but not unique.) If you evaluate this in Wolfram Alpha, for example, it will give you theta(0) - which is not defined as anything in particular. Here theta is the Heaviside function. If you want to impose your own convention, implement your own delta function.
EDIT
I see you wrote a comment on the question, while I was writing this answer, so.... Keep in mind that the Dirac measure or Dirac delta function is not a function at all. The problem you're having, along with what's described below, all relate to trying to give a functional form to something that is inherently not a function. What you are doing is not well-defined in the framework that you have in Matlab.
END OF EDIT
To put the point about conventions in context, the delta function can be defined by different properties. One is that int(delta(x) f(x), a, b) = f(0) when a < 0 < b. This doesn't tell you anything about the integral that you want. Another, which probably leads to an answer as you're getting from Matlab, is to define it as a limit. One (but not the only choice) is the limit of the zero-mean Gaussian as the variance goes to 0.
If you want to use a convention int(delta(x) f(x), a, b) = f(0) when a <= 0 < b, that probably won't get you in much trouble, but just keep in mind that it's a convention you've chosen more than a "right" or
"wrong" answer relative to what you got from Matlab.
As related note, there is a similar choice to be made on the step function (Heaviside function) at x=0. There are conventions in which is is (a) undefined, (b) -1, (c) +1, and (d) 1/2. None is "wrong." This probably corresponds roughly to the choice on the Dirac function since the Heaviside is (roughly) the integral of the Dirac.

Cholesky factorization

within a matlab code of mine, I have to deal with the Cholesky factorization of a certain given matrix. I am generally calling chol(A,'lower') to generate the lower triangular factor.
Now, checking my code with the profiler, it is evident that function chol is really time consuming, especially if the size of the input matrix becomes large.
Therefore, I would like to know, if there is any valuable alternative to the built-in chol function.
I have been thinking of the LAPACK library, and namely of spptrf function. Is it available in MATLAB or not?
Any hint or support are more than welcome.
EDIT
Just as an example, the profiler retrieves this information:
where Coh_u has size (1395*1395). It has, also, to be remarked that chol is called 4000 times, since I need the cholesky factor for 4000 different configurations.
I'm not sure what version of matlab you are using, but I found this discussion, which suggests in older versions that Cholesky Factorization was very slow as you're describing.
One of the answers there says to use the CHOLMOD package or SuiteSparse, which has a chol2 function that is supposed to be faster.
Can you confirm if the correct expression for Coh_u is as under
a) Coh_u = exp(-a.*sqrt((f(ii)/Uhub).^2 + (0.12/Lc).^2)).*(df.*psd(ii,1));
or
b) Coh_u = exp(-a.*dist*sqrt((f(ii)/Uhub).^2 + (0.12/Lc).^2)).*(df.*psd(ii,1));
The difference in a) and b) is that in b) dist has been added which is the distance between two matrices Y an Z such that
dist = pdist2([Y(:) Z(:)],[Y(:) Z(:)]);
But it leads into the "Matrix not positive definite" error with the chol() function.

Goodness of fit with MATLAB and chi-square test

I would like to measure the goodness-of-fit to an exponential decay curve. I am using the lsqcurvefit MATLAB function. I have been suggested by someone to do a chi-square test.
I would like to use the MATLAB function chi2gof but I am not sure how I would tell it that the data is being fitted to an exponential curve
The chi2gof function tests the null hypothesis that a set of data, say X, is a random sample drawn from some specified distribution (such as the exponential distribution).
From your description in the question, it sounds like you want to see how well your data X fits an exponential decay function. I really must emphasize, this is completely different to testing whether X is a random sample drawn from the exponential distribution. If you use chi2gof for your stated purpose, you'll get meaningless results.
The usual approach for testing the goodness of fit for some data X to some function f is least squares, or some variant on least squares. Further, a least squares approach can be used to generate test statistics that test goodness-of-fit, many of which are distributed according to the chi-square distribution. I believe this is probably what your friend was referring to.
EDIT: I have a few spare minutes so here's something to get you started. DISCLAIMER: I've never worked specifically on this problem, so what follows may not be correct. I'm going to assume you have a set of data x_n, n = 1, ..., N, and the corresponding timestamps for the data, t_n, n = 1, ..., N. Now, the exponential decay function is y_n = y_0 * e^{-b * t_n}. Note that by taking the natural logarithm of both sides we get: ln(y_n) = ln(y_0) - b * t_n. Okay, so this suggests using OLS to estimate the linear model ln(x_n) = ln(x_0) - b * t_n + e_n. Nice! Because now we can test goodness-of-fit using the standard R^2 measure, which matlab will return in the stats structure if you use the regress function to perform OLS. Hope this helps. Again I emphasize, I came up with this off the top of my head in a couple of minutes, so there may be good reasons why what I've suggested is a bad idea. Also, if you know the initial value of the process (ie x_0), then you may want to look into constrained least squares where you bind the parameter ln(x_0) to its known value.

Polynomial data fitting in a special way in MATLAB

I have some data let's say the following vector:
[1.2 2.13 3.45 4.59 4.79]
And I want to get a polynomial function, say f to fit this data. Thus, I want to go with something like polyfit. However, what polyfit does is minimizing the sum of least square errors. But, what I want is to have
f(1)=1.2
f(2)=2.13
f(3)=3.45
f(4)=4.59
f(5)=4.79
That is to say, I want to manipulate the fitting algorithm so that it will give me the exact points that I already gave as well as some fitted values where exact values are not given.
How can I do that?
I think everyone is missing the point. You said that "That is to say, I want to manipulate the fitting algorithm so that I will give me the exact points as well as some fitted values where exact fits are not present. How can I do that?"
To me, this means you wish an exact (interpolatory) fit for a listed set, and for some other points, you want to do a least squares fit.
You COULD do that using LSQLIN, by setting a set of equality constraints on the points to be fit exactly, and then allowing the rest of the points to be fit in a least squares sense.
The problem is, this will require a high order polynomial. To be able to fit 5 points exactly, plus some others, the order of the polynomial will be quite a bit higher. And high order polynomials, especially those with constrained points, will do nasty things. But feel free to do what you will, just as long as you also expect a poor result.
Edit: I should add that a better choice is to use a least squares spline, which is something you CAN constrain to pass through a given set of points, while fitting other points in a least squares sense, and still not do something wild and crazy as a result.
Polyfit does what you want. An N-1 degree polynomial can fit N points exactly, thus, when it minimizes the sum of squared error, it gets 0 (which is what you want).
y=[1.2 2.13 3.45 4.59 4.79];
x=[1:5];
coeffs = polyfit(x,y,4);
Will get you a polynomial that goes through all of your points.
What you ask is known as Lagrange Interpolation . There is a MATLAB file exchange available. http://www.mathworks.com/matlabcentral/fileexchange/899-lagrange-polynomial-interpolation
However, you should note that least squares polynomial fitting is generally preferred to Lagrange Interpolation since the data you have in principle will have noise in it and Lagrange Interpolation will fit the noise as well as the data you have. So if you know that your data actually represents M dimensional polynomial and have N data, where N>>M, then you will have a order N polynomial with Lagrange.
You have options.
Use polyfit, just give it enough leeway to perform an exact fit. That is:
values = [1.2 2.13 3.45 4.59 4.79];
p = polyfit(1:length(values), values, length(values)-1);
Now
polyval(p,2) %returns 2.13
Use interpolation / extrapolation
values = [1.2 2.13 3.45 4.59 4.79];
xInterp = 0:0.1:6;
valueInterp = interp1(1:length(values), values, xInterp ,'linear','extrap');
Interpolation provides a lot of options for smoothing, extrapolation etc. For example, try:
valueInterp = interp1(1:length(values), values, xInterp ,'spline','extrap');