Error using scipy.stats truncnorm.rvs(a,b,loc,scale) python - scipy

I'm trying to sample from the truncated normal distribution by using the truncnorm() function from the scipy stats package in python. However, I keep getting the following error:
x = _norm_ilogcdf(np.log(q) + _norm_logcdf(b))
z = z - (_norm_logcdf(z) - y) / _norm_logcdfprime(z)
assert np.abs(z) > TRUNCNORM_TAIL_X/2
I'm not completely sure what it means, but I'm guessing it has something to do with the mean being outside the bounds. But then what is the difference in comparison to:
Domain error in arguments
For clarification, I am not sampling from a standard normal. I altered the bounds by use of the following equation:
a, b = (myclip_a - my_mean) / my_std, (myclip_b - my_mean) / my_std
and I enter these bounds into the function truncnorm.rvs(a,b,my_mean, my_std). Any clarification is much appreciated!

I've encountered the same problem as well. The thing is that the lower bound a is above (or close to) the 99% quantile of the Normal distribution. Thus the truncation causes scipy to crush. The only solution I came out with is to check before truncation if myclip_a is above the 99% quantile and if so avoid the update. I hope that someone will find a better solution than mine!

Related

Arima antipersistence

I’m running RStudio Version 1.1.419 with R-3.4.3 on Windows 10. I am trying to fit an (f)arima model and setting the fractional differencing parameter during the optimization process to be between (-0.5,0.5), i.e. allowing for antipersistence (d < 0), short memory (d = 0) and long memory (d > 0). I have tried multiple functions to accomplish that. I am aware that the default of fracdiff$drange is (0,0.5). Therefore this ...
> result <- fracdiff(MeanPrice, nar = 2, nma = 1, drange = c(-0.5,0.5))
sadly returns this..
Warning: C fracdf() optimization failure
Warning message: unable to compute correlation matrix; maybe change 'h'
Is there a way to fit fracdiff or other models (maybe arfima::arfima()?) with that drange? Your help is very much appreciated.
If you look at the package documentation, it states that the h argument for fracdiff "is used to compute a finite difference approximation to the Hessian, and
hence only influences the cov, cor, and std.error computations." However, as they are referring to the Hessian, I would assume that this affects the results of the MLE. There are other functions in that package that may be helpful: fdGHP for estimating the order of fractional differencing based on the Geweke and Porter-Hudak method, and similarly fdSperio.
Take a look at the forecast package. If you estimate the order of fractional differencing using the above mentioned functions, you might be able to use the same method described in the details of the arfima function.

Divide-by-zero encountered: rhok assumed large error using scipy.optimizor

I used scipy.optimize.fmin_bfgs to minimize the hinge loss (SVM). However, there are errors :
Divide-by-zero encountered: rhok assumed large.
Somebody said that “It had to do with the training data set”, anybody knows how to deal with the problem?
From the source code of scipy, rhok is,
rhok = 1.0 / (numpy.dot(yk, sk))
where both yk and sk depend on intput array x0.
A possible causes of this error may be a bad choice of initial condition x0 which tends to singularities in your function f. I would suggest plotting your function and maybe ensuring initial conditions are always away from possible divergent values. If this is part of a larger training routine, you could possibly use try and on catching an ZeroDivisionError try shifting the initial condition shifted by some amount. You may also find a different minimisation method is more robust from scipy minimize.
If you add the full_output option to scipy.optimize.fmin_bfgs it should give you more information about you particular case.

Why does this trivially learnable example break AdaBoost?

I'm testing out a boosted tree model that I built using Matlab's fitensemble method.
X = rand(100, 10);
Y = X(:, end)>.5;
boosted_tree = fitensemble(X, Y, 'AdaBoostM1', 100,'Tree');
predicted_Y = predict(boosted_tree, X);
I just wanted to run it on a few simple examples, so I threw in an easy case, one feature is >.5 for positive examples and < .5 for negative examples. I get the warning
Warning: AdaBoostM1 exits because classification error = 0
Which leads me to think, great, it figured out the relevant feature and all the training examples were correctly classified.
But if I look at the accuracy
sum(predicted_Y==Y)/length(Y)
The result is 0.5 because the classifier simply assigned the positive class to all examples!
Why does Matlab think that classification error = 0 when it is clearly not 0? I believe this example should be easily learnable. Is there a way to prevent this error and get the correct result using this method?
Edit: The code above should reproduce the warning.
This is not a bug, it's just that AdaBoost is not designed to work in cases where the first weak learner gets perfect classification. More details:
1) The warning you get is referring to the error of the first weak learning, which is indeed zero. You can see this by following the stack trace that comes along with the warning into the function Ensemble.m (in Matlab R2013b, at line 194). If you place a breakpoint there and run your example, then run the command H.predict(X) you will see that this learning has perfect prediction.
2) So why doesn't your ensemble have perfect prediction? If you look more at Ensemble.m, you'll see that this perfect learner never gets added to the ensemble. This is also reflected in that boosted_tree.NTrained is zero.
3) So why doesn't this perfect learner get added to the ensemble? If you find a description of the AdaBoost.M1 algorithm, you'll see that in each round, training examples are weighted by the error of the previous weak learner. But if that weak learner had no error, then the weights will be zero and therefore all subsequent learners will have nothing to do.
4) If you come across this situation in the real world, what do you do? Don't bother with AdaBoost! The problem is easy enough that a single one of your weak learners can solve it:
X = rand(100, 10);
Y = X(:, end)>.5;
tree = fit(ClassificationTree.template, X, Y);
predicted_Y = predict(tree, X);
accuracy = sum(predicted_Y == Y) / length(Y)

Issue with Matlab solve function?

The following command
syms x real;
f = #(x) log(x^2)*exp(-1/(x^2));
fp(x) = diff(f(x),x);
fpp(x) = diff(fp(x),x);
and
solve(fpp(x)>0,x,'Real',true)
return the result
solve([0.0 < (8.0*exp(-1.0/x^2))/x^4 - (2.0*exp(-1.0/x^2))/x^2 -
(6.0*log(x^2)*exp(-1.0/x^2))/x^4 + (4.0*log(x^2)*exp(-1.0/x^2))/x^6],
[x == RD_NINF..RD_INF])
which is not what I expect.
The first question: Is it possible to force Matlab's solve to return the set of all solutions?
(This is related to this question.) Moreover, when I try to solve the equation
solve(fpp(x)==0,x,'Real',true)
which returns
ans =
-1.5056100417680902125994180096313
I am not satisfied since all solutions are not returned (they are approximately -1.5056, 1.5056, -0.5663 and 0.5663 obtained from WolframAlpha).
I know that vpasolve with some initial guess can handle this. But, I have no idea how I can generally find initial guessed values to obtain all solutions, which is my second question.
Other solutions or suggestions for solving these problems are welcomed.
As I indicated in my comment above, sym/solve is primarily meant to solve for analytic solutions of equations. When this fails, it tries to find a numeric solution. Some equations can have an infinite number of numeric solutions (e.g., periodic equations), and thus, as per the documentation: "The numeric solver does not try to find all numeric solutions for [the] equation. Instead, it returns only the first solution that it finds."
However, one can access the features of MuPAD from within Matlab. MuPAD's numeric::solve function has several additional capabilities. In particular is the 'AllRealRoots' option. In your case:
syms x real;
f = #(x)log(x^2)*exp(-1/(x^2));
fp(x) = diff(f(x),x);
fpp(x) = diff(fp(x),x);
s = feval(symengine,'numeric::solve',fpp(x)==0,x,'AllRealRoots')
which returns
s =
[ -1.5056102995536617698689500437312, -0.56633904710786569620564475006904, 0.56633904710786569620564475006904, 1.5056102995536617698689500437312]
as well as a warning message.
My answer to this question provides other way that various MuPAD solvers can be used, particularly if you can isolate and bracket your roots.
The above is not going to directly help with your inequalities other than telling you where the function changes sign. For those you could try:
s = feval(symengine,'solve',fpp(x)>0,x,'Real')
which returns
s =
(Dom::Interval(0, Inf) union Dom::Interval(-Inf, 0)) intersect solve(0 < 2*log(x^2) - 3*x^2*log(x^2) + 4*x^2 - x^4, x, Real)
Try plotting this function along with fpp.
While this is not a bug per se, The MathWorks still might be interested in this difference in behavior and poor performance of sym/solve (and the underlying symobj::solvefull) relative to MuPAD's solve. File a bug report if you like. For the life of me I don't understand why they can't better unify these parts of Matlab. The separation makes not sense from the perspective of a user.

Issue with assigning output from a function in MATLAB

I am having problem when I try to store rmabackadj function's output to a variable. The function works properly when no output variable is assigned. This function is part of bioinformatics toolbox.
So the issue is when I try to run the following it works properly:
rmabackadj(myprobeData.PMIntensities)
But when I try to run the following I get an error:
>> A = rmabackadj(myprobeData.PMIntensities)
Warning: Colon operands must be real scalars.
> In rmabackadj>findMaxDensity at 255
In rmabackadj at 164
Error using ksdensity>parse_args (line 162)
X must be a non-empty vector.
Error in ksdensity (line 114)
[axarg,yData,n,ymin,ymax,xispecified,xi,u,m,kernelname,...
Error in rmabackadj>findMaxDensity (line 255)
[f, x] = ksdensity(z, min(z):(max(z)-min(z))/npoints:max(z), 'kernel', 'epanechnikov');
Error in rmabackadj (line 164)
mu = findMaxDensity( o(o < mu));
I searched for it online as well, but I couldn't find any result. Does anybody have any idea about the cause of this error?
PS: When I assign ans variable to a new variable, it is properly assigned.
A = ans
I'm pretty sure this is a bug.
Firstly, the reason it errors only when you supply an output argument is because there's an internal switch in the function that calculates different things based on nargout. That's an odd design, but not necessarily a bug.
Internal to rmabackadj there are two subfunctions findMaxDensity and findMaxDensity2. The main routine calls findMaxDensity, which is supposed to find an initial guess for the parameter mu. However (when I run the documentation example that you mention in your comment), it finds a terrible guess right on the edge, leading to an error.
When I edit the file to call findMaxDensity2 rather than findMaxDensity, it seems to produce a reasonable guess, and runs fine with no error. I can't vouch for whether the guess is actually "correct", but it seems reasonable to me, and it's only functioning as an initial guess to start off a better estimation process. (NB if you do this yourself, make sure to save a copy of the old version first).
I would guess that this is a bug, either that findMaxdensity is generating an unusually poor guess that should be caught, or that really it should be calling findMaxDensity2 and the code has not been updated to call a new subfunction.
Either way, I would report it to MathWorks.
PS I am running MATLAB R2011b. Check first if the issue has been fixed, or behaves differently, in more recent versions.
Mathworks confirmed this bug and issued a work around for it and mentioned this may be fixed in future releases.
One possible workaround is to add the following conditional at line 163 of rmabackadj function
% estimate mu from left-of-the-mode data
if any(o < mu)
mu = findMaxDensity( o(o < mu));
end
The bug for N<1000 samples has been confirmed as well but no work around has been issued yet.
I will update the thread if the work around for N<1000 samples bug.