I calculated a simple OLS Regression like this: model = sm.OLS(y,X), results = model.fit()
I found that my residuals are heteroskedastic. That's why I calculated a robust covariance matrix in order to get robust standard errors and calculate new t-stats according to those robust standard errors. The robust covariance-matrix I calculated by using:
robust_cov = sm.stats.sandwich_covariance.cov_white_simple(results, use_correction=True)
From which I could extract robust standard errors:
robust_se = sm.stats.sandwich_covariance.se_cov(robust_cov)
Now I would like to use the robust_se to calculate new t-stats but I have no clue how to do that.
I have stumbled upon a question which I think should actually explain well enough how to solve my problem: Getting statsmodels to use heteroskedasticity corrected standard errors in coefficient t-tests
Unfortunately, I don't quite understand the explanation. Respectively, I tried to do results = model_ols.fit(cov_type='HC0') (and HC1, HC2, HC3) as mentioned in the question above. But that leaves me with exactly the same standard erros and t-stats as in the original model.
Can anyone give me a hint what I do wrong?
Related
I have two complex datasets for which I intend to find a suitable function to fit them. The first dataset is presented as follows:
As you can see, although complicated, it seems that this dataset is a combination of rectangle functions. These data describe the relation of 'Amplitude' of complex numbers with time. The second picture looks like this:
And this relation actually describes the 'Phase' of the above complex numbers with time, it seems that they are also combinations of rectangle functions. At first, I want to use combinations of Fourier cosine and sine series to fit the amplitude and phase using
lsqcurvefit
in MATLAB, but it seems that the provided parameters fail to converge to the correct values. (I have tried a number of options, like adjusting FiniteDifferenceStepSize, FiniteDifferenceType, StepTolerance and so on). Despite many failures, I saw someone said we could use Normal cumulative distribution function (CDF) to fit a step function, and I thought that it might be possible if we use the combinations of parameterized CDF and
y = erfc(x)
to achieve successful fitting. So, could anyone provide any solutions or ways to fit the above two relations? Giving some valuable ideas will also be very helpful to me.
PS: For now I don't care any hidden physics inside these data, and all I want to do is to find a mathematical way to fit the above two relations in MATLAB.
Thanks!
I am using MATLAB to build a prediction model which the target is binary.
The problem is that those negative observations in my training data may indeed are positives but are just not detected.
I started with a logistic regression model assuming the data is accurate and the results are less than satisfactory. After some research, I moved to one class learning hoping that I can focus on the only the part of data (the positives) that I am certain with.
I looked up the related materials from MATLAB documentation and found that I can use fitcsvm to proceed.
My current problem is:
Am I on the right path? Can one class learning solve my problem?
I tried to use fitcsvm to create a ClassificationSVM using all the positive observations that I have.
model = fitcsvm(Instance,Label,'KernelScale','auto','Standardize',true)
However, when I try to use the model to predict
[label,score] = predict(model,Test)
All the labels predicted for my Test cases are 1. I think I did something wrong. So should I feed the svm only the positive observations that I have?
If not what should I do?
I have been looking for a Matlab function that can do a nonlinear total least square fit, basically fit a custom function to data which has errors in all dimensions. The easiest case being x and y data-points with different given standard deviations in x and y, for every single point. This is a very common scenario in all natural sciences and just because most people only know how to do a least square fit with errors in y does not mean it wouldn't be extremely useful. I know the problem is far more complicated than a simple y-error, this is probably why most (not even physicists like myself) learned how to properly do this with multidimensional errors.
I would expect that a software like matlab could do it but unless I'm bad at reading the otherwise mostly useful help pages I think even a 'full' Matlab license doesn't provide such fitting functionality. Other tools like Origin, Igor, Scipy use the freely available fortran package "ODRPACK95", for instance. There are few contributions about total least square or deming fits on the file exchange, but they're for linear fits only, which is of little use to me.
I'd be happy for any hint that can help me out
kind regards
First I should point out that I haven't practiced MATLAB much since I graduated last year (also as a Physicist). That being said, I remember using
lsqcurvefit()
in MATLAB to perform non-linear curve fits. Now, this may, or may not work depending on what you mean by custom function? I'm assuming you want to fit some known expression similar to one of these,
y = A*sin(x)+B
y = A*e^(B*x) + C
It is extremely difficult to perform a fit without knowning the form, e.g. as above. Ultimately, all mathematical functions can be approximated by polynomials for small enough intervals. This is something you might want to consider, as MATLAB does have lots of tools for doing polynomial regression.
In the end, I would acutally reccomend you to write your own fit-function. There are tons of examples for this online. The idea is to know the true solution's form as above, and guess on the parameters, A,B,C.... Create an error- (or cost-) function, which produces an quantitative error (deviation) between your data and the guessed solution. The problem is then reduced to minimizing the error, for which MATLAB has lots of built-in functionality.
I wanted to extrapolate some of the data I had, as shown in the plot below. The blue line is the original data and the red line is the extrapolation that I wanted.
To use regression analysis, I used the function polyfit:
sizespecial = size(i_C);
endgoal = sizespecial(2);
plothelp = 1:endgoal;
reg1 = polyfit(plothelp,i_C,2);
reg2 = polyfit(plothelp,i_D,2);
Where i_C and i_D are the vectors that represent the original data. I extended the data by using this code:
plothelp=1:endgoal+11;
for in = endgoal+1:endgoal+11
i_C(in) = (reg1(1)*(in^2))+(reg1(2)*in)+reg1(3);
i_D(in) = (reg2(1)*(in^2))+(reg2(2)*in)+reg2(3);
end
However, the graph I output now is:
I do not understand why the extra notch is introduced (circled in red). Do not hesitate to ask me to clarify any of the details on this questions and thank you for all your answers.
What I imagine is happening is that you are trying fit a second order polynomial over all your data. My guess is that this polynomial will look a lot like the curve I have drawn in in orange. If you follow Matt's advise from his comment and plot your regressed polynomial over the your original data as well (not just the extrapolated part) you should confirm this.
You might get better results by fitting a higher order polynomial. Your data have two points of inflection so a 3rd order polynomial will probably work quite well. One danger of extrapolating on higher order polynomial however is that they could have fairly dramatic inflections outside of the domain of your data and produce unexpected and wild results.
One way to mitigate against this is by rather performing a linear regression over the final x data points of your series. These are the points highlighted in yellow in the figure. You can tune x as a parameter such that it covers as much of the approximately linear final portion of your curve as makes sense. The red line I have drawn in will be the result of a linear regression performed on only those data (as opposed to the entire data set)
Another option might be to rather fit a spline curve and extrapolate on that. You can use the interp1 function specifying 'spline' or 'pchip' for that.
However which is the best choice will depend largely on the nature of the problem you are trying to solve.
i want to solve this problem:
alt text http://img265.imageshack.us/img265/6598/greenshot20100727091025.png
i don't want to use "int", i want to use "quad" family (quad,dblquad,triplequad)
but i can't.
can you help me?
I assume that your real problem is more complex than this trivial one. The best solution is just to use a symbolic integral. Why is numerical integration difficult?
Numerical integration in ONE dimension typically requires on the order of say 100 function evaluations. (The exact number will be very dependent on the accuracy required, the limits, etc.) This makes a 2-d integral typically require on the order of 100^2 = 10000 function evals. So an adaptive, 5-d integral will require on the order of 100^5 = 1e10 function evaluations. (This is only a very rough order of magnitude estimate here.) My point is, you simply don't want to do that!
Better is to reduce the problem in complexity. If your integral is separable (as is this one) then do so! Reduce a 5-d problem into multiple 1-d problems.
Also, in many cases I see people wanting to do a numerical integration of a Gaussian PDF. See that this is easily solved using a call to erf or erfc, coupled with a transformation. The point is that in many cases special functions are defined to greatly reduce the complexity of a problem.
I should add that in many cases, the key to solving a difficult problem in mathematics is to use mathematics to reduce the problem to something simpler. If you can find a way to reduce the dimensionality of your problem just a bit, it will become much more tractable.
The integral you show is
Analytically solvable: always do analytically what you can
?equal to a number: constant expressions should be eliminated from numerical calculations
not easy to get calculated in MATLAB (or very correct).
You can use cumtrapz to integrate over each variable alone, and call trapz the final integration. Remember that this will blow up the error on any problem that is more complicated than the simple sum of linear functions.
Mathematica is more suited to nD integrations, if you have access to that.
matlab can do symbolic integration
>> x = sym('x'); y = sym('y'); z = sym('z'); u = sym('u'); v = sym('v');
>> int(int(int(int(int(x+y+z+u+v,1,5),-2,3),0,1),-1,1),0,1)
ans =
180
Just noticed you want to do numeric, not symbolic integration
If you look at the source of dblquad and triplequad
>> edit dblquad
you see that they just call the lower versions.
it should be possible for you to add a quadquad and a quintquad (or recursively an n-quad)