Cutting down Stata results - linear-regression

I'm running many linear regressions and probit models with a massive number of covariates. That means, every time Stata finished to compute and print the results, produce a huge list of coefficients. And, each time I have to move until the beginning of such a list, where the main coefficients are printed.
I would like to know if there is a way to avoid that. I was looking for an option to print only a certain number of lines. My sencond try was running the regression using -quietly- option and then trying to print a given number of lines. But I'm not really familiar with Stata. I usually work in R, but I have to use Stata this time, that's why I'm struggling with this commercial software.
For linear regressions the -areg- function offers a partial solution for my issue, but that function only allows me to "absorb" a single factor variable. But I need to absorb more variables and also run probit models. Hence, -areg- don't work to me.
Anyone has a trick to solve this? Only print a selection of covariates in Stata?
UPDATE:
A minimal example: I have the following linear regression with many places and time units as FEs.
regress depVar Var1 Var2-Var15 i.place i.time [pw = myweigth], cluster(ID)
I'm interested on see only the coefficients of Var*. But every time I run the regression I got thousands of coefficients for the FEs.
I posted the same question on reddit, and I got the following comments:
https://www.reddit.com/r/stata/comments/fwtds4/cutting_down_stata_results/
What is pretty much what I was looking for. Basically, is solved via estout package, and its -estout- and -esttab- functions:
estout myRegression: quietly ///
regress depVar Var1 Var2-Var15 i.place i.time [pw = myweigth], cluster(ID)
esttab myRegression, drop(place time)
Maybe someone can enrich this approach. Thanks!

Related

How to run parameters optimization for my model using the GPU

Essentially, I have a model with given a set of parameters is able to calculate different thermodynamic properties for different compounds, let's say liquid densities and vapor pressures for example.
When I want to regress the model parameters (e.g. a,b,c,d,e) by fitting data for different compounds, I usually do a lot of sequential operations that I am sure I can easily improve their efficiency. I am thinking about multiobjective optimization or even better-using GPU or multicores of the CPU. But I am a bit lost on where to start from reading just the documentation.
So within my objetive function I usually have something like:
[fval]= objective_function(a,b,c,d,e)
input_comp1=f(constants,a,b,c,d,e)
input_comp2=f(constants,a,b,c,d,e)
exp_density1=load some text file or so.
exp_density2=load some text file or so.
exp_vaporpressure1=load some text file or so.
exp_vaporpressure2=load some text file or so.
densities_1=density(a,b,c,d,e,input_comp1)
vapor_pressures1=vapor_pressures(a,b,c,d,e,input_comp1)
densities_2=density(a,b,c,d,e,input_comp2)
vapor_pressures=vapor_pressures (a,b,c,d,e,input_comp2)
ARD_d1=expression for deviations between experimental and calculated values for density of comp.1
ARD_d2=...
ARD_p1=...
ARD_p2=...
fval= ARD_d1+ARD_d2+ARD_p1+ARD_p2
Which is then evaluated by something like fminsearch but I have also used others in the past, fminsearch worked the best for me. When I do this for just one component It works fast enough for my purpose (but I am a patient man). But now I've extended the model in a way that I need to regress parameters simultaneous from more than one component and it becomes impossible.
I am quite sure this way of doing the calculations can be improved because I can run the calculations for the different compounds simultaneous instead of doing them sequentially, and then evaluate fval when the calculations for all components are done. But how?

how to fit a complex model to complex data

My aim is to fit gaussian-hermite-polynoms to complex measurement data (consisting out of absolute and an phase part). There are seven independent parameters (p) to generate such an gaussian-hermite-mode. I implement everthing in Matlab, so calculating such a mode is no problem. Problem is the fitting operation. At the moment, I implement two versions. First one is with fminsearch, the second one with lsqnonlin:
fun=#(p) CalcCoeff(c1,...
p(1)*CalcGaussHermite(...
CalcCoor([p(2),p(3),alphaZ],[p(4),p(5)],[centerX,centerY],labcoor),...
[l,m,lambda,p(6),p(7)]));
p=[scale,alphaX,alphaY,z_fX,z_fY,w_0X,w_0Y];
%optimset('Display','iter','PlotFcns',#optimplotfval);
% [fpar,fval,exitflag,output] = fminsearch(fun,p,options);
options = optimoptions('lsqnonlin','Display','iter');
res=lsqnonlin(fun,p,[],[],options);
The function CalcCoeff calculates the difference in case of lsqnonlin, in case of fminsearch it calculates the overlap-integral (dot product).
As a test, I calculate a simple gaussian-mode and tried to reconstruct the parameters used. But my algorithm failed in both cases. It operates without any error message, but the parameters can't be reconstructed. So the question I ask myself is: are there too many parameters for the optimization, so the algorithm isn't able to converge? Or is it even a global problem?
I would be very pleased, if any optimization specialist would give me any hint what might go wrong.
If I asked my question in a wrong way just let me know.
Regards
Andre

matlab running all linprog algortithms (is there a matlab-list of algorithms?)

Matlab offers multiple algorithms for solving Linear Programs.
For example Matlab R2012b offers: 'active-set', 'trust-region-reflective', 'interior-point', 'interior-point-convex', 'levenberg-marquardt', 'trust-region-dogleg', 'lm-line-search', or 'sqp'.
But other versions of Matlab support different algorithms.
I would like to run a loop over all algorithms that are supported by the users Matlab-Version. And I would like them to be ordered like the recommendation order of Matlab.
I would like to implement something like this:
i=1;
x=[];
while (isempty(x))
options=optimset(options,'Algorithm',Here_I_need_a_list_of_Algorithms(i))
x = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
end
In 99% this code should be equivalent to
x = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
but sometimes the algorithm gives back an empty array because of numerical problems (exitflag -4). If there is a chance that one of the other algorithms can find a solution I would like to try them too.
So my question is:
Is there a possibility to automatically get a list of all linprog-algorithms that are supported by the installed Matlab-version ordered like Matlab recommends them.
I think looping through all algorithms can make sense in other scenarios too. For example when you need very precise data and have a lot of time, you could run them all and than evaluate which gives the best results.
Or one would like to loop through all algorithms, if one wants to find which algorithms is the best for LPs with a certain structure.
There's no automatic way to do this as far as I know. If you really want to do it, the easiest thing to do would be to go to the online documentation, and check through previous versions (online documentation is available for old versions, not just the most recent release), and construct some variables like this:
r2012balgos = {'active-set', 'trust-region-reflective', 'interior-point', 'interior-point-convex', 'levenberg-marquardt', 'trust-region-dogleg', 'lm-line-search', 'sqp'};
...
r2017aalgos = {...};
v = ver('matlab');
switch v.Release
case '(R2012b)'
algos = r2012balgos;
....
case '(R2017a)'
algos = r2017aalgos;
end
% loop through each of the algorithms
Seems boring, but it should only take you about 30 minutes.
There's a reason MathWorks aren't making this as easy as you might hope, though, because what you're asking for isn't a great idea.
It is possible to construct artificial problems where one algorithm finds a solution and the others don't. But in practice, typically if the recommended algorithm doesn't find a solution this doesn't indicate that you should switch algorithms, it indicates that your problem wasn't well-formulated, and you should consider modifying it, perhaps by modifying some constraints, or reformulating the objective function.
And after all, why stop with just looping through the alternative algorithms? Why not also loop through lots of values for other options such as constraint tolerances, optimality tolerances, maximum number of function evaluations, etc.? These may have just as much likelihood of affecting things as a choice of algorithm. And soon you're running an optimisation algorithm to search through the space of meta-parameters for your original optimisation.
That's not a great plan - probably better to just choose one of the recommended algorithms, stick to that, and if things don't work out then focus on improving your formulation of the problems rather than over-tweaking the optimisation itself.

why if I put a filter on an output I modify the source signal? is this a simulink bug?

I know it sounds strange and that's a bad way to write a question,but let me show you this odd behavior.
as you can see this signal, r5, is nice and clean. exactly what I expected from my simulation.
now look at this:
this is EXACTLY the same simulation,the only difference is that the filter is now not connected. I tried for hours to find a reason,but it seems like a bug.
This is my file, you can test it yourself disconnecting the filter.
----edited.
Tried it with simulink 2014 and on friend's 2013,on two different computers...if Someone can test it on 2015 it would be great.
(attaching the filter to any other r,r1-r4 included ''fixes'' the noise (on ALL r1-r8),I tried putting it on other signals but the noise won't go away).
the expected result is exactly the smooth one, this file showed to be quite robust on other simulations (so I guess the math inside the blocks is good) and this case happens only with one of the two''link number'' (one input on the top left) set to 4,even if a small noise appears with one ''link number'' set to 3.
thanks in advance for any help.
It seems to me that the only thing the filter could affect is the time step used in the integration, assuming you are using a dynamic time step (which is the default). So, my guess is that (if this is not a bug) your system is numerically unstable/chaotic. It could also be related to noise, caused by differentiation. Differentiating noise over a smaller time step mostly makes things even worse.
Solvers such as ode23 and ode45 use a dynamic time step. ode23 compares a second and third order integration and selects the third one if the difference between the two is not too big. If the difference is too big, it does another calculation with a smaller timestep. ode45 does the same with a fourth and fifth order calculation, more accurate, but more sensitive. Instabilities can occur if a smaller time step makes things worse, which could occur if you differentiate noise.
To overcome the problem, try using a fixed time step, change your precision/solver, or better: avoid differentiation, use some type of state estimator to obtain derivatives or calculate analytically.

diagnostic for MATLAB ODE

I am solving a stiff PDE in MATLAB using ode15, and it often freezes depending on the initial conditions. I never actually get an error, it just won't finish even after 10 hours when it should take around 30 seconds to run. I am experimenting with different spatial and time node intervals, but it is hard, because I don't get feedback.
Is there some sort of equivalent to diagnostic for fsolve? stats is not useful because it only displays an output after fsolve is finished.
Check out the documentation on odeset, and specifically the stats option. I think you basically just want to set stats to on and you will get some feedback.
Also, depending on your ODE, you may need a different solver. About half way down the page on this page there is a list of most of the solvers available in MATLAB. Depending on whether your function is stiff or non-stiff, and how accurate you need to get, one of those might work better for you. Sometimes I just code them all in and comment out all but one until I find the one that runs the best for me, but check out the documentation on each if you want to find the "right" one for your application.
Your question is confusing because you refer to both ode15s and fsolve locking up. These are two completely different functions. One does numerical integration and the other solves for roots. Also, fsolve has no option called 'Stats' (see doc fsolve). If you want continuous output from fsolve use:
options = optimist('Display','iter');
[x,fval,exitflag] = fsolve(myfun,x0,options)
This will display the iteration count, number of function evaluations, the function value, and other stuff depending on what algorithm you use (the alorithm can be adjusted via the 'Algorithm' option). Again see doc fsolve for full details.
As far as the 'Stats' option with ode15s goes, it's not going to give you very much information. I doubt that it will you figure out why your system is halting (if it even is ode15s that you have a problem with). What you can try is using an output function via the 'OutputFcn' option of odeset. You can try the simple odeprint first:
options = odeset('OutputFcn',#odeprint)
which will print your state after each integration step. Type edit odeprint to see the code and how you might write your own output function if you need to do more.