I have been trying to use optimizer(SGD, Adagrad) from BigDL library on TransE with scala. My current implementation works with mini batch in sequential way. I followed this example to optimize the embeddings(as Tensors) without creating a layered model.My code is somewhat similar to this example. My current problem is, with some parameters my losses gets at a plateau point (the value of margin) no matter how many epochs I run. With this, my hit#10 in testing is not that good. Can someone give any idea why losses get at a plateaued point and if this generates bad testing results?
P.S. I have checked my loss calculation and it is fine. The only place I have control over my implementation is the optimizer.
Thanks in advance.
Related
In the documentation of H2O is written:
mini_batch_size: Specify a value for the mini-batch size. (Smaller values lead to a better fit; larger values can speed up and generalize better.)
but when I run a model using the FLOW UI (with mini_batch_size > 1) in the log file is written:
WARN: _mini_batch_size Only mini-batch size = 1 is supported right now.
so the question: is the mini_batch_size really used??
It appears to be a left over from preparation for a DeepWater integration that never happened. E.g. https://github.com/h2oai/h2o-3/search?l=Java&p=2&q=mini_batch_size
That makes sense, because the Hogwild! algorithm, that H2O's deep learning uses, does away with the need for batching training data.
To sum up, I don't think it is used.
I am hoping somebody here will be able to help me out with my small issue with one of the Simulink/Matlab code. It is quite similar to the problem that I’ve discussed earlier, but a little bit more complicated and now it is more a Simulink issue, rather than a Matlab one.
So I have a turbine which speed is controlled by the gate’s opening, hence the control voltage. By controlling the gate opening I am accelerating the turbine and at some point in time, I need to introduce a saturation effect (since I am testing the code now, it will be done an external signal). This effect won’t change the control voltage, but it affects other components of the system, hence at the same control voltage, the turbine’s speed will go up. But at the same time, I need to keep the speed at the same value as it was before the saturation effect (let’s say it was 320 rpm). To do so I need to decrease the control voltage and should keep doing it until I reach the speed as it was before. There is no need to do it instantly (this approach will be later introduced in hardware), but it will be a nice thing to check the algorithm in these synthetic tests.
In terms of the model, I was planning to use a while loop with the speed requirement “if speed > 320” again, now just to simplify things. To decrease the control voltage I was planning to subtract from the original 50 (% opening) - 0.25 (u2) at first and after that increasing this value by 0.25 until I decrease the speed below 320. I can’t know the exact opening when this requirement will be satisfied, hence I need some kind of algorithm to “track” this voltage.
So it should be something like this:
u2 = 0;
While speed > 320
u2 = u2+0.25
End
u2 is initially zero since we have a predefined initial control voltage. And obviously, when we reach the motor’s speed below 320, I need to keep the latest value of the u2 (and control voltage).
Overall, it is a small code and should be done in Simulink (don’t want to introduce any other Fcn function into the model). I’ve never used while and if blocks in Simulink, but so far I came up with this system. It’s a simplified version of my model, but the control principle is the same.
We are getting the motor speed of 350, compared with 320 (the speed before “saturation), and if our speed after saturation is higher, we need to reduce the control voltage. To trigger the while loop block I’ve decided to use a simple switch. The while block meanwhile is:
Definitely not the best implementation but I was trying a lot of different combinations and without any real success. I am always getting the same error:
Was trying to use a step signal instead of the constant “7” – to model acceleration of the motor, and was getting the same error at the moment of acceleration above 320 threshold. So looks like the approach is almost right but mathematically it fails to find the most suitable solution. I’ve tried to implement a transport delay in the memory part of the while subsystem but was getting errors during compilation all the time.
Are there any obvious (and not so) mistakes? Or maybe from the beginning, I should have chosen another approach… I really hope that somebody will be able to help. Thank you in advance and have a great day.
I do not think that you have used While block correctly.
This is what I have done, I used a "Matlab function" block instead of "While" block as follows,
The function in Matlab function is
function u2=fcn(speed,u2d)
if speed>320
u2=u2d+0.25;
else
u2=u2d;
end
And the results I have got, Scope 1
Scope
Edit
As you prefer a function free model, the following may do the same.
I have run across issues in developing models where the translation time (simulates quickly but takes far too long to translate) has become a serious issue and could use some insight so I can look into resolving this.
So the question is:
What are some of the primary factors that impact the translation time of a model and ideas to address the issue?
For example, things that may have an impact:
for loops vs a vectorized method - a basic model testing this didn't seem to impact anything
using input variables vs parameters
impact of annotations (e.g., Evaluate=true)
or tough luck, this is tool dependent (Dymola, OMEdit, etc.) :(
use of many connect() - this seems to be a factor (perhaps primary) as it forces translater to do all the heavy lifting
Any insight is greatly appreciated.
Clearly the answer to this question if naturally open ended. There are many things to consider when computation times may be a factor.
For distributed models (e.g., finite difference) the use of simple models and then using connect equations to link them in the appropriate order is not the best way to produce the models. Experience has shown that this method significantly increases the translation time to unbearable lengths. It is better to create distributed models in the same approach that is used the MSL Dynamic pipe (not exactly like it but similar).
Changing the approach as described is significantly faster in translational time (orders of magnitude for larger models, >~100,000 equations) than using connect statements as the number of distributed elements increases to larger numbers. This was tested using Dymola 2017 and 2017FD01.
Some related materials pointed out by others that may be useful for more information have been included below:
https://modelica.org/events/modelica2011/Proceedings/pages/papers/07_1_ID_183_a_fv.pdf
Scalable Test Suite : https://dx.doi.org/10.3384/ecp15118459
Some general Modelica advice?
We've built a model with ~2000 equations and three vectors of input from measured data. Using OpenModelica, attempts at simulation have begun to hang in the translation stage (which runs for hours where it used to take less than a minute) and now I regularly "lose connection to omc.exe." Is there perhaps something cumulative occurring that's degrading translation/compilation performance?
In general, are there any good rules of thumb for keeping simulations lighter and faster? I realize that, depending on the couplings, additional equations could be exponentially increasing the size of the resulting system of equations - could this be a problem?
Thanks for your thoughts!
It shouldn't take that long. Seems like a bug.
You can report this bug here:
https://trac.openmodelica.org/OpenModelica (New Ticket).
If your model is public you can post it there, if not you can contact the OpenModelica team privately.
I did some cleaning in the code; and got the part that repeats 12x (the module) down to ~180 equations; in the process I reduced the size of my input vectors (and also a 2D look-up table the module refers to) by quite a bit - they're both down to a few hundred values. It's working now--simulations run in reasonable time, a few minutes each.
Since all these tables were defined within Modelica functions (as you pointed out, Mr. Tiller) perhaps shrinking them helped to improve the performance. I had assumed that all that data just got spread out in a memory array, without going through any real processing, but maybe that's not the case...time to know more about what's going on under the hood in this environment (as always).
Thanks for the help!
I used Matlab-fminsearch for a negativ max likelihood model for a binomial distributed function. I don't get any error notice, but the parameter which I want to estimate, take always the start value. Apparently, there is a mistake. I know that I ask a totally general question. But is it possible that anybody had the same mistake and know how to deal with it?
Thanks a lot,
#woodchips, thank you a lot. Step by step, I've tried to do what you advised me. First of all, I actually maximized (-log(likelihood)) and this is not the problem. I think I found out the problem but I still have some questions, if I don't bother you. I have a model(param) to maximize in paramstart=p1. This model is built for (-log(likelihood(F))) and my F is a vectorized function like F(t,Z,X,T,param,m2,m3,k,l). I have a data like (tdata,kdata,ldata),X,T are grids and Z is a function on this grid and (m1,m2,m3) are given parameters.When I want to see the value of F(tdata,Z,X,T,m1,m2,m3,kdata,ldata), I get a good output. But I think fminsearch accept that F(tdata,Z,X,T,p,m2,m3,kdata,ldata) like a constant and thatswhy I always have as estimated parameter the start value. I will be happy, if you have any advise to tweak that.
You have some options you can try to tweak. I'd start with algorithm.
When the function value practically doesn't change around your startpoint it's also problematic. Maybe switching to log-likelyhood helps.
I always use fminunc or fmincon. They allow also providing the Hessian (typically better than "estimated") or 'typical values' so the algorithm doesn't spend time in unfeasible regions.
It is virtually always true that you should NEVER maximize a likelihood function, but ALWAYS maximize the log of that function. Floating point issues will almost always corrupt the problem otherwise. That your optimization starts and stops at the same point is a good indicator this is the problem.
You may well need to dig a little deeper than the above, but even so, this next test is the test I recommend that all users of optimization tools do for every one of their problems, BEFORE they throw a function into an optimizer. Evaluate your objective for several points in the vicinity. Does it yield significantly different values? If not, then look to see why not. Are you creating a non-smooth objective to optimize, or a zero objective? I.e., zero to within the supplied tolerances?
If it does yield different values but still not converge, then make sure you know how to call the optimizer correctly. Yeah, right, like nobody has ever made this mistake before. This is actually a very common cause of failure of optimizers.
If it does yield good values that vary, and you ARE calling the optimizer correctly, then think if there are regions into which the optimizer is trying to diverge that yield garbage results. Is the objective generating complex or imaginary results?