What is the impact of lr_mult = 0? - neural-network

I'm looking at some Caffe network-building code (in the BerkeleyVision pascalcontext-fcn8s net.py file), and I find this line:
L.Deconvolution(n.score_fr,
convolution_param=dict(num_output=60, kernel_size=4, stride=2,
bias_term=False),
param=[dict(lr_mult=0)])
I'm wondering about what the lr_mult = 0 term does. My first guess after looking at the documentation is that it should prevent any updates to the kernel weights, but that seems weird, because I assume that the default initialization is random. What does this do? Is there some other code or parameter file somewhere that is initializing the kernel?

You are correct. Setting lr_mult=0 freezes the weights of the layer. The weights will be stayed fixed and will not change from their initial values throughout training.
If you follow the code, you'll see a call to surgery.interp, this function sets the initial weights of the upscaling layer before training begins. The weights are remaining fixed to these values due to the lr_mult=0.

Related

H2O.ai mini_batch_size is really used?

In the documentation of H2O is written:
mini_batch_size: Specify a value for the mini-batch size. (Smaller values lead to a better fit; larger values can speed up and generalize better.)
but when I run a model using the FLOW UI (with mini_batch_size > 1) in the log file is written:
WARN: _mini_batch_size Only mini-batch size = 1 is supported right now.
so the question: is the mini_batch_size really used??
It appears to be a left over from preparation for a DeepWater integration that never happened. E.g. https://github.com/h2oai/h2o-3/search?l=Java&p=2&q=mini_batch_size
That makes sense, because the Hogwild! algorithm, that H2O's deep learning uses, does away with the need for batching training data.
To sum up, I don't think it is used.

Is it necessary to initialized the weights for every time retraining the same model in matlab with nntool?

I know for the ANN model, the initial weights are random. If I train a model and repeat training 10 times by nntool, do the weights initialize every time when I click the training button, or still use the same initial weights you just adjusted?
I am not sure if the nntool you refer to uses the train method (see here https://de.mathworks.com/help/nnet/ref/train.html).
I have used this method quite extensively and it works in a similar way as tensorflow, you store a number of checkpoints and load the latest status to continue training from such point. The code would look something like this.
[feat,target] = iris_dataset;
my_nn = patternnet(20);
my_nn = train(my_nn,feat,target,'CheckpointFile','MyCheckpoint','CheckpointDelay',30);
Here we have requested that checkpoints are stored at a rate not greater than one each 30 seconds. When you want to continue training the net must be loaded from the checkpoint file as:
[feat,target] = iris_dataset;
load MyCheckpoint
my_nn = checkpoint.my_nn;
my_nn = train(my_nn,feat,target,'CheckpointFile','MyCheckpoint');
This solution involves training the network from the command line or via a script rather than using the GUI supplied by Mathworks. I honestly think this latter method is great for beginners but if you want to do any interesting clever start using the command line or even better switch to libraries like Torch or Tensorflow!
Hope it helps!

Illegal rate transition while trying to normalize signal in simulink

I have a signal in simulink which I want to normalize so that the highest value of the signal is always 1. So I use a MaxRunningResettable-Block to remember the highest value that passed so far. I then divide the signal by that value.
A little test with a signal generation block, the running resettable block, the divide block and a scope runs just fine. But when I add this normalizing function to my simulink model I get an error:
"Model initialization failed - Illegal rate transition found involving Unit Delay"
I don't even need to connect the little test case to my other model. Simply by putting it into my model I get this error. Strangely the RunningResettable-block turns yellow when I copy it - indicating that it has a different sampling rate I suppose. I don't get why this happens. I already tried to add a zero-order-hold-block behind the RunningResettable but that didn't help.
As suggest I tried to add another constant block to the R-input of the RunningResettable-Block. I tried several sampling frequencies for that Block (-1, 0, 1/fAb) but that didn't help.
Ok, finally I think that I found my failure. It seems, that the SignalGenerator outputs a continous signal (black). Because in my model there are mostly discrete signals this somehow causes errors. So when I simply add an zero-order-hold block after the signal generator everything seems to work just fine. The sampling frequency of the zero-order-hold has to be adjusted to the rest of the system.

Change simulink parameters at runtime from the code/block flow

My initial problem is that I have a continuous transfer function which coefficients change with time.
Currently the TF's coefficients are expressed in function of the block mask parameters. These parameters are tunable, and if I change the value in the mask parameters dialog during a simulation the response seems to react appropriately.
However how can I do just that in the code/block flow? Basically, I
have the block parameter 'maskParam' which is set using the mask
parameters dialog, and in the mask initialization commands:
'param=maskParam'. 'param' is used in the transfer function and I
would like to change it in real time (as param=maskParam*f(t)).
I have already looked around and found relevant solutions but either it's unbelievably complicated; or the only transfer function which we are allowed to modify at runtime is discrete and 1) I would like to avoid z-transforming my quite complex TF (I don't have the control toolbox) 2) The sampling time seems to be fixed.. None uses this "dirty" technique of updating parameters, maybe that's the way around?
To illustrate:
I am assuming that you want to change your sim parameters whilst the simulation is running?
A solution is that you run your simulation for inf period and use/change a workspace variable during the simulation period to make the changes take effect.
for Example:
If you look at the w block, you can set it's value in runtime, by doing this:
set_param('my_model_name/w', 'value', 100); % Will change to 100 immediately
You can do similar things with arrays (i.e. a list of coefficients in your case).
HINT FOR YOU
You are using discrete transfer function block. Try the following:
1) Give your block a name e.g. fcn_1
2) In your script, type set_param('your_model_name/fcn_1', 'numerator', '[1 2]'); This will set the numerator value to [1 2]. Do the same for denominator.
3) You should be able to understand, through this exercise, how to handle the property names etc. so that you can change/get them using set_param/get_param.
I leave you to investigate further.
The short answer is that Simulink blocks are not really designed to do this. By definition, a transfer function is Liner-Time Invariant, meaning its characteristics (read coefficients) do not vary with time.
Having said that, there are some workarounds, such as the ones you mentioned in your question. These are the correct way to approach the problem I'm afraid, other than the set_param method suggested by #ha9u63ar. See also this blog on the subject on the MathWorks web site.

Simulink: PID Controller - difference between back-calculation and clamping for anti-windup?

I need to implement an anti-windup (output limitation) for my PID controller. Simulink is offering two options: back calculation and clamping (documentation) which seem to deliver equal results. I know what back calculation is doing mathematically. It requires to define the back-calculation gain Kb. This gain is dependent on how long my controller is saturated, therefore it is actually a dynamic value (because I may have a high variation of saturation times). Do you see a way to control this value? (in this case it probably would be necessary to build my own PID Controller as shown in the documentation above or in the picture below.
Which brings me to the question, what is clamping actually doing? And what are other differences? Which one is faster, which one is more robust against stiff slopes? Does anybody has experiences using both?
Not sure if this fully answers the question, but the PID Controller documentation page, explains a bit more about clamping:
clamping
Stops integration when the sum of the block components
exceeds the output limits and the integrator output and block input
have the same sign. Resumes integration when the sum of the block
components exceeds the output limits and the integrator output and
block input have opposite sign. The integrator portion of the block
is:
The clamping circuit implements the logic necessary to determine whether integration continues.
If you select the clamping option and look under the mask, you can probably see the details of the clamping circuit.
Additionally to am304's answer there are some more things to consider.
Clamping
Clamping will always work. It detects when there is integrator overflow and sets the integral path of the PID-controller to zero to avoid windup by using a simple switch.
Clamping is a commmonly used anti windup method, especially in case of digital control systems. In serious applications however, there is also forward clamping involved - evaluating the controller input as well. This mechanism must me implemented manually.
Back Calculation
Back Calculation highly depends on the back calculation coefficient Kb. If you don't know how to actually calculate the parameter Kb don't use back-calculation. This method calculates the difference between the actual controller output and the saturated output and subtracts it from the I-Gain path, amplified by Kb.
In most of cases the default value Kb = 1 will lead to worse results than clamping, it is even possible that it has no effect at all. Kb should be calculated based on the sampling time or
in case a D-Gain is involded, based on D- and I-Gain. Appropriate literatur should be consulted to calculate the coefficient. Back calculation with a properly set coeffient enables better dynamics than clamping!