PReLU Activation Function update rule - neural-network

I just finished reading Delving Deep into Rectifiers paper. This paper proposes a new activation function called PReLU. Maybe it is obvious, because the paper did not mention it, but I want to know when is the parameter 'a' of PReLU updated? Is it updated before weight update or after weight update? or is it simultaneously updated with weight?

The weights are all updated sequentially as the error signal propagates back through each layer of the network. So the bias, and 'a' parameter both update before passing the signal to the next layer and so before the the weight update below them in the network.

Related

System dynamics SEIR infectious curve for 3 waves of Covid

Using system dynamics on anylogic how can you model a simulation that will give an infectious curve of this nature(Below picture) using SEIR.
enter image description here
I have tried to simulate, however my graph goes up and down. It does not oscillate as per the attached picture.
I need to simulate something similar to the graph for my assingment.
There should be three types of events in your model.
First, lets call it "initial spread", is triggered on the start of your simulation.
Second, lets call it "winter season", is triggered annualy in November\December.
Third, lets call it "mass vaccination" - you can decide when to trigger it and for what selection of your agents.
So first two are kind of global events, and the third event is specific to some sub-population (this can make the third wave kind of "toothy" if you trigger it in slightly different moments for different populations).
That is pretty it.
Curios to see how your model will predict the fourth wave - second winter season of your simulation. So keep us updated :)
There are a number of ways to model this. One of the simplest ways is to simply use a time aspect for one of your infection rate parameters so that the infection rate increases or decreases with time.
See the example below.
I took the SIR model from the Cloud https://cloud.anylogic.com/model/d465d1f5-f1fc-464f-857a-d5517edc2355?mode=SETTINGS
And simply added an event to change the Infectivity rate using an event.
Changing the chart to only show infected people the result now looked something like this.
(See the 3 waves that were created)
You will obviously use a parameters optimization experiment to get the parameter settings as close to reality as possible

Difference in Workspace and Simulink step response. Why this difference?

I had as primary objective to make a controller for the transfer function (5.551* s^2), using root locus I made the controller shown below. Analyzing the step response in the Workspace using the step () function I had a satisfactory answer but when I try to transfer this answer to Simulink the response behaves differently, at steady state for example I wish to have the smallest possible error as it was obtained in Workspace but in Simulink there is a big error and for some reason at 8 seconds time (Simulink simulation time) there is a "jump" as shown on the display and when I change the simulation time there is a change in this "jump" too and I do not know why these changes between one environment and another.
Step response in Workspace
Step response in Simulink with 8s of simulation
Step response in Simulink with 12s of simulation
Simulink controller
Simulink transfer function
I expected to make a controller that has an error less than 5% and an overshoot smaller than 25%, so I first made a controller with two integrators to nullify the effects of zeros on the source, after that I added two more integrators on the source to try decrease the error, the zero at -0.652 I used the angular condition for this and the gain of 0.240251 I used the modular condition.
I wasn't expecting the most optimized behavior possible, just that it has minimum conditions that satisfy the imposed conditions, so I didn't worry for example about the four integrators at the source.
I tried use the sisotool() command thinking that I had done something wrong, but the result changed a lot when I was simulating Simulink so I discarded this option and kept the controller I made using root locus.
Your MATLAB code and your Simulink model are not the same, and hence the different results.
MATLAB allows you to define the non-causal plant model P_ball, then form the causal closed loop CL, which can have its step response generated.
Simulink does not allow you to model non-causal blocks (even if the overall model is causal) and hence will not allow you to implement s^2, which I assume is why you have used two differentiation blocks. But a numerical differentiation is not the same as a Laplace s operator.
You would have to make the plant causal by incorporating two poles that are large enough to not adversely effect the overall simulation. So your plant model needs to be something like 5.551*s^2/((s/1000 + 1)(s/1000 + 1)) which can be implemented using a Transfer Function block with a numerator of 5.551*1000*1000*[1 0 0] and a denominator of [1 2*1000 1000*1000].
Alternatively you could just implement PID * P_ball (where you manually do the 2 zero/pole cancellations) which is causal.

Understanding multi-layer LSTM

I'm trying to understand and implement multi-layer LSTM. The problem is i don't know how they connect. I'm having two thoughs in mind:
At each timestep, the hidden state H of the first LSTM will become the input of the second LSTM.
At each timestep, the hidden state H of the first LSTM will become the initial value for the hidden state of the sencond LSTM, and the input of the first LSTM will become the input for the second LSTM.
Please help!
TLDR: Each LSTM cell at time t and level l has inputs x(t) and hidden state h(l,t)
In the first layer, the input is the actual sequence input x(t), and previous hidden state h(l, t-1), and in the next layer the input is the hidden state of the corresponding cell in the previous layer h(l-1,t).
From https://arxiv.org/pdf/1710.02254.pdf:
To increase the capacity of GRU networks (Hermans and
Schrauwen 2013), recurrent layers can be stacked on top of
each other.
Since GRU does not have two output states, the same output hidden state h'2
is passed to the next vertical layer. In other words, the h1 of the next layer will be equal to h'2.
This forces GRU to learn transformations that are useful along depth as well as time.
I am taking help of colah's blog post, just that I will cut short it to make you understand specific part.
As you can look at above image, LSTMs have this chain like structure and each have four neural network layer.
The values that we pass to next timestamp (cell state) and to next layer(hidden state) are basically same and they are desired output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to pass.
We also pass previous cell state information (top arrow to next cell) to next timestamp(cell state) and then decide using sigmoid layer(forget gate layer), how much information we are going to keep taking help of new input and input from previous state.
Hope this helps.
In PyTorch, multilayer LSTM's implementation suggests that the hidden state of the previous layer becomes the input to the next layer. So your first assumption is correct.
There's no definite answer. It depends on your problem and you should try different things.
The simplest thing you can do is to pipe the output from the first LSTM (not the hidden state) as the input to the second layer of LSTM (instead of applying some loss to it). That should work in most cases.
You can try to pipe the hidden state as well but I didn't see it very often.
You can also try other combinations. Say for the second layer you input the output of the first layer and the original input. Or you link to the output of the first layer from the current unit and the previous.
It all depends on your problem and you need to experiment to see what works for you.

Backpropagation through time in stateful RNNs

If I use a stateful RNN in Keras for processing a sequence of length N divided into N parts (each time step is processed individually),
how is backpropagation handled? Does it only affect the last time step, or does it backpropagate through the entire sequence?
If it does not propagate through the entire sequence, is there a way to do this?
The back propagation horizon is limited to the second dimension of the input sequence. i.e. if your data is of type (num_sequences, num_time_steps_per_seq, data_dim) then back prop is done over a time horizon of value num_time_steps_per_seq Take a look at
https://github.com/fchollet/keras/issues/3669
There are a couple things you need to know about RNNs in Keras. At default the parameter return_sequences=False in all recurrent neural networks. This means that at default only the activations of the RNN after processing the entire input sequence are returned as output. If you want to have the activations at every time step and optimize every time step seperately, you need to pass return_sequences=True as parameter (https://keras.io/layers/recurrent/#recurrent).
The next thing that is important to know is that all a stateful RNN does is remember the last activation. So if you have a large input sequence and break it up in smaller sequences (which I believe you are doing), the activation in the network is retained in the network after processing the first sequence and therefore affects the activations in the network when processing the second sequence. This has nothing to do with how the network is optimized, the network simply minimizes the difference between the output and the targets you give.
to the Q1: how is backpropagation handled? (as so as RNN is not only fully-connected vertically as in basic_NN, but also considered to be Deep - having also horizontal backprop connections in hidden layer)
Suppose batch_input_shape=(num_seq, 1, data_dim) - "Backprop will be truncated to 1 timestep , as the second dimension is 1. No gradient updates will be performed further back in time than the second dimension's value." - see here
Thus, if having time_step >1 there - gradient WILL update further back in time_steps assigned in second_dim of input_shape
set return_sequences=True for all recurrent layers except the last one (that use as needed output or Dense further to needed output) -- True is needed to have transmissible sequence from previous to the next rolled at +1 in sliding_window -- to be able to backprop according already estimated weights
return_state=True is used to get the states returned -- 2 state tensors in LSTM [output, state_h, state_c = layers.LSTM(64, return_state=True, name="encoder")] or 1 state tensor in GRU [incl. in shapes] -- that "can be used in the encoder-decoder sequence-to-sequence model, where the encoder final state is used as the initial state of the decoder."...
But remember (for any case): Stateful training does not allow shuffling, and is more time-consuming compared with stateless
p.s.
as you can see here -- (c,h) in tf or (h,c) in keras -- both h & c are elements of output, thus both becoming urgent in batched or multi-threaded training

How to use the value of the variable in the previous interval as an input to the equation....?

Is it possible to use the previous value of the time varying variable
for eg:
Suppose I have pipe whose inlet temperature is 298K with a specified uniform mass flow(m_flow), now suppose i am heating the pipe using a heater of 100 watts.
The outlet temperature will be attain a higher temperature of suppose 302K, now if i have to use this outlet temperature as my inlet temperature (in the sense i am recircuilating the water), how would i be doing it?
is it possible to update the value of the inlet temperature based on the outlet temperature at the previous timestep? so that for the next iteration the inlet temperature will be the same as the oulet temperature in the previous iteration (in other words the fluid would be recirculating).
Thanks
You cannot access the value in the previous time step. The closest you can get in Modelica is using delay(exp,T) to get the value T units of time ago.
The timestep does not enter into it at all. A model that uses information about timestep is just wrong. Nature doesn't know or care about integration time steps, the model should reflect that.
It seems to me what you want to capture is transport delay. Transport delay is the delay introduced by the time it takes for molecules, electrons, etc. through the system. So presumably what you wish to model is the time it takes the inlet fluid to reach the exit. Again, this has nothing to do with the integration timestep but rather the velocity of the fluid and the distance it must travel. Once you know how long that takes (by either a priori knowledge of the system of by looking at the simulation results themselves), you can follow Marco's suggestion of using the delay operator.
In order to setup a proper model for the system you described I suggest you to look at the example :
Modelica.Thermal.FluidHeatFlow.Examples.IndirectCooling
of the modelica standard library ver. 3.2. Instead of one pipe you can put an ambient or control volume component to better suit you needs. Moreover using continous and differentiable equations (the delay function is not) you will benefit from some of the advantages of the Modelica code, e.g. you will be able to reuse your models in a much wider range of cases, solve inverse problems, solve initial value problems, ...
I hope this helps,
Marco