I am performing RANS simulation with the k-omega SST model. I would like to compute the two following scalar invariants of velocity gradient for a 2D periodical channel flow:
![Q_A=-0.5A_{ij}A_{ji}] and Q_s=-0.5S_{ij}S_{ji}]
with ![A_{ij}=\frac{\partial U_i}{\partial x_j}] and ![S_{ij} = 0.5 (A_{ij}+A_{ji})].
Could you confirm me that:
![Q_A=-0.5A_{ij}A_{ji}=-0.5(A_{11}A_{11}+A_{12}A_{21}+A_{22}A_{22}+A_{21}A_{12})]?
![Q_S=-0.5S_{ij}S_{ji}=-0.5(S_{11}S_{11}+S_{12}S_{21}+S_{22}S_{22}+S_{21}S_{12})]?
Moreover, I want to compute the ratio between production and dissipation. The production can be approximated with : ![P=\nu_t S^2]. Could you confirm me that the production is linked to the Q_S invariant? ![P=\nu_t(-2Q_S)]?
Related
For a university project, I want to train a (simulated) robot to hit a ball given the position and velocity. The first thing to try is policy gradients: I have a parametric trajectory generator. For every training position, I feed the position through my network, send the trajectory to the simulator and get a reward back. I now can use that as the loss, sample the gradient, feed it back and update the weights of my network so that it does better next time.
Therefore, goal is to learn the mapping from position to trajectory weights. When using all-star compute graph libraries like Theano and Tensorflow (or Keras), the I have the problem that I do not know how to actually model that system. I want to have standard fully connected layers first, then the output are my trajectory weights. But how do I actually calculate the loss so that it can use the backprop?
In a custom loss function, I would ignore/not specify the true labels, run the simulator and return the loss it gives. But from what I read, you need to return a Theano/Tensorflow function which is symbolic. My loss is quite complicated, so I do not want to move it from simulator to network. How can I implement that? Problem then is to differentiate that loss, as I might need to sample to get that gradient.
I've had a similar problem some time ago.
There was a loss function which relied heavily on optimized C code and third-party libraries.
Porting this to tensorflow was not possible.
But we still wanted to train a tensorflow graph to create steering signals from the current setup.
Here is an
ipython notebook which explains how to mix numerical and analytical derivatives
https://nbviewer.jupyter.org/gist/lhk/5943fa09922693a0fbbbf8dc9d1b05c0
Here is a more detailed description of the idea behind it:
The training of the graph is an optimization problem, so you will definitely need the derivative of the loss.
The challenge is to mix the analytical derivative in tensorflow and the numerical derivative of your loss.
You need this setup
Input I
output P
Graph G maps I to P, P = G(I)
add a constant of the same shape as P, P = C * G(I)
Loss function L
Training the tensorflow graph works with backpropagation.
For every parameter X in the graph, the following derivative is computed
dL/dX = dL/dP * dP/dX
The second part of that, dP/dX comes for free by just setting up the tensorflow graph. But we still need the derivative of the loss.
Now there's a trick.
We want tensorflow to update X based on the correct gradient dL/dP * dP/dX
but we can't get tensorflow to compute dL/dP, because that's not a tensorflow graph.
we will instead use P~ = P * C,
the derivative of that is dP~ / dX = dP/dX * C
So if we set C to dL/dP, we get the correct gradient.
We simply have to estimate C with a numerical gradient.
This is the algorithm:
set up your graph, multiply the output with a constant C
feed 1 for the constant, compute a forward pass, get the prediction P
compute the loss at P
compute the numerical derivative of P
feed the numerical derivative as C, compute the backward pass, update the parameters
I want to scale the loss value of each image based on how close/far is the "current prediction" to the "correct label" during the training. For example if the correct label is "cat" and the network think it is "dog" the penalty (loss) should be less than the case if the network thinks it is a "car".
The way that I am doing is as following:
1- I defined a matrix of the distance between the labels,
2- pass that matrix as a bottom to the "softmaxWithLoss" layer,
3- multiply each log(prob) to this value to scale the loss according to badness in forward_cpu
However I do not know what should I do in the backward_cpu part. I understand the gradient (bottom_diff) has to be changed but not quite sure, how to incorporate the scale value here. According to the math I have to scale the gradient by the scale (because it is just an scale) but don't know how.
Also, seems like there is loosLayer in caffe called "InfoGainLoss" that does very similar job if I am not mistaken, however the backward part of this layer is a little confusing:
bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
I am not sure why infogain_mat[] is divide by prob rather than being multiply by! If I use identity matrix for infogain_mat isn't it supposed to act like softmax loss in both forward and backward?
It will be highly appreciated if someone can give me some pointers.
You are correct in observing that the scaling you are doing for the log(prob) is exactly what "InfogainLoss" layer is doing (You can read more about it here and here).
As for the derivative (back-prop): the loss computed by this layer is
L = - sum_j infogain_mat[label * dim + j] * log( prob(j) )
If you differentiate this expression with respect to prob(j) (which is the input variable to this layer), you'll notice that the derivative of log(x) is 1/x this is why you see that
dL/dprob(j) = - infogain_mat[label * dim + j] / prob(j)
Now, why don't you see similar expression in the back-prop of "SoftmaxWithLoss" layer?
well, as the name of that layer suggests it is actually a combination of two layers: softmax that computes class probabilities from classifiers outputs and a log loss layer on top of it. Combining these two layer enables a more numerically robust estimation of the gradients.
Working a little with "InfogainLoss" layer I noticed that sometimes prob(j) can have a very small value leading to unstable estimation of the gradients.
Here's a detailed computation of the forward and backward passes of "SoftmaxWithLoss" and "InfogainLoss" layers with respect to the raw predictions (x), rather than the "softmax" probabilities derived from these predictions using a softmax layer. You can use these equations to create a "SoftmaxWithInfogainLoss" layer that is more numerically robust than computing infogain loss on top of a softmax layer:
PS,
Note that if you are going to use infogain loss for weighing, you should feed H (the infogain_mat) with label similarities, rather than distances.
Update:
I recently implemented this robust gradient computation and created this pull request. This PR was merged to master branch on April, 2017.
I am trying to implement Kalman filter for vehicle tracking in MATLAB. A Vehicle is moving in X direction with constant velocity. Initial state for vehicle =[x(t) v(t)].
I have to find position of vehicle after t=2 sec. Position of vehicle from GPS is corrupted by noise.
My question is: if I consider that there is no process noise, then will initial prediction matrix be equal to measurement noise matrix? I don't know how to initialise prediction matrix.
Any part of your state that is initialized from a measurement can have the corresponding variance initialized from the measurement variance. If your state has other variables (e.g. velocity) which aren't directly measured, then you'll have to put in educated guesses about how far wrong you could be. Variance has units of "state unit squared" (because variance is the square of standard deviation). So if your velocity estimate has a 68% chance (see: normal distribution) of being within +/-7mph, then the initial variance would be 49 miles^2/hour^2.
What is the fitness function used to solve an inverted pendulum ?
I am evolving neural networks with genetic algorithm. And I don't know how to evaluate each individual.
I tried minimize the angle of pendulum and maximize distance traveled at the end of evaluation time (10 s), but this won't work.
inputs for neural network are: cart velocity, cart position, pendulum angular velocity and pendulum angle at time (t). The output is the force applied at time (t+1)
thanks in advance.
I found this paper which lists their objective function as being:
Defined as:
where "Xmax = 1.0, thetaMax = pi/6, _X'max = 1.0, theta'Max =
3.0, N is the number of iteration steps, T = 0.02 * TS and Wk are selected positive weights." (Using specific values for angles, velocities, and positions from the paper, however, you will want to use your own values depending on the boundary conditions of your pendulum).
The paper also states "The first and second terms determine the accumulated sum of
normalised absolute deviations of X1 and X3 from zero and the third term when minimised, maximises the survival time."
That should be more than enough to get started with, but i HIGHLY recommend you read the whole paper. Its a great read and i found it quite educational.
You can make your own fitness function, but i think the idea of using a position, velocity, angle, and the rate of change of the angle the pendulum is a good idea for the fitness function. You can, however, choose to use those variables in very different ways than the way the author of the paper chose to model their function.
It wouldn't hurt to read up on harmonic oscillators either. They take the general form:
mx" + Bx' -kx = Acos(w*t)
(where B, or A may be 0 depending on whether or not the oscillator is damped/undamped or driven/undriven respectively).
I'm trying to implement gradient checking for a simple feedforward neural network with 2 unit input layer, 2 unit hidden layer and 1 unit output layer. What I do is the following:
Take each weight w of the network weights between all layers and perform forward propagation using w + EPSILON and then w - EPSILON.
Compute the numerical gradient using the results of the two feedforward propagations.
What I don't understand is how exactly to perform the backpropagation. Normally, I compare the output of the network to the target data (in case of classification) and then backpropagate the error derivative across the network. However, I think in this case some other value have to be backpropagated, since in the results of the numerical gradient computation are not dependent of the target data (but only of the input), while the error backpropagation depends on the target data. So, what is the value that should be used in the backpropagation part of gradient check?
Backpropagation is performed after computing the gradients analytically and then using those formulas while training. A neural network is essentially a multivariate function, where the coefficients or the parameters of the functions needs to be found or trained.
The definition of a gradient with respect to a specific variable is the rate of change of the function value. Therefore, as you mentioned, and from the definition of the first derivative we can approximate the gradient of a function, including a neural network.
To check if your analytical gradient for your neural network is correct or not, it is good to check it using the numerical method.
For each weight layer w_l from all layers W = [w_0, w_1, ..., w_l, ..., w_k]
For i in 0 to number of rows in w_l
For j in 0 to number of columns in w_l
w_l_minus = w_l; # Copy all the weights
w_l_minus[i,j] = w_l_minus[i,j] - eps; # Change only this parameter
w_l_plus = w_l; # Copy all the weights
w_l_plus[i,j] = w_l_plus[i,j] + eps; # Change only this parameter
cost_minus = cost of neural net by replacing w_l by w_l_minus
cost_plus = cost of neural net by replacing w_l by w_l_plus
w_l_grad[i,j] = (cost_plus - cost_minus)/(2*eps)
This process changes only one parameter at a time and computes the numerical gradient. In this case I have used the (f(x+h) - f(x-h))/2h, which seems to work better for me.
Note that, you mentiond: "since in the results of the numerical gradient computation are not dependent of the target data", this is not true. As when you find the cost_minus and cost_plus above, the cost is being computed on the basis of
The weights
The target classes
Therefore, the process of backpropagation should be independent of the gradient checking. Compute the numerical gradients before backpropagation update. Compute the gradients using backpropagation in one epoch (using something similar to above). Then compare each gradient component of the vectors/matrices and check if they are close enough.
Whether you want to do some classification or have your network calculate a certain numerical function, you always have some target data. For example, let's say you wanted to train a network to calculate the function f(a, b) = a + b. In that case, this is the input and target data you want to train your network on:
a b Target
1 1 2
3 4 7
21 0 21
5 2 7
...
Just as with "normal" classification problems, the more input-target pairs, the better.