How to monitor tensor values in Theano/Keras? - callback

I know this question has been asked in various forms, but I can't really find any answer I can understand and use. So forgive me if this is a basic question, 'cause I'm a newbie to these tools(theano/keras)
Problem to Solve
Monitor variables in Neural Networks
(e.g. input/forget/output gate values in LSTM)
What I'm currently getting
no matter in which stage I'm getting those values, I'm getting something like :
Elemwise{mul,no_inplace}.0
Elemwise{mul,no_inplace}.0
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
Subtensor{int64}.0
Subtensor{int64}.0
Is there any way I can't monitor(e.g. print to stdout, write to a file, etc) them?
Possible Solution
Seems like callbacks in Keras can do the job, but it doesn't work either for me. I'm getting same thing as above
My Guess
Seems like I'm making very simple mistakes.
Thank you very much in advance, everyone.
ADDED
Specifically, I'm trying to monitor input/forget/output gating values in LSTM.
I found that LSTM.step() is for computing those values:
def step(self, x, states):
h_tm1 = states[0] # hidden state of the previous time step
c_tm1 = states[1] # cell state from the previous time step
B_U = states[2] # dropout matrices for recurrent units?
B_W = states[3] # dropout matrices for input units?
if self.consume_less == 'cpu': # just cut x into 4 pieces in columns
x_i = x[:, :self.output_dim]
x_f = x[:, self.output_dim: 2 * self.output_dim]
x_c = x[:, 2 * self.output_dim: 3 * self.output_dim]
x_o = x[:, 3 * self.output_dim:]
else:
x_i = K.dot(x * B_W[0], self.W_i) + self.b_i
x_f = K.dot(x * B_W[1], self.W_f) + self.b_f
x_c = K.dot(x * B_W[2], self.W_c) + self.b_c
x_o = K.dot(x * B_W[3], self.W_o) + self.b_o
i = self.inner_activation(x_i + K.dot(h_tm1 * B_U[0], self.U_i))
f = self.inner_activation(x_f + K.dot(h_tm1 * B_U[1], self.U_f))
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1 * B_U[2], self.U_c))
o = self.inner_activation(x_o + K.dot(h_tm1 * B_U[3], self.U_o))
with open("test_visualization.txt", "a") as myfile:
myfile.write(str(i)+"\n")
h = o * self.activation(c)
return h, [h, c]
And as it's in the code above, I tried to write the value of i into a file, but it only gave me values like :
Elemwise{mul,no_inplace}.0
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
Subtensor{int64}.0
So I tried i.eval() or i.get_value(), but both failed to give me values.
.eval() gave me this:
theano.gof.fg.MissingInputError: An input of the graph, used to compute Subtensor{::, :int64:}(<TensorType(float32, matrix)>, Constant{10}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.
and .get_value() gave me this:
AttributeError: 'TensorVariable' object has no attribute 'get_value'
So I backtracked those chains(which line calls which functions..) and tried to get values at every steps I found but in vain.
Feels like I'm in some basic pitfalls.

I use the solution described in the Keras FAQ:
http://keras.io/getting-started/faq/#how-can-i-visualize-the-output-of-an-intermediate-layer
In detail:
from keras import backend as K
intermediate_tensor_function = K.function([model.layers[0].input],[model.layers[layer_of_interest].output])
intermediate_tensor = intermediate_tensor_function([thisInput])[0]
yields:
array([[ 3., 17.]], dtype=float32)
However I'd like to use the functional API but I can't seem to get the actual tensor, only the symbolic representation. For example:
model.layers[1].output
yields:
<tf.Tensor 'add:0' shape=(?, 2) dtype=float32>
I'm missing something about the interaction of Keras and Tensorflow here but I'm not sure what. Any insight much appreciated.

One solution is to create a version of your network that is truncated at the LSTM layer of which you want to monitor the gate values, and then replace the original layer with a custom layer in which the stepfunction is modified to return not only the hidden layer values, but also the gate values.
For instance, say you want to access the access the gate values of a GRU. Create a custom layer GRU2 that inherits everything from the GRU class, but adapt the step function such that it returns a concatenation of the states you want to monitor, and then takes only the part containing the previous hidden layer activations when computing the next activations. I.e:
def step(self, x, states):
# get prev hidden layer from input that is concatenation of
# prev hidden layer + reset gate + update gate
x = x[:self.output_dim, :]
###############################################
# This is the original code from the GRU layer
#
h_tm1 = states[0] # previous memory
B_U = states[1] # dropout matrices for recurrent units
B_W = states[2]
if self.consume_less == 'gpu':
matrix_x = K.dot(x * B_W[0], self.W) + self.b
matrix_inner = K.dot(h_tm1 * B_U[0], self.U[:, :2 * self.output_dim])
x_z = matrix_x[:, :self.output_dim]
x_r = matrix_x[:, self.output_dim: 2 * self.output_dim]
inner_z = matrix_inner[:, :self.output_dim]
inner_r = matrix_inner[:, self.output_dim: 2 * self.output_dim]
z = self.inner_activation(x_z + inner_z)
r = self.inner_activation(x_r + inner_r)
x_h = matrix_x[:, 2 * self.output_dim:]
inner_h = K.dot(r * h_tm1 * B_U[0], self.U[:, 2 * self.output_dim:])
hh = self.activation(x_h + inner_h)
else:
if self.consume_less == 'cpu':
x_z = x[:, :self.output_dim]
x_r = x[:, self.output_dim: 2 * self.output_dim]
x_h = x[:, 2 * self.output_dim:]
elif self.consume_less == 'mem':
x_z = K.dot(x * B_W[0], self.W_z) + self.b_z
x_r = K.dot(x * B_W[1], self.W_r) + self.b_r
x_h = K.dot(x * B_W[2], self.W_h) + self.b_h
else:
raise Exception('Unknown `consume_less` mode.')
z = self.inner_activation(x_z + K.dot(h_tm1 * B_U[0], self.U_z))
r = self.inner_activation(x_r + K.dot(h_tm1 * B_U[1], self.U_r))
hh = self.activation(x_h + K.dot(r * h_tm1 * B_U[2], self.U_h))
h = z * h_tm1 + (1 - z) * hh
#
# End of original code
###########################################################
# concatenate states you want to monitor, in this case the
# hidden layer activations and gates z and r
all = K.concatenate([h, z, r])
# return everything
return all, [h]
(Note that the only lines I added are at the beginning and end of the function).
If you then run your network with GRU2 as last layer instead of GRU (with return_sequences = True for the GRU2 layer), you can just call predict on your network, this will give you all hidden layer and gate values.
The same thing should work for LSTM, although you might have to puzzle a bit to figure out how to store all the outputs you want in one vector and retrieve them again afterwards.
Hope that helps!

You can use theano's printing module for printing during execution (and not during definition, which is what you're doing and the reason why you're not getting values, but their abstract definition).
Print
Just use the Print function. Don't forget to use the output of Print to continue your graph, otherwise the output will be disconnected and Print will most likely be removed during optimisation. And you will see nothing.
from keras import backend as K
from theano.printing import Print
def someLossFunction(x, ref):
loss = K.square(x - ref)
loss = Print('Loss tensor (before sum)')(loss)
loss = K.sum(loss)
loss = Print('Loss scalar (after sum)')(loss)
return loss
Plot
A little bonus you might enjoy.
The Print class has a global_fn parameter, to override the default callback to print. You can provide your own function and directly access to the data, to build a plot for instance.
from keras import backend as K
from theano.printing import Print
import matplotlib.pyplot as plt
curve = []
# the callback function
def myPlottingFn(printObj, data):
global curve
# Store scalar data
curve.append(data)
# Plot it
fig, ax = plt.subplots()
ax.plot(curve, label=printObj.message)
ax.legend(loc='best')
plt.show()
def someLossFunction(x, ref):
loss = K.sum(K.square(x - ref))
# Callback is defined line below
loss = Print('Loss scalar (after sum)', global_fn=myplottingFn)(loss)
return loss
BTW the string you passed to Print('...') is stored in the print object under property name message (see function myPlottingFn). This is useful for building multi-curves plot automatically

Related

simple linear regression with Python with 10 lines of code

I am doing first steps in machine learning. Firstly, I try to create a simple algorithm, for example, linear regression of two variables. So, this manual (https://towardsdatascience.com/linear-regression-using-gradient-descent-in-10-lines-of-code-642f995339c0) is best example of coding this one. When I transfer this code, but it does not work. More correct, it prints unreal parameters of regression. Please, help me to overcome the problem. The script below.
x_1 = range(1,100)
y_1 = range(1,100)
N = float(len(y_1))
epochs=1000
m_current = b_current = 0
learning_rate=0.01
for i in range(epochs):
for X,y in zip(x_1, y_1):
y_current = (m_current * X) + b_current
cost = (y-y_current)/N
m_gradient = -(2/N)*(X * (y - y_current))
b_gradient = -(2/N)*(y - y_current)
m_current = m_current - (learning_rate * m_gradient)
b_current = b_current - (learning_rate * b_gradient)
print(m_current)
print(b_current)
print(cost)
/*print
1.9999 i excpect 0.9999999 or 1
9.2333 i excpect 0.00000001 or 0
101.11 i excpect 0.1
*/

How to convert deep learning gradient descent equation into python

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please
Please see the following link for the equations used
Click here to see the equations used for the calculations
Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations
import numpy as np
def propagate(w, b, X, Y):
"""
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size
(1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A = # compute activation
cost = # compute cost
### END CODE HERE ###
# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw =
db =
### END CODE HERE ###
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
Following are the data given to test the above function
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]),
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
Following is the expected output of the above
Expected Output:
dw [[ 0.99993216] [ 1.99980262]]
db 0.499935230625
cost 6.000064773192205
For the above propagate function I have used the below replacements, but the output is not what is expected. Please kindly help on how to get the expected output
A = sigmoid(X)
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
dw = (np.dot(X,((A-Y).T)))/m
db = np.sum((A-Y),axis=0)/m
Following is the sigmoid function used to calculate the Activation:
def sigmoid(z):
"""
Compute the sigmoid of z
Arguments:
z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
### START CODE HERE ### (≈ 1 line of code)
s = 1 / (1+np.exp(-z))
### END CODE HERE ###
return s
Hope someone could help me understand how to solve this as I couldn't continue with rest of the tutorials without understanding this. Many thanks
You can calculate A,cost,dw,db as the following:
A = sigmoid(np.dot(w.T,X) + b)
cost = -1 / m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
dw = 1/m * np.dot(X,(A-Y).T)
db = 1/m * np.sum(A-Y)
where sigmoid is :
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
After going through the code and notes a few times was finally able to figure out the error.
First it needs calculating Z and then pass it to the sigmoid function, instead of X
Formula for Z = w(T)X+b. So in python this is calculated as below
Z=np.dot(w.T,X)+b
Then calculate A by passing z to sigmoid function
A = sigmoid(Z)
Then dw can be calculated as below
dw=np.dot(X,(A-Y).T)/m
Calculation of the other variables; cost and derivative of b will be as follows
cost = -1*((np.sum((Y*np.log(A))+((1-Y)*(np.log(1-A))),axis=1))/m)
db = np.sum((A-Y),axis=1)/m
def sigmoid(x):
#You have it right
return 1/(1 + np.exp(-x))
def derivSigmoid(x):
return sigmoid(x) * (1 - sigmoid(x))
error = targetSample - output
#Make sure to keep the sigmoided value around. For instance, an output that has already been sigmoided can be used to get the sigmoid derivative faster (output = sigmoid(x)):
dOutput = output * (1 - output)
Looks like you're already working on the backprop. Just thought I'd help simplify some of the forward prop for you.

CNTK Neural network with not one-hot-vector output (multi-class classifier)

Thank you for the CNTK Tool, the examples are running pretty fast. Since some days, I try to set up a simple network, but I dont get it. I need a network with 2 input and 3 output, for example:
|features 0.3 0.5 |labels 0.2 0.7 0.9
The output is not a one-hot-vector, the network has to learn the label-values 0.2 0.7 0.9. But most examples have a one-hot-vector as output, so it is not clear to me how to solve this. I have tried to change the tutorial with 3 classification, but it does not work, the network does not learn the output correctly. The network I have tried is:
BrainScriptNetworkBuilder = {
SDim = 2 # feature dimension
H1Dim = 50 # hidden dimension
H2Dim = 50 # hidden dimension
LDim = 3 # number of classes (labels)
model (features) = {
W0 = ParameterTensor {(H1Dim:SDim)} ; b0 = ParameterTensor {H1Dim}
W1 = ParameterTensor {(H2Dim:H1Dim)} ; b1 = ParameterTensor {H2Dim}
W2 = ParameterTensor {(LDim:H2Dim)} ; b2 = ParameterTensor {LDim}
r1 = ReLU(W0 * features + b0) # hidden layer 1
r2 = ReLU(W1 * r1 + b1) # hidden layer 2
z = ReLU(W2 * r2 + b2)
}.z
# define inputs
features = Input {SDim, sparse = false}
labels = Input {LDim, sparse = false}
# apply model to features
z = model (features)
# define criteria and output(s)
ce = SquareError(labels, z) # criterion (loss)
err = SquareError(labels, z) # additional metric
# connect to the system. These five variables must be named exactly like this.
featureNodes = (features)
inputNodes = (labels)
criterionNodes = (ce)
evaluationNodes = (err)
outputNodes = (z)
}
So my question is: How to set up a network in CNTK, so that the output is not a one hot vector?
Thank you for help.
When your label is not a one-hot vector, squareError is a good loss function to minimize. If some examples have a one-hot label you can still user squareError. So I think you are doing everything right, you might have to just tune the learning rate to get it to work well.

How Can I change Theano gradients during backpropagation wrt the current output of the network?

I am trying to code up an example of the Inverting Gradient method from DEEP REINFORCEMENT LEARNING IN PARAMETERIZED ACTION SPACE (equation 11) in Lasagne/Theano. Basically what I am trying to do is ensure the output of the network is within some specified bounds, in this case [1,-1].
I have been looking at the example given here that inverts the gradient which has helped but at this point I am stuck. I think the best place to perform this operation is in the gradient computation method so I copied rmsprop and am trying to edit the gradients before the updates are applied.
This is what I have so far
def rmspropWithInvert(loss_or_grads, params, p, learning_rate=1.0, rho=0.9, epsilon=1e-6):
clip = 2.0
grads = lasagne.updates.get_or_compute_grads(loss_or_grads, params)
# grads = theano.gradient.grad_clip(grads, -clip, clip)
grads_ = []
for grad in grads:
grads_.append(theano.gradient.grad_clip(grad, -clip, clip) )
grads = grads_
a, p_ = T.scalars('a', 'p_')
z_lazy = ifelse(T.gt(a,0.0), (1.0-p_)/(2.0), (p_-(-1.0))/(2.0))
f_lazyifelse = theano.function([a,p_], z_lazy,
mode=theano.Mode(linker='vm'))
# compute the parameter vector to invert the gradients by
ps = theano.shared(
np.zeros((3, 1), dtype=theano.config.floatX),
broadcastable=(False, True))
for i in range(3):
ps[i] = f_lazyifelse(grads[-1][i], p[i])
# Apply vector through computed gradients
grads2=[]
for grad in grads.reverse():
grads2.append(theano.mul(ps, grad))
ps = grad
grads = grads2.reverse()
print "Grad Update: " + str(grads[0])
updates = OrderedDict()
# Using theano constant to prevent upcasting of float32
one = T.constant(1)
for param, grad in zip(params, grads):
value = param.get_value(borrow=True)
accu = theano.shared(np.zeros(value.shape, dtype=value.dtype),
broadcastable=param.broadcastable)
accu_new = rho * accu + (one - rho) * grad ** 2
updates[accu] = accu_new
updates[param] = param - (learning_rate * grad /
T.sqrt(accu_new + epsilon))
return updates
Maybe someone more skilled with Theano/Lasagne will see a solution? Conceptually I think the computation is easy but coding everything in the update step symbolically has proven challenging for me. I am still getting used to Theano.

Back Propagation Neural Network Hidden Layer all output is 1

everyone I have created a neural network with 1600 input, one hidden layer with different number of neurons nodes and 24 output neurons.
My code shown that I can decrease the error each epoch, but the output of hidden layer always is 1. Due to this reason, the weight adjusted always produce same result for my testing data.
I try different number of neuron nodes and learning rate in the ANN and also randomly initialize my initial weight. I use sigmoid function as my activate function since my output is either 1 or 0 in different output.
May I know that what is the main reason that causes the output of hidden layer always is 1 and how should i solve it?
My purpose for this neural network is to recognize 24 hand shape for alphabet, I try intensities data in my first phase of project.
I have try 30 hidden neural nodes also 100 neural nodes even 1000 neural nodes but the output of hidden layer still is 1. Due to this reason, all of the outcome in testing data is always similar.
I added the code for my network
Thanks
g = inline('logsig(x)');
[row, col] = size(input);
numofInputNeurons = col;
weight_input_hidden = rand(numofInputNeurons, numofFirstHiddenNeurons);
weight_hidden_output = rand(numofFirstHiddenNeurons, numofOutputNeurons);
epochs = 0;
errorMatrix = [];
while(true)
if(totalEpochs > 0 && epochs >= totalEpochs)
break;
end
totalError = 0;
epochs = epochs + 1;
for i = 1:row
targetRow = zeros(1, numofOutputNeurons);
targetRow(1, target(i)) = 1;
hidden_output = g(input(1, 1:end)*weight_input_hidden);
final_output = g(hidden_output*weight_hidden_output);
error = abs(targetRow - final_output);
error = sum(error);
totalError = totalError + error;
if(error ~= 0)
delta_final_output = learningRate * (targetRow - final_output) .* final_output .* (1 - final_output);
delta_hidden_output = learningRate * (hidden_output) .* (1-hidden_output) .* (delta_final_output * weight_hidden_output');
for m = 1:numofFirstHiddenNeurons
for n = 1:numofOutputNeurons
current_changes = delta_final_output(1, n) * hidden_output(1, m);
weight_hidden_output(m, n) = weight_hidden_output(m, n) + current_changes;
end
end
for m = 1:numofInputNeurons
for n = 1:numofFirstHiddenNeurons
current_changes = delta_hidden_output(1, n) * input(1, m);
weight_input_hidden(m, n) = weight_input_hidden(m, n) + current_changes;
end
end
end
end
totalError = totalError / (row);
errorMatrix(end + 1) = totalError;
if(errorThreshold > 0 && totalEpochs == 0 && totalError < errorThreshold)
break;
end
end
I see a few obvious errors that need fixing in your code:
1) You have no negative weights when initialising. This is likely to get the network stuck. The weight initialisation should be something like:
weight_input_hidden = 0.2 * rand(numofInputNeurons, numofFirstHiddenNeurons) - 0.1;
2) You have not implemented bias. That will severely limit the ability of the network to learn. You should go back to your notes and figure that out, it is usually implemented as an extra column of 1's inserted into input and activation vectors/matrix before determining the activations of each layer, and there should be a matching additional column of weights.
3) Your delta for output layer is wrong. This line
delta_final_output = learningRate * (targetRow - final_output) .* final_output .* (1 - final_output);
. . . is not the delta for the output layer activations. It has some extra unwanted factors.
The correct delta for logloss objective function and sigmoid activation in output layer would be:
delta_final_output = (final_output - targetRow);
There are other possibilities, depending on your objective function, which is not shown. You original code is close to correct for mean squared error, which would probably still work if you changed the sign and removed the factor of learningRate
4) Your delta for hidden layer is wrong. This line:
delta_hidden_output = learningRate * (hidden_output) .* (1-hidden_output) .* (delta_final_output * weight_hidden_output');
. . . is not the delta for the hidden layer activations. You have multiplied by the learningRate for some reason (combined with the other delta that means you have a factor of learningRate squared).
The correct delta would be:
delta_hidden_output = (hidden_output) .* (1-hidden_output) .* (delta_final_output * weight_hidden_output');
5) Your weight update step needs adjusting to match fixes to (3) and (4). These lines:
current_changes = delta_final_output(1, n) * hidden_output(1, m);
would need to be adjusted to get correct sign and learning rate multiplier
current_changes = -learningRate * delta_final_output(1, n) * hidden_output(1, m);
That's 5 bugs from looking through the code, I may have missed some. But I think that's more than enough for now.