Extraction of neural network activation values from Python Theano+Keras - neural-network

Say I have simple MLP network for 2 class problem:
model = Sequential()
model.add(Dense(2, 10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(10, 10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(10, 2, init='uniform'))
model.add(Activation('softmax'))
after training this network I was unable to see any values in W object when observed it in debug mode.
Are they stored somewhere in Theano's computational graph, if so is it possible to get them? If not why all values in activation layer of model are None?
UPDATE:
Sorry for being too quick. Tensor object holding weights from Dense layer can be perfectly found. But invoking:
model.layers[1]
gives me Activation layer. Where I would like to see activation levels. Instead I see only:
beta = 0.1
nb_input = 1
nb_output = 1
params = []
targets = 0
updates = []
I assume keras just clears all these values after model evaluation - is it true?
If so the only way to record activations from neurons is to create custom Activation layer which would record needed info - right?

I am not familiar with Keras but if it is building a conventional Theano neural network computation graph then it is not possible to view the activation values in the way you propose.
Conventionally, only weights are stored persistently, as shared variables. Values computed in intermediate stages of a Theano computation are transient and can only be viewed via debugging the execution of the compiled Theano function (note that this is not easy to do -- debugging the host Python application is not sufficient).
If you were building the computation directly instead of using Keras I would advise including the intermediate activation values that you are interested in in the output values list of the Theano function. I cannot comment on how this can be achieved via Keras.

Related

Custom loss function in NN

I am trying to build a simpleRNN network with custom loss function. I am predicting bmi based on 25 different features. My dataset is unbalanced and have outliers and want to predict better on the outliers. Rather it is more important to predict better on outliers.
For my custom loss function I have added condition that if the loss is greater than 2 units then I want to penalize those observations more.
import keras.backend as K
def custom_loss(y_true, y_pred):
loss = K.abs(y_pred - y_true)
wt = loss * 5
loss_mae = K.switch((loss > 2),wt,loss)
return loss_mae
model = Sequential()
model.add(SimpleRNN(units=64, input_shape=(25, 1), activation="relu"))
model.add(Dense(32, activation="linear"))
model.add(Dropout(0.2))
model.add(Dense(1, activation="linear"))
model.compile(loss=custom_loss, optimizer='adam')
model.add(Dropout(0.1))
model.summary()
model.fit(train_x, train_y)
sample predictions after running this code
preds=[[16.015867], [16.022823], [15.986835], [16.69895 ], [17.537468]]
actual=[[18.68], [24.35], [18.07], [15.2 ], [13.78]]
As you can see, the prediction for 2nd and 5th obs, is still way off. Am I doing anything wrong in the code?
One thing that is very wrong is that you should never have a dropout on the output neuron. Apart from this:
activation function of hidden layers should not be linear (model.add(Dense(32, activation="linear")) should be model.add(Dense(32, activation="relu")))
A neural network should always be able to overfit to your training data, and this should be your debugging state before worrying about generalisation, consequently:
Do not use dropout (this only makes fitting harder, you can experiment with it once you are concerned about generalisation)
Your network is somewhat tiny,try making it much wider and see if your predictions improve
overall MAE is much worse behaved learning signal than MSE, which also automatically penalises outliers a lot, why not use it?
Consider normalising your data, neural networks work well with their default initisalisations if both inputs and targets are in somewhat bounded space, preferably in [-1, 1] or [0, 1] scale.

How is Cross Entropy Loss Converted to a Scalar During Optimization?

I have a basic beginner question about how neural networks are defined, and I am learning in the context of the Keras library. Following the MNIST hello world program, I have defined this network:
model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(RESHAPED,), activation='softmax'))
My understanding is that that this creates a neural network with two layers, in this case RESHAPED is 784, and NB_CLASSES is 10, so the network will have 1 input layer with 785 neurons and one output layer with 10 neurons.
Then I added this:
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
I understand have read up on the formula for categorical cross entropy, but it appears to be calculated per output node. My question is, during training, how would the values of the cross entropy be combined to create a scalar valued objective function? Is it just an average?
Keras computes the mean of the per-instance loss values, possibly weighted (see sample_weight_mode argument if you're interested).
Here's the reference to the source code: training.py. As you can see, the result value goes through K.mean(...), which ensures the result is a scalar.
In general, however, it is possible to reduce the losses differently, e.g., just a sum, but it usually performs worse, so the mean is more preferable (see this question).

How to add two output classification layers in keras?

I have a neural network whose job is to classify 10 classes. Further, I want these 10 classes to be classified into 2 classes (positive -> 3 , negative -> 7). How can I achieve this in keras?
It sounds like you are trying to solve two different, but closely related problems. I recommend that you train your first model to predict 10 classes, and then you create a copy of the first model (including weights) except with a different output layer to support binary classification. At this point you can either:
Train only your final dense layer and new output layer, or
Train the entire model with a low learning rate
For more information you can read about Transfer Learning.
Example code:
model.save('model_1') # load this to retrieve your original model
model.pop() # pop output activation layer and associated params
model.pop() # pop final dense layer
model.add(Dense(1), kernel_initializer='normal', activation='sigmoid')
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, batch_size=32)
If you want to retrain the whole model then you can omit the loop setting all but the last two layers to untrainable, and choose an optimizer such as SGD with a low learning rate.

Neural network categorization: Do they always have to have one label per training data

In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.
Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras

fine-tuning with VGG on caffe

I'm replicating the steps in
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
I want to change the network to VGG model which is obtained at
http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
does it suffice to simply substitute the model parameter as following?
./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights VGG_ISLVRC_16_layers.caffemodel -gpu 0
Or do I need to adjust learning rates, iterations, i.e. does it come with separate prototxt files?
There needs to be a 1-1 correspondence between the weights of the network you want to train and the weights you use for initializing/fine-tuning. The architecture of the old and new model have to match.
VGG-16 has a different architecture than the model described by models/finetune_flickr_style/train_val.prototxt (FlickrStyleCaffeNet). This is the network that the solver will try to optimize. Even if it doesn't crash, the weights you've loaded don't have any meaning in the new network.
The VGG-16 network is described in the deploy.prototxt file on this page in Caffe's Model Zoo.