When setting up a Neural Network using Keras you can use either the Sequential model, or the Functional API. My understanding is the the former is easy to set up and manage, and operates as a linear stack of layers, and that the functional approach is useful for more complex architectures, particularly those which involve sharing the output of an internal layer. I personally like using the functional API for versatility, however, am having difficulties with advanced activation layers such as LeakyReLU. When using standard activations, in the sequential model one can write:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Similarly in the functional API one can write the above as:
inpt = Input(shape = (100,))
dense_1 = Dense(32, activation ='relu')(inpt)
out = Dense(10, activation ='softmax')(dense_2)
model = Model(inpt,out)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
However, when using advanced activations like LeakyReLU and PReLU, in that sequential model we write them as separate layers. For example:
model = Sequential()
model.add(Dense(32, input_dim=100))
model.add(LeakyReLU(alpha=0.1))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
Now, I'm assuming one does the equivalent in the functional API approach:
inpt = Input(shape = (100,))
dense_1 = Dense(32)(inpt)
LR = LeakyReLU(alpha=0.1)(dense_1)
out = Dense(10, activation ='softmax')(LR)
model = Model(inpt,out)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
My questions are:
Is this correct syntax in the functional approach?
Why does Keras require a new layer for these advanced activation functions rather than allowing us to just replace 'relu'?
Is there something fundamentally different about creating a new layer for the activation function, rather than assigning it to an existing layer definition (as in the first examples where we wrote 'relu'), as I realise you could always write your activation functions, including standard ones, as new layers, although have read that that should be avoided?
No, you forgot to connect the LeakyReLU to the dense layer:
LR = LeakyReLU(alpha=0.1)(dense_1)
Usually the advanced activations have tunable or learnable parameters, and these have to stored somewhere, it makes more sense for them to be layers as you can then access and save these parameters.
Do it only if there is an advantage, such as tunable parameters.
Related
I have two datasets and I want to train a SVM classification model (fitcsvm) by one of them and then predict labels for the other one. I use 10-fold cross-validation (crossval) to train my model so I have 10 different models. My question is which one of these models are the best for prediction and how can I find that?
here is my code:
Mdl = fitcsvm(trainingData,labels);
CVMdl = crossval(Mdl);
You may have mixed up something here. The function fitcsvm trains a single model and the function crossval validates this single model. It will then return an evaluation value.
In general, you cannot train a model by cross-validation (as it says, it is a validation technique). However, you can use cross-validation to train good models.
What you are looking for is a sort of hyperparameter optimization. Those are methods that automatically train multiple models on a given data set to find the best tuning values for the SVM. Have a look at the docs here
You can turn it on like this
Mdl = fitcsvm(trainingData,labels,'OptimizeHyperparameters','auto')
You may want to use cross-validation to train multiple models with the same tuning parameters but I guess, you'll have to write this yourself. Perhaps this already helps you.
I have a neural network whose job is to classify 10 classes. Further, I want these 10 classes to be classified into 2 classes (positive -> 3 , negative -> 7). How can I achieve this in keras?
It sounds like you are trying to solve two different, but closely related problems. I recommend that you train your first model to predict 10 classes, and then you create a copy of the first model (including weights) except with a different output layer to support binary classification. At this point you can either:
Train only your final dense layer and new output layer, or
Train the entire model with a low learning rate
For more information you can read about Transfer Learning.
Example code:
model.save('model_1') # load this to retrieve your original model
model.pop() # pop output activation layer and associated params
model.pop() # pop final dense layer
model.add(Dense(1), kernel_initializer='normal', activation='sigmoid')
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, batch_size=32)
If you want to retrain the whole model then you can omit the loop setting all but the last two layers to untrainable, and choose an optimizer such as SGD with a low learning rate.
In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.
Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras
I need simple example about how to use keras model. It is not clear for me what difference between model.evaluate and model.predict.
I want to create model for binary classification. Lets say I have images of cats and dogs, train model and can use it to predict which animal on given photo. Maybe there is some good into or tutorials. I read anything in first five pages in google, but found only complex level tutorials and discussions.
To make things short:
model.evaluate evaluates a pair (X,Y) and returns the loss (and all other metrics configured for the model). This is for testing your model on a vaildation or test set.
model.predict predicts the outcome given an input X. This if for predicting the class from an input image for example.
This, among other things, is also clearly documented in the linked documentation.
You can find a lot of example models for Keras in the git repository (keras/examples) or on the Keras website (here and here).
For binary classification you could use this model for example:
model = Sequential()
model.add(Dense(300, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.02))
model.fit(X, Y)
Say I have simple MLP network for 2 class problem:
model = Sequential()
model.add(Dense(2, 10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(10, 10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(10, 2, init='uniform'))
model.add(Activation('softmax'))
after training this network I was unable to see any values in W object when observed it in debug mode.
Are they stored somewhere in Theano's computational graph, if so is it possible to get them? If not why all values in activation layer of model are None?
UPDATE:
Sorry for being too quick. Tensor object holding weights from Dense layer can be perfectly found. But invoking:
model.layers[1]
gives me Activation layer. Where I would like to see activation levels. Instead I see only:
beta = 0.1
nb_input = 1
nb_output = 1
params = []
targets = 0
updates = []
I assume keras just clears all these values after model evaluation - is it true?
If so the only way to record activations from neurons is to create custom Activation layer which would record needed info - right?
I am not familiar with Keras but if it is building a conventional Theano neural network computation graph then it is not possible to view the activation values in the way you propose.
Conventionally, only weights are stored persistently, as shared variables. Values computed in intermediate stages of a Theano computation are transient and can only be viewed via debugging the execution of the compiled Theano function (note that this is not easy to do -- debugging the host Python application is not sufficient).
If you were building the computation directly instead of using Keras I would advise including the intermediate activation values that you are interested in in the output values list of the Theano function. I cannot comment on how this can be achieved via Keras.