How to add two output classification layers in keras? - neural-network

I have a neural network whose job is to classify 10 classes. Further, I want these 10 classes to be classified into 2 classes (positive -> 3 , negative -> 7). How can I achieve this in keras?

It sounds like you are trying to solve two different, but closely related problems. I recommend that you train your first model to predict 10 classes, and then you create a copy of the first model (including weights) except with a different output layer to support binary classification. At this point you can either:
Train only your final dense layer and new output layer, or
Train the entire model with a low learning rate
For more information you can read about Transfer Learning.
Example code:
model.save('model_1') # load this to retrieve your original model
model.pop() # pop output activation layer and associated params
model.pop() # pop final dense layer
model.add(Dense(1), kernel_initializer='normal', activation='sigmoid')
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, batch_size=32)
If you want to retrain the whole model then you can omit the loop setting all but the last two layers to untrainable, and choose an optimizer such as SGD with a low learning rate.

Related

Keras explanation: number of nodes in input layer

I'm trying to understand the relationship between a simple Perceptron and a neural network one gets when using the keras Sequence class.
I learned that the neural network perceptron looks as such:
Each "node" in the first layer is one of the features of a sample x_1, x_2,...,x_n
Could somebody explain the jump to the neural network I find in the Keras package below?
Since the input layer has four nodes, does that mean that network consists of four of the perceptron networks?
There is seem to be misunderstanding on what a perceptron is. A perceptron is a single unit that multiplies the inputs with weights, sums them up and applies an activation function:
Now the diagrams you have are called multi-layer perceptrons (MLP) and consist of a stack of perceptrons organised in layers, wiki. In Keras, there is no explicit notion of a perceptron but of a layer of perceptrons implemented as a Dense layer because the layers are densely connected, ie every output is connected to every input between layers. The second diagram would correspond to:
model = Sequential()
model.add(Dense(4, activation='sigmoid', input_dim=3))
model.add(Dense(4, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
assuming you have sigmoid activation. In this case, the input layer is implicit by specifying the input_dim=3 and the final layer would be the output layer.

Neural network categorization: Do they always have to have one label per training data

In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.
Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras

Can a convolutional neural network be built with perceptrons?

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.
We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.
So max-pooling is applied.
With multiple convolutional layers, it looks something like this:
But my question is, this whole architecture could be build with perceptrons, right?
For every convolutional layer, one perceptron is needed, with layers:
input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;
Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field:
So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:
is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.
And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.
and then finally:
The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons.
which is a perceptron again.
So my final architecture looks like this:
-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron
Or am I overseeing something? Or is this already how they are programmed?
The answer very much depends on what exactly you call a Perceptron. Common options are:
Complete architecture. Then no, simply because it's by definition a different NN.
A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.
A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.
Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)
A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.

How to enforce feature vector representing label probability with Caffe siamese CNN?

related to How to Create CaffeDB training data for siamese networks out of image directory
If I have N labels. How can I enforce, that the feature vector of size N right before the contrastive loss layer represents some kind of probability for each class? Or comes that automatically with the siamese net design?
If you only use contrastive loss in a Siamese network, there is no way of forcing the net to classify into the correct label - because the net is only trained using "same/not same" information and does not know the semantics of the different classes.
What you can do is train with multiple loss layers.
You should aim at training a feature representation that is reach enough for your domain, so that looking at the trained feature vector of some input (in some high dimension) you should be able to easily classify that input to the correct class. Moreover, given that feature representation of two inputs one should be able to easily say if they are "same" or "not same".
Therefore, I recommend that you train your deep network with two loss layer with "bottom" as the output of one of the "InnerProduct" layers. One loss is the contrastive loss. The other loss should have another "InnerProduct" layer with num_output: N and a "SoftmaxWithLoss" layer.
A similar concept was used in this work:
Sun, Chen, Wang and Tang Deep Learning Face Representation by Joint Identification-Verification NIPS 2014.

Matlab neural network simulate up to hidden layer

I have trained a 3-layer (input, hidden and output) feedforward neural network in Matlab. After training, I would like to simulate the trained network with an input test vector and obtain the response of the neurons of the hidden layer (not the final output layer). How can I go about doing this?
Additionally, after training a neural network, is it possible to "cut away" the final output layer and make the current hidden layer as the new output layer (for any future use)?
Extra-info: I'm building an autoencoder network.
The trained weights for a trained network are available in the net.LW property. You can use these weights to get the hidden layer outputs
From Matlab Documentation
nnproperty.net_LW
Neural network LW property.
NET.LW
This property defines the weight matrices of weights going to layers
from other layers. It is always an Nl x Nl cell array, where Nl is the
number of network layers (net.numLayers).
The weight matrix for the weight going to the ith layer from the jth
layer (or a null matrix []) is located at net.LW{i,j} if
net.layerConnect(i,j) is 1 (or 0).
The weight matrix has as many rows as the size of the layer it goes to
(net.layers{i}.size). It has as many columns as the product of the size
of the layer it comes from with the number of delays associated with the
weight:
net.layers{j}.size * length(net.layerWeights{i,j}.delays)
Addition to using input and layer weights and biases, you may add a output connect from desired layer (after training the network). I found it possible and easy but I didn't exam the correctness of it.