I would like to know if there is a way to get a net with a mix output of neurons?
I'm building a RL agent and would like to get the action in the form of: ([0,1,2],(0,1))
the action tell the gent to perform a task [0,1,2] and the amount it should take (0,1)
So I have a 4 output neurons net, but with diferents activation functions.
softmax for the discrit [0,1,2] to take the best action
linear for the (0,1) to take any number between 0-1
Is there a way to ceate one network that combine these two?
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(64, input_dim=self.state_size,
activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(3, activation='softmax'))
model.compile(loss='mse',
optimizer=tf.keras.optimizers.Adam(learning_rate=self.learning_rate))
return model
Now I need to add:
model.add(tf.keras.layers.Dense(1, activation='linear'))
Related
I currently understand and made a simple neural network which solves the XOR problem. I want to make a neural network for digit recognition. I know using MNIST data I would need 784 input neurons, 15 hidden neurons and 10 output neurons (0-9).
However, I don’t understand how the network would be trained and how feed forward would work with multiple output neurons.
For example, if the input was the pixels for the digit 3, how would the network determine which output neuron is picked and when training, how would the network know which neuron should be associated with the target value.
Any help would be appreciated.
So you have a classification problem with multiple outputs. I'm supposing that you are using a softmax activation function for the output layer.
How the network determines which output neuron is picked: simple, the output neuron with the greatest probability of being the target class.
The network would be trained with standard backpropagation, same algorithm that you would have with only one output.
There is only one difference: the activation function.
For binary classification you need only one output (for example with digits 0 and 1, if probability < 0.5 then class is 0, else 1).
For multi-class classification you need an output node for each class; then the network will pick the node with the greatest probability of being the target class.
I'm trying to understand the relationship between a simple Perceptron and a neural network one gets when using the keras Sequence class.
I learned that the neural network perceptron looks as such:
Each "node" in the first layer is one of the features of a sample x_1, x_2,...,x_n
Could somebody explain the jump to the neural network I find in the Keras package below?
Since the input layer has four nodes, does that mean that network consists of four of the perceptron networks?
There is seem to be misunderstanding on what a perceptron is. A perceptron is a single unit that multiplies the inputs with weights, sums them up and applies an activation function:
Now the diagrams you have are called multi-layer perceptrons (MLP) and consist of a stack of perceptrons organised in layers, wiki. In Keras, there is no explicit notion of a perceptron but of a layer of perceptrons implemented as a Dense layer because the layers are densely connected, ie every output is connected to every input between layers. The second diagram would correspond to:
model = Sequential()
model.add(Dense(4, activation='sigmoid', input_dim=3))
model.add(Dense(4, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
assuming you have sigmoid activation. In this case, the input layer is implicit by specifying the input_dim=3 and the final layer would be the output layer.
I have a neural network whose job is to classify 10 classes. Further, I want these 10 classes to be classified into 2 classes (positive -> 3 , negative -> 7). How can I achieve this in keras?
It sounds like you are trying to solve two different, but closely related problems. I recommend that you train your first model to predict 10 classes, and then you create a copy of the first model (including weights) except with a different output layer to support binary classification. At this point you can either:
Train only your final dense layer and new output layer, or
Train the entire model with a low learning rate
For more information you can read about Transfer Learning.
Example code:
model.save('model_1') # load this to retrieve your original model
model.pop() # pop output activation layer and associated params
model.pop() # pop final dense layer
model.add(Dense(1), kernel_initializer='normal', activation='sigmoid')
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, batch_size=32)
If you want to retrain the whole model then you can omit the loop setting all but the last two layers to untrainable, and choose an optimizer such as SGD with a low learning rate.
In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.
Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras
Say I have simple MLP network for 2 class problem:
model = Sequential()
model.add(Dense(2, 10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(10, 10, init='uniform'))
model.add(Activation('tanh'))
model.add(Dense(10, 2, init='uniform'))
model.add(Activation('softmax'))
after training this network I was unable to see any values in W object when observed it in debug mode.
Are they stored somewhere in Theano's computational graph, if so is it possible to get them? If not why all values in activation layer of model are None?
UPDATE:
Sorry for being too quick. Tensor object holding weights from Dense layer can be perfectly found. But invoking:
model.layers[1]
gives me Activation layer. Where I would like to see activation levels. Instead I see only:
beta = 0.1
nb_input = 1
nb_output = 1
params = []
targets = 0
updates = []
I assume keras just clears all these values after model evaluation - is it true?
If so the only way to record activations from neurons is to create custom Activation layer which would record needed info - right?
I am not familiar with Keras but if it is building a conventional Theano neural network computation graph then it is not possible to view the activation values in the way you propose.
Conventionally, only weights are stored persistently, as shared variables. Values computed in intermediate stages of a Theano computation are transient and can only be viewed via debugging the execution of the compiled Theano function (note that this is not easy to do -- debugging the host Python application is not sufficient).
If you were building the computation directly instead of using Keras I would advise including the intermediate activation values that you are interested in in the output values list of the Theano function. I cannot comment on how this can be achieved via Keras.