Neural network categorization: Do they always have to have one label per training data - neural-network

In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.

Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.

Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras

Related

How is Cross Entropy Loss Converted to a Scalar During Optimization?

I have a basic beginner question about how neural networks are defined, and I am learning in the context of the Keras library. Following the MNIST hello world program, I have defined this network:
model = Sequential()
model.add(Dense(NB_CLASSES, input_shape=(RESHAPED,), activation='softmax'))
My understanding is that that this creates a neural network with two layers, in this case RESHAPED is 784, and NB_CLASSES is 10, so the network will have 1 input layer with 785 neurons and one output layer with 10 neurons.
Then I added this:
model.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=['accuracy'])
I understand have read up on the formula for categorical cross entropy, but it appears to be calculated per output node. My question is, during training, how would the values of the cross entropy be combined to create a scalar valued objective function? Is it just an average?
Keras computes the mean of the per-instance loss values, possibly weighted (see sample_weight_mode argument if you're interested).
Here's the reference to the source code: training.py. As you can see, the result value goes through K.mean(...), which ensures the result is a scalar.
In general, however, it is possible to reduce the losses differently, e.g., just a sum, but it usually performs worse, so the mean is more preferable (see this question).

How to add two output classification layers in keras?

I have a neural network whose job is to classify 10 classes. Further, I want these 10 classes to be classified into 2 classes (positive -> 3 , negative -> 7). How can I achieve this in keras?
It sounds like you are trying to solve two different, but closely related problems. I recommend that you train your first model to predict 10 classes, and then you create a copy of the first model (including weights) except with a different output layer to support binary classification. At this point you can either:
Train only your final dense layer and new output layer, or
Train the entire model with a low learning rate
For more information you can read about Transfer Learning.
Example code:
model.save('model_1') # load this to retrieve your original model
model.pop() # pop output activation layer and associated params
model.pop() # pop final dense layer
model.add(Dense(1), kernel_initializer='normal', activation='sigmoid')
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, batch_size=32)
If you want to retrain the whole model then you can omit the loop setting all but the last two layers to untrainable, and choose an optimizer such as SGD with a low learning rate.

How to map softMax output to labels in MXNet

In Deep learning the predictions are often encoded using one hot vector. I am using MXNet for creating a simple Neural Network which classifies images of animals as cats,dogs,horses etc. When I call the Predict method of MXNet it returns me a softmax output. Now, how do I determine that the index of the entry in the softmax output corresponding to maximum probability is Cats or Dogs or Horses. The softmax output only gives an array without any mapping of the results with the corresponding label.
This might help answering your question. http://mxnet.io/tutorials/python/predict_imagenet.html
https://github.com/dmlc/mxnet-notebooks/blob/master/python/how_to/predict.ipynb
This example uses pretrained model to predict images and synset dataset.

Recurrent neural layers in Keras

I'm learning neural networks through Keras and would like to explore my sequential dataset on a recurrent neural network.
I was reading the docs and trying to make sense of the LSTM example.
My questions are:
What are the timesteps that are required for both layers?
How do I prepare a sequential dataset that works with Dense as an input for those recurrent layers?
What does the Embedding layer do?
Timesteps are a pretty bothering thing about Keras. Due to the fact that data you provide as an input to your LSTM must be a numpy array it is needed (at least for Keras version <= 0.3.3) to have a specified shape of data - even with a "time" dimension. You can only put a sequences which have a specified length as an input - and in case your inputs vary in a length - you should use either an artificial data to "fill" your sequences or use a "stateful" mode (please read carefully Keras documentation to understand what this approach means). Both solutions might be unpleasent - but it's a cost you pay that Keras is so simple :) I hope that in version 1.0.0 they will do something with that.
There are two ways to apply norecurrent layers after LSTM ones:
you could set an argument return_sequences to False - then only the last activations from every sequence will be passed to a "static" layer.
you could use one of "time distributed" layers - to get more flexibility with what you want to do with your data.
https://stats.stackexchange.com/questions/182775/what-is-an-embedding-layer-in-a-neural-network :)

How to use created "net" neural network object for prediction?

I used ntstool to create NAR (nonlinear Autoregressive) net object, by training on a 1x1247 input vector. (daily stock price for 6 years)
I have finished all the steps and saved the resulting net object to workspace.
Now I am clueless on how to use this object to predict the y(t) for example t = 2000, (I trained the model for t = 1:1247)
In some other threads, people recommended to use sim(net, t) function - however this will give me the same result for any value of t. (same with net(t) function)
I am not familiar with the specific neural net commands, but I think you are approaching this problem in the wrong way. Typically you want to model the evolution in time. You do this by specifying a certain window, say 3 months.
What you are training now is a single input vector, which has no information about evolution in time. The reason you always get the same prediction is because you only used a single point for training (even though it is 1247 dimensional, it is still 1 point).
You probably want to make input vectors of this nature (for simplicity, assume you are working with months):
[month1 month2; month2 month 3; month3 month4]
This example contains 2 training points with the evolution of 3 months. Note that they overlap.
Use the Network
After the network is trained and validated, the network object can be used to calculate the network response to any input. For example, if you want to find the network response to the fifth input vector in the building data set, you can use the following
a = net(houseInputs(:,5))
a =
34.3922
If you try this command, your output might be different, depending on the state of your random number generator when the network was initialized. Below, the network object is called to calculate the outputs for a concurrent set of all the input vectors in the housing data set. This is the batch mode form of simulation, in which all the input vectors are placed in one matrix. This is much more efficient than presenting the vectors one at a time.
a = net(houseInputs);
Each time a neural network is trained, can result in a different solution due to different initial weight and bias values and different divisions of data into training, validation, and test sets. As a result, different neural networks trained on the same problem can give different outputs for the same input. To ensure that a neural network of good accuracy has been found, retrain several times.
There are several other techniques for improving upon initial solutions if higher accuracy is desired. For more information, see Improve Neural Network Generalization and Avoid Overfitting.
strong text