How to use keras for binary classification? - neural-network

I need simple example about how to use keras model. It is not clear for me what difference between model.evaluate and model.predict.
I want to create model for binary classification. Lets say I have images of cats and dogs, train model and can use it to predict which animal on given photo. Maybe there is some good into or tutorials. I read anything in first five pages in google, but found only complex level tutorials and discussions.

To make things short:
model.evaluate evaluates a pair (X,Y) and returns the loss (and all other metrics configured for the model). This is for testing your model on a vaildation or test set.
model.predict predicts the outcome given an input X. This if for predicting the class from an input image for example.
This, among other things, is also clearly documented in the linked documentation.
You can find a lot of example models for Keras in the git repository (keras/examples) or on the Keras website (here and here).
For binary classification you could use this model for example:
model = Sequential()
model.add(Dense(300, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.02))
model.fit(X, Y)

Related

Predict a number with a given image (0 to 1)

I am a total beginner to ML and Neural networks. I am currently working on a project where I have a lot of pictures stored in a MongoDB database. Each one of those pictures has a number from 0 to 1. For example "picture 1" 0.71.
I want to train my model given the database. The main goal for the project is that after the model is finished and trained, given an image the model will be able to return(predict) a number from 0 to 1. After doing some research and asking a few people I figured out some libraries that would be useful for the project are: Tenserflow and Keras. Some people told me that it is impossible, but I'm not sure therefore I came to ask here.
So my questions are: Is it possible? If so, how can I implement it? Are there any specific tools you recommend? If you specify a way that I should use for my project do I need to export my MongoDB database in a certain form? Since I am a beginner maybe there are some tutorials that you think that can help?
I'm sorry if this question is a bit too general, if there are any misunderstandings please comment and I will try to answer.
Thanks in advance!
What you want to do is totally feasible, this kind of project is called regression, since you are using images data the best type of models are called convolutional neural network (CNN), you'll need some understanding if you want to build your own model. I've done a project where I had to predict a number of bacterial colonies using an image, much like your problem except that I had no boundaries on the predicted values.
What is a CNN ? Here is a link
Basically a CNN will understand the features in the images and will use those features to predict a value.
You won't need to create your own model, most people just use well-designed one in the scientific litterature.
Go for keras, it's the easiest framework out there and work like a charm. Here is how to implement VGG16 (an architecture that is probably the best for your problem) : link
You should follow this tutorial to get going on developing with keras.
Last hint: don't use the same last layer as the one on the VGG16 implementation, use a Dense Layer with one neuron and with a sigmoid/linear/leaky relu activation.
ie:
#model.add(Dense(1000, activation='softmax'))
model.add(Dense(1, activation='sigmoid'))
This means : predict 1 number (sigmoid will bound it between 0 and 1, but maybe lrelu or linear is better)
Also, I guess you could use MongoDB to read the images as arrays, but I would just put the images on a folder.
Edit : When compiling the model, use a mean squared error as in
adam = keras.optimizers.Adam(lr=1e-4)
model.compile(optimizer=adam, loss='mse')
Here you have the "hello world program" in terms of neural networks and digits classification. You can start studying it because I think you will end up with a similar architecture for your NN. What you should focus on is the output of your model, because in this example they are performing classification on 10 classes (digits from 0 to 9) but you are trying to read a real number. You could try to use a single neurone with sigmoid or linear activation at the end of your model.

Neural network categorization: Do they always have to have one label per training data

In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.
Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras

Self organizing Maps and Linear vector quantization

Self organizing maps are more suited for clustering(dimension reduction) rather than classification. But SOM's are used in Linear vector quantization for fine tuning. But LVQ is a supervised leaning method. So to use SOM's in LVQ, LVQ should be provided with a labelled training data set. But since SOM's only do clustering and not classification and thus cannot have labelled data how can SOM be used as an input for LVQ?
Does LVQ fine tune the clusters in SOM?
Before using in LVQ should SOM be put through another classification algorithm so that it can classify the inputs so that these labelled inputs maybe used in LVQ?
It must be clear that supervised differs from unsupervised because in the first the target values are known.
Therefore, the output of supervised models is a prediction.
Instead, the output of unsupervised models is a label for which we don't know the meaning yet. For this purpose, after clustering, it is necessary to do the profiling of each one of those new label.
Having said so, you could label the dataset using an unsupervised learning technique such as SOM. Then, you should profile each class in order to be sure to understand the meaning of each class.
At this point, you can pursue two different path depending on what is your final objective:
1. use this new variable as a way for dimensionality reduction
2. use this new dataset featured with the additional variable representing the class as a labelled data that you will try to predict using the LVQ
Hope this can be useful!

Recurrent neural layers in Keras

I'm learning neural networks through Keras and would like to explore my sequential dataset on a recurrent neural network.
I was reading the docs and trying to make sense of the LSTM example.
My questions are:
What are the timesteps that are required for both layers?
How do I prepare a sequential dataset that works with Dense as an input for those recurrent layers?
What does the Embedding layer do?
Timesteps are a pretty bothering thing about Keras. Due to the fact that data you provide as an input to your LSTM must be a numpy array it is needed (at least for Keras version <= 0.3.3) to have a specified shape of data - even with a "time" dimension. You can only put a sequences which have a specified length as an input - and in case your inputs vary in a length - you should use either an artificial data to "fill" your sequences or use a "stateful" mode (please read carefully Keras documentation to understand what this approach means). Both solutions might be unpleasent - but it's a cost you pay that Keras is so simple :) I hope that in version 1.0.0 they will do something with that.
There are two ways to apply norecurrent layers after LSTM ones:
you could set an argument return_sequences to False - then only the last activations from every sequence will be passed to a "static" layer.
you could use one of "time distributed" layers - to get more flexibility with what you want to do with your data.
https://stats.stackexchange.com/questions/182775/what-is-an-embedding-layer-in-a-neural-network :)

SVM LibSVM Ignore Feature 1,3,5 when Predicting

this question is about LibSVM or SVMs in general.
I wonder if it is possible to categorize Feature-Vectors of different length with the same SVM Model.
Let's say we train the SVM with about 1000 Instances of the following Feature Vector:
[feature1 feature2 feature3 feature4 feature5]
Now I want to predict a test-vector which has the same length of 5.
If the probability I receive is to poor, I now want to check the first subset of my test-vector containing the columns 2-5. So I want to dismiss the 1 feature.
My question now is: Is it possible to tell the SVM only to check the features 2-5 for prediction (e.g. with weights), or do I have to train different SVM Models. One for 5 features, another for 4 features and so on...?
Thanks in advance...
marcus
You can always remove features from your test points by fiddling with the file, but I highly recommend not using such an approach. An SVM model is valid when all features are present. If you are using the linear kernel, simply setting a given feature to 0 will implicitly cause it to be ignored (though you should not do this). When using other kernels, this is very much a no no.
Using a different set of features for predictions than the set you used for training is not a good approach.
I strongly suggest to train a new model for the subset of features you wish to use in prediction.