What is Fine Tuning in reference to Neural Network? - neural-network

I'm going through a few research papers based on neural network, where I came across the word Fine Tuning on pre-trained CNN network. What does it actually do?

Pre-trained:
Firstly we have to understand pre-trained model. Pre-trained models are the models which weights are already trained by someone on a data-set. e.g VGG16 is trained on image-net. Now we want to classify imagenet images. than we can say that If we use pre-trained VGG16 we can classify them easily. Because VGG16 is already trained to classify imagenet objects we don't need to train that again.
Fine-Tuning:
Now I want to classify Cifar-10(classes-10) with VGG16 (classes-1000) and I want to use pre-trained models for this work. Now I have a model which is trained on Image-net which have 1000 classes. So Now I will change the last layer with 10 neurons with softmax activation because Now I want to classify 10 classes not 1000. Now I will fine-tune(change according to my need) my model. I will add a dense layer at the last of the model which have 10 neurons. Now I can use VGG16 (pre-trained for image-net). changing pre-trained model according to our need is known as fine-tuning.
Transfer Learning:
Now the whole concept using pre-trained model and use it to classify our data-set by fine-tuning model is known as transfer-learning
Transfer-learning Example(Using Pre-trained model and Fine-tune it for using it on my data-set)
Here I am using Dense-net pre-trained on image-net and fine-tune my model because I want to use VGG16 net model to classify images in my data-set. and my data set have 5 classes So I am adding last dense-layer having 5 neurons
model=Sequential()
dense_model=keras.applications.densenet.DenseNet121(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224,224,3), pooling=None, classes=1000)
dense_model.trainable = False
dense_model.summary()
# Add the vgg convolutional base model
model.add(dense_model)
# Add new layers
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(5, activation='softmax'))
model.summary()
Pre-trained model link:
https://www.kaggle.com/sohaibanwaar1203/pretrained-densenet
Now what if I want to change the hyper-parameters of the pre-trained model. I want to check which (optimizer,loss-function,number of layers, number of neurons) is working well on my data-set if I use VGG16 (on my data-set). For this reason I will optimize my parameter known as hyper-parameter Optimization
Hyper-parameter Optimization:
if you have knowledge about neural networks you will know that we give random numbers to our neural network. e.g No of dense layers, Number of dense units, Activation's, Dropout percentage. We don't know that neural network with 3 layers will perform well on our data or neural network with 6 layers will perform well on our data. We do experimentation to get the best number for our model. Now experimentation in which you are finding best number for your model is known as fine tuning. Now we have some techniques to Optimize our model like
Grid Search, Random Search. I am sharing notebook by which you will be able to Optimize your model parameters with the help of code.
import math
from keras.wrappers.scikit_learn import KerasRegressor
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.metrics import make_scorer
from keras.models import Sequential,Model
from keras.layers import Dense,Dropout,Activation,BatchNormalization
from keras import losses
from keras import optimizers
from keras.callbacks import EarlyStopping
from keras import regularizers
def Randomized_Model(lr=0.0001,dropout=0.5,optimizer='Adam',loss='mean_squared_error',
activation="relu",clipnorm=0.1,
decay=1e-2,momentum=0.5,l1=0.01,l2=0.001,
):
#Setting Numbers of units in Every dense layer according to the number of dense layers
no_of_units_in_dense_layer=[]
#backwards loop
#setting up loss functions
loss=losses.mean_squared_error
if(loss=='mean_squared_error'):
loss=losses.mean_squared_error
if(loss=="poisson"):
loss=keras.losses.poisson
if(loss=="mean_absolute_error"):
loss=keras.losses.mean_absolute_percentage_error
if(loss=="mean_squared_logarithmic_error"):
loss=keras.losses.mean_squared_logarithmic_error
if(loss=="binary_crossentropy"):
loss=keras.losses.binary_crossentropy
if(loss=="hinge"):
loss=keras.losses.hinge
#setting up Optimizers
opt=keras.optimizers.Adam(lr=lr, decay=decay, beta_1=0.9, beta_2=0.999)
if optimizer=="Adam":
opt=keras.optimizers.Adam(lr=lr, decay=decay, beta_1=0.9, beta_2=0.999)
if optimizer=="Adagrad":
opt=keras.optimizers.Adagrad(lr=lr, epsilon=None, decay=decay)
if optimizer=="sgd":
opt=keras.optimizers.SGD(lr=lr, momentum=momentum, decay=decay, nesterov=False)
if optimizer=="RMSprop":
opt=keras.optimizers.RMSprop(lr=lr, rho=0.9, epsilon=None, decay=0.0)
if optimizer=="Adamax":
opt=keras.optimizers.Adamax(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
#model sequential
model=Sequential()
model.add(Dense(units=64,input_dim=30,activation=activation))
model.add(Dense(units=32,activation=activation))
model.add(Dense(units=8,activation=activation))
model.add(Dense(units=1))
model.compile(loss=loss ,optimizer=opt)
return model
params = {'lr': (0.0001, 0.01,0.0009,0.001,0.002 ),
'epochs': [50,100,25],
'dropout': (0, 0.2,0.4, 0.8),
'optimizer': ['Adam','Adagrad','sgd','RMSprop','Adamax'],
'loss': ["mean_squared_error","hinge","mean_absolute_error","mean_squared_logarithmic_error","poisson"],
'activation' :["relu","selu","linear","sigmoid"],
'clipnorm':(0.0,0.5,1),
'decay':(1e-6,1e-4,1e-8),
'momentum':(0.9,0.5,0.2),
'l1': (0.01,0.001,0.0001),
'l2': (0.01,0.001,0.0001),
}
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.metrics import make_scorer
# model class to use in the scikit random search CV
model = KerasRegressor(build_fn=Randomized_Model, epochs=30, batch_size=3, verbose=1)
RandomizedSearchfit = RandomizedSearchCV(estimator=model, cv=KFold(3), param_distributions=params, verbose=1, n_iter=10, n_jobs=1)
#having some problem in this line
RandomizedSearch_result = RandomizedSearchfit.fit(X, Y )
Now give your X and Y to this model it will find the best parameter selected by you in the param_dict variable. You can also check fine-tuning of CNN in this notebook (Click Here) In this Notebook I am using Talos Library to fine tune my model.
This is another notebook in which I am using SKLearn (Randomised and grid search )to fine tune my model (Click Here)

Fine-tuning is usually called the last step of more complex NN training when you only slightly modify a pre-trained network, usually to improve performance on a specific domain or re-use good input representation in a different task.
Often, it is mentioned in context of transfer learning. E.g., for image recognition, it may mean that you take a network that was trained to recognize 1k classes from ImageNet. You take the pre-trained network and only "fine-tune" the last layer on your task-specific (smaller and presumably simpler dataset).

Related

How to add two output classification layers in keras?

I have a neural network whose job is to classify 10 classes. Further, I want these 10 classes to be classified into 2 classes (positive -> 3 , negative -> 7). How can I achieve this in keras?
It sounds like you are trying to solve two different, but closely related problems. I recommend that you train your first model to predict 10 classes, and then you create a copy of the first model (including weights) except with a different output layer to support binary classification. At this point you can either:
Train only your final dense layer and new output layer, or
Train the entire model with a low learning rate
For more information you can read about Transfer Learning.
Example code:
model.save('model_1') # load this to retrieve your original model
model.pop() # pop output activation layer and associated params
model.pop() # pop final dense layer
model.add(Dense(1), kernel_initializer='normal', activation='sigmoid')
for layer in model.layers[:-2]:
layer.trainable = False
model.compile(loss='binary_crossentropy', optimizer='nadam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, batch_size=32)
If you want to retrain the whole model then you can omit the loop setting all but the last two layers to untrainable, and choose an optimizer such as SGD with a low learning rate.

Neural network categorization: Do they always have to have one label per training data

In all the examples of categorization with neural networks that I have seen, they all have training data that has one category as the predominant category or the label for each input data.
Can you feed training data that has more than one label. Eg: a picture with a "cat" and a "mouse".
I understand (maybe wrong) that if you use softmax for probability/prediction at the output layer, it tends to try and select one (maximize discerning power). I'm guessing this would hurt/prevent learning and predicting multiple labels with input data.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ? or is that already the case and I missed some vital understanding. Please clarify.
Most examples have one class per input, so no you haven't missed anything. It is however possible to do multi-class classification, which is sometimes called joint classification in the literature.
The naive implementation you suggested with a softmax will struggle as the outputs on the final layer have to add up to 1, so the more classes you have the harder it is to figure out what the network is trying to say.
You can change the architecture to achieve what you want however. For each class you could have a binary softmax classifier which branches off from the penultimate layer or you can use a sigmoid, which doesn't have to add up to one (even though each neuron outputs between 0 and 1). Note using a sigmoid might make training more difficult.
Alternatively you could train multiple networks for each class and then combine them into one classification system at the end. It depends on how complex your envisioned task is.
Is there any approach/architecture of NN where there are multiple labels in training data and multiple outputs predictions are made ?
Answer is YES. To briefly answer your question, I am giving an example in the context of Keras, a high-level neural network library.
Let's consider the following model. We want to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc.
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
# headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# this embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# a LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
auxiliary_input = Input(shape=(5,), name='aux_input')
x = merge([lstm_out, auxiliary_input], mode='concat')
# we stack a deep fully-connected network on top
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# and finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
This defines a model with two inputs and two outputs:
model = Model(input=[main_input, auxiliary_input], output=[main_output, auxiliary_output])
Now, lets compile and train the model as follows:
model.compile(optimizer='rmsprop',
loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1., 'aux_output': 0.2})
# and trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
{'main_output': labels, 'aux_output': labels},
nb_epoch=50, batch_size=32)
Reference: Multi-input and multi-output models in Keras

Is it possible to extend trained Neural Network to recognize additional patterns

Lets say I have Neural Network (NN) that is trained to recognize cats given an image, is there a way to update my NN to recognize dogs as well?
More generally, my question is regarding a way to extend a NN by kind a "loading patterns library".
This is generally known as transfer learning, you basically train a neural network on a large dataset (like ImageNet) and then use the feature vector that is generated by the final convolutional layer to train another classifier (a multiclass SVM for example), and this works even if the objects are different.
Another way is to take a pretrained network and retrain the classifier part (the fully connected layers). It is still faster than training a network from scratch.

How to use a theanets neural network model in deeplearning4j?

I have trained theanets neural network model and I want to use the same model in deeplearning4j, any suggestions?
We have keras model import if you can map it to that. That's as close as it's going to get though.
https://deeplearning4j.org/model-import-keras
Beyond that it depends on the mdoel.

Load my MATLAB model as a Weka model

I have trained a Bayesian Regularized Neural Network model with MATLAB. This model is not available with Weka. So now I want to import my MATLAB model as Weka's .model file, so that I can directly use my model with Weka.
How do I go about it?
Weka can import models in the PMML format, so the easiest (and possibly the only avaliable) way to load the neural network trained with some "special" form of regularization.
You will have to save your network in the PMML format, some guidelines can be obtained here:
http://www.dmg.org/v3-2/NeuralNetwork.html