Change optimizer alghoritm in Keras - neural-network

If you change the optimizer in Keras, you need to compile your model again. This compilation overrides the learned weights of the network. I know how to save weights, but I do not know how to restore them for a network. Can someone please help?

Here is a YouTube video that explains exactly what you're wanting to do: Save and load a Keras model
How you load the model weights is going to depend on how you saved the model or model weights. There are three different saving methods that Keras makes available. These are described in the video link above (with examples), as well as below.
The model.save('my_model.h5') function saves:
The architecture of the model, allowing to re-create the model.
The weights of the model.
The training configuration (loss, optimizer).
The state of the optimizer, allowing to resume training exactly where you left off.
To load this saved model, you would use the following:
from keras.models import load_model
new_model = load_model('my_model.h5')
The model.to_json() function only saves the architecture of the model. This will not save the weights. To load this saved model, you would use the following:
json_string = model.to_json()
from keras.models import model_from_json
model = model_from_json(json_string)
The model.save_weights('my_model_weights.h5') function only saves the weights of the model. To load these saved weights to a model, you would use the following:
model.load_weights('my_model_weights.h5')

Related

What is Fine Tuning in reference to Neural Network?

I'm going through a few research papers based on neural network, where I came across the word Fine Tuning on pre-trained CNN network. What does it actually do?
Pre-trained:
Firstly we have to understand pre-trained model. Pre-trained models are the models which weights are already trained by someone on a data-set. e.g VGG16 is trained on image-net. Now we want to classify imagenet images. than we can say that If we use pre-trained VGG16 we can classify them easily. Because VGG16 is already trained to classify imagenet objects we don't need to train that again.
Fine-Tuning:
Now I want to classify Cifar-10(classes-10) with VGG16 (classes-1000) and I want to use pre-trained models for this work. Now I have a model which is trained on Image-net which have 1000 classes. So Now I will change the last layer with 10 neurons with softmax activation because Now I want to classify 10 classes not 1000. Now I will fine-tune(change according to my need) my model. I will add a dense layer at the last of the model which have 10 neurons. Now I can use VGG16 (pre-trained for image-net). changing pre-trained model according to our need is known as fine-tuning.
Transfer Learning:
Now the whole concept using pre-trained model and use it to classify our data-set by fine-tuning model is known as transfer-learning
Transfer-learning Example(Using Pre-trained model and Fine-tune it for using it on my data-set)
Here I am using Dense-net pre-trained on image-net and fine-tune my model because I want to use VGG16 net model to classify images in my data-set. and my data set have 5 classes So I am adding last dense-layer having 5 neurons
model=Sequential()
dense_model=keras.applications.densenet.DenseNet121(include_top=False, weights='imagenet', input_tensor=None, input_shape=(224,224,3), pooling=None, classes=1000)
dense_model.trainable = False
dense_model.summary()
# Add the vgg convolutional base model
model.add(dense_model)
# Add new layers
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation='relu'))
model.add(keras.layers.Dense(5, activation='softmax'))
model.summary()
Pre-trained model link:
https://www.kaggle.com/sohaibanwaar1203/pretrained-densenet
Now what if I want to change the hyper-parameters of the pre-trained model. I want to check which (optimizer,loss-function,number of layers, number of neurons) is working well on my data-set if I use VGG16 (on my data-set). For this reason I will optimize my parameter known as hyper-parameter Optimization
Hyper-parameter Optimization:
if you have knowledge about neural networks you will know that we give random numbers to our neural network. e.g No of dense layers, Number of dense units, Activation's, Dropout percentage. We don't know that neural network with 3 layers will perform well on our data or neural network with 6 layers will perform well on our data. We do experimentation to get the best number for our model. Now experimentation in which you are finding best number for your model is known as fine tuning. Now we have some techniques to Optimize our model like
Grid Search, Random Search. I am sharing notebook by which you will be able to Optimize your model parameters with the help of code.
import math
from keras.wrappers.scikit_learn import KerasRegressor
import keras
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.metrics import make_scorer
from keras.models import Sequential,Model
from keras.layers import Dense,Dropout,Activation,BatchNormalization
from keras import losses
from keras import optimizers
from keras.callbacks import EarlyStopping
from keras import regularizers
def Randomized_Model(lr=0.0001,dropout=0.5,optimizer='Adam',loss='mean_squared_error',
activation="relu",clipnorm=0.1,
decay=1e-2,momentum=0.5,l1=0.01,l2=0.001,
):
#Setting Numbers of units in Every dense layer according to the number of dense layers
no_of_units_in_dense_layer=[]
#backwards loop
#setting up loss functions
loss=losses.mean_squared_error
if(loss=='mean_squared_error'):
loss=losses.mean_squared_error
if(loss=="poisson"):
loss=keras.losses.poisson
if(loss=="mean_absolute_error"):
loss=keras.losses.mean_absolute_percentage_error
if(loss=="mean_squared_logarithmic_error"):
loss=keras.losses.mean_squared_logarithmic_error
if(loss=="binary_crossentropy"):
loss=keras.losses.binary_crossentropy
if(loss=="hinge"):
loss=keras.losses.hinge
#setting up Optimizers
opt=keras.optimizers.Adam(lr=lr, decay=decay, beta_1=0.9, beta_2=0.999)
if optimizer=="Adam":
opt=keras.optimizers.Adam(lr=lr, decay=decay, beta_1=0.9, beta_2=0.999)
if optimizer=="Adagrad":
opt=keras.optimizers.Adagrad(lr=lr, epsilon=None, decay=decay)
if optimizer=="sgd":
opt=keras.optimizers.SGD(lr=lr, momentum=momentum, decay=decay, nesterov=False)
if optimizer=="RMSprop":
opt=keras.optimizers.RMSprop(lr=lr, rho=0.9, epsilon=None, decay=0.0)
if optimizer=="Adamax":
opt=keras.optimizers.Adamax(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
#model sequential
model=Sequential()
model.add(Dense(units=64,input_dim=30,activation=activation))
model.add(Dense(units=32,activation=activation))
model.add(Dense(units=8,activation=activation))
model.add(Dense(units=1))
model.compile(loss=loss ,optimizer=opt)
return model
params = {'lr': (0.0001, 0.01,0.0009,0.001,0.002 ),
'epochs': [50,100,25],
'dropout': (0, 0.2,0.4, 0.8),
'optimizer': ['Adam','Adagrad','sgd','RMSprop','Adamax'],
'loss': ["mean_squared_error","hinge","mean_absolute_error","mean_squared_logarithmic_error","poisson"],
'activation' :["relu","selu","linear","sigmoid"],
'clipnorm':(0.0,0.5,1),
'decay':(1e-6,1e-4,1e-8),
'momentum':(0.9,0.5,0.2),
'l1': (0.01,0.001,0.0001),
'l2': (0.01,0.001,0.0001),
}
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV, KFold
from sklearn.metrics import make_scorer
# model class to use in the scikit random search CV
model = KerasRegressor(build_fn=Randomized_Model, epochs=30, batch_size=3, verbose=1)
RandomizedSearchfit = RandomizedSearchCV(estimator=model, cv=KFold(3), param_distributions=params, verbose=1, n_iter=10, n_jobs=1)
#having some problem in this line
RandomizedSearch_result = RandomizedSearchfit.fit(X, Y )
Now give your X and Y to this model it will find the best parameter selected by you in the param_dict variable. You can also check fine-tuning of CNN in this notebook (Click Here) In this Notebook I am using Talos Library to fine tune my model.
This is another notebook in which I am using SKLearn (Randomised and grid search )to fine tune my model (Click Here)
Fine-tuning is usually called the last step of more complex NN training when you only slightly modify a pre-trained network, usually to improve performance on a specific domain or re-use good input representation in a different task.
Often, it is mentioned in context of transfer learning. E.g., for image recognition, it may mean that you take a network that was trained to recognize 1k classes from ImageNet. You take the pre-trained network and only "fine-tune" the last layer on your task-specific (smaller and presumably simpler dataset).

is that normal the size of trained model is too big (40GB) in Matlab? [duplicate]

The MATLAB Classification Learner App creates an SVM, which takes 4 MB memory space.
Why so much? As far as I know, the SVM has to learn only a few coefficients of the hyper-planes.
Classification Learner App has two export options - Export Model, and Export Compact Model.
If you choose Export Model you'll get an object of class ClassificationSVM - this contains not only the model itself, but also the data used for training, which are needed if you later want to create various diagnostic plots or calculate performance measures. This may explain the size you're seeing.
If you choose Export Compact Model, you'll get an object of class CompactClassificationSVM, which contains just the model itself. This can only be used for prediction, and not the diagnostic plots and measures described earlier. As it's an object, it will still take up more memory than just a simple array of coefficients - but it should be quite a bit smaller than the ClassificationSVM, as it doesn't store the training data.

How to use keras for binary classification?

I need simple example about how to use keras model. It is not clear for me what difference between model.evaluate and model.predict.
I want to create model for binary classification. Lets say I have images of cats and dogs, train model and can use it to predict which animal on given photo. Maybe there is some good into or tutorials. I read anything in first five pages in google, but found only complex level tutorials and discussions.
To make things short:
model.evaluate evaluates a pair (X,Y) and returns the loss (and all other metrics configured for the model). This is for testing your model on a vaildation or test set.
model.predict predicts the outcome given an input X. This if for predicting the class from an input image for example.
This, among other things, is also clearly documented in the linked documentation.
You can find a lot of example models for Keras in the git repository (keras/examples) or on the Keras website (here and here).
For binary classification you could use this model for example:
model = Sequential()
model.add(Dense(300, init='uniform'))
model.add(Activation('relu'))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.02))
model.fit(X, Y)

fine-tuning with VGG on caffe

I'm replicating the steps in
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
I want to change the network to VGG model which is obtained at
http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
does it suffice to simply substitute the model parameter as following?
./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights VGG_ISLVRC_16_layers.caffemodel -gpu 0
Or do I need to adjust learning rates, iterations, i.e. does it come with separate prototxt files?
There needs to be a 1-1 correspondence between the weights of the network you want to train and the weights you use for initializing/fine-tuning. The architecture of the old and new model have to match.
VGG-16 has a different architecture than the model described by models/finetune_flickr_style/train_val.prototxt (FlickrStyleCaffeNet). This is the network that the solver will try to optimize. Even if it doesn't crash, the weights you've loaded don't have any meaning in the new network.
The VGG-16 network is described in the deploy.prototxt file on this page in Caffe's Model Zoo.

How to predict labels for new data (test set) by the PartitionedEnsemble model in Matlab?

I trained a ensemble model (RUSBoost) for a binary classification problem by the function fitensemble() in Matlab 2014a. The training by this function is performed 10-fold cross-validation through the input parameter "kfold" of the function fitensemble().
However, the output model trained by this function cannot be used to predict the labels of new data if I use the predict(model, Xtest). I checked the Matlab documents, which says we can use kfoldPredict() function to evaluate the trained model. But I did not find any input of the new data through this function. Also, I found the structure of the trained model with cross-validation is different from that model without cross-validation. So, could anyone please advise me how to use the model, which is trained with cross-validation, to predict labels of new data? Thanks!
kfoldPredict() needs a RegressionPartitionedModel or ClassificationPartitionedEnsemble object as input. This already contains the models and data for kfold cross validation.
The RegressionPartitionedModel object has a field Trained, in which the trained learners that are used for cross validation are stored.
You can take any of these learners and use it like predict(learner, Xdata).
Edit:
If k is too large, it is possible that there is too little meaningful data in one or more iteration, so the model for that iteration is less accurate.
There are no general rules for k, but k=10 like in the MATLAB default is a good starting point to play around with it.
Maybe this is also interesting for you: https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation