I have a LeNet-300-100 dense neural network for MNIST dataset where I want to freeze the first two layers having 300 and 100 hidden neurons in the first two hidden layers. I just want to train the output layer. The code I have to do this is as follows:
from tensorflow import keras
inner_model = keras.Sequential(
[
keras.Input(shape=(1024,)),
keras.layers.Dense(300, activation="relu", kernel_initializer = tf.initializers.GlorotNormal()),
keras.layers.Dense(100, activation="relu", kernel_initializer = tf.initializers.GlorotNormal()),
]
)
model_mnist = keras.Sequential(
[keras.Input(shape=(1024,)), inner_model, keras.layers.Dense(10, activation="softmax"),]
)
# model_mnist.trainable = True # Freeze the outer model
# Freeze the inner model-
inner_model.trainable = False
# Sanity check-
inner_model.trainable, model_mnist.trainable
# (False, True)
# Compile NN-
model_mnist.compile(
loss=tf.keras.losses.categorical_crossentropy,
# optimizer='adam',
optimizer=tf.keras.optimizers.Adam(lr = 0.0012),
metrics=['accuracy'])
However, this code doesn't seem to be freezing the first two hidden layers and they are also learning. What am I doing wrong?
Thanks!
Solution: Use 'trainable' parameter while defining neural network model to freeze the desired layers of the model as follows-
model = Sequential()
model.add(Dense(units = 300, activation="relu", kernel_initializer = tf.initializers.GlorotNormal(), trainable = False))
model.add(Dense(units = 100, activation = "relu", kernel_initializer = tf.initializer.GlorotNormal(), trainable = False))
model.add(Dense(units = 10, activation = "softmax"))
# Compile model as usual
Related
I'm building a convolutional autoencoder, but want the encoding to be in a linear form so I can more easily feed it as input into an MLP. I have two convolutional layers on the encoder along with a linear inner layer to reduce dimension. This encoding is then fed into the corresponding decoder.
When I flatten the output of the second convolutional layer, based on my calculation (using the standard formula: Calculate the Output size in Convolution layer) should come out to a 1x100352 rank 1 tensor. However, when I set the input dimension of the linear layer to be 100352, the flattened rank 1 tensor has dimension 1x50176. Then comes the weird part.
I tried changing the input dimension of the linear layer to be 50176, assuming I had miscalculated. When I do this, the reshaped rank 1 tensor confusingly becomes 1x100352, and then the aforementioned weight matrix becomes 50176x256 as expected.
This response to modifying the linear layer's input dimension doesn't make sense to me. That hyperparameter controls the weight matrix correctly, but I guess I'm uncertain why it has any bearing on the linear layer's input since that's just a reshaped tensor output from a convolutional layer whose hyperparameters are unrelated to the hyperparameter in question.
I apologize if I'm just missing something obvious. I'm very new to pytorch, and I couldn't find any other posts which discussed this sort of issue.
Here's what I believe to be the minimal reproducible example:
import os
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.utils import save_image
class convAutoEncoder(nn.Module):
def __init__(self,**kwargs):
super().__init__()
#Creating network structure
#Encoder portion of autoencoder
self.enc1 = nn.Conv2d(in_channels = kwargs["inputChannels"], out_channels = kwargs["channelsEncoderMid"], kernel_size = kwargs["kernelSize"])
self.enc2 = nn.Conv2d(in_channels = kwargs["channelsEncoderMid"], out_channels = kwargs["channelsEncoderInner"], kernel_size = kwargs["kernelSize"])
self.enc3 = nn.Linear(in_features = kwargs["intoLinear"], out_features = kwargs["linearEncoded"])
#Decoder portion of autoencoder
self.dec1 = nn.Linear(in_features = kwargs["linearEncoded"], out_features = kwargs["intoLinear"])
self.dec2 = nn.ConvTranspose2d(in_channels = kwargs["channelsEncoderInner"], out_channels = kwargs["channelsDecoderMid"], kernel_size = kwargs["kernelSize"])
self.dec3 = nn.ConvTranspose2d(in_channels = kwargs["channelsDecoderMid"], out_channels = kwargs["inputChannels"], kernel_size = kwargs["kernelSize"])
def forward(self,x):
#Encoding
x = F.relu(self.enc1(x))
x = F.relu(self.enc2(x))
x = x.reshape(1,-1)
x = x.squeeze()
x = F.relu(self.enc3(x))
#Decoding
x = F.relu(self.dec1(x))
x = x.reshape([32,4,28,28])
x = F.relu(self.dec2(x))
x = F.relu(self.dec3(x))
return x
def encodeDecodeConv(numEpochs = 20, input_Channels = 3, batchSize = 32,
channels_Encoder_Inner = 4, channels_Encoder_Mid = 8, into_Linear = 100352,
linear_Encoded = 256, channels_Decoder_Mid = 8, kernel_Size = 3,
learningRate = 1e-3):
#Pick a device. If GPU available, use that. Otherwise, use CPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#Define data transforms
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
#Define training dataset
trainSet = datasets.CIFAR10(root = './data', train = True, download = True, transform = transform)
#Define testing dataset
testSet = datasets.CIFAR10(root = './data', train = False, download = True, transform = transform)
#Define data loaders
trainLoader = DataLoader(trainSet, batch_size = batchSize, shuffle = True)
testLoader = DataLoader(testSet, batch_size = batchSize, shuffle = True)
#Initialize neural network
model = convAutoEncoder(inputChannels = input_Channels, channelsEncoderMid = channels_Encoder_Mid, channelsEncoderInner = channels_Encoder_Inner, intoLinear = into_Linear, linearEncoded = linear_Encoded, channelsDecoderMid = channels_Decoder_Mid, kernelSize = kernel_Size)
#Optimization setup
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(),lr = learningRate)
lossTracker = []
for epoch in range(numEpochs):
loss = 0
for data,_ in trainLoader:
data = data.to(device)
optimizer.zero_grad()
outputs = model(data)
train_loss = criterion(outputs,data)
train_loss.backward()
optimizer.step()
loss += train_loss.item()
loss = loss/len(trainLoader)
print('Epoch {} of {}, Train loss: {:.3f}'.format(epoch+1,numEpochs,loss))
encodeDecodeConv()
Edit2: Somewhere in the CIFAR10 dataset, the data appears to change dimension. After playing around with print statements more, I discovered that setting the relevant hyperparameter to 100352 works great for many entries, but then seemingly one image pops up that has a different size. Not sure why that would occur, though.
I'm trying to implement a very simple one layered MLP for a toy regression problem with one variable (dimension = 1) and one target (dimension = 1). It's a simple curve fitting problem with zero noise.
Matlab\Deep Learning Toolbox
Using levenberg-marquardt backpropagation on a MLP with a single hidden layer with 100 neurons and hyperbolic tangent activation I got pretty decent performance with almost zero effort:
MSE = 7.18e-08
Plotting the predictions and the targets I get a very precise fitting.
Python\Tensorflow\Keras
With the same network settings I used in matlab there's almost no training. No matter how hard I try to tune the training parameters or switch the optimizer.
MSE = 0.12900154
In this case the plot of the predictions is a curve that is not even able to follow the oscillations of the target curve.
I can obtain something better using RELU activations for the hidden layer but we're still far:
MSE = 0.0582045
This is the code I used in Python:
# IMPORT LIBRARIES
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
# IMPORT DATASET FROM CSV FILE, SHUFFLE TRAINING SET
# AND MAKE NUMPY ARRAY FOR TRAINING (DATA ARE ALREADY NORMALIZED)
dataset_path = "C:/Users/Rob/Desktop/Learning1.csv"
Learning_Dataset = pd.read_csv(dataset_path
, comment='\t',sep=","
,skipinitialspace=False)
Learning_Dataset = Learning_Dataset.sample(frac = 1) # SHUFFLING
test_dataset_path = "C:/Users/Rob/Desktop/Test1.csv"
Test_Dataset = pd.read_csv(test_dataset_path
, comment='\t',sep=","
,skipinitialspace=False)
Learning_Target = Learning_Dataset.pop('Target')
Test_Target = Test_Dataset.pop('Target')
Learning_Dataset = np.array(Learning_Dataset,dtype = "float32")
Test_Dataset = np.array(Test_Dataset,dtype = "float32")
Learning_Target = np.array(Learning_Target,dtype = "float32")
Test_Target = np.array(Test_Target,dtype = "float32")
# DEFINE SIMPLE MLP MODEL
inputs = tf.keras.layers.Input(shape=(1,))
x = tf.keras.layers.Dense(100, activation='relu')(inputs)
y = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs=inputs, outputs=y)
# TRAIN MODEL
opt = tf.keras.optimizers.RMSprop(learning_rate = 0.001,
rho = 0.9,
momentum = 0.0,
epsilon = 1e-07,
centered = False)
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)
model.compile(optimizer = opt,
loss = 'mse',
metrics = ['mse'])
model.fit(Learning_Dataset,
Learning_Target,
epochs=500,
validation_split = 0.2,
verbose=0,
callbacks=[early_stop],
shuffle = False,
batch_size = 100)
# INFERENCE AND CHECK ACCURACY
Predictions = model.predict(Test_Dataset)
Predictions = Predictions.reshape(10000)
print(np.square(np.subtract(Test_Target,Predictions)).mean()) # MSE
plt.plot(Test_Dataset,Test_Target,'o',Test_Dataset,Predictions,'o')
plt.legend(('Target','Model Prediction'))
plt.show()
What am i doing wrong?
Thanks
My sequential dense DNN seems to run through each parameter in my parameter grid three times while doing Grid Search. I expect it to run once per specified epcohs in the grid: 10, 50 and 100. Why does this happen?
model architecture:
def build_model():
print('building DNN architecture')
model = Sequential()
model.add(Dropout(0.02, input_shape = (150,)))
model.add(Dense(8, init = 'normal', activation = 'relu'))
model.add(Dropout(0.02))
model.add(Dense(16, init = 'normal', activation = 'relu'))
model.add(Dense(1, init = 'normal'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
print('model succesfully compiled')
return model
Grid search on epochs:
from sklearn.model_selection import GridSearchCV
epochs = [10,50,100]
param_grid = dict(epochs = epochs)
grid = GridSearchCV(estimator = KerasRegressor(build_fn = build_model), param_grid = param_grid)
grid_result = grid.fit(x_train, y_train)
grid_result.best_params_
Because GridSearchCV does both grid search and cross-validation. For each parameter combination, three (by default) splits are used for cross-validation, and this is why you see the model being trained three times for each parameter set.
You can change the number of folds (splits) with the "cv" parameter. Check it out in the documentation.
I have the following model:
sharedLSTM1 = LSTM((data.shape[1]), return_sequences=True)
sharedLSTM2 = LSTM(data.shape[1])
def createModel(dropoutRate=0.0, numNeurons=40, optimizer='adam'):
inputLayer = Input(shape=(timesteps, data.shape[1]))
sharedLSTM1Instance = sharedLSTM1(inputLayer)
sharedLSTM2Instance = sharedLSTM2(sharedLSTM1Instance)
dropoutLayer = Dropout(dropoutRate)(sharedLSTM2Instance)
denseLayer1 = Dense(numNeurons)(dropoutLayer)
denseLayer2 = Dense(numNeurons)(denseLayer1)
outputLayer = Dense(1, activation='sigmoid')(denseLayer2)
return (inputLayer, outputLayer)
inputLayer1, outputLayer1 = createModel()
inputLayer2, outputLayer2 = createModel()
model = Model(inputs=[inputLayer1, inputLayer2], outputs=[outputLayer1, outputLayer2])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
What will be the behaviour of model.fit([data1, data2], [labels1, labels2]) in this model. Will it alternatively train the two NNs for each epoch? Or will it completely train one network, and then train the other? Or maybe some other way?
It will train the only existing network at once.
You don't have two models, you have one model only. This model will be trained.
Data1 and Data2 will be fed simultaneously.
The loss function will be applied to both outputs, and both will backpropagate.
I want to convert a pre-trained CNN (like VGG-16) to a fully convolutional network in Pytorch. How can I do so?
You can do that as follows (see comments for description):
import torch
import torch.nn as nn
from torchvision import models
# 1. LOAD PRE-TRAINED VGG16
model = models.vgg16(pretrained=True)
# 2. GET CONV LAYERS
features = model.features
# 3. GET FULLY CONNECTED LAYERS
fcLayers = nn.Sequential(
# stop at last layer
*list(model.classifier.children())[:-1]
)
# 4. CONVERT FULLY CONNECTED LAYERS TO CONVOLUTIONAL LAYERS
### convert first fc layer to conv layer with 512x7x7 kernel
fc = fcLayers[0].state_dict()
in_ch = 512
out_ch = fc["weight"].size(0)
firstConv = nn.Conv2d(in_ch, out_ch, 7, 7)
### get the weights from the fc layer
firstConv.load_state_dict({"weight":fc["weight"].view(out_ch, in_ch, 7, 7),
"bias":fc["bias"]})
# CREATE A LIST OF CONVS
convList = [firstConv]
# Similarly convert the remaining linear layers to conv layers
for layer in enumerate(fcLayers[1:]):
if isinstance(module, nn.Linear):
# Convert the nn.Linear to nn.Conv
fc = module.state_dict()
in_ch = fc["weight"].size(1)
out_ch = fc["weight"].size(0)
conv = nn.Conv2d(in_ch, out_ch, 1, 1)
conv.load_state_dict({"weight":fc["weight"].view(out_ch, in_ch, 1, 1),
"bias":fc["bias"]})
convList += [conv]
else:
# Append other layers such as ReLU and Dropout
convList += [layer]
# Set the conv layers as a nn.Sequential module
convLayers = nn.Sequential(*convList)