Float Multi-label Regression in Caffe - loss results - neural-network

I have trained NN for Regression problem. my data type is HDF5_DATA that made of .jpg images (3X256X256) and float-label array (3 labels). Data-Set create script:
import h5py, os
import caffe
import numpy as np
SIZE = 256 # images size
with open( '/home/path/trainingTintText.txt', 'r' ) as T :
lines = T.readlines()
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
labels = np.zeros( (len(lines),3), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) )
transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img*255
print X[i]
labels[i,0] = float(sp[1])
labels[i,1] = float(sp[2])
labels[i,2] = float(sp[3])
with h5py.File('/home/path/train.h5','w') as H:
H.create_dataset('data', data=X)
H.create_dataset('label', data=labels)
with open('/home/path/train_h5_list.txt','w') as L:
L.write( '/home/path/train.h5' )
this is (not fullish) architecture:
name: "NN"
layers {
name: "NNd"
top: "data"
top: "label"
type: HDF5_DATA
hdf5_data_param {
source: "/home/path/train_h5_list.txt"
batch_size: 64
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: HDF5_DATA
top: "data"
top: "label"
hdf5_data_param {
source: "/home/path/train_h5_list.txt"
batch_size: 100
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
convolution_param {
num_output: 32
kernel_size: 11
stride: 2
bias_filler {
type: "constant"
value: 0.1
}
}
}
layers {
name: "ip2"
type: INNER_PRODUCT
bottom: "ip1"
top: "ip2"
inner_product_param {
num_output: 3
bias_filler {
type: "constant"
value: 0.1
}
}
}
layers {
name: "relu22"
type: RELU
bottom: "ip2"
top: "ip2"
}
layers {
name: "loss"
type: EUCLIDEAN_LOSS
bottom: "ip2"
bottom: "label"
top: "loss"
}
when I train the NN I got very high loss values:
I1117 08:15:57.707001 2767 solver.cpp:337] Iteration 0, Testing net (#0)
I1117 08:15:57.707033 2767 net.cpp:684] Ignoring source layer fkp
I1117 08:15:59.111842 2767 solver.cpp:404] Test net output #0: loss = 256.672 (* 1 = 256.672 loss)
I1117 08:15:59.275205 2767 solver.cpp:228] Iteration 0, loss = 278.909
I1117 08:15:59.275255 2767 solver.cpp:244] Train net output #0: loss = 278.909 (* 1 = 278.909 loss)
I1117 08:15:59.275276 2767 sgd_solver.cpp:106] Iteration 0, lr = 0.01
I1117 08:16:57.115145 2767 solver.cpp:337] Iteration 100, Testing net (#0)
I1117 08:16:57.115486 2767 net.cpp:684] Ignoring source layer fkp
I1117 08:16:58.884704 2767 solver.cpp:404] Test net output #0: loss = 238.257 (* 1 = 238.257 loss)
I1117 08:16:59.026926 2767 solver.cpp:228] Iteration 100, loss = 191.836
I1117 08:16:59.026971 2767 solver.cpp:244] Train net output #0: loss = 191.836 (* 1 = 191.836 loss)
I1117 08:16:59.026993 2767 sgd_solver.cpp:106] Iteration 100, lr = 0.01
I1117 08:17:56.890614 2767 solver.cpp:337] Iteration 200, Testing net (#0)
I1117 08:17:56.890880 2767 net.cpp:684] Ignoring source layer fkp
I1117 08:17:58.665057 2767 solver.cpp:404] Test net output #0: loss = 208.236 (* 1 = 208.236 loss)
I1117 08:17:58.809150 2767 solver.cpp:228] Iteration 200, loss = 136.422
I1117 08:17:58.809248 2767 solver.cpp:244] Train net output #0: loss = 136.422 (* 1 = 136.422 loss)
when I divide the images and the label arrays by 255 I got very low loss results (neat to 0). what is the reason for those loss results? am I doing something wrong? thanks

With the Euclidean loss, this is only to be expected. The Euclidean loss should be smaller by a factor of 256 if you divide all of the labels by 256 and re-train. It doesn't mean that dividing the labels by 256 makes the network become any better at predicting the labels; you've just changed the "scale" (the "units").
In particular, the Euclidean loss is (roughly) L = sqrt((x1 -y1)2 + (x2 -y2)2), where x is the correct answer and y is the output from the neural network. Suppose you divide every x by 256, then re-train. The neural network will learn to divide its output y by 256. How will this affect the Euclidean loss L? Well, if you work through the math, you'll find that L shrinks by a factor of 256.
It'd be like the difference between trying to predict a distance in feet, vs a distance in yards. The latter would involve dividing by 3. Conceptually, the overall accuracy of the network would remain the same; but the Euclidean loss would be divided by a factor of three, because you've changed the units from yards to meters. An average error of 0.1 feet would correspond to an average error of 0.0333 yards; but conceptually yield the "same" accuracy, even though 0.0333 looks like a smaller number than 0.1.
Dividing the images by 256 should be irrelevant. It's dividing the labels by 256 that caused the reduction in the loss function.

Related

Very long training time on a simple CNN

#building a convolutional neural network for images
#hyperprams
learning_rate = 5e-3
in_channel = 3
num_classes = 57
batch_size = 24
class CNNet(nn.Module):
def __init__(self,num_classes):
super(CNNet, self).__init__()
#Using 3 inchannels for RGB bvalues
self.conv1 = nn.Conv2d(in_channels=3,
out_channels=15,
kernel_size= (3,3),
stride=(1,1),
padding= (1,1),
bias=True)
#in channels on second layer must match out on first layer
self.conv2 = nn.Conv2d(in_channels= 15,
out_channels= 30,
kernel_size=(3,3),
stride = (1,1),
padding=(1,1),
bias=True)
#can use max pooling to reduce image size to 13 x 13
self.pool = nn.MaxPool2d(2,2)
#last level is fully connected with num out channels * original image dem /2
self.fc = nn.Linear(30 * 13 * 13 * 3, num_classes)
def forward(self, x):
x = f.relu(self.conv1(x))
x = f.relu(self.conv2(x))
x = self.pool(x)
x = x.reshape(x.shape[0], -1)
x = self.fc(x)
return x
#instantiate the class object of CNNet
cnn = CNNet(57)
cnn_optimizer = optim.Adam(cnn.parameters(), lr = learning_rate)
#training the CNN
for epoch in range(1):
for i, (data,target) in enumerate(train_loader):
cnn_optimizer.zero_grad()
#forward
outputs = cnn(data)
loss = criterion(outputs, target)
#backward propigation
cnn_optimizer.zero_grad()
loss.backward()
cnn_optimizer.step()
Im having an issue with training a convolution neural network using a custom data.
I have about 10000 color images all of size 26X26 Im trying to use to train the model to predict 57 different classes.
When Im running on the custom images class train in taking hours to complete. Currently 12 hours and it hasn't finished.
Im assuming I have something wrong in the training process that's causing such a long run time, But Im not sure where.

ValueError: Expected input batch_size (24) to match target batch_size (8)

Got many links to solve this read different stackoverflow answer related to this but not able to figure it out .
My image size is torch.Size([8, 3, 16, 16]).
My architechture is as below
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# linear layer (784 -> 1 hidden node)
self.fc1 = nn.Linear(16 * 16, 768)
self.fc2 = nn.Linear(768, 64)
self.fc3 = nn.Linear(64, 10)
self.dropout = nn.Dropout(p=.5)
def forward(self, x):
# flatten image input
x = x.view(-1, 16 * 16)
# add hidden layer, with relu activation function
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = F.log_softmax(self.fc3(x), dim=1)
return x
# specify loss function
criterion = nn.NLLLoss()
# specify optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=.003)
# number of epochs to train the model
n_epochs = 30 # suggest training between 20-50 epochs
model.train() # prep model for training
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
###################
# train the model #
###################
for data, target in trainloader:
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)
# print training statistics
# calculate average loss over an epoch
train_loss = train_loss/len(trainloader.dataset)
print('Epoch: {} \tTraining Loss: {:.6f}'.format(
epoch+1,
train_loss
))
i am getting value error as
ValueError: Expected input batch_size (24) to match target batch_size (8).
how to fix it . My batch size is 8 and input image size is (16*16).And i have 10 class classification here .
Your input images have 3 channels, therefore your input feature size is 16*16*3, not 16*16. Currently, you consider each channel as separate instances, leading to a classifier output - after x.view(-1, 16*16) flattening - of (24, 16*16). Clearly, the batch size doesn't match because it is supposed to be 8, not 8*3 = 24.
You could either:
Switch to a CNN to handle multi-channel inputs (here 3 channels).
Use a self.fc1 with 16*16*3 input features.
If the input is RGB, maybe even convert to 1-channel grayscale map.

Sigmoid function output

I have the following neural network model.
nn_classifier = Sequential()
nn_classifier.add(Dense(output_dim = 16 ,activation='relu',input_dim = 13))
nn_classifier.add(Dense(output_dim = 16,activation='relu'))
nn_classifier.add(Dense(output_dim = 1, activation = 'sigmoid'))
nn_classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', metrics=[tf.keras.metrics.BinaryAccuracy(threshold=0.5)])
model=nn_classifier.fit(X_train, Y_train ,validation_split=0.33, batch_size = 10, nb_epoch = 100)
Y_pred = nn_classifier.predict(X_test)
As I have used the sigmoid function in my output layer, I was expecting the predicted values (Y_pred) are either 0 or 1. But I get some decimal values. Is my understanding wrong?
Sigmoid always gives a value in [0,1] you need to round the value that means fix a threshold if it is higher than threshold then 1 else 0.

3D Input to Feed-Forward Neural Network

I have a 3D input dataset. The dimensions are (24,80,42). 80 number of timesteps or samples. Each timestep has 24 entities and each entity is attributed to 42 features. How do I give this as input to an ordinary Feed-forward Neural Network? I have already got results with LSTM.
This is the error I'm getting.
I don't know how to reshape the data to give as input.
ValueError: Error when checking input: expected dense_3_input to have 3 dimensions, but got array with shape (1920, 42)
input_shape = (80,24,42)
network = models.Sequential()
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=42, activation='relu',
input_shape=input_shape))
# Add fully connected layer with a ReLU activation function
network.add(layers.Dense(units=42, activation='relu'))
# Add fully connected layer with no activation function
network.add(layers.Dense(units=24))
network.summary()
Is this correct ?
Layer (type) Output Shape Param #
dense_25 (Dense) (None, 80, 24, 42) 1806
dense_26 (Dense) (None, 80, 24, 42) 1806
dense_27 (Dense) (None, 80, 24, 24) 1032
Total params: 4,644
Trainable params: 4,644
Non-trainable params: 0

How does Caffe determine the number of neurons in each layer?

Recently, I've been trying to use Caffe for some of the deep learning work that I'm doing. Although writing the model in Caffe is very easy, I've not been able to know the answer to this question. How does Caffe determine the number of neurons in a hidden layer? I do know that determination of number of neurons in a layer and the number of hidden layers itself are problems that cannot be determined analytically and the use of 'thumb rules' is imperative in this regard. But is there a way to define or know the number of neurons in each layer in Caffe? And by default, how does Caffe inherently determine this?
Any help is much appreciated!
Caffe doesn't determine the number of neurons--the user does.
This is pulled straight from Caffe's website, here: http://caffe.berkeleyvision.org/tutorial/layers.html
For example, this is a convolution layer of 96 nodes (or neurons):
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
# learning rate and decay multipliers for the filters
param { lr_mult: 1 decay_mult: 1 }
# learning rate and decay multipliers for the biases
param { lr_mult: 2 decay_mult: 0 }
convolution_param {
num_output: 96 # learn 96 filters
kernel_size: 11 # each filter is 11x11
stride: 4 # step 4 pixels between each filter application
weight_filler {
type: "gaussian" # initialize the filters from a Gaussian
std: 0.01 # distribution with stdev 0.01 (default mean: 0)
}
bias_filler {
type: "constant" # initialize the biases to zero (0)
value: 0
}
}
}