NaN Loss and Quantity of Data - neural-network

everyone.
While I was training a neural network with a training dataset of 40000 objects, I was having problems related to the loss function being equal to Nan at each epoch. After sampling the dataset, using 50% of it, this problem wasn't occurring any more. I was wondering how the size of the training data would have an impact in this setting. I used the following function to do the training:
def train_test_net_notLinearCDF(X,y,coefficient,test_input):
# Neural network
model = Sequential()
model.add(Dense(80, activation="relu", input_dim=X.shape[1]))
model.add(Dense(20, activation="tanh"))
model.add(Dense(1, activation="linear"))
opt_adam = Adam(clipvalue=0.5)
model.compile(loss='mean_squared_error', optimizer=opt_adam)
history = model.fit(X, y, epochs=100, validation_split = 0.2,batch_size=32)
fig1 = plt.gcf()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.suptitle('MSE de treino e Validação ' + coefficient)
plt.ylabel('MSE')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
fig1.savefig('losses_varying_alpha_'+coefficient+'.png', dpi=300)
y_pred = model.predict(test_input)
return y_pred
Thanks in advance.

Related

Training a NN on top of cached embeddings from a pre-trained model, loss not going down?

I have some embeddings, which are the output of a pre-trained model, saved to disk. I am trying to perform a binary classification task of accept/reject. I have trained a simple neural network to perform classification, however, I am not seeing any decrease in the loss after some time.
Here is my NN, the cached embeddings are of shape 512:
from transformers.modeling_outputs import SequenceClassifierOutput
class ClassNet(nn.Module):
def __init__(self, num_labels=2):
super(ClassNet, self).__init__()
self.num_labels = num_labels
self.classifier = nn.Sequential(
nn.Linear(512, 256, bias=True),
nn.ReLU(inplace=True),
nn.Dropout(p=.5, inplace=False),
nn.Linear(256, 128, bias=True),
nn.ReLU(inplace=True),
nn.Dropout(p=.5, inplace=False),
nn.Linear(128, num_labels, bias=True)
)
def forward(self, inputs):
return self.classifier(inputs)
This is some random architechture that I am trying to over-fit to, but it seems that the network plateau's quickly on the training data. Could it be that my data is too complicated?
here's my training loop:
optimizer = optim.Adam(model.parameters(), lr=1e-4,weight_decay=5e-3) # L2 regularization
loss_fct=nn.CrossEntropyLoss()
model.train()
for epoch in range(10): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data['embeddings'], data['labels']
# zero the parameter gradients
optimizer.zero_grad()
outputs = model(inputs)
logits = outputs.squeeze(1)
loss = loss_fct(logits, labels.squeeze())
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
The loss is stuck at around .4 and doesn't really decrease at all after an epoch.
To give a little context, the pre-trained embeddings are the output of specially trained ViT model from HuggingFace, I am trying to perform a classification task directly on the outputs of that model by building a simple neural network on top of it.
Can anyone advise on what is going wrong? Also if anyone has any suggestions to get a better accuracy, I would love to hear it.

Using SGD on MNIST dataset with Pytorch, loss not decreasing

I tried to use SGD on MNIST dataset with batch size of 32, but the loss does not decrease at all.
I checked my model, loss function and read documentation but couldn't figure out what I've done wrong.
I defined my neural network as below
class classification(nn.Module):
def __init__(self):
super(classification, self).__init__()
# construct layers for a neural network
self.classifier1 = nn.Sequential(
nn.Linear(in_features=28*28, out_features=20*20),
nn.Sigmoid(),
)
self.classifier2 = nn.Sequential(
nn.Linear(in_features=20*20, out_features=10*10),
nn.Sigmoid(),
)
self.classifier3 = nn.Sequential(
nn.Linear(in_features=10*10, out_features=10),
nn.LogSoftmax(dim=1),
)
def forward(self, inputs): # [batchSize, 1, 28, 28]
x = inputs.view(inputs.size(0), -1) # [batchSize, 28*28]
x = self.classifier1(x) # [batchSize, 20*20]
x = self.classifier2(x) # [batchSize, 10*10]
out = self.classifier3(x) # [batchSize, 10]
return out
And I defined my training process as below
classifier = classification().to("cuda")
#optimizer
optimizer = torch.optim.SGD(classifier.parameters(), lr=learning_rate_value)
#loss function
criterion = nn.NLLLoss()
batch_size=32
epoch = 30
#array to save loss history
loss_train_arr=np.zeros(epoch)
#used DataLoader to make split batch
batched_train = torch.utils.data.DataLoader(training_set, batch_size, shuffle=True)
for i in range(epoch):
loss_train=0
#train and compute loss, accuracy
for img, label in batched_train:
img=img.to(device)
label=label.to(device)
optimizer.zero_grad()
predicted = classifier(img)
label_predicted = torch.argmax(predicted,dim=1)
loss = criterion(predicted, label)
loss.backward
optimizer.step()
loss_train += loss.item()
loss_train_arr[i]=loss_train/(len(batched_train.dataset)/batch_size)
I am using a model with LogSoftmax layer, so my loss function seems right. But the loss does not decrease at all.
If the code you posted is the exact code you use, the problem is that you don't actually call backward on the loss (missing parentheses ()).

Correct practice and approach for reporting the training and generalization performance

I am trying to learn the correct procedure for training a neural network for classification. Many tutorials are there but they never explain how to report for the generalization performance. Can somebody please tell me if the following is the correct method or not. I am using first 100 examples from the fisheriris data set that has labels 1,2 and call them as X and Y respectively. Then I split X into trainData and Xtest with a 90/10 split ratio. Using trainData I trained the NN model. Now the NN internally further splits trainData into tr,val,test subsets. My confusion is which one is usually used for generalization purpose when reporting the performance of the model to unseen data in conferences/Journals?
The dataset can be found in the link: https://www.mathworks.com/matlabcentral/fileexchange/71468-simple-neural-networks-with-k-fold-cross-validation-manner
rng('default')
load iris.mat;
X = [f(1:100,:) l(1:100)];
numExamples = size(X,1);
indx = randperm(numExamples);
X = X(indx,:);
Y = X(:,end);
split1 = cvpartition(Y,'Holdout',0.1,'Stratify',true); %90% trainval 10% test
istrainval = training(split1); % index for fitting
istest = test(split1); % indices for quality assessment
trainData = X(istrainval,:);
Xtest = X(istest,:);
Ytest = Y(istest);
numExamplesXtrainval = size(trainData,1);
indxXtrainval = randperm(numExamplesXtrainval);
trainData = trainData(indxXtrainval,:);
Ytrain = trainData(:,end);
hiddenLayerSize = 10;
% data format = rows = number of dim, column = number of examples
net = patternnet(hiddenLayerSize);
net = init(net);
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';
net.trainParam.epochs=50;
[net tr]= train(net,trainData', Ytrain');
Trained = sim(net, trainData'); %outputs predicted labels
train_predict = net(trainData');
performanceTrain = perform(net,Ytrain',train_predict)
lbl_train=grp2idx(Ytrain);
Yhat_train = (train_predict >= 0.5);
Lbl_Yhat_Train = grp2idx(Yhat_train);
[cmMatrixTrain]= confusionmat(lbl_train,Lbl_Yhat_Train )
accTrain=sum(lbl_train ==Lbl_Yhat_Train)/size(lbl_train,1);
disp(['Training Set: Total Train Acccuracy by MLP = ',num2str(100*accTrain ), '%'])
[confTest] = confusionmat(lbl_train(tr.testInd),Lbl_Yhat_Train(tr.testInd) )
%unknown test
test_predict = net(Xtest');
performanceTest = perform(net,Ytest',test_predict);
Yhat_test = (test_predict >= 0.5);
test_lbl=grp2idx(Ytest);
Lbl_Yhat_Test = grp2idx(Yhat_test);
[cmMatrix_Test]= confusionmat(test_lbl,Lbl_Yhat_Test )
This is the output.
Problem1: There seems to be no prediction for the other class. Why?
Problem2: Do I need a separate dataset like the one I created as Xtest for reporting generalization error or is it the practice to use the data trainData(tr.testInd,:) as the generalization test set? Did I create an unnecessary subset?
performanceTrain =
2.2204e-16
cmMatrixTrain =
45 0
45 0
Training Set: Total Train Acccuracy by MLP = 50%
confTest =
9 0
5 0
cmMatrix_Test =
5 0
5 0
There are a few issues with the code. Let's deal with them before answering your question. First, you set a threshold of 0.5 for making decisions (Yhat_train = (train_predict >= 0.5);) while all points of your net prediction are above 0.5. This means you only get zeros in your confusion matrices. You can plot the scores to choose a better threshold:
figure;
plot(train_predict(Ytrain == 1),'.b')
hold on
plot(train_predict(Ytrain == 2),'.r')
legend('label 1','label 2')
cvpartition gave me an error. It ran successfully as split1 = cvpartition(Y,'Holdout',0.1); In any case, artificial neural networks usuallly manage partitioning within the training process, so you feed in X and Y and some parameters for how to do it. See here for example: link where you set
net.divideParam.trainRatio = .4;
net.divideParam.valRatio = .3;
net.divideParam.testRatio = .3;
So how to report the results? Only for the test data. The train data will suffer from overfit, and will show false, too good results. If you use validation data (you havn't), then you cannot show results for it because it will also suffer from overfit. If you let the training do validation for you your test results will be safe from overfit.

pytorch linear regression given wrong results

I implemented a simple linear regression and I’m getting some poor results. Just wondering if these results are normal or I’m making some mistake.
I tried different optimizers and learning rates, I always get bad/poor results
Here is my code:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torch.autograd import Variable
class LinearRegressionPytorch(nn.Module):
def __init__(self, input_dim=1, output_dim=1):
super(LinearRegressionPytorch, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self,x):
x = x.view(x.size(0),-1)
y = self.linear(x)
return y
input_dim=1
output_dim = 1
if torch.cuda.is_available():
model = LinearRegressionPytorch(input_dim, output_dim).cuda()
else:
model = LinearRegressionPytorch(input_dim, output_dim)
criterium = nn.MSELoss()
l_rate =0.00001
optimizer = torch.optim.SGD(model.parameters(), lr=l_rate)
#optimizer = torch.optim.Adam(model.parameters(),lr=l_rate)
epochs = 100
#create data
x = np.random.uniform(0,10,size = 100) #np.linspace(0,10,100);
y = 6*x+5
mu = 0
sigma = 5
noise = np.random.normal(mu, sigma, len(y))
y_noise = y+noise
#pass it to pytorch
x_data = torch.from_numpy(x).float()
y_data = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = Variable(x_data).cuda()
target = Variable(y_data).cuda()
else:
inputs = Variable(x_data)
target = Variable(y_data)
for epoch in range(epochs):
#predict data
pred_y= model(inputs)
#compute loss
loss = criterium(pred_y, target)
#zero grad and optimization
optimizer.zero_grad()
loss.backward()
optimizer.step()
#if epoch % 50 == 0:
# print(f'epoch = {epoch}, loss = {loss.item()}')
#print params
for name, param in model.named_parameters():
if param.requires_grad:
print(name, param.data)
There are the poor results :
linear.weight tensor([[1.7374]], device='cuda:0')
linear.bias tensor([0.1815], device='cuda:0')
The results should be weight = 6 , bias = 5
Problem Solution
Actually your batch_size is problematic. If you have it set as one, your targetneeds the same shape as outputs (which you are, correctly, reshaping with view(-1, 1)).
Your loss should be defined like this:
loss = criterium(pred_y, target.view(-1, 1))
This network is correct
Results
Your results will not be bias=5 (yes, weight will go towards 6 indeed) as you are adding random noise to target (and as it's a single value for all your data points, only bias will be affected).
If you want bias equal to 5 remove addition of noise.
You should increase number of your epochs as well, as your data is quite small and network (linear regression in fact) is not really powerful. 10000 say should be fine and your loss should oscillate around 0 (if you change your noise to something sensible).
Noise
You are creating multiple gaussian distributions with different variations, hence your loss would be higher. Linear regression is unable to fit your data and find sensible bias (as the optimal slope is still approximately 6 for your noise, you may try to increase multiplication of 5 to 1000 and see what weight and bias will be learned).
Style (a little offtopic)
Please read documentation about PyTorch and keep your code up to date (e.g. Variable is deprecated in favor of Tensor and rightfully so).
This part of code:
x_data = torch.from_numpy(x).float()
y_data = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = Tensor(x_data).cuda()
target = Tensor(y_data).cuda()
else:
inputs = Tensor(x_data)
target = Tensor(y_data)
Could be written succinctly like this (without much thought):
inputs = torch.from_numpy(x).float()
target = torch.from_numpy(y_noise).float()
if torch.cuda.is_available():
inputs = inputs.cuda()
target = target.cuda()
I know deep learning has it's reputation for bad code and fatal practice, but please do not help spreading this approach.

How to perform two group classification with deep neural network? (Matlab)

I'm new in machine learning (and to stackoverflow as well) and i want to make some classification tasks. I performed two group classifications on my data set (field of speech acoustics) with LIBSVM and Matlab's Pattern Recignition Tool from the Neural network toolbox to create a simple network with one hidden layer. In the hope of higher classification results i want to try Deep Neural Networks, and i found this code: http://www.mathworks.com/matlabcentral/fileexchange/42853-deep-neural-network
I have some difficulty understanding it.
My data is constructed of 127 samples of 19 parameters, so my input number is 19. I want to classify them in two groups: 0 and 1, so my output number is 1. The values in my data set are normalized between 0 and 1.
My code is the following:
clear all
clc
addpath('..');
load('data.mat')
inputdata = inputs;
outputdata = outputs;
datanum = 127;
outputnum = 1;
hiddennum = 3;
inputnum = 19;
% rbm = randRBM(inputnum, outputnum);
% rbm = pretrainRBM( rbm, inputdata );
dbn = randDBN([inputnum, hiddennum, outputnum]);
dbn = pretrainDBN( dbn, inputdata );
dbn = SetLinearMapping( dbn, inputdata, outputdata );
dbn = trainDBN( dbn, inputdata, outputdata );
estimate = v2h( dbn, inputdata )
[rmse AveErrNum] = CalcRmse(dbn, inputdata, outputdata)
The code runs. The rmse is 0.4183, the AveErrNum is 0.1969. What i need is the classification accuracy between my targets (stored in outputdata) and the networks predictions (Accuracy = data classified correctly / all data).
Where do i find the networks predictions after binarization?
Do I use the right type of network for my classification?
Don't I need to divide my data into Training, Validation and Testing samples (like in the case of a simple neural network with one hidden layer)?
Thanks in advance for any help!