High loss in neural network sequence classification - neural-network

I am using neural network to classify sequence of length 340 to 8 classes, I am using cross entropy as loss. I am getting very high number for the loss . I am wondering if I did mistake in calculating the loss for each epoch. Or should i use other loss function .
criterion = nn.CrossEntropyLoss()
if CUDA:
criterion = criterion.cuda()
optimizer = optim.SGD(model.parameters(), lr=LEARNING_RATE, momentum=0.9)
loss_list = []
for epoch in range(N_EPOCHES):
tot_loss=0
running_loss =0
model.train()
loss_values = []
acc_list = []
acc_list = torch.FloatTensor(acc_list)
sum_acc = 0
# Training
for i, (seq_batch, stat_batch) in enumerate(training_generator):
# Transfer to GPU
seq_batch, stat_batch = seq_batch.to(device), stat_batch.to(device)
optimizer.zero_grad()
# Model computation
seq_batch = seq_batch.unsqueeze(-1)
outputs = model(seq_batch)
loss = criterion(outputs.argmax(1), stat_batch.argmax(1))
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()*seq_batch.size(0)
loss_values.append(running_loss/len(training_set))
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 50000),"acc",(outputs.argmax(1) == stat_batch.argmax(1)).float().mean())
running_loss = 0.0
sum_acc += (outputs.argmax(1) == stat_batch.argmax(1)).float().sum()
print("epoch" , epoch, "acc", sum_acc/len(training_generator))
print('Finished Training')
[1, 2000] loss: 14.205 acc tensor(0.5312, device='cuda:0')
[1, 4000] loss: 13.377 acc tensor(0.4922, device='cuda:0')
[1, 6000] loss: 13.159 acc tensor(0.5508, device='cuda:0')
[1, 8000] loss: 13.050 acc tensor(0.5547, device='cuda:0')
[1, 10000] loss: 12.974 acc tensor(0.4883, device='cuda:0')
epoch 1 acc tensor(133.6352, device='cuda:0')
[2, 2000] loss: 12.833 acc tensor(0.5781, device='cuda:0')
[2, 4000] loss: 12.834 acc tensor(0.5391, device='cuda:0')
[2, 6000] loss: 12.782 acc tensor(0.5195, device='cuda:0')
[2, 8000] loss: 12.774 acc tensor(0.5508, device='cuda:0')
[2, 10000] loss: 12.762 acc tensor(0.5156, device='cuda:0')
epoch 2 acc tensor(139.2496, device='cuda:0')
[3, 2000] loss: 12.636 acc tensor(0.5469, device='cuda:0')
[3, 4000] loss: 12.640 acc tensor(0.5469, device='cuda:0')
[3, 6000] loss: 12.648 acc tensor(0.5508, device='cuda:0')
[3, 8000] loss: 12.637 acc tensor(0.5586, device='cuda:0')
[3, 10000] loss: 12.620 acc tensor(0.6016, device='cuda:0')
epoch 3 acc tensor(140.6962, device='cuda:0')
[4, 2000] loss: 12.520 acc tensor(0.5547, device='cuda:0')
[4, 4000] loss: 12.541 acc tensor(0.5664, device='cuda:0')
[4, 6000] loss: 12.538 acc tensor(0.5430, device='cuda:0')
[4, 8000] loss: 12.535 acc tensor(0.5547, device='cuda:0')
[4, 10000] loss: 12.548 acc tensor(0.5820, device='cuda:0')
epoch 4 acc tensor(141.6522, device='cuda:0')

I am getting very high number for the loss
What makes you think this is high? What do you compare this to?
Yes, you should use nn.CrossEntropyLoss for multi-class classification tasks. And your training loss seems perfectly fine to me. At initialization, you should have loss = -log(1/8) = ~2.

Related

Why does loss decrease but accuracy decreases too (Pytorch, LSTM)?

I have built a model with LSTM - Linear modules in Pytorch for a classification problem (10 classes). I am training the model and for each epoch I output the loss and accuracy in the training set. The ouput is as follows:
epoch: 0 start!
Loss: 2.301875352859497
Acc: 0.11388888888888889
epoch: 1 start!
Loss: 2.2759320735931396
Acc: 0.29
epoch: 2 start!
Loss: 2.2510263919830322
Acc: 0.4872222222222222
epoch: 3 start!
Loss: 2.225804567337036
Acc: 0.6066666666666667
epoch: 4 start!
Loss: 2.199286699295044
Acc: 0.6511111111111111
epoch: 5 start!
Loss: 2.1704766750335693
Acc: 0.6855555555555556
epoch: 6 start!
Loss: 2.1381614208221436
Acc: 0.7038888888888889
epoch: 7 start!
Loss: 2.1007182598114014
Acc: 0.7194444444444444
epoch: 8 start!
Loss: 2.0557992458343506
Acc: 0.7283333333333334
epoch: 9 start!
Loss: 1.9998993873596191
Acc: 0.7427777777777778
epoch: 10 start!
Loss: 1.9277743101119995
Acc: 0.7527777777777778
epoch: 11 start!
Loss: 1.8325848579406738
Acc: 0.7483333333333333
epoch: 12 start!
Loss: 1.712520718574524
Acc: 0.7077777777777777
epoch: 13 start!
Loss: 1.6056485176086426
Acc: 0.6305555555555555
epoch: 14 start!
Loss: 1.5910680294036865
Acc: 0.4938888888888889
epoch: 15 start!
Loss: 1.6259561777114868
Acc: 0.41555555555555557
epoch: 16 start!
Loss: 1.892195224761963
Acc: 0.3655555555555556
epoch: 17 start!
Loss: 1.4949012994766235
Acc: 0.47944444444444445
epoch: 18 start!
Loss: 1.4332982301712036
Acc: 0.48833333333333334
For loss function I have used nn.CrossEntropyLoss and Adam Optimizer.
Although the loss is constantly decreasing, the accuracy increases until epoch 10 and then begins for some reason to decrease.
Why is this happening ?
Even if my model is overfitting, doesn't that mean that the accuracy should be high ?? (always speaking for accuracy and loss measured on the training set, not the validation set)
Decreasing loss does not mean improving accuracy always.
I will try to address this for the cross-entropy loss.
CE-loss= sum (-log p(y=i))
Note that loss will decrease if the probability of correct class increases and loss increases if the probability of correct class decreases. Now, when you compute average loss, you are averaging over all the samples, some of the probabilities may increase and some of them can decrease, making overall loss smaller but also accuracy drops.

Is it normal in PyTorch for accuracy to increase and decrease repeatedly?

I am new to PyTorch, currently working on a Transfer Learning simple code. When I am training my model, I am getting a big variance between increase and decrease of the accuracy and loss. I trained the network for 50 epochs, and below is the result:
Epoch [1/50], Loss: 0.5477, Train Accuracy: 63%
Epoch [2/50], Loss: 2.1935, Train Accuracy: 75%
Epoch [3/50], Loss: 1.8811, Train Accuracy: 79%
Epoch [4/50], Loss: 0.0671, Train Accuracy: 77%
Epoch [5/50], Loss: 0.2522, Train Accuracy: 80%
Epoch [6/50], Loss: 0.0962, Train Accuracy: 88%
Epoch [7/50], Loss: 1.8883, Train Accuracy: 74%
Epoch [8/50], Loss: 0.3565, Train Accuracy: 83%
Epoch [9/50], Loss: 0.0228, Train Accuracy: 81%
Epoch [10/50], Loss: 0.0124, Train Accuracy: 81%
Epoch [11/50], Loss: 0.0252, Train Accuracy: 84%
Epoch [12/50], Loss: 0.5184, Train Accuracy: 81%
Epoch [13/50], Loss: 0.1233, Train Accuracy: 86%
Epoch [14/50], Loss: 0.1704, Train Accuracy: 82%
Epoch [15/50], Loss: 2.3164, Train Accuracy: 79%
Epoch [16/50], Loss: 0.0294, Train Accuracy: 85%
Epoch [17/50], Loss: 0.2860, Train Accuracy: 85%
Epoch [18/50], Loss: 1.5114, Train Accuracy: 81%
Epoch [19/50], Loss: 0.1136, Train Accuracy: 86%
Epoch [20/50], Loss: 0.0062, Train Accuracy: 80%
Epoch [21/50], Loss: 0.0748, Train Accuracy: 84%
Epoch [22/50], Loss: 0.1848, Train Accuracy: 84%
Epoch [23/50], Loss: 0.1693, Train Accuracy: 81%
Epoch [24/50], Loss: 0.1297, Train Accuracy: 77%
Epoch [25/50], Loss: 0.1358, Train Accuracy: 78%
Epoch [26/50], Loss: 2.3172, Train Accuracy: 75%
Epoch [27/50], Loss: 0.1772, Train Accuracy: 79%
Epoch [28/50], Loss: 0.0201, Train Accuracy: 80%
Epoch [29/50], Loss: 0.3810, Train Accuracy: 84%
Epoch [30/50], Loss: 0.7281, Train Accuracy: 79%
Epoch [31/50], Loss: 0.1918, Train Accuracy: 81%
Epoch [32/50], Loss: 0.3289, Train Accuracy: 88%
Epoch [33/50], Loss: 1.2363, Train Accuracy: 81%
Epoch [34/50], Loss: 0.0362, Train Accuracy: 89%
Epoch [35/50], Loss: 0.0303, Train Accuracy: 90%
Epoch [36/50], Loss: 1.1700, Train Accuracy: 81%
Epoch [37/50], Loss: 0.0031, Train Accuracy: 81%
Epoch [38/50], Loss: 0.1496, Train Accuracy: 81%
Epoch [39/50], Loss: 0.5070, Train Accuracy: 76%
Epoch [40/50], Loss: 0.1984, Train Accuracy: 77%
Epoch [41/50], Loss: 0.1152, Train Accuracy: 79%
Epoch [42/50], Loss: 0.0603, Train Accuracy: 82%
Epoch [43/50], Loss: 0.2293, Train Accuracy: 84%
Epoch [44/50], Loss: 0.1304, Train Accuracy: 80%
Epoch [45/50], Loss: 0.0381, Train Accuracy: 82%
Epoch [46/50], Loss: 0.1833, Train Accuracy: 84%
Epoch [47/50], Loss: 0.0222, Train Accuracy: 84%
Epoch [48/50], Loss: 0.0010, Train Accuracy: 81%
Epoch [49/50], Loss: 1.0852, Train Accuracy: 79%
Epoch [50/50], Loss: 0.0167, Train Accuracy: 83%
There are some epochs that have a much better accuracy and loss than others. However, the model loses them in later epochs. As I know, the accuracy should improve every epoch. Did I write the training code wrongly? If not, then is that normal? Any way to solve it? Shall the previous accuracy be saved and only if the accuracy of the next epoch is greater than the previous one then train one more epoch? I have been working on Keras previously, and I haven't experienced that problem. I am fine tuning the resent by freezing previous weights and adding only 2 classes for the final layer. Below is my code:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
num_epochs = 50
for epoch in range (num_epochs):
#Reset the correct to 0 after passing through all the dataset
correct = 0
for images,labels in dataloaders['train']:
images = Variable(images)
labels = Variable(labels)
if torch.cuda.is_available():
images = images.cuda()
labels = labels.cuda()
optimizer.zero_grad()
outputs = model_conv(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
_, predicted = torch.max(outputs, 1)
correct += (predicted == labels).sum()
train_acc = 100 * correct / dataset_sizes['train']
print ('Epoch [{}/{}], Loss: {:.4f}, Train Accuracy: {}%'
.format(epoch+1, num_epochs, loss.item(), train_acc))
I would say it depends on dataset and architecture. Hence, fluctuations are normal, but in general loss should improve.It could be a result of noise in the test dataset, i.e. wrongly labeled examples.
If the test accuracy starts to decrease it might be that your network is overfitting.
You might want to stop the learning just before you reach that point or take other steps to counter the overfitting problem.
Is it normal in PyTorch for accuracy to increase and decrease repeatedly
It should always go down compared on the one epoch level.
Compared to the one batch level it may fluctuate, but generally it should get smaller over time since this is the whole point when we minimize the loss we are improving accuracy.

TFLEARN multivariable regression does not converge (attempting to duplicate fitlab fitnet)

I am trying to write a model in TFLEARN to fit to 16 parameters.
I have previously run this same experiment in Matlab using the "fitnet" function with 2 hidden layers of 2000 and 1500 nodes.
I am attempting to replicate these results in tensorflow before exploring other architectures/descent algos/hyperparameter tuning. I have done some research and determined the matlab fitnet function uses tanh nodes for hidden layers and linear for output. Also, the descent algorithm is defaulted to levenberg-Marquardt, but worked for me with other (sgd) algorithms as well.
It appears that the accuracy is maxing out around .2, and then oscillating below this over successive epochs. I did not see this behavior in matlab.
My TFLEARN code looks like:
tnorm = tflearn.initializations.uniform_scaling()
adam = tflearn.optimizers.Adam (learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
# network building
input_data = tflearn.input_data(shape=[None, np.shape(prepared_x)[1]])
fc1 = tflearn.fully_connected(input_data, 2000,activation='tanh',weights_init=tnorm)
fc2 = tflearn.fully_connected(fc1,1500,activation='tanh',weights_init=tnorm)
output = tflearn.fully_connected(fc2, 16, activation='linear',weights_init=tnorm)
network = tflearn.regression(output, optimizer=adam, loss='mean_square')
#define model with checkpoints
model = tflearn.DNN(network, tensorboard_dir='output/', tensorboard_verbose=3, checkpoint_path='output')
#Train Model
model.fit(prepared_x, prepared_t, n_epoch=5, batch_size=100,shuffle=True, show_metric=True, snapshot_epoch=False,validation_set=0.1 )
#save
model.save('TFLEARN_FC_final.tfl')
The output of the traing session looks like:
Run id: UTSD6N
Log directory: output/
[?25l---------------------------------
Training samples: 43200
Validation samples: 4800
--
Training Step: 1
[2K
| Adam | epoch: 000 | loss: 0.00000 - acc: 0.0000 -- iter: 00100/43200
[A[ATraining Step: 2 | total loss: [1m[32m0.67871[0m[0m
[2K
| Adam | epoch: 000 | loss: 0.67871 - acc: 0.0455 -- iter: 00200/43200
[A[ATraining Step: 3 | total loss: [1m[32m33.14599[0m[0m
[2K
| Adam | epoch: 000 | loss: 33.14599 - acc: 0.0082 -- iter: 00300/43200
[A[ATraining Step: 4 | total loss: [1m[32m28.01067[0m[0m
[2K
| Adam | epoch: 000 | loss: 28.01067 - acc: 0.0021 -- iter: 00400/43200
[A[ATraining Step: 5 | total loss: [1m[32m17.35706[0m[0m
[2K
| Adam | epoch: 000 | loss: 17.35706 - acc: 0.0006 -- iter: 00500/43200
[A[ATraining Step: 6 | total loss: [1m[32m9.73368[0m[0m
[2K
| Adam | epoch: 000 | loss: 9.73368 - acc: 0.0002 -- iter: 00600/43200
[A[ATraining Step: 7 | total loss: [1m[32m5.19867[0m[0m
[2K
| Adam | epoch: 000 | loss: 5.19867 - acc: 0.0001 -- iter: 00700/43200
[A[ATraining Step: 8 | total loss: [1m[32m3.54779[0m[0m
[2K
| Adam | epoch: 000 | loss: 3.54779 - acc: 0.0113 -- iter: 00800/43200
[A[ATraining Step: 9 | total loss: [1m[32m3.80998[0m[0m
[2K
| Adam | epoch: 000 | loss: 3.80998 - acc: 0.0106 -- iter: 00900/43200
[A[ATraining Step: 10 | total loss: [1m[32m4.33370[0m[0m
[2K
| Adam | epoch: 000 | loss: 4.33370 - acc: 0.0053 -- iter: 01000/43200
[A[ATraining Step: 11 | total loss: [1m[32m4.24100[0m[0m
[2K
...
[2K
| Adam | epoch: 004 | loss: 0.02448 - acc: 0.1817 -- iter: 42800/43200
[A[ATraining Step: 2157 | total loss: [1m[32m0.02633[0m[0m
[2K
| Adam | epoch: 004 | loss: 0.02633 - acc: 0.1875 -- iter: 42900/43200
[A[ATraining Step: 2158 | total loss: [1m[32m0.02509[0m[0m
[2K
| Adam | epoch: 004 | loss: 0.02509 - acc: 0.1688 -- iter: 43000/43200
[A[ATraining Step: 2159 | total loss: [1m[32m0.02525[0m[0m
[2K
| Adam | epoch: 004 | loss: 0.02525 - acc: 0.1529 -- iter: 43100/43200
[A[ATraining Step: 2160 | total loss: [1m[32m0.02695[0m[0m
[2K
| Adam | epoch: 005 | loss: 0.02695 - acc: 0.1456 -- iter: 43200/43200
image of accuracy/loss from tensorboard
Any suggestions would be much appreciated.
For any future lurkers -- I solved my own problem by fixing the descent algorithm.
The default learning rate for the Adam optimizer is .001 but this was too high, I had to switch to .005 for convergence.

matlab: zero loop in neural network

everyone! I wanna have neural network with loop. I create neural network
net = newff(rand(5, 100), rand(1, 100), [3, 2], { 'tansig' 'tansig'}, 'traingdx3', 'learngdm', 'mse');
net.layerConnect(1, 1) = 1;
net.layerWeights{1, 1}.delays = [1];
net.trainParam.epochs = 100;
net = train(net, rand(5, 100), rand(1, 100));
and
net.LW
ans =
[3x3 double] [] []
[2x3 double] [] []
[] [1x2 double] []
but when I view net.LW{1, 1} , I get
net.LW{1, 1}
ans =
0 0 0
0 0 0
0 0 0
why loop weights always zeros?

Is it possible to change the inequality behaviour of interp1 when using 'previous' or 'next'

Consider as examples:
interp1([0, 1], [2, 3], 0 , 'previous')
interp1([0, 1], [2, 3], 0 , 'next')
which produces
ans =
2
ans =
2
That is, in each respective case it finds the value in [0, 1] closest to and not exceeding 0 (respectively closest to and not below 0), then returns the corresponding value of [2,3]. I would like to change the second condition to "closest to and above 0", that is, it should return the same results as:
interp1([0, 1], [2, 3], 0.1 , 'previous')
interp1([0, 1], [2, 3], 0.1 , 'next')
which gives
ans =
2
ans =
3
In this case this works as 0.1 is a value in between [0, 1].