I'm using the Deep SVDD on CIFAR10 for one-class classification. When I change the L2 norm to Lp for p<1 I got nans after some epochs.
It is working for loss= torch.mean((outputs - inputs)2)
But I got nan for loss= torch.mean((abs(outputs - inputs))(0.9))
The loss for each epoch is shown here:
INFO:root: Epoch 1/50 Time: 1.514 Loss: 84.51767029
INFO:root: Epoch 2/50 Time: 1.617 Loss: 82.70055634
INFO:root: Epoch 3/50 Time: 1.528 Loss: 80.92372467
INFO:root: Epoch 4/50 Time: 1.612 Loss: 79.23560699
INFO:root: Epoch 5/50 Time: 1.495 Loss: 77.56893951
INFO:root: Epoch 6/50 Time: 1.596 Loss: 75.95311737
INFO:root: Epoch 7/50 Time: 1.504 Loss: 74.40722260
INFO:root: Epoch 8/50 Time: 1.593 Loss: 72.84329010
INFO:root: Epoch 9/50 Time: 1.639 Loss: 71.34644287
INFO:root: Epoch 10/50 Time: 1.578 Loss: 69.86484253
INFO:root: Epoch 11/50 Time: 1.553 Loss: 68.41005692
INFO:root: Epoch 12/50 Time: 1.670 Loss: 66.96582977
INFO:root: Epoch 13/50 Time: 1.607 Loss: 65.56927887
INFO:root: Epoch 14/50 Time: 1.573 Loss: 64.20584961
INFO:root: Epoch 15/50 Time: 1.605 Loss: 62.85230591
INFO:root: Epoch 16/50 Time: 1.483 Loss: 61.53305466
INFO:root: Epoch 17/50 Time: 1.616 Loss: 60.22836166
INFO:root: Epoch 18/50 Time: 1.499 Loss: 58.94760498
INFO:root: Epoch 19/50 Time: 1.611 Loss: 57.73990845
INFO:root: Epoch 20/50 Time: 1.507 Loss: 56.51732086
INFO:root: Epoch 21/50 Time: 1.624 Loss: 55.30994400
INFO:root: Epoch 22/50 Time: 1.482 Loss: 54.13251587
INFO:root: Epoch 23/50 Time: 1.606 Loss: 52.98952118
INFO:root: Epoch 24/50 Time: 1.508 Loss: 51.86713654
INFO:root: Epoch 25/50 Time: 1.587 Loss: 50.76639069
INFO:root: Epoch 26/50 Time: 1.523 Loss: 49.68750381
INFO:root: Epoch 27/50 Time: 1.574 Loss: 48.62197098
INFO:root: Epoch 28/50 Time: 1.537 Loss: 47.59307220
INFO:root: Epoch 29/50 Time: 1.560 Loss: 46.58890167
INFO:root: Epoch 30/50 Time: 1.607 Loss: 45.59774643
INFO:root: Epoch 31/50 Time: 1.504 Loss: 44.61755203
INFO:root: Epoch 32/50 Time: 1.592 Loss: 43.67579239
INFO:root: Epoch 33/50 Time: 1.480 Loss: 42.76135941
INFO:root: Epoch 34/50 Time: 1.577 Loss: 41.84933487
INFO:root: Epoch 35/50 Time: 1.488 Loss: 40.96647171
INFO:root: Epoch 36/50 Time: 1.596 Loss: 40.10220779
INFO:root: Epoch 37/50 Time: 1.534 Loss: 39.26658310
INFO:root: Epoch 38/50 Time: 1.615 Loss: 38.44916168
INFO:root: Epoch 39/50 Time: 1.518 Loss: nan
INFO:root: Epoch 40/50 Time: 1.574 Loss: nan
INFO:root: Epoch 41/50 Time: 1.511 Loss: nan
INFO:root: Epoch 42/50 Time: 1.556 Loss: nan
INFO:root: Epoch 43/50 Time: 1.565 Loss: nan
INFO:root: Epoch 44/50 Time: 1.561 Loss: nan
INFO:root: Epoch 45/50 Time: 1.600 Loss: nan
INFO:root: Epoch 46/50 Time: 1.518 Loss: nan
INFO:root: Epoch 47/50 Time: 1.618 Loss: nan
INFO:root: Epoch 48/50 Time: 1.540 Loss: nan
INFO:root: Epoch 49/50 Time: 1.591 Loss: nan
INFO:root: Epoch 50/50 Time: 1.504 Loss: nan
For different learning rates and output dimensions still, the network returns nan after some epochs
I am new in cnn, and I wanted to know how may I improve my model? Augmentation is already done. Thanks in advance.
model = Sequential()
model.add(Conv2D(16, (3,3), activation='relu', strides=(1,1),
padding='same', input_shape=input_shape))
model.add(Conv2D(32, (3,3), activation='relu', strides=(1,1),
padding='same'))
model.add(Conv2D(64, (3,3), activation='relu', strides=(1,1),
padding='same'))
#model.add(Conv2D(128, (3,3), activation='relu', strides=(1,1),
# padding='same'))
#model.add(MaxPool2D(2,2))#AveragePooling2D
model.add(AveragePooling2D(2,2))#AveragePooling2D
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()
#opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
optimizer= "Adam",
metrics=\['acc'\] )][1]][1]
history = model.fit(X, y, epochs=150, batch_size=32,
shuffle=True, validation_split=0.1
callbacks = [checkpoint])
Epoch 00140: val_acc did not improve from 0.93082
Epoch 141/150
28620/28620 [==============================] - 37s 1ms/step - loss: 0.1654 - acc: 0.9401 - val_loss: 0.2388 - val_acc: 0.9267
Epoch 00141: val_acc did not improve from 0.93082
Epoch 142/150
28620/28620 [==============================] - 38s 1ms/step - loss: 0.1314 - acc: 0.9516 - val_loss: 0.2728 - val_acc: 0.9091
Epoch 00142: val_acc did not improve from 0.93082
Epoch 143/150
28620/28620 [==============================] - 37s 1ms/step - loss: 0.1425 - acc: 0.9476 - val_loss: 0.2439 - val_acc: 0.9242
Epoch 00143: val_acc did not improve from 0.93082
Epoch 144/150
28620/28620 [==============================] - 37s 1ms/step - loss: 0.1434 - acc: 0.9473 - val_loss: 0.3709 - val_acc: 0.8824
Epoch 00144: val_acc did not improve from 0.93082
Epoch 145/150
28620/28620 [==============================] - 37s 1ms/step - loss: 0.1483 - acc: 0.9468 - val_loss: 0.2544 - val_acc: 0.9208
Epoch 00145: val_acc did not improve from 0.93082
Epoch 146/150
28620/28620 [==============================] - 35s 1ms/step - loss: 0.1366 - acc: 0.9501 - val_loss: 0.2872 - val_acc: 0.9110
Epoch 00146: val_acc did not improve from 0.93082
Epoch 147/150
28620/28620 [==============================] - 36s 1ms/step - loss: 0.1476 - acc: 0.9465 - val_loss: 0.3147 - val_acc: 0.9013
Epoch 00147: val_acc did not improve from 0.93082
Epoch 148/150
28620/28620 [==============================] - 36s 1ms/step - loss: 0.1391 - acc: 0.9486 - val_loss: 0.2838 - val_acc: 0.9069
Epoch 00148: val_acc did not improve from 0.93082
Epoch 149/150
28620/28620 [==============================] - 35s 1ms/step - loss: 0.1392 - acc: 0.9486 - val_loss: 0.2541 - val_acc: 0.9211
Epoch 00149: val_acc did not improve from 0.93082
Epoch 150/150
28620/28620 [==============================] - 37s 1ms/step - loss: 0.1401 - acc: 0.9489 - val_loss: 0.2213 - val_acc: 0.9308
Epoch 00150: val_acc did not improve from 0.93082
I am new to PyTorch, currently working on a Transfer Learning simple code. When I am training my model, I am getting a big variance between increase and decrease of the accuracy and loss. I trained the network for 50 epochs, and below is the result:
Epoch [1/50], Loss: 0.5477, Train Accuracy: 63%
Epoch [2/50], Loss: 2.1935, Train Accuracy: 75%
Epoch [3/50], Loss: 1.8811, Train Accuracy: 79%
Epoch [4/50], Loss: 0.0671, Train Accuracy: 77%
Epoch [5/50], Loss: 0.2522, Train Accuracy: 80%
Epoch [6/50], Loss: 0.0962, Train Accuracy: 88%
Epoch [7/50], Loss: 1.8883, Train Accuracy: 74%
Epoch [8/50], Loss: 0.3565, Train Accuracy: 83%
Epoch [9/50], Loss: 0.0228, Train Accuracy: 81%
Epoch [10/50], Loss: 0.0124, Train Accuracy: 81%
Epoch [11/50], Loss: 0.0252, Train Accuracy: 84%
Epoch [12/50], Loss: 0.5184, Train Accuracy: 81%
Epoch [13/50], Loss: 0.1233, Train Accuracy: 86%
Epoch [14/50], Loss: 0.1704, Train Accuracy: 82%
Epoch [15/50], Loss: 2.3164, Train Accuracy: 79%
Epoch [16/50], Loss: 0.0294, Train Accuracy: 85%
Epoch [17/50], Loss: 0.2860, Train Accuracy: 85%
Epoch [18/50], Loss: 1.5114, Train Accuracy: 81%
Epoch [19/50], Loss: 0.1136, Train Accuracy: 86%
Epoch [20/50], Loss: 0.0062, Train Accuracy: 80%
Epoch [21/50], Loss: 0.0748, Train Accuracy: 84%
Epoch [22/50], Loss: 0.1848, Train Accuracy: 84%
Epoch [23/50], Loss: 0.1693, Train Accuracy: 81%
Epoch [24/50], Loss: 0.1297, Train Accuracy: 77%
Epoch [25/50], Loss: 0.1358, Train Accuracy: 78%
Epoch [26/50], Loss: 2.3172, Train Accuracy: 75%
Epoch [27/50], Loss: 0.1772, Train Accuracy: 79%
Epoch [28/50], Loss: 0.0201, Train Accuracy: 80%
Epoch [29/50], Loss: 0.3810, Train Accuracy: 84%
Epoch [30/50], Loss: 0.7281, Train Accuracy: 79%
Epoch [31/50], Loss: 0.1918, Train Accuracy: 81%
Epoch [32/50], Loss: 0.3289, Train Accuracy: 88%
Epoch [33/50], Loss: 1.2363, Train Accuracy: 81%
Epoch [34/50], Loss: 0.0362, Train Accuracy: 89%
Epoch [35/50], Loss: 0.0303, Train Accuracy: 90%
Epoch [36/50], Loss: 1.1700, Train Accuracy: 81%
Epoch [37/50], Loss: 0.0031, Train Accuracy: 81%
Epoch [38/50], Loss: 0.1496, Train Accuracy: 81%
Epoch [39/50], Loss: 0.5070, Train Accuracy: 76%
Epoch [40/50], Loss: 0.1984, Train Accuracy: 77%
Epoch [41/50], Loss: 0.1152, Train Accuracy: 79%
Epoch [42/50], Loss: 0.0603, Train Accuracy: 82%
Epoch [43/50], Loss: 0.2293, Train Accuracy: 84%
Epoch [44/50], Loss: 0.1304, Train Accuracy: 80%
Epoch [45/50], Loss: 0.0381, Train Accuracy: 82%
Epoch [46/50], Loss: 0.1833, Train Accuracy: 84%
Epoch [47/50], Loss: 0.0222, Train Accuracy: 84%
Epoch [48/50], Loss: 0.0010, Train Accuracy: 81%
Epoch [49/50], Loss: 1.0852, Train Accuracy: 79%
Epoch [50/50], Loss: 0.0167, Train Accuracy: 83%
There are some epochs that have a much better accuracy and loss than others. However, the model loses them in later epochs. As I know, the accuracy should improve every epoch. Did I write the training code wrongly? If not, then is that normal? Any way to solve it? Shall the previous accuracy be saved and only if the accuracy of the next epoch is greater than the previous one then train one more epoch? I have been working on Keras previously, and I haven't experienced that problem. I am fine tuning the resent by freezing previous weights and adding only 2 classes for the final layer. Below is my code:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
num_epochs = 50
for epoch in range (num_epochs):
#Reset the correct to 0 after passing through all the dataset
correct = 0
for images,labels in dataloaders['train']:
images = Variable(images)
labels = Variable(labels)
if torch.cuda.is_available():
images = images.cuda()
labels = labels.cuda()
optimizer.zero_grad()
outputs = model_conv(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
_, predicted = torch.max(outputs, 1)
correct += (predicted == labels).sum()
train_acc = 100 * correct / dataset_sizes['train']
print ('Epoch [{}/{}], Loss: {:.4f}, Train Accuracy: {}%'
.format(epoch+1, num_epochs, loss.item(), train_acc))
I would say it depends on dataset and architecture. Hence, fluctuations are normal, but in general loss should improve.It could be a result of noise in the test dataset, i.e. wrongly labeled examples.
If the test accuracy starts to decrease it might be that your network is overfitting.
You might want to stop the learning just before you reach that point or take other steps to counter the overfitting problem.
Is it normal in PyTorch for accuracy to increase and decrease repeatedly
It should always go down compared on the one epoch level.
Compared to the one batch level it may fluctuate, but generally it should get smaller over time since this is the whole point when we minimize the loss we are improving accuracy.
Im using a neural network implemented with the Keras library and below is the results during training. At the end it prints a test score and a test accuracy. I can't figure out exactly what the score represents, but the accuracy I assume to be the number of predictions that was correct when running the test.
Epoch 1/15 1200/1200 [==============================] - 4s - loss:
0.6815 - acc: 0.5550 - val_loss: 0.6120 - val_acc: 0.7525
Epoch 2/15 1200/1200 [==============================] - 3s - loss:
0.5481 - acc: 0.7250 - val_loss: 0.4645 - val_acc: 0.8025
Epoch 3/15 1200/1200 [==============================] - 3s - loss:
0.5078 - acc: 0.7558 - val_loss: 0.4354 - val_acc: 0.7975
Epoch 4/15 1200/1200 [==============================] - 3s - loss:
0.4603 - acc: 0.7875 - val_loss: 0.3978 - val_acc: 0.8350
Epoch 5/15 1200/1200 [==============================] - 3s - loss:
0.4367 - acc: 0.7992 - val_loss: 0.3809 - val_acc: 0.8300
Epoch 6/15 1200/1200 [==============================] - 3s - loss:
0.4276 - acc: 0.8017 - val_loss: 0.3884 - val_acc: 0.8350
Epoch 7/15 1200/1200 [==============================] - 3s - loss:
0.3975 - acc: 0.8167 - val_loss: 0.3666 - val_acc: 0.8400
Epoch 8/15 1200/1200 [==============================] - 3s - loss:
0.3916 - acc: 0.8183 - val_loss: 0.3753 - val_acc: 0.8450
Epoch 9/15 1200/1200 [==============================] - 3s - loss:
0.3814 - acc: 0.8233 - val_loss: 0.3505 - val_acc: 0.8475
Epoch 10/15 1200/1200 [==============================] - 3s - loss:
0.3842 - acc: 0.8342 - val_loss: 0.3672 - val_acc: 0.8450
Epoch 11/15 1200/1200 [==============================] - 3s - loss:
0.3674 - acc: 0.8375 - val_loss: 0.3383 - val_acc: 0.8525
Epoch 12/15 1200/1200 [==============================] - 3s - loss:
0.3624 - acc: 0.8367 - val_loss: 0.3423 - val_acc: 0.8650
Epoch 13/15 1200/1200 [==============================] - 3s - loss:
0.3497 - acc: 0.8475 - val_loss: 0.3069 - val_acc: 0.8825
Epoch 14/15 1200/1200 [==============================] - 3s - loss:
0.3406 - acc: 0.8500 - val_loss: 0.2993 - val_acc: 0.8775
Epoch 15/15 1200/1200 [==============================] - 3s - loss:
0.3252 - acc: 0.8600 - val_loss: 0.2960 - val_acc: 0.8775
400/400 [==============================] - 0s
Test score: 0.299598811865
Test accuracy: 0.88
Looking at the Keras documentation, I still don't understand what score is. For the evaluate function, it says:
Returns the loss value & metrics values for the model in test mode.
One thing I noticed is that when the test accuracy is lower, the score is higher, and when accuracy is higher, the score is lower.
For reference, the two relevant parts of the code:
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
Score is the evaluation of the loss function for a given input.
Training a network is finding parameters that minimize a loss function (or cost function).
The cost function here is the binary_crossentropy.
For a target T and a network output O, the binary crossentropy can defined as
f(T,O) = -(T*log(O) + (1-T)*log(1-O) )
So the score you see is the evaluation of that.
If you feed it a batch of inputs it will most likely return the mean loss.
So yeah, if your model has lower loss (at test time), it should often have lower prediction error.
Loss is often used in the training process to find the "best" parameter values for your model (e.g. weights in neural network). It is what you try to optimize in the training by updating weights.
Accuracy is more from an applied perspective. Once you find the optimized parameters above, you use this metrics to evaluate how accurate your model's prediction is compared to the true data.
This answer provides a detailed info:
How to interpret "loss" and "accuracy" for a machine learning model
I am trying to write a model in TFLEARN to fit to 16 parameters.
I have previously run this same experiment in Matlab using the "fitnet" function with 2 hidden layers of 2000 and 1500 nodes.
I am attempting to replicate these results in tensorflow before exploring other architectures/descent algos/hyperparameter tuning. I have done some research and determined the matlab fitnet function uses tanh nodes for hidden layers and linear for output. Also, the descent algorithm is defaulted to levenberg-Marquardt, but worked for me with other (sgd) algorithms as well.
It appears that the accuracy is maxing out around .2, and then oscillating below this over successive epochs. I did not see this behavior in matlab.
My TFLEARN code looks like:
tnorm = tflearn.initializations.uniform_scaling()
adam = tflearn.optimizers.Adam (learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam')
# network building
input_data = tflearn.input_data(shape=[None, np.shape(prepared_x)[1]])
fc1 = tflearn.fully_connected(input_data, 2000,activation='tanh',weights_init=tnorm)
fc2 = tflearn.fully_connected(fc1,1500,activation='tanh',weights_init=tnorm)
output = tflearn.fully_connected(fc2, 16, activation='linear',weights_init=tnorm)
network = tflearn.regression(output, optimizer=adam, loss='mean_square')
#define model with checkpoints
model = tflearn.DNN(network, tensorboard_dir='output/', tensorboard_verbose=3, checkpoint_path='output')
#Train Model
model.fit(prepared_x, prepared_t, n_epoch=5, batch_size=100,shuffle=True, show_metric=True, snapshot_epoch=False,validation_set=0.1 )
#save
model.save('TFLEARN_FC_final.tfl')
The output of the traing session looks like:
Run id: UTSD6N
Log directory: output/
[?25l---------------------------------
Training samples: 43200
Validation samples: 4800
--
Training Step: 1
[2K
| Adam | epoch: 000 | loss: 0.00000 - acc: 0.0000 -- iter: 00100/43200
[A[ATraining Step: 2 | total loss: [1m[32m0.67871[0m[0m
[2K
| Adam | epoch: 000 | loss: 0.67871 - acc: 0.0455 -- iter: 00200/43200
[A[ATraining Step: 3 | total loss: [1m[32m33.14599[0m[0m
[2K
| Adam | epoch: 000 | loss: 33.14599 - acc: 0.0082 -- iter: 00300/43200
[A[ATraining Step: 4 | total loss: [1m[32m28.01067[0m[0m
[2K
| Adam | epoch: 000 | loss: 28.01067 - acc: 0.0021 -- iter: 00400/43200
[A[ATraining Step: 5 | total loss: [1m[32m17.35706[0m[0m
[2K
| Adam | epoch: 000 | loss: 17.35706 - acc: 0.0006 -- iter: 00500/43200
[A[ATraining Step: 6 | total loss: [1m[32m9.73368[0m[0m
[2K
| Adam | epoch: 000 | loss: 9.73368 - acc: 0.0002 -- iter: 00600/43200
[A[ATraining Step: 7 | total loss: [1m[32m5.19867[0m[0m
[2K
| Adam | epoch: 000 | loss: 5.19867 - acc: 0.0001 -- iter: 00700/43200
[A[ATraining Step: 8 | total loss: [1m[32m3.54779[0m[0m
[2K
| Adam | epoch: 000 | loss: 3.54779 - acc: 0.0113 -- iter: 00800/43200
[A[ATraining Step: 9 | total loss: [1m[32m3.80998[0m[0m
[2K
| Adam | epoch: 000 | loss: 3.80998 - acc: 0.0106 -- iter: 00900/43200
[A[ATraining Step: 10 | total loss: [1m[32m4.33370[0m[0m
[2K
| Adam | epoch: 000 | loss: 4.33370 - acc: 0.0053 -- iter: 01000/43200
[A[ATraining Step: 11 | total loss: [1m[32m4.24100[0m[0m
[2K
...
[2K
| Adam | epoch: 004 | loss: 0.02448 - acc: 0.1817 -- iter: 42800/43200
[A[ATraining Step: 2157 | total loss: [1m[32m0.02633[0m[0m
[2K
| Adam | epoch: 004 | loss: 0.02633 - acc: 0.1875 -- iter: 42900/43200
[A[ATraining Step: 2158 | total loss: [1m[32m0.02509[0m[0m
[2K
| Adam | epoch: 004 | loss: 0.02509 - acc: 0.1688 -- iter: 43000/43200
[A[ATraining Step: 2159 | total loss: [1m[32m0.02525[0m[0m
[2K
| Adam | epoch: 004 | loss: 0.02525 - acc: 0.1529 -- iter: 43100/43200
[A[ATraining Step: 2160 | total loss: [1m[32m0.02695[0m[0m
[2K
| Adam | epoch: 005 | loss: 0.02695 - acc: 0.1456 -- iter: 43200/43200
image of accuracy/loss from tensorboard
Any suggestions would be much appreciated.
For any future lurkers -- I solved my own problem by fixing the descent algorithm.
The default learning rate for the Adam optimizer is .001 but this was too high, I had to switch to .005 for convergence.