How do I implement a network cost function of type autoencoder in keras based on the database labels. The examples of this base have labels 0 and 1. I did the form presented below, I do not know if it is correct.
def loss_function (x):
def function(y_true, y_pred):
for i in range (batch_size):
if x[i]==0:
print('valorA',x[i])
L1=K.mean(K.square(y_pred[i] - y_true[i]))
else:
print('valorB',x[i])
L2=K.mean(K.square(y_pred[i] - y_true[i]))
return L1+L2
return function
compilação do autoencoder
autoencoder.compile(loss=loss_function(labeltr),optimizer='adam')
Related
I am building a Bayesian neural network, and I need to manually calculate the gradient of each neural network output and update the network parameters.
For example, in the following network, how can I get the gradient of neural network output ag and bg to the neural network parameters phi, it's --∂ag/∂phi and ∂bg/∂phi--, and update the parameters respectively.
class encoder(torch.nn.Module):
def __init__(self, _l_dim, _hidden_dim, _fg_dim):
super(encoder, self).__init__()
self.hidden_nn = nn.Linear(_l_dim, _hidden_dim)
self.ag_nn = nn.Linear(_hidden_dim, _fg_dim)
self.bg_nn = nn.Linear(_hidden_dim, _fg_dim)
def forward(self, _lg):
ag = self.ag_nn(self.hidden_nn(_lg))
bg = self.bg_nn(self.hidden_nn(_lg))
return ag, bg
If you want do compute dx/dW, you can use autograd for that. torch.autograd.grad(x, W, grad_outputs=torch.ones_like(x), retain_graph=True). Does that actually accomplish what you're trying to do?
Problem statement
You are looking to compute the gradients of the parameters corresponding to each loss term. Given a model f, parametrized by θ_ag and θ_bg. These two parameter sets might overlap: that's the case here since you have a shared hidden layer. Then f(x; θ_ag, θ_bg) will output a pair of elements ag and bg. Your loss function is defined as L = L_ag + L_bg.
The terms you want to compute are dL_ag/dθ_ag and dL_bg/dθ_bg, which is different from what you would typically get with a single backward call: which gives dL/dθ_ag and dL/dθ_bg.
Implementation
In order to compute those terms, you will require two backward passes, after both of them we will compute the respective terms. Before starting, here are a couple things you need to do:
It will be useful to make θ_ag and θ_bg available to us. You can, for example, add those two functions in your model definition:
def ag_params(self):
return [*self.hidden_nn.parameters(), *self.ag_nn.parameters()]
def bg_params(self):
return [*self.hidden_nn.parameters(), *self.bg_nn.parameters()]
Assuming you have a loss function loss_fn which outputs two scalar values L_ab and L_bg. Here is a mockup for loss_fn:
def loss_fn(ab, bg):
return ab.mean(), bg.mean()
We will need an optimizer to zero the gradient out, here SGD:
optim = torch.optim.SGD(model.parameters(), lr=1e-3)
Then we can start applying the following method:
Do an inference to compute ag, and bg as well as L_ag, and L_bg:
>>> ag, bg = model(x)
>>> L_ag, L_bg = loss_fn(ag, bg)
Backpropagate once on L_ag, while retaining the graph:
>>> L_ag.backward(retain_graph=True)
At this point, we can collect dL_ag/dθ_ag on the parameters contained in θ_ag. For example, you could pick the norm of the different parameter gradients using the ag_params function:
>>> pgrad_ag = torch.stack([p.grad.norm()
for p in m.ag_params() if p.grad is not None])
Next we can proceed with a second backpropagation, this time on L_bg. But before that, we need to clear the gradients so dL_ag/dθ_ag doesn't pollute the next computation:
>>> optim.zero_grad()
Backpropagation on L_bg:
>>> L_bg.backward(retain_graph=True)
Here again, we collect the gradient norms, i.e. the gradient of dL/dθ_bg, this time using the bg_params function:
>>> pgrad_bg = torch.stack([p.grad.norm()
for p in m.bg_params() if p.grad is not None])
Now you have pgrad_ag and pgrad_bg which correspond to the gradient norms of dL/dθ_bg, and dL/dθ_bg respectively.
I am implementing a code for semantic segmentation using Keras and I wrote my loss function as in the paper "Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations" (link: https://arxiv.org/abs/1707.03237) to balance each class. My data are organized as (bacth_size, ImDim1, ImDim2, Nclasses).
My loss function is:
eps = 1e-3
def dice(y_true, y_pred):
weights = 1./K.sum(y_true, axis=[0,1,2])
weights = weights/K.sum(weights)
num = K.sum(weights*K.sum(y_true*y_pred, axis=[0,1,2]))
den = K.sum(weights*K.sum(y_true+y_pred, axis=[0,1,2]))
return 2.*(num+eps)/(den+eps)
def dice_loss(y_true, y_pred):
return 1-dice(y_true, y_pred)
Doing in this way, that looks correct to me, the loss function returns nan and I do not get why!?
A Triplet network (inspired by "Siamese network") is comprised of 3 instances of the same feed-forward network (with shared parameters). When fed with 3 samples, the network outputs 2 intermediate values - the L2 (Euclidean) distances between the embedded representation of two of its inputs from
the representation of the third.
I'm using pairs of three images for feeding the network (x = anchor image, a standard image, x+ = positive image, an image containing the same object as x - actually, x+ is same class as x, and x- = negative image, an image with different class than x.
I'm using the triplet loss cost function described here.
How do I determine the network's accuracy?
I am assuming that your are doing work for image retrieval or similar tasks.
You should first generate some triplet, either randomly or using some hard (semi-hard) negative mining method. Then you split your triplet into train and validation set.
If you do it this way, then you can define your validation accuracy as proportion of the number of triplet in which feature distance between anchor and positive is less than that between anchor and negative in your validation triplet. You can see an example here which is written in PyTorch.
As another way, you can directly measure in term of your final testing metric. For example, for image retrieval, typically, we measure the performance of model on test set using mean average precision. If you use this metric, you should first define some queries on your validation set and their corresponding ground truth image.
Either of the above two metric is fine. Choose whatever you think fit your case.
So I am performing a similar task of using Triplet loss for classification. Here is how I used the novel loss method with a classifier.
First, train your model using the standard triplet loss function for N epochs. Once you are sure that the model ( we shall refer to this as the embedding generator) is trained, save the weights as we shall be using these weights ahead.
Let's say that your embedding generator is defined as:
class EmbeddingNetwork(nn.Module):
def __init__(self):
super(EmbeddingNetwork, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(1, 64, (7,7), stride=(2,2), padding=(3,3)),
nn.BatchNorm2d(64),
nn.LeakyReLU(0.001),
nn.MaxPool2d((3, 3), 2, padding=(1,1))
)
self.conv2 = nn.Sequential(
nn.Conv2d(64,64,(1,1), stride=(1,1)),
nn.BatchNorm2d(64),
nn.LeakyReLU(0.001),
nn.Conv2d(64,192, (3,3), stride=(1,1), padding=(1,1)),
nn.BatchNorm2d(192),
nn.LeakyReLU(0.001),
nn.MaxPool2d((3,3),2, padding=(1,1))
)
self.fullyConnected = nn.Sequential(
nn.Linear(7*7*256,32*128),
nn.BatchNorm1d(32*128),
nn.LeakyReLU(0.001),
nn.Linear(32*128,128)
)
def forward(self,x):
x = self.conv1(x)
x = self.conv2(x)
x = self.fullyConnected(x)
return torch.nn.functional.normalize(x, p=2, dim=-1)
Now we shall using this embedding generator to create another classifier, fit the weights we saved before to this part of the network and then freeze this part so our classifier trainer does not interfere with the triplet model. This can be done as:
class classifierNet(nn.Module):
def __init__(self, EmbeddingNet):
super(classifierNet, self).__init__()
self.embeddingLayer = EmbeddingNet
self.classifierLayer = nn.Linear(128,62)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.dropout(self.embeddingLayer(x))
x = self.classifierLayer(x)
return F.log_softmax(x, dim=1)
Now we shall load the weights we saved before and freeze them using:
embeddingNetwork = EmbeddingNetwork().to(device)
embeddingNetwork.load_state_dict(torch.load('embeddingNetwork.pt'))
classifierNetwork = classifierNet(embeddingNetwork)
Now train this classifier network using the standard classification losses like BinaryCrossEntropy or CrossEntropy.
In Keras, when training and evaluating a Neural Network model (classify two classes (0 and 1)), the model returns loss and accuracy for both training and testing:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
What does this accuracy represent? Is it the mean accuracy for the two classes or the accuracy for one of the two classes?
Accuracy is the number of correctly classified samples divided by the number of all samples. It does not involve any per class accuracies.
Here is for example the code Keras uses to compute the binary accuracy:
K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
Keras will choose from a list of possible metrics in its code. From the metrics source code, there are five possibilities:
def binary_accuracy(y_true, y_pred):
return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
def sparse_categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.max(y_true, axis=-1),
K.cast(K.argmax(y_pred, axis=-1), K.floatx())),
K.floatx())
def top_k_categorical_accuracy(y_true, y_pred, k=5):
return K.mean(K.in_top_k(y_pred, K.argmax(y_true, axis=-1), k), axis=-1)
def sparse_top_k_categorical_accuracy(y_true, y_pred, k=5):
return K.mean(K.in_top_k(y_pred, K.cast(K.max(y_true, axis=-1), 'int32'), k), axis=-1)
The choice depends on what type of model and loss function you have. In the training module, you see it choosing the accuracy function:
if (output_shape[-1] == 1 or self.loss_functions[i] == losses.binary_crossentropy):
# case: binary accuracy
acc_fn = metrics_module.binary_accuracy
elif self.loss_functions[i] == losses.sparse_categorical_crossentropy:
# case: categorical accuracy with sparse targets
acc_fn = metrics_module.sparse_categorical_accuracy
else:
acc_fn = metrics_module.categorical_accuracy
In your model, you have 2 outputs and a categorical_crossentropy loss, so you will fall in case 3, and your accuracy will be:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
Translating, your model expects only one class to be true, if the index of the predicted class with maximum value is equal to the index of the true class, it counts as right.
Example:
predicted: [0.7 ; 0.3] /// true: [1 ; 0] --- counts as right
predicted: [0.8 ; 0,2] /// true: [0 ; 1] --- counts as wrong
This is a snippet of my model:
W1 = create_base_network(latent_dim)
input_a = Input(shape=(1,latent_dim))
input_b = Input(shape=(1,latent_dim))
x_a = encoder(input_a)
x_b = encoder(input_b)
processed_a = W1(x_a)
processed_b = W1(x_b)
del1 = Lambda(Delta1, output_shape=Delta1_output_shape)([processed_a, processed_b])
model = Model(input=[input_a, input_b], output=del1)
# train
rms = RMSprop()
model.compile(loss='kappa_delta_loss', optimizer=rms)
Basically, the neural net is getting a (pre-trained) encoder representation of the two inputs and computing the difference in prediction values for the two inputs by passing through a MLP. This difference is Delta1 which is y_pred of the network. I want the loss function to be y_pred*y_true. However, when I do that, I get the error, 'Invalid objective: kappa_delta_loss'.
What am I doing wrong?
You almost answer the question yourself. Create your objective
function like ones in
https://github.com/fchollet/keras/blob/master/keras/objectives.py like
this,
import theano import theano.tensor as T
epsilon = 1.0e-9
def custom_objective(y_true, y_pred):
'''Just another crossentropy'''
y_pred = T.clip(y_pred, epsilon, 1.0 - epsilon)
y_pred /= y_pred.sum(axis=-1, keepdims=True)
cce = T.nnet.categorical_crossentropy(y_pred, y_true)
return cce and pass it to compile argument
model.compile(loss=custom_objective, optimizer='adadelta')
from https://github.com/fchollet/keras/issues/369
So you should create your custom loss function with two arguments, the first being the target and the second your prediction.
Assuming your output (y_pred) is a scalar, your custom objective could be
def custom objective(y_true,y_pred)
return K.dot(y_true,y_pred)
K for keras backend (more generic than the theano example)