Programing a Pytorch neural network with a branch in the flow of information - neural-network

I am trying to program a custom layer in PyTorch. I would like this layer to be fully connected to the previous layer but at the same time I want to feed some information from the input layer, let's say I want it to be fully connected to the first layer as well. For example the 4th layer would be fed the 3rd and 1st layer.
This would make the information flow split at the first layer and one branch would be inserted later into the network.
I have to define the forward in this layer having two inputs
class MyLayer(nn.Module):
def __init__(self, size_in, size_out):
super().__init__()
self.size_in, self.size_out = size_in, size_out
weights = torch.Tensor(size_out, size_in)
(... ...)
def forward(self, first_layer, previous_layer):
(... ...)
return output
How can I make this work if I put this layer after, let's say, a normal feed-farward which takes only the previous layer's output as input?
Can I use nn.Sequential with this layer?
Thanks!

just concatenate the input info with the output of previous layers and feed it to next layers, like:
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(100, 120) #supose your input shape is 100
self.fc2 = nn.Linear(120, 80)
self.fc3 = nn.Linear(180, 10)
def forward(self, input_layer):
x = F.relu(self.fc1(input_layer))
x = F.relu(self.fc2(x))
x = torch.cat((input_layer, x), 0)
x = self.fc3(x) #this layer is fed by the input info and the previous layer
return x

Related

How to get the gradient of the specific output of the neural network to the network parameters

I am building a Bayesian neural network, and I need to manually calculate the gradient of each neural network output and update the network parameters.
For example, in the following network, how can I get the gradient of neural network output ag and bg to the neural network parameters phi, it's --∂ag/∂phi and ∂bg/∂phi--, and update the parameters respectively.
class encoder(torch.nn.Module):
def __init__(self, _l_dim, _hidden_dim, _fg_dim):
super(encoder, self).__init__()
self.hidden_nn = nn.Linear(_l_dim, _hidden_dim)
self.ag_nn = nn.Linear(_hidden_dim, _fg_dim)
self.bg_nn = nn.Linear(_hidden_dim, _fg_dim)
def forward(self, _lg):
ag = self.ag_nn(self.hidden_nn(_lg))
bg = self.bg_nn(self.hidden_nn(_lg))
return ag, bg
If you want do compute dx/dW, you can use autograd for that. torch.autograd.grad(x, W, grad_outputs=torch.ones_like(x), retain_graph=True). Does that actually accomplish what you're trying to do?
Problem statement
You are looking to compute the gradients of the parameters corresponding to each loss term. Given a model f, parametrized by θ_ag and θ_bg. These two parameter sets might overlap: that's the case here since you have a shared hidden layer. Then f(x; θ_ag, θ_bg) will output a pair of elements ag and bg. Your loss function is defined as L = L_ag + L_bg.
The terms you want to compute are dL_ag/dθ_ag and dL_bg/dθ_bg, which is different from what you would typically get with a single backward call: which gives dL/dθ_ag and dL/dθ_bg.
Implementation
In order to compute those terms, you will require two backward passes, after both of them we will compute the respective terms. Before starting, here are a couple things you need to do:
It will be useful to make θ_ag and θ_bg available to us. You can, for example, add those two functions in your model definition:
def ag_params(self):
return [*self.hidden_nn.parameters(), *self.ag_nn.parameters()]
def bg_params(self):
return [*self.hidden_nn.parameters(), *self.bg_nn.parameters()]
Assuming you have a loss function loss_fn which outputs two scalar values L_ab and L_bg. Here is a mockup for loss_fn:
def loss_fn(ab, bg):
return ab.mean(), bg.mean()
We will need an optimizer to zero the gradient out, here SGD:
optim = torch.optim.SGD(model.parameters(), lr=1e-3)
Then we can start applying the following method:
Do an inference to compute ag, and bg as well as L_ag, and L_bg:
>>> ag, bg = model(x)
>>> L_ag, L_bg = loss_fn(ag, bg)
Backpropagate once on L_ag, while retaining the graph:
>>> L_ag.backward(retain_graph=True)
At this point, we can collect dL_ag/dθ_ag on the parameters contained in θ_ag. For example, you could pick the norm of the different parameter gradients using the ag_params function:
>>> pgrad_ag = torch.stack([p.grad.norm()
for p in m.ag_params() if p.grad is not None])
Next we can proceed with a second backpropagation, this time on L_bg. But before that, we need to clear the gradients so dL_ag/dθ_ag doesn't pollute the next computation:
>>> optim.zero_grad()
Backpropagation on L_bg:
>>> L_bg.backward(retain_graph=True)
Here again, we collect the gradient norms, i.e. the gradient of dL/dθ_bg, this time using the bg_params function:
>>> pgrad_bg = torch.stack([p.grad.norm()
for p in m.bg_params() if p.grad is not None])
Now you have pgrad_ag and pgrad_bg which correspond to the gradient norms of dL/dθ_bg, and dL/dθ_bg respectively.

How to reduce the dimensions of a tensor with neural networks

I have a 3D tensor of size [100,70,42] (batch, seq_len, features) and I would like to get a tensor of size [100,1,1] by using a neural network based on linear transformations (nn.Linear in Pytorch).
I have implemented the following code
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Linear(42, 120)
self.fc2 = nn.Linear(120,1)
def forward(self, input):
model = nn.Sequential(self.fc1,
nn.ReLU(),
self.fc2)
output = model(input)
return output
However, upon training this only gives me an output of the shape [100,70,1], which is not the desired one.
Thanks!
nn.Linear acts only on last axis. If you want to apply linear over last two dimensions, you must reshape your input tensor:
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Linear(70 * 42, 120) # notice input shape
self.fc2 = nn.Linear(120,1)
def forward(self, input):
input = input.reshape((-1, 70 * 42)) # added reshape
model = nn.Sequential(self.fc1,
nn.ReLU(),
self.fc2)
output = model(input)
output = output.reshape((-1, 1, 1)) # OP asked for 3-dim output
return output

Multiple matrix multiplication loses weight updates

When in forward method I only do one set of torch.add(torch.bmm(x, exp_w), self.b) then my model is back propagating correctly. When I add another layer - torch.add(torch.bmm(out, exp_w2), self.b2) - then the gradients are not updated and the model isn't learning. If I change the activation function from nn.Sigmoid to nn.ReLU then it works with two layers.
Been thinking about this a day now, and not figuring out why it's not working with nn.Sigmoid.
I've tried different learning rates, Loss functions and optimization functions, but no combination seems to work. When I add the weights together before and after training they are the same.
Code:
class MyModel(nn.Module):
def __init__(self, input_dim, output_dim):
torch.manual_seed(1)
super(MyModel, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
hidden_1_dimentsions = 20
self.w = torch.nn.Parameter(torch.empty(input_dim, hidden_1_dimentsions).uniform_(0, 1))
self.b = torch.nn.Parameter(torch.empty(hidden_1_dimentsions).uniform_(0, 1))
self.w2 = torch.nn.Parameter(torch.empty(hidden_1_dimentsions, output_dim).uniform_(0, 1))
self.b2 = torch.nn.Parameter(torch.empty(output_dim).uniform_(0, 1))
def activation(self):
return torch.nn.Sigmoid()
def forward(self, x):
x = x.view((x.shape[0], 1, self.input_dim))
exp_w = self.w.expand(x.shape[0], self.w.size(0), self.w.size(1))
out = torch.add(torch.bmm(x, exp_w), self.b)
exp_w2 = self.w2.expand(out.shape[0], self.w2.size(0), self.w2.size(1))
out = torch.add(torch.bmm(out, exp_w2), self.b2)
out = self.activation()(out)
return out.view(x.shape[0])
Besides loss functions, activation functions and learning rates, your parameter initialisation is also important. I suggest you to take a look at Xavier initialisation: https://pytorch.org/docs/stable/nn.html#torch.nn.init.xavier_uniform_
Furthermore, for a wide range of problems and network architectures Batch Normalization, which ensures that your activations have zero mean and standard deviation, helps: https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm1d
If you are interested to know more about the reason for this, it's mostly due to the vanishing gradient problem, which means that your gradients get so small that your weights don't get updated. It's so common that it has its own page on Wikipedia: https://en.wikipedia.org/wiki/Vanishing_gradient_problem

Shared weights for subclass in a siamese model in Tensorflow

I've some problem with the organization of my code in TENSORFLOW.
I want to implement a siamese model that compares the outputs of two convolutional network that have the same weights.
I want to create a class to define my convolutional network, and an other class to define my global model. It seems that there are several ways to share weights (lazy-loading, use many scopes,...) but how can I do this between many objects ?
Are FLAGS useful in my case ?
Any help would be useful
I've found it easiest to use tf.variable_scope with reuse=tf.AUTO_REUSE. tf.name_scope is optional, but keeps your graphs clean for tensorboard visualizations.
import tensorflow as tf
def get_logits(image):
with tf.variable_scope('my_network', reuse=tf.AUTO_REUSE):
# more complex network probably
x = image
x = tf.layers.conv2d(x, 3, 1, activation=tf.nn.relu)
x = tf.layers.conv2d(x, 3, 1, activation=tf.nn.relu)
x = tf.layers.flatten(x)
x = tf.layers.dense(x, 10)
return x
batch_size = 2
height = 6
width = 6
# dummy images
image1 = tf.zeros((batch_size, height, width, 3), dtype=tf.float32)
image2 = tf.zeros((batch_size, height, width, 3), dtype=tf.float32)
with tf.name_scope('instance1'):
out1 = get_logits(image1)
print(len(tf.global_variables())) # 6
with tf.name_scope('instance2'):
out2 = get_logits(image2)
print(len(tf.global_variables())) # still 6
I'm unsure of your exact issue with different objects. If you have multiple different objects, just make sure they call the same function.
class MyNetwork(object):
def __init__(self, name):
self.name = name
def get_network_logits(self, image):
with tf.name_scope(self.name):
return get_logits(image)
n1 = MyNetwork('instance1')
n2 = MyNetwork('instance2')
l1 = n1(image1)
l2 = n2(image2)

How to determine accuracy with triplet loss in a convolutional neural network

A Triplet network (inspired by "Siamese network") is comprised of 3 instances of the same feed-forward network (with shared parameters). When fed with 3 samples, the network outputs 2 intermediate values - the L2 (Euclidean) distances between the embedded representation of two of its inputs from
the representation of the third.
I'm using pairs of three images for feeding the network (x = anchor image, a standard image, x+ = positive image, an image containing the same object as x - actually, x+ is same class as x, and x- = negative image, an image with different class than x.
I'm using the triplet loss cost function described here.
How do I determine the network's accuracy?
I am assuming that your are doing work for image retrieval or similar tasks.
You should first generate some triplet, either randomly or using some hard (semi-hard) negative mining method. Then you split your triplet into train and validation set.
If you do it this way, then you can define your validation accuracy as proportion of the number of triplet in which feature distance between anchor and positive is less than that between anchor and negative in your validation triplet. You can see an example here which is written in PyTorch.
As another way, you can directly measure in term of your final testing metric. For example, for image retrieval, typically, we measure the performance of model on test set using mean average precision. If you use this metric, you should first define some queries on your validation set and their corresponding ground truth image.
Either of the above two metric is fine. Choose whatever you think fit your case.
So I am performing a similar task of using Triplet loss for classification. Here is how I used the novel loss method with a classifier.
First, train your model using the standard triplet loss function for N epochs. Once you are sure that the model ( we shall refer to this as the embedding generator) is trained, save the weights as we shall be using these weights ahead.
Let's say that your embedding generator is defined as:
class EmbeddingNetwork(nn.Module):
def __init__(self):
super(EmbeddingNetwork, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(1, 64, (7,7), stride=(2,2), padding=(3,3)),
nn.BatchNorm2d(64),
nn.LeakyReLU(0.001),
nn.MaxPool2d((3, 3), 2, padding=(1,1))
)
self.conv2 = nn.Sequential(
nn.Conv2d(64,64,(1,1), stride=(1,1)),
nn.BatchNorm2d(64),
nn.LeakyReLU(0.001),
nn.Conv2d(64,192, (3,3), stride=(1,1), padding=(1,1)),
nn.BatchNorm2d(192),
nn.LeakyReLU(0.001),
nn.MaxPool2d((3,3),2, padding=(1,1))
)
self.fullyConnected = nn.Sequential(
nn.Linear(7*7*256,32*128),
nn.BatchNorm1d(32*128),
nn.LeakyReLU(0.001),
nn.Linear(32*128,128)
)
def forward(self,x):
x = self.conv1(x)
x = self.conv2(x)
x = self.fullyConnected(x)
return torch.nn.functional.normalize(x, p=2, dim=-1)
Now we shall using this embedding generator to create another classifier, fit the weights we saved before to this part of the network and then freeze this part so our classifier trainer does not interfere with the triplet model. This can be done as:
class classifierNet(nn.Module):
def __init__(self, EmbeddingNet):
super(classifierNet, self).__init__()
self.embeddingLayer = EmbeddingNet
self.classifierLayer = nn.Linear(128,62)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.dropout(self.embeddingLayer(x))
x = self.classifierLayer(x)
return F.log_softmax(x, dim=1)
Now we shall load the weights we saved before and freeze them using:
embeddingNetwork = EmbeddingNetwork().to(device)
embeddingNetwork.load_state_dict(torch.load('embeddingNetwork.pt'))
classifierNetwork = classifierNet(embeddingNetwork)
Now train this classifier network using the standard classification losses like BinaryCrossEntropy or CrossEntropy.