Shared weights for subclass in a siamese model in Tensorflow - class

I've some problem with the organization of my code in TENSORFLOW.
I want to implement a siamese model that compares the outputs of two convolutional network that have the same weights.
I want to create a class to define my convolutional network, and an other class to define my global model. It seems that there are several ways to share weights (lazy-loading, use many scopes,...) but how can I do this between many objects ?
Are FLAGS useful in my case ?
Any help would be useful

I've found it easiest to use tf.variable_scope with reuse=tf.AUTO_REUSE. tf.name_scope is optional, but keeps your graphs clean for tensorboard visualizations.
import tensorflow as tf
def get_logits(image):
with tf.variable_scope('my_network', reuse=tf.AUTO_REUSE):
# more complex network probably
x = image
x = tf.layers.conv2d(x, 3, 1, activation=tf.nn.relu)
x = tf.layers.conv2d(x, 3, 1, activation=tf.nn.relu)
x = tf.layers.flatten(x)
x = tf.layers.dense(x, 10)
return x
batch_size = 2
height = 6
width = 6
# dummy images
image1 = tf.zeros((batch_size, height, width, 3), dtype=tf.float32)
image2 = tf.zeros((batch_size, height, width, 3), dtype=tf.float32)
with tf.name_scope('instance1'):
out1 = get_logits(image1)
print(len(tf.global_variables())) # 6
with tf.name_scope('instance2'):
out2 = get_logits(image2)
print(len(tf.global_variables())) # still 6
I'm unsure of your exact issue with different objects. If you have multiple different objects, just make sure they call the same function.
class MyNetwork(object):
def __init__(self, name):
self.name = name
def get_network_logits(self, image):
with tf.name_scope(self.name):
return get_logits(image)
n1 = MyNetwork('instance1')
n2 = MyNetwork('instance2')
l1 = n1(image1)
l2 = n2(image2)

Related

Weights of network stays the same after optimizer step

My network just refuses to train. To make code reading less of a hassle, I abbreviate some complicated logic. Would update more if needed.
model = DistMultNN()
optimizer = optim.SGD(model.parameters(), lr=0.0001)
for t in range(500):
e1_neg = sampling_logic()
e2_neg = sampling_logic()
e1_pos = sampling_logic()
r = sampling_logic()
e2_pos = sampling_logic()
optimizer.zero_grad()
y_pred = model(tuple(zip(e1_pos, r, e2_pos)), e1_neg, e2_neg)
loss = model.loss(y_pred)
loss.backward()
optimizer.step()
I define my network as follow
class DistMultNN(nn.Module):
def __init__(self):
super().__init__()
self.seed = 42
self.entities_embedding = nn.init.xavier_uniform_(
torch.zeros((self.NO_ENTITIES, self.ENCODING_DIM), requires_grad=True))
self.relation_embedding = nn.init.xavier_uniform_(
torch.zeros((self.NO_RELATIONSHIPS, self.ENCODING_DIM), requires_grad=True))
self.W = torch.rand(self.ENCODING_DIM, self.ENCODING_DIM, requires_grad=True) # W is symmetric, todo: requireGrad?
self.W = (self.W + self.W.t()) / 2
self.b = torch.rand(self.ENCODING_DIM, 1, requires_grad=True)
self.lambda_ = 1.
self.rnn = torch.nn.RNN(input_size=encoding_dim, hidden_size=1, num_layers=1, nonlinearity='relu')
self.loss_func = torch.nn.LogSoftmax(dim=1)
def loss(self, y_pred):
softmax = -1 * self.loss_func(y_pred)
result = torch.mean(softmax[:, 0])
result.requires_grad = True
return result
def forward(self, samples, e1neg, e2neg):
batch_size = len(samples)
batch_result = np.zeros((batch_size, len(e1neg[0]) + 1))
for datapoint_id in range(batch_size):
entity_1 = entities_embed_lookup(datapoint_id[0])
entity_2 = entities_embed_lookup(datapoint_id[2])
r = relation_embed_lookup(datapoint_id[1])
x = self.some_fourier_transform(entity_1, r, entity_2)
batch_result[datapoint_id][0] = self.some_matmul(x)
for negative_example_id in range(len(e1neg[0])):
same_thing_with_negative_examples()
batch_result[datapoint_id][negative_example_id + 1] = self.some_matmul(x)
batch_result_tensor = torch.tensor(data=batch_result)
return batch_result_tensor
I tried checking weights using e.g. print(model.rnn.all_weights) in the training loop but they do not change. What did I do wrong?
So first of all the result.requires_grad = True should not be needed and in fact should throw an error, because results would normally not be a leaf variable.
So in your forward at the end you create a new tensor out of a numpy array:
batch_result_tensor = torch.tensor(data=batch_result)
and out of this result you calculate the loss and want to backward it. This doesn't work because batch_result_tensor is not part of any computation graph needed to calculate a gradient. You can't just mix numpy and torch this way.
The forward function has to consists of operations with torch tensors, which require grad, if you want to update and optimize them. So the default case is you have layers, which have weight tensors which requires grad. You have a input which you pass to the layers and so the computational graph is build and all the operations are recorded in it.
So I would start making batch_result a torch tensor and remove batch_result_tensor = torch.tensor(data=batch_result) and result.requires_grad = True. You might have to change more.

Seemingly inconsistent tensor sizes in pytorch

I'm building a convolutional autoencoder, but want the encoding to be in a linear form so I can more easily feed it as input into an MLP. I have two convolutional layers on the encoder along with a linear inner layer to reduce dimension. This encoding is then fed into the corresponding decoder.
When I flatten the output of the second convolutional layer, based on my calculation (using the standard formula: Calculate the Output size in Convolution layer) should come out to a 1x100352 rank 1 tensor. However, when I set the input dimension of the linear layer to be 100352, the flattened rank 1 tensor has dimension 1x50176. Then comes the weird part.
I tried changing the input dimension of the linear layer to be 50176, assuming I had miscalculated. When I do this, the reshaped rank 1 tensor confusingly becomes 1x100352, and then the aforementioned weight matrix becomes 50176x256 as expected.
This response to modifying the linear layer's input dimension doesn't make sense to me. That hyperparameter controls the weight matrix correctly, but I guess I'm uncertain why it has any bearing on the linear layer's input since that's just a reshaped tensor output from a convolutional layer whose hyperparameters are unrelated to the hyperparameter in question.
I apologize if I'm just missing something obvious. I'm very new to pytorch, and I couldn't find any other posts which discussed this sort of issue.
Here's what I believe to be the minimal reproducible example:
import os
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.utils import save_image
class convAutoEncoder(nn.Module):
def __init__(self,**kwargs):
super().__init__()
#Creating network structure
#Encoder portion of autoencoder
self.enc1 = nn.Conv2d(in_channels = kwargs["inputChannels"], out_channels = kwargs["channelsEncoderMid"], kernel_size = kwargs["kernelSize"])
self.enc2 = nn.Conv2d(in_channels = kwargs["channelsEncoderMid"], out_channels = kwargs["channelsEncoderInner"], kernel_size = kwargs["kernelSize"])
self.enc3 = nn.Linear(in_features = kwargs["intoLinear"], out_features = kwargs["linearEncoded"])
#Decoder portion of autoencoder
self.dec1 = nn.Linear(in_features = kwargs["linearEncoded"], out_features = kwargs["intoLinear"])
self.dec2 = nn.ConvTranspose2d(in_channels = kwargs["channelsEncoderInner"], out_channels = kwargs["channelsDecoderMid"], kernel_size = kwargs["kernelSize"])
self.dec3 = nn.ConvTranspose2d(in_channels = kwargs["channelsDecoderMid"], out_channels = kwargs["inputChannels"], kernel_size = kwargs["kernelSize"])
def forward(self,x):
#Encoding
x = F.relu(self.enc1(x))
x = F.relu(self.enc2(x))
x = x.reshape(1,-1)
x = x.squeeze()
x = F.relu(self.enc3(x))
#Decoding
x = F.relu(self.dec1(x))
x = x.reshape([32,4,28,28])
x = F.relu(self.dec2(x))
x = F.relu(self.dec3(x))
return x
def encodeDecodeConv(numEpochs = 20, input_Channels = 3, batchSize = 32,
channels_Encoder_Inner = 4, channels_Encoder_Mid = 8, into_Linear = 100352,
linear_Encoded = 256, channels_Decoder_Mid = 8, kernel_Size = 3,
learningRate = 1e-3):
#Pick a device. If GPU available, use that. Otherwise, use CPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#Define data transforms
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
#Define training dataset
trainSet = datasets.CIFAR10(root = './data', train = True, download = True, transform = transform)
#Define testing dataset
testSet = datasets.CIFAR10(root = './data', train = False, download = True, transform = transform)
#Define data loaders
trainLoader = DataLoader(trainSet, batch_size = batchSize, shuffle = True)
testLoader = DataLoader(testSet, batch_size = batchSize, shuffle = True)
#Initialize neural network
model = convAutoEncoder(inputChannels = input_Channels, channelsEncoderMid = channels_Encoder_Mid, channelsEncoderInner = channels_Encoder_Inner, intoLinear = into_Linear, linearEncoded = linear_Encoded, channelsDecoderMid = channels_Decoder_Mid, kernelSize = kernel_Size)
#Optimization setup
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(),lr = learningRate)
lossTracker = []
for epoch in range(numEpochs):
loss = 0
for data,_ in trainLoader:
data = data.to(device)
optimizer.zero_grad()
outputs = model(data)
train_loss = criterion(outputs,data)
train_loss.backward()
optimizer.step()
loss += train_loss.item()
loss = loss/len(trainLoader)
print('Epoch {} of {}, Train loss: {:.3f}'.format(epoch+1,numEpochs,loss))
encodeDecodeConv()
Edit2: Somewhere in the CIFAR10 dataset, the data appears to change dimension. After playing around with print statements more, I discovered that setting the relevant hyperparameter to 100352 works great for many entries, but then seemingly one image pops up that has a different size. Not sure why that would occur, though.

Multiple matrix multiplication loses weight updates

When in forward method I only do one set of torch.add(torch.bmm(x, exp_w), self.b) then my model is back propagating correctly. When I add another layer - torch.add(torch.bmm(out, exp_w2), self.b2) - then the gradients are not updated and the model isn't learning. If I change the activation function from nn.Sigmoid to nn.ReLU then it works with two layers.
Been thinking about this a day now, and not figuring out why it's not working with nn.Sigmoid.
I've tried different learning rates, Loss functions and optimization functions, but no combination seems to work. When I add the weights together before and after training they are the same.
Code:
class MyModel(nn.Module):
def __init__(self, input_dim, output_dim):
torch.manual_seed(1)
super(MyModel, self).__init__()
self.input_dim = input_dim
self.output_dim = output_dim
hidden_1_dimentsions = 20
self.w = torch.nn.Parameter(torch.empty(input_dim, hidden_1_dimentsions).uniform_(0, 1))
self.b = torch.nn.Parameter(torch.empty(hidden_1_dimentsions).uniform_(0, 1))
self.w2 = torch.nn.Parameter(torch.empty(hidden_1_dimentsions, output_dim).uniform_(0, 1))
self.b2 = torch.nn.Parameter(torch.empty(output_dim).uniform_(0, 1))
def activation(self):
return torch.nn.Sigmoid()
def forward(self, x):
x = x.view((x.shape[0], 1, self.input_dim))
exp_w = self.w.expand(x.shape[0], self.w.size(0), self.w.size(1))
out = torch.add(torch.bmm(x, exp_w), self.b)
exp_w2 = self.w2.expand(out.shape[0], self.w2.size(0), self.w2.size(1))
out = torch.add(torch.bmm(out, exp_w2), self.b2)
out = self.activation()(out)
return out.view(x.shape[0])
Besides loss functions, activation functions and learning rates, your parameter initialisation is also important. I suggest you to take a look at Xavier initialisation: https://pytorch.org/docs/stable/nn.html#torch.nn.init.xavier_uniform_
Furthermore, for a wide range of problems and network architectures Batch Normalization, which ensures that your activations have zero mean and standard deviation, helps: https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm1d
If you are interested to know more about the reason for this, it's mostly due to the vanishing gradient problem, which means that your gradients get so small that your weights don't get updated. It's so common that it has its own page on Wikipedia: https://en.wikipedia.org/wiki/Vanishing_gradient_problem

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

How to build a recurrent neural net in Keras where each input goes through a layer first?

I'm trying to build an neural net in Keras that would look like this:
Where x_1, x_2, ... are input vectors that undergo the same transformation f. f is itself a layer whose parameters must be learned. The sequence length n is variable across instances.
I'm having trouble understanding two things here:
What should the input look like?
I'm thinking of a 2D tensor with shape (number_of_x_inputs, x_dimension), where x_dimension is the length of a single vector $x$. Can such 2D tensor have a variable shape? I know tensors can have variable shapes for batch processing, but I don't know if that helps me here.
How do I pass each input vector through the same transformation before feeding it to the RNN layer?
Is there a way to sort of extend for example a GRU so that an f layer is added before going through the actual GRU cell?
I'm not an expert, but I hope this helps.
Question 1:
Vectors x1, x2... xn can have different shapes, but I'm not sure if the instances of x1 can have different shapes. When I have different shapes I usually pad the short sequences with 0s.
Question 2:
I'm not sure about extending a GRU, but I would do something like this:
x_dims = [50, 40, 30, 20, 10]
n = 5
def network():
shared_f = Conv1D(5, 3, activation='relu')
shated_LSTM = LSTM(10)
inputs = []
to_concat = []
for i in range(n):
x_i = Input(shape=(x_dims[i], 1), name='x_' + str(i))
inputs.append(x_i)
step1 = shared_f(x_i)
to_concat.append(shated_LSTM(step1))
merged = concatenate(to_concat)
final = Dense(2, activation='softmax')(merged)
model = Model(inputs=inputs, outputs=[final])
# model = Model(inputs=[sequence], outputs=[part1])
model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])
return model
m = network()
In this example, I used a Conv1D as the shared f transformation, but you could use something else (Embedding, etc.).