ValueError: Target size (torch.Size([128])) must be the same as input size (torch.Size([112])) - neural-network

I have a training function, in which inside there are two vectors:
d_labels_a = torch.zeros(128)
d_labels_b = torch.ones(128)
Then I have these features:
# Compute output
features_a = nets[0](input_a)
features_b = nets[1](input_b)
features_c = nets[2](inputs)
And then a domain classifier (nets[4]) makes predictions:
d_pred_a = torch.squeeze(nets[4](features_a))
d_pred_b = torch.squeeze(nets[4](features_b))
d_pred_a = d_pred_a.float()
d_pred_b = d_pred_b.float()
print(d_pred_a.shape)
The error raises in the loss function: ` pred_a = torch.squeeze(nets3)
pred_b = torch.squeeze(nets3)
pred_c = torch.squeeze(nets3)
loss = criterion(pred_a, labels_a) + criterion(pred_b, labels_b) + criterion(pred_c, labels) + d_criterion(d_pred_a, d_labels_a) + d_criterion(d_pred_b, d_labels_b)
The problem is that d_pred_a/b is different from d_labels_a/b, but only after a certain point. Indeed, when I print the shape of d_pred_a/b it istorch.Size([128])but then it changes totorch.Size([112])` independently.
It comes from here:
# Compute output
features_a = nets[0](input_a)
features_b = nets[1](input_b)
features_c = nets[2](inputs)
because if I print the shape of features_a is torch.Size([128, 2048]) but it changes into torch.Size([112, 2048])
nets[0] is a VGG, like this:
class VGG16(nn.Module):
def __init__(self, input_size, batch_norm=False):
super(VGG16, self).__init__()
self.in_channels,self.in_width,self.in_height = input_size
self.block_1 = VGGBlock(self.in_channels,64,batch_norm=batch_norm)
self.block_2 = VGGBlock(64, 128,batch_norm=batch_norm)
self.block_3 = VGGBlock(128, 256,batch_norm=batch_norm)
self.block_4 = VGGBlock(256,512,batch_norm=batch_norm)
#property
def input_size(self):
return self.in_channels,self.in_width,self.in_height
def forward(self, x):
x = self.block_1(x)
x = self.block_2(x)
x = self.block_3(x)
x = self.block_4(x)
# x = self.avgpool(x)
x = torch.flatten(x,1)
return x

I solved. The problem was the last batch. I used drop_last=True in the dataloader and It worked.

Related

How to use nn.MultiheadAttention together with nn.LSTM?

I'm trying to build a Pytorch network for image captioning.
Currently I have a working network of Encoder and Decoder, and I want to add nn.MultiheadAttnetion layer to it (to be used as self attention).
Currently my decode looks like this:
class Decoder(nn.Module):
def __init__(self, hidden_size, embed_dim, vocab_size, layers = 1):
super(Decoder, self).__init__()
self.embed_dim = embed_dim
self.vocab_size = vocab_size
self.layers = layers
self.hidden_size = hidden_size
self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
self.lstm = nn.LSTM(input_size = embed_dim, hidden_size = hidden_size, batch_first = True, num_layers = layers)
#self.attention = nn.MultiheadAttention(hidden_size, num_heads=1, batch_first= True)
self.fc = nn.Linear(hidden_size, self.vocab_size)
def init_hidden(self, batch_size):
h = torch.zeros(self.layers, batch_size, self.hidden_size).to(device)
c = torch.zeros(self.layers, batch_size, self.hidden_size).to(device)
return h,c
def forward(self, features, caption):
batch_size = caption.size(0)
caption_size = caption.size(1)
h,c = self.init_hidden(batch_size)
embeddings = self.embedding(caption)
lstm_input = torch.cat((features.unsqueeze(1), embeddings[:,:-1,:]), dim=1)
output, (h,c) = self.lstm(lstm_input, (h,c))
#output, _ = self.attention(output, output, output)
output = self.fc(output)
return output
def generate_caption(self, features, max_caption_size = MAX_LEN):
h,c = self.init_hidden(1)
caption = ""
embeddings = features.unsqueeze(1)
for i in range(max_caption_size):
output, (h, c) = self.lstm(embeddings, (h,c))
#output, _ = self.attention(output, output, output)
output = self.fc(output)
_, word_index = torch.max(output, dim=2) # take the word with highest probability
if word_index == vocab.get_index(END_WORD):
break
caption += vocab.get_word(word_index) + " "
embeddings = self.embedding(torch.LongTensor([word_index]).view(1,-1).to(device))
return caption
and it gives relatively good results for image captioning.
I want to add the commented out lines so the model will use Attention. But- when I do that- the model breaks, although the loss becomes extremely low (decreasing from 2.7 to 0.2 during training instead of 2.7 to 1 without the attention) - the caption generation is not really working (predicts the same word over and over again).
My questions are:
Am I using the nn.MultiheadAttention correctly? it is very weird to me that it should be used after the LSTM, but I saw this online, and it works from dimension sizes perspective
Any idea why my model breaks when I use Attention?
EDIT: I also tried to put the Attention before the LSTM, and it didn't work as well (network predicted the same caption for every picture)

Name of Modules to compute sparsity

I'm writing a function that computes the sparsity of the weight matrices of the following fully connected network:
class FCN(nn.Module):
def __init__(self):
super(FCN, self).__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.relu1 = nn.ReLU()
self.fc2 = nn.Linear(hidden_dim, hidden_dim)
self.relu2 = nn.ReLU()
self.fc3 = nn.Linear(hidden_dim, hidden_dim)
self.relu3 = nn.ReLU()
self.fc4 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
out = self.fc1(x)
out = self.relu1(out)
out = self.fc2(out)
out = self.relu2(out)
out = self.fc3(out)
out = self.relu3(out)
out = self.fc4(out)
return out
The function I have written is the following:
def print_layer_sparsity(model):
for name,module in model.named_modules():
if 'fc' in name:
zeros = 100. * float(torch.sum(model.name.weight == 0))
tot = float(model.name.weight.nelement())
print("Sparsity in {}.weight: {:.2f}%".format(name, zeros/tot))
But it gives me the following error:
torch.nn.modules.module.ModuleAttributeError: 'FCN' object has no attribute 'name'
It works fine when I manually enter the name of the layers (e.g.,
(model.fc1.weight == 0)
(model.fc2.weight == 0)
(model.fc3.weight == 0) ....
but I'd like to make it independent from the network. In other words, I'd like to adapt my function in a way that, given any sparse network, it prints the sparsity of every layer. Any suggestions?
Thanks!!
Try:
getattr(model, name).weight
In place of
model.name.weight
Your print_layer_sparsity function becomes:
def print_layer_sparsity(model):
for name,module in model.named_modules():
if 'fc' in name:
zeros = 100. * float(torch.sum(getattr(model, name).weight == 0))
tot = float(getattr(model, name).weight.nelement())
print("Sparsity in {}.weight: {:.2f}%".format(name, zeros/tot))
You can't do model.name because name is a str. The in-built getattr function allows you to get the member variables / attributes of an object using its name as a string.
For more information, checkout this answer.

Using zero_grad() after loss.backward(), but still receives RuntimeError: "Trying to backward through the graph a second time..."

Below is my implementation of a2c using PyTorch. Upon learning about backpropagation in PyTorch, I have known to zero_grad() the optimizer after each update iteration. However, there is still a RunTime error on second-time backpropagation.
def torchworker(number, model):
worker_env = gym.make("Taxi-v3").env
max_steps_per_episode = 2000
worker_opt = optim.Adam(lr=5e-4, params=model.parameters())
p_history = []
val_history = []
r_history = []
running_reward = 0
episode_count = 0
under = 0
start = time.time()
for i in range(2):
state = worker_env.reset()
episode_reward = 0
penalties = 0
drop = 0
print("Episode {} begins ({})".format(episode_count, number))
worker_env.render()
criterion = nn.SmoothL1Loss()
time_solve = 0
for _ in range(1, max_steps_per_episode):
#worker_env.render()
state = torch.tensor(state, dtype=torch.long)
action_probs = model.forward(state)[0]
critic_value = model.forward(state)[1]
val_history.append((state, critic_value[0]))
# Choose action
action = np.random.choice(6, p=action_probs.detach().numpy())
p_history.append(torch.log(action_probs[action]))
# Apply chosen action
state, reward, done, _ = worker_env.step(action)
r_history.append(reward)
episode_reward += reward
time_solve += 1
if reward == -10:
penalties += 1
elif reward == 20:
drop += 1
if done:
break
# Update running reward to check condition for solving
running_reward = (running_reward * (episode_count) + episode_reward) / (episode_count + 1)
# Calculate discounted returns
returns = deque(maxlen=3500)
discounted_sum = 0
for r in r_history[::-1]:
discounted_sum = r + gamma * discounted_sum
returns.appendleft(discounted_sum)
# Calculate actor losses and critic losses
loss_actor_value = 0
loss_critic_value = 0
history = zip(p_history, val_history, returns)
for log_prob, value, ret in history:
diff = ret - value[1]
loss_actor_value += -log_prob * diff
ret_tensor = torch.tensor(ret, dtype=torch.float32)
loss_critic_value += criterion(value[1], ret_tensor)
loss = loss_actor_value + 0.1 * loss_critic_value
print(loss)
# Update params
loss.backward()
worker_opt.step()
worker_opt.zero_grad()
# Log details
end = time.time()
episode_count += 1
if episode_count % 1 == 0:
worker_env.render()
if running_reward > -50: # Condition to consider the task solved
under += 1
if under > 5:
print("Solved at episode {} !".format(episode_count))
break
I believe there may be something to do with the architecture of my AC model, so I also include it here for reference.
class ActorCriticNetwork(nn.Module):
def __init__(self, num_inputs, num_hidden, num_actions):
super(ActorCriticNetwork, self).__init__()
self.embed = nn.Embedding(500, 10)
self.fc1 = nn.Linear(10, num_hidden * 2)
self.fc2 = nn.Linear(num_hidden * 2, num_hidden)
self.c = nn.Linear(num_hidden, 1)
self.fc3 = nn.Linear(num_hidden, num_hidden)
self.a = nn.Linear(num_hidden, num_actions)
def forward(self, x):
out = F.relu(self.embed(x))
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
critic = self.c(out)
out = F.relu(self.fc3(out.detach()))
actor = F.softmax(self.a(out), dim=-1)
return actor, critic
Would you please tell me what the mistake here is? Thank you in advance.
SOLVED: I forgot to clear the history of probabilities, action-values and rewards after iterations. It is clear why that would cause the issue, as the older elements would cause propagating through old dcgs.

Keras shape error when checking input

I am trying to train a simple MLP model that maps input questions (using a 300D word embedding) and image features extracted using a pretrained VGG16 model to a feature vector of fixed length. However, I can't figure out how to fix the error mentioned below. Here is the code I'm trying to run at the moment:
parser = argparse.ArgumentParser()
parser.add_argument('-num_hidden_units', type=int, default=1024)
parser.add_argument('-num_hidden_layers', type=int, default=3)
parser.add_argument('-dropout', type=float, default=0.5)
parser.add_argument('-activation', type=str, default='tanh')
parser.add_argument('-language_only', type=bool, default= False)
parser.add_argument('-num_epochs', type=int, default=10) #default=100
parser.add_argument('-model_save_interval', type=int, default=10)
parser.add_argument('-batch_size', type=int, default=128)
args = parser.parse_args()
questions_train = open('data/qa/preprocess/questions_train2014.txt', 'r').read().splitlines()
answers_train = open('data/qa/preprocess/answers_train2014_modal.txt', 'r').read().splitlines()
images_train = open('data/qa/preprocess/images_train2014.txt', 'r').read().splitlines()
vgg_model_path = 'data/coco/vgg_feats.mat'
maxAnswers = 1000
questions_train, answers_train, images_train = selectFrequentAnswers(questions_train,answers_train,images_train, maxAnswers)
#encode the remaining answers
labelencoder = preprocessing.LabelEncoder()
labelencoder.fit(answers_train)
nb_classes = len(list(labelencoder.classes_))
joblib.dump(labelencoder,'models/labelencoder.pkl')
features_struct = scipy.io.loadmat(vgg_model_path)
VGGfeatures = features_struct['feats']
print ('loaded vgg features')
image_ids = open('data/coco/coco_vgg_IDMap.txt').read().splitlines()
id_map = {}
for ids in image_ids:
id_split = ids.split()
id_map[id_split[0]] = int(id_split[1])
nlp = English()
print ('loaded word2vec features...')
img_dim = 4096
word_vec_dim = 300
model = Sequential()
if args.language_only:
model.add(Dense(args.num_hidden_units, input_dim=word_vec_dim, init='uniform'))
else:
model.add(Dense(args.num_hidden_units, input_dim=img_dim+word_vec_dim, init='uniform'))
model.add(Activation(args.activation))
if args.dropout>0:
model.add(Dropout(args.dropout))
for i in range(args.num_hidden_layers-1):
model.add(Dense(args.num_hidden_units, init='uniform'))
model.add(Activation(args.activation))
if args.dropout>0:
model.add(Dropout(args.dropout))
model.add(Dense(nb_classes, init='uniform'))
model.add(Activation('softmax'))
json_string = model.to_json()
if args.language_only:
model_file_name = 'models/mlp_language_only_num_hidden_units_' + str(args.num_hidden_units) + '_num_hidden_layers_' + str(args.num_hidden_layers)
else:
model_file_name = 'models/mlp_num_hidden_units_' + str(args.num_hidden_units) + '_num_hidden_layers_' + str(args.num_hidden_layers)
open(model_file_name + '.json', 'w').write(json_string)
print ('Compiling model...')
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
print ('Compilation done...')
print ('Training started...')
for k in range(args.num_epochs):
#shuffle the data points before going through them
index_shuf = list(range(len(questions_train)))
shuffle(index_shuf)
questions_train = [questions_train[i] for i in index_shuf]
answers_train = [answers_train[i] for i in index_shuf]
images_train = [images_train[i] for i in index_shuf]
progbar = generic_utils.Progbar(len(questions_train))
for qu_batch,an_batch,im_batch in zip(grouper(questions_train, args.batch_size, fillvalue=questions_train[-1]),
grouper(answers_train, args.batch_size, fillvalue=answers_train[-1]),
grouper(images_train, args.batch_size, fillvalue=images_train[-1])):
X_q_batch = get_questions_matrix_sum(qu_batch, nlp)
if args.language_only:
X_batch = X_q_batch
else:
X_i_batch = get_images_matrix(im_batch, id_map, VGGfeatures)
X_batch = np.hstack((X_q_batch, X_i_batch))
Y_batch = get_answers_matrix(an_batch, labelencoder)
loss = model.train_on_batch(X_batch, Y_batch)
progbar.add(args.batch_size, values=[("train loss", loss)])
#print type(loss)
if k%args.model_save_interval == 0:
model.save_weights(model_file_name + '_epoch_{:02d}.hdf5'.format(k))
model.save_weights(model_file_name + '_epoch_{:02d}.hdf5'.format(k))
And here is the error I get:
Keras: Error when checking input: expected dense_9_input to have shape
(4396,) but got array with shape (4096,)
I think that the error lies in what you pass in the else statement in the first layer of your model versus what you pass in training. In your first layer you specify:
model = Sequential()
if args.language_only:
model.add(Dense(args.num_hidden_units, input_dim=word_vec_dim, init='uniform'))
else:
model.add(Dense(args.num_hidden_units, input_dim=img_dim+word_vec_dim, init='uniform'))
You clearly pass input_dim = img_dim + word_vec_dim = 4096 + 300 = 4396. During training you pass:
X_q_batch = get_questions_matrix_sum(qu_batch, nlp)
if args.language_only:
X_batch = X_q_batch
else:
X_i_batch = get_images_matrix(im_batch, id_map, VGGfeatures)
X_batch = np.hstack((X_q_batch, X_i_batch))
So, in the else branch, X_batch will have X_q_batch or X_i_batch rows, which apparently = 4096.
By the way, for debugging purposes, it would be easier to give your layers a name, e.g.
x = Dense(64, activation='relu', name="dense_one")
I hope this helps.

matlab - combine all the output using celldisp

Below is my code.. I'm trying to display each of the seq using celldisp..but it comes out one by one..
%mycode
seq1 = {'acc';'gct';'tta';'ccc';'ttc';'aaa';'ttg';'gta';'gtg';'act'};
seq2 = {'act';'gcc';'tta';'cca';'aac';'aaa';'ttg';'gta';'gtg';'acc'};
seq3 = {'ttc';'tgc';'atg';'ggc';'ccg';'aat';'aag';'ggt';'tga';'agt'};
seq4 = {'cag';'ccg';'ttg';'caa';'aat';'agc';'ttg';'ggg';'gct';'att'};
seq5 = {'ccc';'ggg';'tta';'cag';'aac';'aaa';'ttg';'gta';'gac';'acc'};
seq6 = {'acc';'gct';'ata';'ccc';'ttc';'taa';'ttg';'gtc';'gtg';'acc'};
for mutseq = {seq2,seq3,seq4,seq5,seq6}
a = strcmp(seq1,mutseq);
celldisp(mutseq);
end
%endcode
the output is :
mutseq{1}{1} =
act
mutseq{1}{2} =
gcc
mutseq{1}{3} =
tta
mutseq{1}{4} =
cca
mutseq{1}{5} =
aac
mutseq{1}{6} =
aaa
mutseq{1}{7} =
ttg
mutseq{1}{8} =
gta
mutseq{1}{9} =
gtg
mutseq{1}{10} =
%and it repeat for next seq
did anyone know how to combine it into one?