How to calculate Receptive Field for Inception Module at Google Net? - neural-network

One inception module of GoogleNet is attached in the image.
How we can calculate the receptive field for this inception module?
Can we calculate only one convolution branch?
EDIT:
I have this program for receptive field size calculation.
import math
convnet = [[7,2,3],[1,1,0],[3,2,0],[1,1,0],[1,1,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[3,2,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[3,2,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[5,3,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[5,3,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[3,2,1],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[1,1,0],[1,1,0],[3,1,1],[1,1,0],[7,1,1]]
layer_names = ["conv1/7x7_s2","conv1/relu_7x7","pool1/3x3_s2","pool1/norm1","conv2/3x3_reduce","conv2/relu_3x3_reduce","conv2/3x3","conv2/relu_3x3","pool2/3x3_s2","inception_3a/3x3_reduce","inception_3a/relu_3x3_reduce","inception_3a/3x3","inception_3a/relu_3x3","inception_3b/3x3_reduce","inception_3b/relu_3x3_reduce","inception_3b/3x3","inception_3b/relu_3x3","pool3/3x3_s2","inception_4a/3x3_reduce","inception_4a/relu_3x3_reduce","inception_4a/3x3","inception_4a/relu_3x3","loss1/ave_pool","inception_4b/3x3_reduce","inception_4b/relu_3x3_reduce","inception_4b/3x3","inception_4b/relu_3x3","inception_4c/3x3_reduce","inception_4c/relu_3x3_reduce","inception_4c/3x3","inception_4c/relu_3x3","inception_4d/3x3_reduce","inception_4d/relu_3x3_reduce","inception_4d/3x3","inception_4d/relu_3x3","loss2/ave_pool","inception_4e/3x3_reduce","inception_4e/relu_3x3_reduce","inception_4e/3x3","inception_4e/relu_3x3","pool4/3x3_s2","inception_5a/3x3_reduce","inception_5a/relu_3x3_reduce","inception_5a/3x3","inception_5a/relu_3x3","inception_5b/3x3_reduce","inception_5b/relu_3x3_reduce","inception_5b/3x3","inception_5b/relu_3x3","pool5/7x7_s1"]
imsize = 720
def outFromIn(isz, layernum, net = convnet):
if layernum>len(net): layernum=len(net)
totstride = 1
insize = isz
#for layerparams in net:
for layer in range(layernum):
fsize, stride, pad = net[layer]
outsize = (insize - fsize + 2*pad) / stride + 1
insize = outsize
totstride = totstride * stride
return outsize, totstride
def inFromOut( layernum, net = convnet):
if layernum>len(net): layernum=len(net)
outsize = 1
#for layerparams in net:
for layer in reversed(range(layernum)):
fsize, stride, pad = net[layer]
outsize = ((outsize -1)* stride) + fsize
RFsize = outsize
return RFsize
if __name__ == '__main__':
print "layer output sizes given image = %dx%d" % (imsize, imsize)
for i in range(len(convnet)):
p = outFromIn(imsize,i+1)
rf = inFromOut(i+1)
print "Layer Name = %s, Output size = %3d, Stride = % 3d, RF size = %3d" % (layer_names[i], p[0], p[1], rf)
I set image size is 720. pool5/7x7_s1 layer's receptive field size is much bigger than the original image size. What is wrong with this calculation?
layer output sizes given image = 224x224
Layer Name = conv1/7x7_s2, Output size = 112, Stride = 2, RF size = 7
Layer Name = conv1/relu_7x7, Output size = 112, Stride = 2, RF size = 7
Layer Name = pool1/3x3_s2, Output size = 55, Stride = 4, RF size = 11
Layer Name = pool1/norm1, Output size = 55, Stride = 4, RF size = 11
Layer Name = conv2/3x3_reduce, Output size = 55, Stride = 4, RF size = 11
Layer Name = conv2/relu_3x3_reduce, Output size = 55, Stride = 4, RF size = 11
Layer Name = conv2/3x3, Output size = 55, Stride = 4, RF size = 19
Layer Name = conv2/relu_3x3, Output size = 55, Stride = 4, RF size = 19
Layer Name = pool2/3x3_s2, Output size = 27, Stride = 8, RF size = 27
Layer Name = inception_3a/3x3_reduce, Output size = 27, Stride = 8, RF size = 27
Layer Name = inception_3a/relu_3x3_reduce, Output size = 27, Stride = 8, RF size = 27
Layer Name = inception_3a/3x3, Output size = 27, Stride = 8, RF size = 43
Layer Name = inception_3a/relu_3x3, Output size = 27, Stride = 8, RF size = 43
Layer Name = inception_3b/3x3_reduce, Output size = 27, Stride = 8, RF size = 43
Layer Name = inception_3b/relu_3x3_reduce, Output size = 27, Stride = 8, RF size = 43
Layer Name = inception_3b/3x3, Output size = 27, Stride = 8, RF size = 59
Layer Name = inception_3b/relu_3x3, Output size = 27, Stride = 8, RF size = 59
Layer Name = pool3/3x3_s2, Output size = 13, Stride = 16, RF size = 75
Layer Name = inception_4a/3x3_reduce, Output size = 13, Stride = 16, RF size = 75
Layer Name = inception_4a/relu_3x3_reduce, Output size = 13, Stride = 16, RF size = 75
Layer Name = inception_4a/3x3, Output size = 13, Stride = 16, RF size = 107
Layer Name = inception_4a/relu_3x3, Output size = 13, Stride = 16, RF size = 107
Layer Name = inception_4b/3x3_reduce, Output size = 13, Stride = 16, RF size = 107
Layer Name = inception_4b/relu_3x3_reduce, Output size = 13, Stride = 16, RF size = 107
Layer Name = inception_4b/3x3, Output size = 13, Stride = 16, RF size = 139
Layer Name = inception_4b/relu_3x3, Output size = 13, Stride = 16, RF size = 139
Layer Name = inception_4c/3x3_reduce, Output size = 13, Stride = 16, RF size = 139
Layer Name = inception_4c/relu_3x3_reduce, Output size = 13, Stride = 16, RF size = 139
Layer Name = inception_4c/3x3, Output size = 13, Stride = 16, RF size = 171
Layer Name = inception_4c/relu_3x3, Output size = 13, Stride = 16, RF size = 171
Layer Name = inception_4d/3x3_reduce, Output size = 13, Stride = 16, RF size = 171
Layer Name = inception_4d/relu_3x3_reduce, Output size = 13, Stride = 16, RF size = 171
Layer Name = inception_4d/3x3, Output size = 13, Stride = 16, RF size = 203
Layer Name = inception_4d/relu_3x3, Output size = 13, Stride = 16, RF size = 203
Layer Name = inception_4e/3x3_reduce, Output size = 13, Stride = 16, RF size = 203
Layer Name = inception_4e/relu_3x3_reduce, Output size = 13, Stride = 16, RF size = 203
Layer Name = inception_4e/3x3, Output size = 13, Stride = 16, RF size = 235
Layer Name = inception_4e/relu_3x3, Output size = 13, Stride = 16, RF size = 235
Layer Name = pool4/3x3_s2, Output size = 7, Stride = 32, RF size = 267
Layer Name = inception_5a/3x3_reduce, Output size = 7, Stride = 32, RF size = 267
Layer Name = inception_5a/relu_3x3_reduce, Output size = 7, Stride = 32, RF size = 267
Layer Name = inception_5a/3x3, Output size = 7, Stride = 32, RF size = 331
Layer Name = inception_5a/relu_3x3, Output size = 7, Stride = 32, RF size = 331
Layer Name = inception_5b/3x3_reduce, Output size = 7, Stride = 32, RF size = 331
Layer Name = inception_5b/relu_3x3_reduce, Output size = 7, Stride = 32, RF size = 331
Layer Name = inception_5b/3x3, Output size = 7, Stride = 32, RF size = 395
Layer Name = inception_5b/relu_3x3, Output size = 7, Stride = 32, RF size = 395
Layer Name = pool5/7x7_s1, Output size = 3, Stride = 32, RF size = 587

You should calculate for each path (you have four) and then take the maximal receptive field.

Related

I can't figure out why the size of the tensors doesn't match in Pytorch

Some context:
I have been studying AI and ML for the last couple of month now and finally I am studying neural nets. Great! The problem is that when I follow a tutorial everything seems to be OK, but when I try to implement a NN by my self I always face issues related to the size of the tensors.
I have seem the answer to other questions (like this one) but they face the exact problem of the post. I am not looking for a code to just copy and paste. I want to understand why I am facing this problem, how to handle it and avoid it.
The error message:
/home/devops/aic/venv/lib/python3.8/site-packages/torch/nn/modules/loss.py:528: UserWarning: Using a target size (torch.Size([16, 2])) that is different to the input size (torch.Size([9, 2])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
return F.mse_loss(input, target, reduction=self.reduction)
Traceback (most recent call last):
File "nn_conv.py", line 195, in
loss = loss_function(outputs, targets)
File "/home/devops/aic/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/devops/aic/venv/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 528, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "/home/devops/aic/venv/lib/python3.8/site-packages/torch/nn/functional.py", line 2928, in mse_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/home/devops/aic/venv/lib/python3.8/site-packages/torch/functional.py", line 74, in broadcast_tensors
return _VF.broadcast_tensors(tensors) # type: ignore
RuntimeError: The size of tensor a (9) must match the size of tensor b (16) at non-singleton dimension 0
This is my code:
import os
import cv2
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
class DogsVSCats():
IMG_SIZE = 50
CATS = 'PetImages/Cat'
DOGS = 'PetImages/Dog'
LABELS = {CATS: 0, DOGS: 1}
training_data = []
cats_count = 0
dogs_count = 0
def make_training_data(self):
for label in self.LABELS.keys():
for f in tqdm(os.listdir(label)):
try:
path = os.path.join(label, f)
# convert image to grayscale
img = cv2.imread(path)
if img is not None:
height, width = img.shape[:2]
if width > height:
height = round((height * self.IMG_SIZE) / width)
width = self.IMG_SIZE
right = 0
bottom = self.IMG_SIZE - height
else:
width = round((width * self.IMG_SIZE) / height)
height = self.IMG_SIZE
right = self.IMG_SIZE - width
bottom = 0
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
img = cv2.resize(img, (width, height))
img = cv2.copyMakeBorder(img,
top=0,
bottom=bottom,
left=0,
right=right,
borderType=cv2.BORDER_CONSTANT)
# Add a One-hot-vector of label of the image to self.training_data
self.training_data.append([np.array(img), np.eye(len(self.LABELS))[self.LABELS[label]]])
if label == self.CATS:
self.cats_count += 1
elif label == self.DOGS:
self.dogs_count += 1
except cv2.error as e:
pass
np.random.shuffle(self.training_data)
np.save("PetImages/training_data.npy", self.training_data)
print("Cats:", self.cats_count)
print("Dogs:", self.dogs_count)
training_data = np.load('PetImages/training_data.npy', allow_pickle=True)
plt.imsave('PetImages/trained_example.png', training_data[1][0])
class RunningMetrics():
def __init__(self):
self._sum = 0
self._count = 0
def __call__(self):
return self._sum/float(self._count)
def update(self, val, size):
self._sum += val
self._count += size
class Net(nn.Module):
def __init__(self, num_channels, conv_kernel_size=3, stride=1, padding=1, max_pool_kernel_size=2):
super(Net, self).__init__()
self._num_channels = num_channels
self._max_pool_kernel_size = max_pool_kernel_size
self.conv1 = nn.Conv2d(1, self._num_channels, conv_kernel_size, stride, padding)
self.conv2 = nn.Conv2d(self._num_channels, self._num_channels*2, conv_kernel_size, stride, padding)
self.conv3 = nn.Conv2d(self._num_channels*2, self._num_channels*4, conv_kernel_size, stride, padding)
# Calc input of first
self.fc1 = nn.Linear(self._num_channels*4*8*8, self._num_channels*8)
self.fc2 = nn.Linear(self._num_channels*8, 2)
def forward(self, x):
# Conv
x = self.conv1(x)
x = F.relu(F.max_pool2d(x, self._max_pool_kernel_size))
x = self.conv2(x)
x = F.relu(F.max_pool2d(x, self._max_pool_kernel_size))
x = self.conv3(x)
x = F.relu(F.max_pool2d(x, self._max_pool_kernel_size))
# Flatten
x = x.view(-1, self._num_channels*4*8*8)
# Fully Connected
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
# return F.log_softmax(x, dim=1)
return F.softmax(x, dim=1)
def save_model(path):
torch.save(save, path)
def load_model(path):
self = torch.load(PATH)
self.eval()
if __name__ == '__main__':
print('Loading dataset')
if not os.path.exists("PetImages/training_data.npy"):
dogsvcats = DogsVSCats()
dogsvcats.make_training_data()
training_data = np.load('PetImages/training_data.npy', allow_pickle=True)
print('Loading Net')
net = Net(num_channels=32)
# net = net.to(device)
# optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9 )
optimizer = optim.Adam(net.parameters(), lr=0.001)
# loss_function = nn.NLLLoss()
loss_function = nn.MSELoss()
print('Converting X tensor')
X = torch.Tensor([i[0] for i in training_data]).view(-1, 50, 50)
X = X/255.0
print('Converting Y tensor')
y = torch.Tensor([i[1] for i in training_data])
# Validation data
VAL_PERCENT = 0.1
val_size = int(len(X)*VAL_PERCENT)
X_train = X[:-val_size]
y_train = y[:-val_size]
X_test = X[-val_size:]
y_test = y[-val_size:]
print('Training Set:', len(X_train))
print('Testing Set:', len(X_test))
BATCH_SIZE = 16
EPOCHS = 2
IMG_SIZE=50
for epoch in range(EPOCHS):
print(f'Epoch {epoch+1}/{EPOCHS}')
running_loss = RunningMetrics()
running_acc = RunningMetrics()
for i in tqdm(range(0, len(X_train), BATCH_SIZE)):
inputs = X_train[i:i+BATCH_SIZE].view(-1,1, IMG_SIZE, IMG_SIZE)
targets = y_train[i:i+BATCH_SIZE]
# inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = net(inputs)
_, preds = torch.max(outputs, 1)
loss = loss_function(outputs, targets)
loss.backward()
optimizer.step()
running_loss.update(loss.item()*BATCH_SIZE,
BATCH_SIZE)
running_acc.update(toch.sum(preds == targets).float(),
BATCH_SIZE)
print(f'Loss: {running_loss:.4f}, Acc: {running_acc:.4f}')
print('-'*10)
Dataset:
I am using the Microsoft's dataset of cats and dogs images
EDIT:
The error previous message has been solved following Anonymous' advice but now I am getting another error:
Traceback (most recent call last):
File "nn_conv.py", line 203, in
running_acc.update(torch.sum(preds == targets).float(),
RuntimeError: The size of tensor a (16) must match the size of tensor b (2) at non-singleton dimension 1
Input : 16 x 1 x 50 x 50
After conv1/maxpool1 : 16 x 32 x 25 x 25
After conv2/maxpool2 : 16 x 64 x 12 x 12 (no padding so taking floor)
After conv3/maxpool3 : 16 x 128 x 6 x 6 (=73 728 neurons here is your error)
Flattening : you specified a view like -1 x 32 * 4 * 8 * 8 = 9 x 8192
The correct flattening is -1 x 32 * 4 * 6 * 6
Few tips :
as you begin pytorch, you should go see how to use a dataloader/dataset
the binary cross entropy is more commonly used for classification (though MSE is still possible)

Beginner PyTorch - RuntimeError: shape '[16, 400]' is invalid for input of size 9600

I'm trying to build a CNN but I get this error:
---> 52 x = x.view(x.size(0), 5 * 5 * 16)
RuntimeError: shape '[16, 400]' is invalid for input of size 9600
It's not clear for me what the inputs of the 'x.view' line should be. Also, I don't really understand how many times I should have this 'x.view' function in my code. Is it only once, after the 3 convolutional layers and 2 linear layers? Or is it 5 times, one after every layer?
Here's my code:
CNN
import torch.nn.functional as F
# Convolutional neural network
class ConvNet(nn.Module):
def __init__(self, num_classes=10):
super(ConvNet, self).__init__()
self.conv1 = nn.Conv2d(
in_channels=3,
out_channels=16,
kernel_size=3)
self.conv2 = nn.Conv2d(
in_channels=16,
out_channels=24,
kernel_size=4)
self.conv3 = nn.Conv2d(
in_channels=24,
out_channels=32,
kernel_size=4)
self.dropout = nn.Dropout2d(p=0.3)
self.pool = nn.MaxPool2d(2)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(512, 10)
self.final = nn.Softmax(dim=1)
def forward(self, x):
print('shape 0 ' + str(x.shape))
x = F.max_pool2d(F.relu(self.conv1(x)), 2)
x = self.dropout(x)
print('shape 1 ' + str(x.shape))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = self.dropout(x)
print('shape 2 ' + str(x.shape))
# x = F.max_pool2d(F.relu(self.conv3(x)), 2)
# x = self.dropout(x)
x = F.interpolate(x, size=(5, 5))
x = x.view(x.size(0), 5 * 5 * 16)
x = self.fc1(x)
return x
net = ConvNet()
Can someone help me understand the problem?
The output of 'x.shape' is:
shape 0 torch.Size([16, 3, 256, 256])
shape 1 torch.Size([16, 16, 127, 127])
shape 2 torch.Size([16, 24, 62, 62])
Thanks
This means that instead the product of the channel and spatial dimensions is not 5*5*16. To flatten the tensor, replace x = x.view(x.size(0), 5 * 5 * 16) with:
x = x.view(x.size(0), -1)
And self.fc1 = nn.Linear(600, 120) with:
self.fc1 = nn.Linear(600, 120)

Difference between Batch Normalization and Self Normalized Neural Network with SELU

I would like to know the difference between batch normalization and self normalized neural network. In other words, would SELU (Scaled Exponential Linear Unit) replace batch normalization and how?
Moreover, I after looking into the values of the SELU activations, they were in the range: [-1, 1]. While this is not the case with batch normalization. Instead, the values after the BN layer (before the relu activation), took the values of [-a, a] Approximately, and not [-1, 1].
Here is how I printed the values after the SELU activation and after batch norm layer:
batch_norm_layer = tf.Print(batch_norm_layer,
data=[tf.reduce_max(batch_norm_layer), tf.reduce_min(batch_norm_layer)],
message = name_scope + ' min and max')
And similar code for the SELU activations...
Batch norm layer is defined as follows:
def batch_norm(x, n_out, phase_train, in_conv_layer = True):
with tf.variable_scope('bn'):
beta = tf.Variable(tf.constant(0.0, shape=n_out),
name='beta', trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=n_out),
name='gamma', trainable=True)
if in_conv_layer:
batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2], name='moments')
else:
batch_mean, batch_var = tf.nn.moments(x, [0, 1], name='moments')
ema = tf.train.ExponentialMovingAverage(decay=0.9999)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(phase_train,
mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
return normed
Therefore, since batch norm outputs higher values, the loss increases dramatically, and thus I got nans.
In addition, I tried reducing the learning rate with batch norm, but, that didn't help as well. So how to fix this problem???
Here is the following code:
import tensorflow as tf
import numpy as np
import os
import cv2
batch_size = 32
num_epoch = 102
latent_dim = 100
def weight_variable(kernal_shape):
weights = tf.get_variable(name='weights', shape=kernal_shape, dtype=tf.float32, trainable=True,
initializer=tf.truncated_normal_initializer(stddev=0.02))
return weights
def bias_variable(shape):
initial = tf.constant(0.0, shape=shape)
return tf.Variable(initial)
def batch_norm(x, n_out, phase_train, convolutional = True):
with tf.variable_scope('bn'):
exp_moving_avg = tf.train.ExponentialMovingAverage(decay=0.9999)
beta = tf.Variable(tf.constant(0.0, shape=n_out),
name='beta', trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=n_out),
name='gamma', trainable=True)
if convolutional:
batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2], name='moments')
else:
batch_mean, batch_var = tf.nn.moments(x, [0], name='moments')
update_moving_averages = exp_moving_avg.apply([batch_mean, batch_var])
m = tf.cond(phase_train, lambda: exp_moving_avg.average(batch_mean), lambda: batch_mean)
v = tf.cond(phase_train, lambda: exp_moving_avg.average(batch_var), lambda: batch_var)
normed = tf.nn.batch_normalization(x, m, v, beta, gamma, 1e-3)
normed = tf.Print(normed, data=[tf.shape(normed)], message='size of normed?')
return normed, update_moving_averages # Note that we should run the update_moving_averages with sess.run...
def conv_layer(x, w_shape, b_shape, padding='SAME'):
W = weight_variable(w_shape)
tf.summary.histogram("weights", W)
b = bias_variable(b_shape)
tf.summary.histogram("biases", b)
# Note that I used a stride of 2 on purpose in order not to use max pool layer.
conv = tf.nn.conv2d(x, W, strides=[1, 2, 2, 1], padding=padding) + b
conv_batch_norm, update_moving_averages = batch_norm(conv, b_shape, phase_train=tf.cast(True, tf.bool))
name_scope = tf.get_variable_scope().name
conv_batch_norm = tf.Print(conv_batch_norm,
data=[tf.reduce_max(conv_batch_norm), tf.reduce_min(conv_batch_norm)],
message = name_scope + ' min and max')
activations = tf.nn.relu(conv_batch_norm)
tf.summary.histogram("activations", activations)
return activations, update_moving_averages
def deconv_layer(x, w_shape, b_shape, padding="SAME", activation='selu'):
W = weight_variable(w_shape)
tf.summary.histogram("weights", W)
b = bias_variable(b_shape)
tf.summary.histogram('biases', b)
x_shape = tf.shape(x)
out_shape = tf.stack([x_shape[0], x_shape[1] * 2, x_shape[2] * 2, w_shape[2]])
if activation == 'selu':
conv_trans = tf.nn.conv2d_transpose(x, W, out_shape, [1, 2, 2, 1], padding=padding) + b
conv_trans_batch_norm, update_moving_averages = \
batch_norm(conv_trans, b_shape, phase_train=tf.cast(True, tf.bool))
transposed_activations = tf.nn.relu(conv_trans_batch_norm)
else:
conv_trans = tf.nn.conv2d_transpose(x, W, out_shape, [1, 2, 2, 1], padding=padding) + b
conv_trans_batch_norm, update_moving_averages = \
batch_norm(conv_trans, b_shape, phase_train=tf.cast(True, tf.bool))
transposed_activations = tf.nn.sigmoid(conv_trans_batch_norm)
tf.summary.histogram("transpose_activation", transposed_activations)
return transposed_activations, update_moving_averages
tfrecords_filename_seq = ["C:/Users/user/PycharmProjects/AffectiveComputing/P16_db.tfrecords"]
filename_queue = tf.train.string_input_producer(tfrecords_filename_seq, num_epochs=num_epoch, shuffle=False, name='queue')
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
# Defaults are not specified since both keys are required.
features={
'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'image_raw': tf.FixedLenFeature([], tf.string),
'annotation_raw': tf.FixedLenFeature([], tf.string)
})
# This is how we create one example, that is, extract one example from the database.
image = tf.decode_raw(features['image_raw'], tf.uint8)
# The height and the weights are used to
height = tf.cast(features['height'], tf.int32)
width = tf.cast(features['width'], tf.int32)
# The image is reshaped since when stored as a binary format, it is flattened. Therefore, we need the
# height and the weight to restore the original image back.
image = tf.reshape(image, [height, width, 3])
annotation = tf.cast(features['annotation_raw'], tf.string)
min_after_dequeue = 100
num_threads = 1
capacity = min_after_dequeue + num_threads * batch_size
label_batch, images_batch = tf.train.batch([annotation, image],
shapes=[[], [112, 112, 3]],
batch_size=batch_size,
capacity=capacity,
num_threads=num_threads)
label_batch_splitted = tf.string_split(label_batch, delimiter=',')
label_batch_values = tf.reshape(label_batch_splitted.values, [batch_size, -1])
label_batch_numbers = tf.string_to_number(label_batch_values, out_type=tf.float32)
confidences = tf.slice(label_batch_numbers, begin=[0, 2], size=[-1, 1])
images_batch = tf.cast([images_batch], tf.float32)[0] # Note that casting the image will increases its rank.
with tf.name_scope('image_normal'):
images_batch = tf.map_fn(lambda img: tf.image.per_image_standardization(img), images_batch)
#images_batch = tf.Print(images_batch, data=[tf.reduce_max(images_batch), tf.reduce_min(images_batch)],
# message='min and max in images_batch')
with tf.variable_scope('conv1'):
conv1, uma_conv1 = conv_layer(images_batch, [4, 4, 3, 32], [32]) # image size: [56, 56]
with tf.variable_scope('conv2'):
conv2, uma_conv2 = conv_layer(conv1, [4, 4, 32, 64], [64]) # image size: [28, 28]
with tf.variable_scope('conv3'):
conv3, uma_conv3 = conv_layer(conv2, [4, 4, 64, 128], [128]) # image size: [14, 14]
with tf.variable_scope('conv4'):
conv4, uma_conv4 = conv_layer(conv3, [4, 4, 128, 256], [256]) # image size: [7, 7]
conv4_reshaped = tf.reshape(conv4, [-1, 7 * 7 * 256], name='conv4_reshaped')
w_c_mu = tf.Variable(tf.truncated_normal([7 * 7 * 256, latent_dim], stddev=0.1), name='weight_fc_mu')
b_c_mu = tf.Variable(tf.constant(0.1, shape=[latent_dim]), name='biases_fc_mu')
w_c_sig = tf.Variable(tf.truncated_normal([7 * 7 * 256, latent_dim], stddev=0.1), name='weight_fc_sig')
b_c_sig = tf.Variable(tf.constant(0.1, shape=[latent_dim]), name='biases_fc_sig')
epsilon = tf.random_normal([1, latent_dim])
tf.summary.histogram('weights_c_mu', w_c_mu)
tf.summary.histogram('biases_c_mu', b_c_mu)
tf.summary.histogram('weights_c_sig', w_c_sig)
tf.summary.histogram('biases_c_sig', b_c_sig)
with tf.variable_scope('mu'):
mu = tf.nn.bias_add(tf.matmul(conv4_reshaped, w_c_mu), b_c_mu)
tf.summary.histogram('mu', mu)
with tf.variable_scope('stddev'):
stddev = tf.nn.bias_add(tf.matmul(conv4_reshaped, w_c_sig), b_c_sig)
tf.summary.histogram('stddev', stddev)
with tf.variable_scope('z'):
latent_var = mu + tf.multiply(tf.sqrt(tf.exp(stddev)), epsilon)
tf.summary.histogram('features_sig', stddev)
w_dc = tf.Variable(tf.truncated_normal([latent_dim, 7 * 7 * 256], stddev=0.1), name='weights_dc')
b_dc = tf.Variable(tf.constant(0.0, shape=[7 * 7 * 256]), name='biases_dc')
tf.summary.histogram('weights_dc', w_dc)
tf.summary.histogram('biases_dc', b_dc)
with tf.variable_scope('deconv4'):
deconv4 = tf.nn.bias_add(tf.matmul(latent_var, w_dc), b_dc)
deconv4_batch_norm, uma_deconv4 = \
batch_norm(deconv4, [7 * 7 * 256], phase_train=tf.cast(True, tf.bool), convolutional=False)
deconv4 = tf.nn.relu(deconv4_batch_norm)
deconv4_reshaped = tf.reshape(deconv4, [-1, 7, 7, 256], name='deconv4_reshaped')
with tf.variable_scope('deconv3'):
deconv3, uma_deconv3 = deconv_layer(deconv4_reshaped, [3, 3, 128, 256], [128], activation='selu')
with tf.variable_scope('deconv2'):
deconv2, uma_deconv2 = deconv_layer(deconv3, [3, 3, 64, 128], [64], activation='selu')
with tf.variable_scope('deconv1'):
deconv1, uma_deconv1 = deconv_layer(deconv2, [3, 3, 32, 64], [32], activation='selu')
with tf.variable_scope('deconv_image'):
deconv_image_batch, uma_deconv = deconv_layer(deconv1, [3, 3, 3, 32], [3], activation='sigmoid')
# loss function.
with tf.name_scope('loss_likelihood'):
# temp1 shape: [32, 112, 112, 3]
temp1 = images_batch * tf.log(deconv_image_batch + 1e-9) + (1 - images_batch) * tf.log(1 - deconv_image_batch + 1e-9)
#temp1 = temp1 * confidences. This will give an error. Therefore, we should expand the dimension of confidence tensor
confidences_ = tf.expand_dims(tf.expand_dims(confidences, axis=1), axis=1) # shape: [32, 1, 1, 1].
temp1 = temp1 * confidences_
log_likelihood = -tf.reduce_sum(temp1, reduction_indices=[1, 2, 3])
log_likelihood_total = tf.reduce_sum(log_likelihood)
#l2_loss = tf.reduce_mean(tf.abs(tf.subtract(images_batch, deconv_image_batch)))
with tf.name_scope('loss_KL'):
# temp2 shape: [32, 200]
temp2 = 1 + tf.log(tf.square(stddev + 1e-9)) - tf.square(mu) - tf.square(stddev)
temp3 = temp2 * confidences # confidences shape is [32, 1]
KL_term = - 0.5 * tf.reduce_sum(temp3, reduction_indices=1)
KL_term_total = tf.reduce_sum(KL_term)
with tf.name_scope('total_loss'):
variational_lower_bound = tf.reduce_mean(log_likelihood + KL_term)
tf.summary.scalar('loss', variational_lower_bound)
with tf.name_scope('optimizer'):
optimizer = tf.train.AdamOptimizer(0.00001).minimize(variational_lower_bound)
init_op = tf.group(tf.local_variables_initializer(),
tf.global_variables_initializer())
saver = tf.train.Saver()
model_path = 'C:/Users/user/PycharmProjects/VariationalAutoEncoder/' \
'VariationalAutoEncoderFaces/tensorboard_logs/Graph_model/ckpt'
# Here is the session...
with tf.Session() as sess:
train_writer = tf.summary.FileWriter('C:/Users/user/PycharmProjects/VariationalAutoEncoder/'
'VariationalAutoEncoderFaces/tensorboard_logs/Event_files', sess.graph)
merged = tf.summary.merge_all()
# Note that init_op should start before the Coordinator and the thread otherwise, this will throw an error.
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
step = 0
to_run_list = [uma_conv1, uma_conv2, uma_conv3, uma_conv4, uma_deconv1, uma_deconv2, uma_deconv3,
uma_deconv4, uma_deconv, optimizer, variational_lower_bound, merged,
deconv_image_batch, image]
# Note that the last name "Graph_model" is the name of the saved checkpoints file => the ckpt is saved
# under tensorboard_logs.
ckpt = tf.train.get_checkpoint_state(
os.path.dirname(model_path))
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print('checkpoints are saved!!!')
else:
print('No stored checkpoints')
epoch = 0
while not coord.should_stop():
_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, loss, summary, reconstructed_image, original_image = \
sess.run(to_run_list)
print('total loss:', loss)
original_image = cv2.cvtColor(np.array(original_image), cv2.COLOR_RGB2BGR)
reconstructed_image = cv2.cvtColor(np.array(reconstructed_image[0]), cv2.COLOR_RGB2BGR)
cv2.imshow('original_image', original_image)
cv2.imshow('reconstructed_image', reconstructed_image)
cv2.waitKey(1)
if step % 234 == 0:
epoch += 1
print('epoch:', epoch)
if epoch == num_epoch - 2:
coord.request_stop()
if step % 100 == 0:
train_writer.add_summary(summary, step)
#print('total loss:', loss)
#print('log_likelihood_', log_likelihood_)
#print('KL_term', KL_term_)
step += 1
save_path = saver.save(sess, model_path)
coord.request_stop()
coord.join(threads)
train_writer.close()
Any help is much appreciated!!
Here are some sample codes to show the trend of means and variances over 3 SELU layers. The numbers of nodes on the layers (including the input layer) are [15, 30, 30, 8]
import tensorflow as tf
import numpy as np
import os
#-----------------------------------------------#
# https://github.com/bioinf-jku/SNNs/blob/master/selu.py
# The SELU activation function
def selu(x):
with ops.name_scope('elu') as scope:
alpha = 1.6732632423543772848170429916717
scale = 1.0507009873554804934193349852946
return scale*tf.where(x>=0.0, x, alpha*tf.nn.elu(x))
#-----------------------------------------------#
# https://github.com/bioinf-jku/SNNs/blob/master/selu.py
# alpha-dropout
def dropout_selu(x, rate, alpha= -1.7580993408473766, fixedPointMean=0.0, fixedPointVar=1.0,
noise_shape=None, seed=None, name=None, training=False):
"""Dropout to a value with rescaling."""
def dropout_selu_impl(x, rate, alpha, noise_shape, seed, name):
keep_prob = 1.0 - rate
x = ops.convert_to_tensor(x, name="x")
if isinstance(keep_prob, numbers.Real) and not 0 < keep_prob <= 1:
raise ValueError("keep_prob must be a scalar tensor or a float in the "
"range (0, 1], got %g" % keep_prob)
keep_prob = ops.convert_to_tensor(keep_prob, dtype=x.dtype, name="keep_prob")
keep_prob.get_shape().assert_is_compatible_with(tensor_shape.scalar())
alpha = ops.convert_to_tensor(alpha, dtype=x.dtype, name="alpha")
alpha.get_shape().assert_is_compatible_with(tensor_shape.scalar())
if tensor_util.constant_value(keep_prob) == 1:
return x
noise_shape = noise_shape if noise_shape is not None else array_ops.shape(x)
random_tensor = keep_prob
random_tensor += random_ops.random_uniform(noise_shape, seed=seed, dtype=x.dtype)
binary_tensor = math_ops.floor(random_tensor)
ret = x * binary_tensor + alpha * (1-binary_tensor)
a = math_ops.sqrt(fixedPointVar / (keep_prob *((1-keep_prob) * math_ops.pow(alpha-fixedPointMean,2) + fixedPointVar)))
b = fixedPointMean - a * (keep_prob * fixedPointMean + (1 - keep_prob) * alpha)
ret = a * ret + b
ret.set_shape(x.get_shape())
return ret
with ops.name_scope(name, "dropout", [x]) as name:
return utils.smart_cond(training,
lambda: dropout_selu_impl(x, rate, alpha, noise_shape, seed, name),
lambda: array_ops.identity(x))
#-----------------------------------------------#
# build a 3-layer dense network with SELU activation and alpha-dropout
sess = tf.InteractiveSession()
w1 = tf.constant(np.random.normal(loc=0.0, scale=np.sqrt(1.0/15.0), size = [15, 30]))
b1 = tf.constant(np.random.normal(loc=0.0, scale=0.5, size = [30]))
x1 = tf.constant(np.random.normal(loc=0.0, scale=1.0, size = [200, 15]))
y1 = tf.add(tf.matmul(x1, w1), b1)
y1_selu = selu(y1)
y1_selu_dropout = dropout_selu(y1_selu, 0.05, training=True)
w2 = tf.constant(np.random.normal(loc=0.0, scale=np.sqrt(1.0/30.0), size = [30, 30]))
b2 = tf.constant(np.random.normal(loc=0.0, scale=0.5, size = [30]))
x2 = y1_selu_dropout
y2 = tf.add(tf.matmul(x2, w2), b2)
y2_selu = selu(y2)
y2_selu_dropout = dropout_selu(y2_selu, 0.05, training=True)
w3 = tf.constant(np.random.normal(loc=0.0, scale=np.sqrt(1.0/30.0), size = [30, 8]))
b3 = tf.constant(np.random.normal(loc=0.0, scale=0.5, size = [8]))
x3 = y2_selu_dropout
y3 = tf.add(tf.matmul(x3, w3), b3)
y3_selu = selu(y3)
y3_selu_dropout = dropout_selu(y3_selu, 0.05, training=True)
#-------------------------#
# evaluate the network
x1_v, y1_selu_dropout_v, \
x2_v, y2_selu_dropout_v, \
x3_v, y3_selu_dropout_v, \
= sess.run([x1, y1_selu_dropout, x2, y2_selu_dropout, x3, y3_selu_dropout])
#-------------------------#
# print each layer's mean and standard deviation (1st line: input; 2nd line: output)
print("Layer 1")
print(np.mean(x1_v), np.std(x1_v))
print(np.mean(y1_selu_dropout_v), np.std(y1_selu_dropout_v))
print("Layer 2")
print(np.mean(x2_v), np.std(x2_v))
print(np.mean(y2_selu_dropout_v), np.std(y2_selu_dropout_v))
print("Layer 3")
print(np.mean(x3_v), np.std(x3_v))
print(np.mean(y3_selu_dropout_v), np.std(y3_selu_dropout_v))
Here is one possible output. Over 3 layers, the mean and standard deviation are still close to 0 and 1, respectively.
Layer 1
-0.0101213033749 1.01375071842
0.0106228883975 1.09375593322
Layer 2
0.0106228883975 1.09375593322
-0.027910206754 1.12216643393
Layer 3
-0.027910206754 1.12216643393
-0.131790078631 1.09698413493

Constructing structure for wind data

I have been trying to construct a structure for wind data in matlab using the struct function:
struct(fieldname1,value1,fieldname2,value2,......).
I have wind speed and direction at different heights such as 40,50,80,90 meters. The problem is I don't understand how to represent the 'value', for my big data.
wind_data=struct(ws40,[],ws50,[],ws80,[],ws90,[],wd40,[],wd50,[],wd80,[],wd90,[])
ws=wind speed. wd=wind direction and each of them are vectors.
You can allocate structs by hand:
wind_data.ws40 = [1, 2, 3];
wind_data.wd40 = [4, 5, 6];
wind_data.ws50 = [11, 22, 33];
wind_data.wd50 = [44, 55, 66];
or dynamically:
heights = [40, 50, 80, 90];
ws = round(10*rand(4,3));
wd = round(10*rand(4,3));
for hh = 1:numel(heights)
wind_data.( [ 'ws' num2str(heights(hh)) ] ) = ws(hh,:)
wind_data.( [ 'wd' num2str(heights(hh)) ] ) = wd(hh,:)
end
or directly allocate them, you have to put the fieldname in '' as Ed Smith already said:
heights = [40, 50, 80, 90];
ws = round(10*rand(4,3));
wd = round(10*rand(4,3));
wind_data = struct('ws40', ws(1,:), ...
'wd40', wd(1,:), ...
'ws50', ws(2,:), ...
'wd50', wd(2,:) );

matlab - unfixed dimension matrix, set value to multiple fields of struct at once, avoid loop

I have 2 (not very small) 3-dimension structs with matrices as fields:
sz1 = 200;
sz2 = 9;
sz3 = [20, 40, 80, 160, 320, 640, 1280, 2560, 5120]
% actually the structs have 12 fields, each has size 200x9x5120
mat.p1(sz1, sz2, sz3(sz2)) = uint8(0);
mat.p2(sz1, sz2, sz3(sz2)) = uint8(0);
mat.p3(sz1, sz2, sz3(sz2)) = 0;
mat.p4(sz1, sz2, sz3(sz2)) = 0;
old_mat.p1(sz1, sz2, sz3(sz2)) = uint8(0);
old_mat.p2(sz1, sz2, sz3(sz2)) = uint8(0);
old_mat.p3(sz1, sz2, sz3(sz2)) = 0;
old_mat.p4(sz1, sz2, sz3(sz2)) = 0;
And i need to write a reset function, where i will re-assign the values to 3 of 4 (actually 10 of 12) fields in both matrices as follows:
for i = 1:sz1
for j = 1:sz2
for k = 1:sz3(j)
mat.p1(i,j,k) = uint8(255);
mat.p3(i,j,k) = -1;
mat.p4(i,j,k) = 0.01;
old_mat.p1(i,j,k) = uint8(255);
old_mat.p3(i,j,k) = -1;
old_mat.p4(i,j,k) = 0.01;
end
end
end
Note that actually what i need in the matrices is the same as the reset function, means i only need the 5120th index of the 3rd-dimension when 2nd-dimension is 9, if 2nd-dimension = 4, i only need up to 160 indices at 3rd-dimension etc.
The questions are:
Is there anyway to assign values to the fields at the same time (not 1 line 1 field) since i actually have to do with 10 fields?
Is there anyway to avoid the for-loops? I tried like this:
mat.p1(:) = uint8(255);
mat.p3(:) = -1;
mat.p4(:) = 0.01;
old_mat.p1(:) = uint8(255);
old_mat.p3(:) = -1;
old_mat.p4(:) = 0.01;
but here all the matrices are filled with max 3rd-dimension = 5120 so i expect someone can show me how to use the vectorizing function like arrayfun, bsxfun, cellfun etc., which can apply for just the "half-cubic" like the for-loops above.
UPDATE:
Thanks horchler for the video, it seems the problem with big size (in bytes) of the matrices is solved when i change the matrix of struct to a struct with matrices as fields. This also clears the timing problem even with nested for-loops. So i updated the questions and the input as well, please see above.
Yes, I think that using a structure of arrays will work better for you here. You should be able to allocate using zeros just as if mat.p1, mat.p2, etc. were regular arrays. I'd do something like this (note that you didn't indicate any values for mat.p2) using a single for loop:
sz1 = 200;
sz2 = 9;
sz3 = [20, 40, 80, 160, 320, 640, 1280, 2560, 5120];
% Be careful with this form of pre-allocation if your sz arrays change
% Clear your struct or make sure to use the code in a function
mat.p1(sz1,sz2,sz3(sz2)) = uint8(0);
mat.p3(sz1,sz2,sz3(sz2)) = 0;
mat.p4(sz1,sz2,sz3(sz2)) = 0;
for i = 1:sz2
mat.p1(:,i,1:sz3(i)) = uint8(255);
mat.p3(:,i,1:sz3(i)) = -1;
mat.p4(:,i,1:sz3(i)) = 0.01;
end
old_mat.p1 = mat.p1;
old_mat.p3 = mat.p3;
old_mat.p4 = mat.p4;
Alternatively, you could do something like this:
sz1 = 200;
sz2 = 9;
sz3 = [20, 40, 80, 160, 320, 640, 1280, 2560, 5120];
mat = struct('p1',zeros(sz1,sz2,sz3(sz2),'uint8'),...
'p3',zeros(sz1,sz2,sz3(sz2)),...
'p4',zeros(sz1,sz2,sz3(sz2)));
for i = 1:sz2
mat.p1(:,i,1:sz3(i)) = uint8(255);
mat.p3(:,i,1:sz3(i)) = -1;
mat.p4(:,i,1:sz3(i)) = 0.01;
end
old_mat = struct('p1',mat.p1,'p3',mat.p3,'p4',mat.p4);