Deconv implementation in keras output_shape issue - convolution

I am implementing following Colorization Model written in Caffe. I am confused about my output_shape parameter to supply in Keras
model.add(Deconvolution2D(256,4,4,border_mode='same',
output_shape=(None,3,14,14),subsample=(2,2),dim_ordering='th',name='deconv_8.1'))
I have added a dummy output_shape parameter. But how can I determine the output parameter? In caffe model the layer is defined as:
layer {
name: "conv8_1"
type: "Deconvolution"
bottom: "conv7_3norm"
top: "conv8_1"
convolution_param {
num_output: 256
kernel_size: 4
pad: 1
dilation: 1
stride: 2
}
If I do not supply this parameter the code give parameter error but I can not understand what should I supply as output_shape
p.s. already asked on data science forum page with no response. may be due to small user base

What output shape does the Caffe deconvolution layer produce?
For this colorization model in particular you can simply refer to page 24 of their paper (which is linked in their GitHub page):
So basically the output shape of this deconvolution layer in the original model is [None, 56, 56, 128]. This is what you want to pass to Keras as output_shape. The only problem is as I mention in the section below, Keras doesn't really use this parameter to determine the output shape, so you need to run a dummy prediction to find what your other parameters need to be in order for you to get what you want.
More generally the Caffe source code for computing its Deconvolution layer output shape is:
const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1;
const int output_dim = stride_data[i] * (input_dim - 1)
+ kernel_extent - 2 * pad_data[i];
Which with a dilation argument equal to 1 reduces to just:
const int output_dim = stride_data[i] * (input_dim - 1)
+ kernel_shape_data[i] - 2 * pad_data[i];
Note that this matches the Keras documentation when the parameter a is zero:
Formula for calculation of the output shape 3, 4: o = s (i - 1) +
a + k - 2p
How to verify actual output shape with your Keras backend
This is tricky, because the actual output shape depends on the backend implementation and configuration. Keras is currently unable to find it on its own. So you actually have to execute a prediction on some dummy input to find the actual output shape. Here's an example of how to do this from the Keras docs for Deconvolution2D:
To pass the correct `output_shape` to this layer,
one could use a test model to predict and observe the actual output shape.
# Examples
```python
# apply a 3x3 transposed convolution with stride 1x1 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', input_shape=(3, 12, 12)))
# Note that you will have to change the output_shape depending on the backend used.
# we can predict with the model and print the shape of the array.
dummy_input = np.ones((32, 3, 12, 12))
# For TensorFlow dummy_input = np.ones((32, 12, 12, 3))
preds = model.predict(dummy_input)
print(preds.shape)
# Theano GPU: (None, 3, 13, 13)
# Theano CPU: (None, 3, 14, 14)
# TensorFlow: (None, 14, 14, 3)
Reference: https://github.com/fchollet/keras/blob/master/keras/layers/convolutional.py#L507
Also you might be curious to know why is it that the output_shape parameter apparently doesn't really define the output shape. According to the post Deconvolution2D layer in keras this is why:
Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).

Related

Inputs to Encoder-Decoder LSTMCell/RNN Network

I'm creating an LSTM Encoder-Decoder Network, using Keras, following the code provided here: https://github.com/LukeTonin/keras-seq-2-seq-signal-prediction. The only change I made is to replace the GRUCell with an LSTMCell. Basically both the encoder and decoder consists of 2 layers, of 35 LSTMCells. The layers are stacked over (and combined with) each other using an RNN Layer.
The LSTMCell returns 2 states whereas the GRUCell returns 1 state. This is where I am encountering an error, as I do not know how to code for the 2 returned states of the LSTMCell.
I have created two models: first, an encoder-decoder model. Second, a prediction model. I am not encountering any problems in the encoder-decoder model, but a encountering problems in the decoder of the prediction model.
The error I am getting is:
ValueError: Layer rnn_4 expects 9 inputs, but it received 3 input tensors. Input received: [<tf.Tensor 'input_4:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'input_11:0' shape=(?, 35) dtype=float32>, <tf.Tensor 'input_12:0' shape=(?, 35) dtype=float32>]
This error happens when this line below, in the prediction model, is run:
decoder_outputs_and_states = decoder(
decoder_inputs, initial_state=decoder_states_inputs)
The section of code this fits into is:
encoder_predict_model = keras.models.Model(encoder_inputs,
encoder_states)
decoder_states_inputs = []
# Read layers backwards to fit the format of initial_state
# For some reason, the states of the model are order backwards (state of the first layer at the end of the list)
# If instead of a GRU you were using an LSTM Cell, you would have to append two Input tensors since the LSTM has 2 states.
for hidden_neurons in layers[::-1]:
# One state for GRU, but two states for LSTMCell
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
decoder_outputs_and_states = decoder(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_outputs = decoder_outputs_and_states[0]
decoder_states = decoder_outputs_and_states[1:]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_predict_model = keras.models.Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
Could somebody help me with the for loop above, and initial states I should be passing the decoder after that?
I had an similar error and i solved just doing what he says, adding another input tensor:
# If instead of a GRU you were using an LSTM Cell, you would have to append two Input tensors since the LSTM has 2 states.
for hidden_neurons in layers[::-1]:
# One state for GRU
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
here it solved the prolem...

caffe: convolution with a fix predifined kernel (filter)

Instead of having a learnable filter, I am interested in a convolution with a fix predefined matrix; for example sobel filter:
so, I set learning = 0 (so its fixed), and my kernel size = 3 as:
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param { lr_mult: 0 decay_mult: 0 }
convolution_param {
num_output: 10
kernel_size: 3 # filter is 3x3
stride: 2
weight_filler {
type: ??}
}
}
Now, I do not know how to give matrix information to the conv layer. Any ideas? I think it should go to weight_filler, but how?
One more question: num_output has to be same as bottom's (data channel = 10 here) channel size? can I set num_output another number? if yes, what will happen and what that means?
How to init weights to specific values?
You can use net_surgery to load your untrained/un-initialized net in python and then assign the specific weights you want to the filters, save the net, and use it with the weights you want for this specific layer.
How do set num_output and other conv_params?
This is a good question: You have an input blob of shape bx10xhxw and you want to apply a 3x3 filter to each channel and get back a new filtered bx10xhxw. If you just set num_output: 10, the shape of the filters would be 10x10x3x3, that is, 10 filters of shape 10x3x3 - which is not want you expect. You want a 3x3 filter.
To that end you need to look at group conv_param. Setting group: 10 together with num_output: 10 (assuming input c=10) will give you what you want, the weight shape will be 10x1x3x3.
In python caffe interface, caffe.Net object instatiated with loading .prototxt file, which defined the network architecture. You can use caffe.Net object with following properties for accessing various information on the network.
blob_loss_weights: An OrderedDict (bottom to top, i.e., input to output) of network blob loss weights indexed by layer name
blobs: An OrderedDict (bottom to top, i.e., input to output) of network blobs indexed by layer name
bottom_names: all bottom names in the network
inputs: inputs to this network
layer_dict: An OrderedDict (bottom to top, i.e., input to output) of network layers indexed by layer name
layers: caffe._caffe.LayerVec - list of whose element is caffe.Layer objects in the network, caffe.Layer classs has blobs field for layer's parameters memory and type for layer type (e.g, Convolution, Data, etc)
outputs: outputs from this network
params: An OrderedDict (bottom to top, i.e., input to output) of network parameters indexed by name; each is a list of multiple blobs (e.g., weights and biases)
top_names: all top names in the network
You can use caffe.Net.params for accessing layer's learnable parameters together with caffe.Net.layer_dict to access layer info.
caffe.Net.params is ordered dictionary where key is layer name and value is the blobs for parameters (e.g, weight and bias) and in case of Convolution layer, first element of blobs are weiht and second element of blobs is bias:
caffe.Net.params['layer_name'][0] : weight
caffe.Net.params['layer_name'][1] : bias
Please note that access to blob's memory should be done with caffe.Net.params['layer_name'][0].data and updating the blob's memory should be done with ... such as caffe.Net.params['layer_name'][0].data[...]
Following code illustrate the loading learnable parameter from numpy saved file (.npy):
def load_weights_and_biases(network):
k_list = list(network.params.keys())
suffix = ["weight", "bias"]
num_layers = len(network.layer_dict)
for idx, layer_name in enumerate(network.layer_dict):
print(f"\n-----------------------------")
print(f"layer index: {idx}/{num_layers}")
print(f"layer name: '{layer_name}''")
print(f"layer type: '{detection_nw.layers[idx].type}' ")
if layer_name in k_list:
params = network.params[layer_name]
print(f"{len(params)} learnable parameters in '{detection_nw.layers[idx].type}' type")
for i, p in enumerate(params):
#print(f"\tparams[{i}]: {p}")
#print(f"\tparams[{i}] CxHxW: {p.channels}x{p.height}x{p.width}")
print(f"\tp[{i}]: {p.data.shape} of {p.data.dtype}")
param_file_path = f"./npy_save/{layer_name}_{suffix[i]}.npy"
param_file = Path(param_file_path)
if param_file.exists():
print(f"\tload {param_file_path}")
arr = np.load(param_file_path, allow_pickle=True)
if p.data.shape == arr.shape:
print(f"\tset {layer_name}_{suffix[i]} with arr:shape {arr.shape}, type {arr.dtype}")
p.data[...] = arr
else:
print(f"p.data.shape: {p.data.shape} is not equal to arr.shape: {arr.shape}")
break
else:
print(f"{param_file_path} is not exits!!")
break
else:
print(f"no learnable parameters in '{layer_name}' of '{network.layers[idx].type}' type'")
Blob type is defined as caffe._caffe.Blob in python caffe (aka pycaffe) interface. Use help(caffe._caffe.Blob) after import caffe and names described in data descriptors defined here section of help output as attribute.
For more detaild info on Blob in Caffe reference
Blobs, Layers, and Nets: anatomy of a Caffe model - caffe documentations
caffe::Blob Class Template Reference - C++ source for Blob class

Spatial reflection padding in Caffe

Any ideas how to implement Spatial Reflection Padding in Caffe like in Torch?
(x): nn.SpatialReflectionPadding(l=1, r=1, t=1, b=1)
(x): nn.SpatialConvolution(64 -> 64, 3x3)
(x): nn.ReLU
One way to do this would be using the Python Layer of Caffe. You can then set the functions yourself and customize based on your needs. However, this layer can only run in the CPU, so it might slow down your model especially if you use it in the middle of the network.
In the following, I have defined a layer to zero pad input using the Python layer, which you can modify to suit your needs:
import caffe
import numpy as np
class SpatialReflectionPadding(caffe.Layer):
def setup(self,bottom,top):
if len(bottom) != 1: # check that a single bottom blob is given
raise Exception("Expected a single blob")
if len(bottom[0].shape) != 4: # check that it is 4D
raise Exception("Expected 4D blob")
params = eval(self.param_str) # get the params given in the prototxt
self.l = params["l"]
self.r = params["r"]
self.t = params["t"]
self.b = params["b"]
def reshape(self,bottom,top):
top[0].reshape(bottom[0].shape[0],bottom[0].shape[1],bottom[0].shape[2]+self.t+self.b,bottom[0].shape[3]+self.r+self.l) # set the shape of the top blob based on the shape of the existing bottom blob
def forward(self,bottom,top):
for i in range(0,top[0].shape[2]):
for j in range(0,top[0].shape[3]):
if (i < self.t or i >= self.t+bottom[0].shape[2]) or (j < self.l or j >= self.l+bottom[0].shape[3]):
top[0].data[:,:,i,j] = 0 # for the padded part, set the value to 0
else:
top[0].data[:,:,i,j] = bottom[0].data[:,:,i-self.t,j-self.l] # for the rest, copy the value from the bottom blob
def backward(self,top,propagate_down,bottom):
bottom[0].diff[...] = np.full(bottom[0].shape,1) * top[0].diff[:,:,self.t:self.t+bottom[0].shape[2],self.l:self.l+bottom[0].shape[3]] # set the gradient for backward pass
Then, in your prototxt file, you can use it as:
layer {
name: "srp" # some name
type: "Python"
bottom: "some_layer" # the layer which provides the input blob
top: "srp"
python_param {
module: "caffe_srp" # whatever is your module name
layer: "SpatialReflectionPadding"
param_str: '{ "l": 1, "b": 1, "t": 1, "r": 1}'
}
}
I am not 100% sure that it works correctly, though when I used it, it appeared to do so. In any case, it should give an idea and a starting point on how one could proceed. Also, you could refer to this question and its answers.

How to set proper arguments to build keras Convolution2D NN model [Text Classification]?

I am trying to use 2D CNN to do text classification on Chinese Article and have trouble on setting arguments of keras Convolution2D. I know the basic flow of Convolution2D to cope with image, but stuck by using my dataset with keras.
Input data
My data is 9800 Chinese Article, max sentence length is 6810,with 200 word2vec size.
So the input shape is `(9800, 1, 6810, 200)`
Code for building model
MAX_FEATURES = 6810
# I just randomly pick one filter, seems this is the problem?
nb_filter = 128
input_shape = (1, 6810, 200)
# each word is 200 (word2vec size)
embedding_size = 200
# 3 word length
n_gram = 3
# so stride here is embedding_size*n_gram
model = Sequential()
model.add(Convolution2D(nb_filter, n_gram, embedding_size, border_mode='valid', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(100, 1), border_mode='valid'))
model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(hidden_dims))
model.add(Dropout(0.5))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# X is (9800, 1, 6810, 200)
model.fit(X, y, batch_size=32,
nb_epoch=5,
validation_split=0.1)
Question 1. I have problem to set Convolution2D arguments. My reseach is below,
The official docs do not contain an exmaple for 2D CNN text classifacation(though has 1D CNN).
Convolution2D defination is here https://keras.io/layers/convolutional/:
keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation=None, weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='default', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None, bias=True)
nb_filter: Number of convolution filters to use.
nb_row: Number of rows in the convolution kernel.
nb_col: Number of columns in the convolution kernel.
border_mode: 'valid', 'same' or 'full'. ('full' requires the Theano backend.)
Some research about the arguments:
This issue https://github.com/fchollet/keras/issues/233 is about 2D CNN for text classification, I read all comments and pick:
(1) https://github.com/fchollet/keras/issues/233#issuecomment-117427013
model.add(Convolution2D(nb_filter=N_FILTERS, stack_size=1, nb_row=FIELD_SIZE,
nb_col=1, subsample=(STRIDE, 1)))
(2) https://github.com/fchollet/keras/issues/233#issuecomment-117700913
sequential.add(Convolution2D(nb_feature_maps, 1, n_gram, embedding_size))
But it seems has some diference to current keras version, also the arguments naming by different people are in a mess (I hope keras has an easy understandable argument expanation).
Another comment I see about current api:
https://github.com/fchollet/keras/issues/1665#issuecomment-181181000
The current API is as below:
keras.layers.convolutional.Convolution2D(nb_filter, nb_row, nb_col, init='glorot_uniform', activation='linear', weights=None, border_mode='valid', subsample=(1, 1), dim_ordering='th', W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None)
So (36,1,7,7) seems the reason, the correct arguments would be (36,7,7,...).
By above research, on my understanding of convolution, Convolution2D create a (nb_filter, nb_row, nb_col) filter , by sliding a stride to get one filter result, recurse sliding, finally combine the result into array with shape (1, one_sample_article_length[6810] / nb_filter), and go to the next layer, is that right? Is my code below set nb_row and nb_col correct ?
Question 2. What is the proper MaxPooling2D arguments? (for my dateset or for commonm, either is OK)
I refer this issue https://github.com/fchollet/keras/issues/233#issuecomment-117427013 to set the argument, there are two kinds:
MaxPooling2D(poolsize=(((nb_features - FIELD_SIZE) / STRIDE) + 1, 1))
MaxPooling2D(poolsize=(maxlen - n_gram + 1, 1))
I have no idea why they calculate MaxPooling2D argument like that.
Question 3. Any recommendation for batch_size and nb_epoch to do such text classification? I have no idea at all.

How to merge two tensors at the beginning of a network in Torch?

Given the following beginning of a network
local net = nn.Sequential()
net:add(SpatialConvolution(3, 64, 4, 4, 2, 2, 1, 1))
with an input tensor input
local input = torch.Tensor(batchSize, 3, 64, 64)
// during training
local output = net:forward(input)
I want to modify the network to accept a second tensor cond as input
local cond = torch.Tensor(batchSize, 1000, 1, 1)
// during training
local output = net:forward({input, cond})
I modified the network by adding a JoinTable before the SpatialConvolution is added, like so:
local net = nn.Sequential()
net:add(nn.JoinTable(2, 4))
net:add(SpatialConvolution(3, 64, 4, 4, 2, 2, 1, 1))
This is not working because both tensors have different sizes in dimensions 2, 3, and 4. Giving the cond tensor as size of (batchSize, 1000, 64, 64) is not an option since its a waste of memory.
Is there any best practise for merging two different tensors at the beginning of a network to be feed into the first layer.
There is no such thing as "merging" tensors which do not have compatible shapes. You should simply pass a table of tensors and start your network with SelectTable operation and work with nngraph, not simple Sequential. In particular - how would you expect Spatial Convolution to work on such odd "tensor" which "narrows down" to your cond? There is no well defined operation in mathematics for such use case, thus you have to be more specific (which you will achieve with nngraph and SelectTable).