How to understand Caffe reshape parameters and reimplement in Keras? - neural-network

I am not sure how to interpret the reshape parameters. here http://caffe.berkeleyvision.org/tutorial/layers/reshape.html it says that 0 means copy and -1 means infer. Is it the same when -1 is not the last parameter ? Can anyone help me understand it?
layer {
name: "Layer1"
type: "Reshape"
bottom: "Layer1"
top: "Layer2"
reshape_param {
shape {
dim: 0
dim: 2
dim: -1
dim: 0
}
}
Also, if I want to implement the same layer in Keras, do I also use the Keras reshape layer like :
Layer2 = K.reshape(Layer1,(-1,input_dim))

This means that considering you have an input of shape (a, b, c, d, e), your output will have shape:
(a, 2, b * c * e / 2, d)
The values a and d are copied from the previous layer. The value 2 is forced, and the value -1 calculates whatever it needs to keep the same number of elements as the input.
In Keras, since you're not changing the first dimension (the batch size), you only need a regular Reshape layer that will ignore the batch size:
Reshape((2,-1,youMustKnowThis))
In a Sequential model, just add this layer:
sequentialModel.add(Reshape((2,-1,youMustKnowThis))
In a functional API Model, pass the output of the previous layer:
newShaped = Reshape((2,-1,youMustKnowThis))(outputOfPreviousLayer)

Related

Optimizing tensor multiplications

I've got a real-time image processing program I'm trying to optimize, and it all boils down to matrix multiplications. Consider 3 tensors I'm calculating in the initialization stage:
A = np.arange(35 * 51 * 59).reshape([35, 51, 59])
B = np.arange(37 * 51 * 51 * 59).reshape([37, 51, 51, 59])
C = np.arange(59 * 27).reshape([59, 27])
Each frame, I'm getting a new data in the form of a fourth tensor:
M = np.arange(35 * 37 * 59).reshape([35, 37, 59]).
Currently, I'm calculating D = np.einsum('xyf,xtf,ytpf,fr->tpr', M, A, B, C), where D is my desired result, and it's the major bottleneck of the program. There are two directions I'm trying to follow in order to optimize it.
First I tried coming up with a tensor T, a function of A, B, C, D that I can pre-calculate, and then it'll all boil to D = np.tensordot(M, T, axes=..). I wasn't successful. I spent a lot of time on it, is it even possible at all?
Moreover, the program itself is written in MATLAB. As it doesn't have a built-in tensor multiplication function (einsum or tensordot equivilent), I'm currently using the tprod toolbox, and doing:
temp1 = etprod('dcb', A, 'abc', M, 'adc');
temp2 = etprod('dbc', B, 'abcd', temp1, 'adb');
D = etprod('cdb', C, 'ab', temp2, 'acd');
As the default dot product function in MATLAB (for 2D matrices) is much faster then etprod, I though about reshaping A, B, C, D to 2D arrays in a way that I will able to multiple 2D matrices using the default function, without hand-written for loops. I wasn't successful with that either.
Any thoughts? thanks!
If this operation is done many times with different values of M we could define
D0 = np.einsum('xft,fr->tpr',A, B, C)
The whole operation could be broken into binary steps:
D0=np.einsum('xtf,ytpf->xyptf',A,B)
D0=np.einsum('xyptf,fr->xyftpr',D0,C)
D=np.einsum('tprxfy,xfy->tpr',D0,M)
The final operation uses D0 and M and can be coded as a matrix vector operation. In Matlab it would be
D=reshape(D0.[],numel(M))*M(:);
which could then be reordered as desired.
We could write this order as (((A,B),C),M)
It might be better, however, to use ((M,C),A,B)
D=np.einsum('xyf,fr->xyfr',M,C)
D0=np.einsum('xyfr,xtf->ytfr',D,A)
D=np.einsum('ytfr,ytpf->tpr',D,B)
This ordering of operations has intermediate arrays with only 4 indices rather than one with 6. If each operation is much faster than the single one this may be an advantage.

caffe circular shift of zero-padded upscaled images for interpolation

I want to implement the super-resolution algorithm defined in https://arxiv.org/abs/1609.05158 using caffe. There are TF implementations but no caffe implementation yet: https://github.com/tetrachrome/subpixel
To summarize the algorithm: I want to superresolve an image by 3. I want to do the upsampling at the end of the network rather than at the beginning. To do that I will have 9 images (Batch x 9 x height x width) at the end of the network.
Then what I wish to do is to pick one pixel from each image at the same coordinates and place them within 3x3 square to complete an image of size 3*height * 3*width. Similar to:
1) Can I use deconvolution layer to upscale an image by 3, filling zeros in between and if so, how?
2) I am thinking of using slice layer to extract 9 images.
3) Is there a way to circularly shift some images to align them as seen in the image and if so, how?
4) Do I really need slice layer before circular shifting and eltwise summing OR can I do it in another way without needing slice layer: Can I circular shift channels separately and can I merge channels of images by summation?
5) Can this be done in a much easier way which I am unable to imagine.
I asked quite a lot questions I hope I am not overflowing the questions.
Thank you in advance.
EDIT:
I want to implement this Tensorflow code in caffe:
def _phase_shift(I, r):
bsize, a, b, c = I.get_shape().as_list()
bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim
X = tf.reshape(I, (bsize, a, b, r, r))
X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1
X = tf.split(1, a, X) # a, [bsize, b, r, r]
X = tf.concat(2, [tf.squeeze(x, axis=1) for x in X]) # bsize, b, a*r, r
X = tf.split(1, b, X) # b, [bsize, a*r, r]
X = tf.concat(2, [tf.squeeze(x, axis=1) for x in X]) # bsize, a*r, b*r
return tf.reshape(X, (bsize, a*r, b*r, 1))

Using Merge Layer in Keras using Dot Product

I am trying to merge two layers together. My input, or my processed data, appears as such:
[[2069 2297 3087 ..., 0 0 0]
[2069 2297 3087 ..., 0 0 0]
[2069 2297 3087 ..., 0 0 0]
...,
[2711 4215 875 ..., 0 0 0]
[5324 1412 1301 ..., 0 0 0]
[5065 3561 5002 ..., 0 0 0]]
With each row representing a sequence of words and each #, a specific index to a word. I have two of these data and I am trying to merge them together by first embedding them into 16-dimensional word vectors and then using a dot product. To do this, I created two branches to embed the data first. I then try to merge them.
When I try to merge the two using this function in Keras:
model = Sequential()
model.add(Merge( [x1_branch, x2_branch], mode = 'dot'))
I get the following error:
ValueError: Error when checking target: expected merge_1 to have 3 dimensions, but got array with shape (162, 1)
I believe that the matrix multiplication was executed, as written and described in the documentation:
"E.g. if applied to two tensors a and b of shape (batch_size, n), the output will be a tensor of shape (batch_size, 1) where each entry i will be the dot product between a[i] and b[i]."
Obviously, my batch size for this sample is 162. However, the error still makes no sense. How can the merge layer expect an input if it has already, seemingly, done the calculation?
I would greatly appreciate any help. Thanks!

Theano dataset dimensions error when passing into function

Using the the example network of a mlp with 2 hidden layers and two drop outs
so my load_data() function has 400 rows of 20 features and my label dataset is just 400 rows of one variable that will be split into X_train X_test y_train_y_test and some taken out for validation
my lasagne input layer is :
l_in = lasagne.layers.InputLayer(shape=(None, 20), input_var=input_var)
and my train function is train_fn = theano.function([input_var, target_var], loss, updates=updates, allow_input_downcast=True)
at around here my program skips: train_err += train_fn(inputs, targets)
'Wrong number of dimensions: expected 1, got 2 with shape (20, 1).')
the 20, 1 I understand, as I passed in twenty values on one side and 1 value in the labels side, but I thought theano autonmatically flattened each array?
what can I do to fix this?
any help would be appreciated!
The inputs that you pass to train_fn() should be an ndarray with shape (n, 20), where n is the number of examples in your minibatch. The targets should be an ndarray with shape (n) (note that shapes (1, n) and (n, 1) won't work). Try double checking that the arrays you actually pass to the function match these shapes.

How to reshape a blob in Caffe?

How to reshape a blob of the shape N x C x H x W to N x 1 x (C*H) x W in Caffe?
I want to make a convolution layer the weights of which are identical between channels.
One way I come up with is to reshape the bottom blob of the shape N x C x H x W to N x 1 x (C*H) x W and place a convolution layer upon it. But I just don't know how to reshape a blob.
Please help me out, thank you.
As pointed by whjxnyzh, you can use "Reshape" layer. Caffe is quite flexible in the way it allows you to define the output shape.
See the declaration of reshap_param in caffe.proto`:
// Specify the output dimensions. If some of the dimensions are set to 0,
// the corresponding dimension from the bottom layer is used (unchanged).
// Exactly one dimension may be set to -1, in which case its value is
// inferred from the count of the bottom blob and the remaining dimensions.
In your case I guess you'll have a layer like this:
layer {
name: "my_reshape"
type: "Reshape"
bottom: "in"
top: "reshaped_in"
reshape_param { shape: {dim: 0 dim: 1 dim: -1 dim: 0 } }
}
See also on caffe.help.
Caffe now has a reshapeLayer for you.
http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ReshapeLayer.html
If I understand your final objective right, Caffe's convolution layer already can do multiple input-output convolution with common/shared filters like:
layer {
name: "conv"
type: "Convolution"
bottom: "in1"
bottom: "in2"
bottom: "in3"
top: "out1"
top: "out2"
top: "out3"
convolution_param {
num_output : 10 #the same 10 filters for all 3 inputs
kernel_size: 3
}
}
Assuming you have all streams split (slice layer can do that), and finally you may merge them if desired with a concat or eltwise layer.
That avoid the needs of reshaping blob, convolved, and then reshaping it back, which might introduce cross-channel interference near the margins.
Not sure if this fits your specs exactly, but Caffe does have flattening layers. The blob goes from n * c * h * w to n * (chw) * 1 * 1.
See http://caffe.berkeleyvision.org/tutorial/layers.html