How to reshape a blob of the shape N x C x H x W to N x 1 x (C*H) x W in Caffe?
I want to make a convolution layer the weights of which are identical between channels.
One way I come up with is to reshape the bottom blob of the shape N x C x H x W to N x 1 x (C*H) x W and place a convolution layer upon it. But I just don't know how to reshape a blob.
Please help me out, thank you.
As pointed by whjxnyzh, you can use "Reshape" layer. Caffe is quite flexible in the way it allows you to define the output shape.
See the declaration of reshap_param in caffe.proto`:
// Specify the output dimensions. If some of the dimensions are set to 0,
// the corresponding dimension from the bottom layer is used (unchanged).
// Exactly one dimension may be set to -1, in which case its value is
// inferred from the count of the bottom blob and the remaining dimensions.
In your case I guess you'll have a layer like this:
layer {
name: "my_reshape"
type: "Reshape"
bottom: "in"
top: "reshaped_in"
reshape_param { shape: {dim: 0 dim: 1 dim: -1 dim: 0 } }
}
See also on caffe.help.
Caffe now has a reshapeLayer for you.
http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1ReshapeLayer.html
If I understand your final objective right, Caffe's convolution layer already can do multiple input-output convolution with common/shared filters like:
layer {
name: "conv"
type: "Convolution"
bottom: "in1"
bottom: "in2"
bottom: "in3"
top: "out1"
top: "out2"
top: "out3"
convolution_param {
num_output : 10 #the same 10 filters for all 3 inputs
kernel_size: 3
}
}
Assuming you have all streams split (slice layer can do that), and finally you may merge them if desired with a concat or eltwise layer.
That avoid the needs of reshaping blob, convolved, and then reshaping it back, which might introduce cross-channel interference near the margins.
Not sure if this fits your specs exactly, but Caffe does have flattening layers. The blob goes from n * c * h * w to n * (chw) * 1 * 1.
See http://caffe.berkeleyvision.org/tutorial/layers.html
Related
I am not sure how to interpret the reshape parameters. here http://caffe.berkeleyvision.org/tutorial/layers/reshape.html it says that 0 means copy and -1 means infer. Is it the same when -1 is not the last parameter ? Can anyone help me understand it?
layer {
name: "Layer1"
type: "Reshape"
bottom: "Layer1"
top: "Layer2"
reshape_param {
shape {
dim: 0
dim: 2
dim: -1
dim: 0
}
}
Also, if I want to implement the same layer in Keras, do I also use the Keras reshape layer like :
Layer2 = K.reshape(Layer1,(-1,input_dim))
This means that considering you have an input of shape (a, b, c, d, e), your output will have shape:
(a, 2, b * c * e / 2, d)
The values a and d are copied from the previous layer. The value 2 is forced, and the value -1 calculates whatever it needs to keep the same number of elements as the input.
In Keras, since you're not changing the first dimension (the batch size), you only need a regular Reshape layer that will ignore the batch size:
Reshape((2,-1,youMustKnowThis))
In a Sequential model, just add this layer:
sequentialModel.add(Reshape((2,-1,youMustKnowThis))
In a functional API Model, pass the output of the previous layer:
newShaped = Reshape((2,-1,youMustKnowThis))(outputOfPreviousLayer)
Given my code:
G=zeros(height,width); %zeros or whatever
for y = 1 : height
for x = 1 : width
magnitude = sqrt(Gx(y,x)^2 + Gy(y,x)^2);
gradient_direction = atan(Gy(y,x)/Gx(y,x));
G(y,x) = [magnitude gradient_direction];
end
end
I keep getting this (if I don't use zeros()):
Subscripted assignment dimension mismatch.
or this:
Assignment has more non-singleton rhs dimensions than non-singleton
subscripts
While #atru answer works, I would want to suggest a vectorize way that faster and neater, if that will help. The operations here can easily converted to vectorized operations:
G=cat(3,hypot(Gy,Gx),atan(Gy./Gx));
By using G(y,x,:) = [magnitude, gradient_direction]; you are attempting to assign two values to a spot reserved for a single value with indices (y,x). One way to fix this is to use a 3 dimensional array G instead,
G=zeros(height,width,2);
for y = 1 : height
for x = 1 : width
magnitude = sqrt(Gx(y,x)^2 + Gy(y,x)^2);
gradient_direction = atan(Gy(y,x)/Gx(y,x));
G(y,x,:) = [magnitude, gradient_direction];
end
end
Now at each point G(y,x) you can store both of the values and access them as for instance G(1,2,1) for magnitude at (1,2) position and G(1,2,2) for gradient_direction. This assumes Gx and Gy are both arrays with size height x width.
Important thing to note is that slices of G along the third dimension will also be 3D arrays, i.e. mag_dir = G(3,2,:) will have a size [1 1 2] and not [1 2]. This may cause errors in some applications, examples include trying to concatenate mag_dir with another vector (that does not have the extra dimension) and linear algebra operations.
To resolve this, use reshape to explicitly change the dimensions to to target ones. For the vector here it would be reshape(mag_dir, 1, 2). Same holds for 2D slices like more_md = G(1,:,:) - this will need for instance more_md = reshape(more_md,2,5).
I want to implement the super-resolution algorithm defined in https://arxiv.org/abs/1609.05158 using caffe. There are TF implementations but no caffe implementation yet: https://github.com/tetrachrome/subpixel
To summarize the algorithm: I want to superresolve an image by 3. I want to do the upsampling at the end of the network rather than at the beginning. To do that I will have 9 images (Batch x 9 x height x width) at the end of the network.
Then what I wish to do is to pick one pixel from each image at the same coordinates and place them within 3x3 square to complete an image of size 3*height * 3*width. Similar to:
1) Can I use deconvolution layer to upscale an image by 3, filling zeros in between and if so, how?
2) I am thinking of using slice layer to extract 9 images.
3) Is there a way to circularly shift some images to align them as seen in the image and if so, how?
4) Do I really need slice layer before circular shifting and eltwise summing OR can I do it in another way without needing slice layer: Can I circular shift channels separately and can I merge channels of images by summation?
5) Can this be done in a much easier way which I am unable to imagine.
I asked quite a lot questions I hope I am not overflowing the questions.
Thank you in advance.
EDIT:
I want to implement this Tensorflow code in caffe:
def _phase_shift(I, r):
bsize, a, b, c = I.get_shape().as_list()
bsize = tf.shape(I)[0] # Handling Dimension(None) type for undefined batch dim
X = tf.reshape(I, (bsize, a, b, r, r))
X = tf.transpose(X, (0, 1, 2, 4, 3)) # bsize, a, b, 1, 1
X = tf.split(1, a, X) # a, [bsize, b, r, r]
X = tf.concat(2, [tf.squeeze(x, axis=1) for x in X]) # bsize, b, a*r, r
X = tf.split(1, b, X) # b, [bsize, a*r, r]
X = tf.concat(2, [tf.squeeze(x, axis=1) for x in X]) # bsize, a*r, b*r
return tf.reshape(X, (bsize, a*r, b*r, 1))
Sorry if this question is too specific to a particular library, however it seems popular enough that somebody might know the answer to this. The API documentation for AddImage does not say what each of the arguments are:
public PdfXObject addImage(ImageData image,
float a,
float b,
float c,
float d,
float e,
float f)
Creates Image XObject from image and adds it to canvas (as Image XObject).
Parameters:
image - the PdfImageXObject object
a - an element of the transformation matrix
b - an element of the transformation matrix
c - an element of the transformation matrix
d - an element of the transformation matrix
e - an element of the transformation matrix
f - an element of the transformation matrix
Obviously two are x/y coords, and presumably 2 are height and width, but from the "legacy" code I'm working with, it's not apparent which is which, and I can't think what the other two floats could be.
Those six values are elements of a matrix that has three rows and three columns:
You can use this matrix to express a transformation in a two-dimentional system.
Carrying out this multiplication results in this:
Carrying out this multiplication results in this:
x' = a * x + c * y + e
y' = b * x + d * y + f
The third column in the matrix is fixed: you’re working in two dimensions, so you don’t need to calculate a new z coordinate.
When studying analytical geometry in high school, you’ve probably learned how to apply transformations to objects. In PDF, we use a slightly different approach: instead of transforming objects, we transform the coordinate system.
Nevertheless, you can use your high school knowledge of analytical geometry to understand what the different values are about. For instance:
e and f are the values you will need for the translation of the object, so if you want to add the image at the position x = 36; y = 36, then you will need e = 36; f = 36.
a and d are the values you will need for the scaling in case you don't have any rotation. For instance: if you want the image to have a width of 100 user units and a height of 50 user units, you will need a = 100; b = 0; c = 0; d = 50.
So to add an image of 100 by 50 user units of which the lower-left corner coincides with the coordinate (36, 36), you'd need:
cb.addImage(img, 100, 0, 0, 50, 36, 36);
You can use the following formulas to compute the values for a, b, c, d, e, and f. For example, if you want to combine a translation (dX, dY), a scaling (sX, sY), and a rotation ϕ:
a = sX * cos(ϕ);
b = sY * sin(ϕ);
c = sX * -sin(ϕ);
d = sY * cos(ϕ);
e = dX;
f = dY;
These are all things you can rediscover if you dig into your high school books. It's simple Math; the stuff I learned at school at the age of 17 ;-)
Pithy: Help with Matlab script that takes ImageData array and Convolution weights from Caffe and returns convolution. Please.
I am trying to recreate a convolution generated by Caffe in Matlab.
Let's make the following definitions
W**2 = Size of input
F**2 = Size of filter
P = Size of padding
S = Stride
K = Number of filters
The following text describes how to generalize the convolution as a matrix multiplication:
The local regions in the input image are stretched out into columns in an operation commonly called im2col. For example, if the input is [227x227x3] and it is to be convolved with 11x11x3 filters at stride 4, then we would take [11x11x3] blocks of pixels in the input and stretch each block into a column vector of size 11*11*3 = 363. Iterating this process in the input at stride of 4 gives (227-11)/4+1 = 55 locations along both width and height, leading to an output matrix X_col of im2col of size [363 x 3025], where every column is a stretched out receptive field and there are 55*55 = 3025 of them in total. Note that since the receptive fields overlap, every number in the input volume may be duplicated in multiple distinct columns.
From this, one could draw the conclusion that the im2col function call would look something like this:
input = im2col( input, [3*F*F, ((W-F)/S+1)**2)])
However, if I use the following parameter-values
W = 5
F = 3
P = 1
S = 2
K = 2
I get the following dimensions
>> size(input)
ans =
1 3 5 5
>> size(output)
ans =
1 2 3 3
>> size(filter)
ans =
2 3 3 3
And if I use the im2col function call from above, I end up with an empty matrix.
If I change the stride to 1 in the above example, the size of the input, the output and the filter remains the same. If I use Matlab's 'convn' command, the size is not the same as the actual output from Caffe.
>> size(convn(input,filter))
ans =
2 5 7 7
What would be the general way to resize your array for matrix multiplication?
You are using the second argument to im2col wrong, see the documentation.
You should give it the size of the filter window that you are trying to slide over the image, i.e.:
cols = im2col( input, [F, F])