Caffe Element-Wise multiplication with fixed blobs - neural-network

I think I will be asking multiple quesitons here, I'd love any comment because I'm new to Caffe.
In my network input images have size 1x41x41 Since I am using 64 batch size I think the data size will be 64x1x41x41 (Please correct me if this is wrong)
After some convolutional layers (that don't change the data size), I would like to multiply the resulting data with predefined blobs of size 1x41x41. It seems convenient to use EltwiseLayer to do the multiplication. So in order to define second bottom layer of the Eltwise I need to have another input data for the blobs. (Please advise if this can be done in other way)
The first question: Batch training confuses me. If I want to multiply a batch of images with a single blob in an EltwiseLayer should the bottom sizes be the same? In other words should I use repmat (matlab) to clone 64 blobs to have a size of 64x1x41x41 or can I just plug single blob of size 1x1x41x41?
Second question: I want to multiply the data with 100 different blobs and then take the mean of 100 results. Do I need to define 100 EltwiseLayers to do the job? Or can I collect blobs in a single data of size 1x100x41x41 (or 64x100x41x41) and clone the data to be multipled 100 times? And if so how can I do it? An example would be very useful. (I've seen a TileLayer somewhere but the info is spread across the galaxy.)
Thanks in advance.

In order to do element-wise multiplication in caffe both blobs must have exactly the same shape. Caffe does not "broadcast" along singleton dimensions.
So, if you want to multiply a batch of 64 blobs of shape 1x41x41 each, you'll have to provide two 64x1x41x41 bottom blobs.
As you already noted, you can use "Tile" layer to do the repmating:
layer {
name: "repmat"
type: "Tile"
bottom: "const_1x1x41x41_blob"
top: "const_64x1x41x41_blob"
tile_param {
axis = 0 # you want to "repmat" along the first axis
tiles = 64 # you want 64 repetitions
}
}
Now you can do "Eltwise" multiplication
layer {
name: "mul"
type: "Eltwise"
bottom: "const_64x1x41x41_blob"
bottom: "other_blob"
top: "mul"
eltwise_param {
operation: MUL
}
}

Related

How to get a 5-Dimensional output after torch.nn.Conv2d layer in PyTorch?

I am working on a project based on the OpenPose research paper that I read two weeks ago. In that, the model is supposed to give a 5-dimensional output. For example, torch.nn.conv2d() gives a 4-D output of the following shape: (Batch_size, n_channels, input_width, input_height). What I need is an output of the following shape: (Batch_size, n_channels, input_width, input_height, 2). Here 2 is a fixed number not subject to any changes.
The 2 is there because each entry is a 2-dimensional vector hence for each channel in every pixel position, there are 2 values hence, the added dimension.
What will be the best way to do this?
I thought about having 2 seperate branches for each of the vector values but the network is very deep and I would like to be as Computationally efficient as possible.
So you are effectively looking to compute feature maps which are interpreted as 2-dimensional vectors. Unless there is something fancy math-wise happening there, you are probably fine with just having twice as many output channels: (batch_size, n_channels * 2, width, height), and then reshaping it as
output5d = output4d.reshape(
output4d.shape[0],
output4d.shape[1] / 2,
2,
output4d.shape[2],
output4d.shape[3]
)
which gives you a shape of (batch_size, n_channels, 2, width, height). If you really want to have 2 as the last dimension, you can use transpose:
output5d = output5d.transpose(2, 4)
but if there is no strong argument in favor of this layout, I would suggest you do not transpose as it always costs a bit of performance.

How to calculate third element of caffe convnet?

Following this question and this tutorial I've create a simple net just like the tutorial but with 100X100 images and first convolution kernel of 11X11 and pad=0.
I understand that the formula is : (W−F+2P)/S+1 and in my case dimension became [51X51X3] (3 is channel of rgb) but the number 96 popup in my net diagram and as this tutorial said it is third dimension of the output, in other hand , my net after first conv became [51X51X96]. I couldn't figure out , how the number 96 calculated and why.
Isn't the network convolution layer suppose to pass throw three color channel and the output should be three feature map? How come its dimension grow like this? Isn't it true that we have one kernel for each channel ? How this one kernel create 96(or in the first tutorial, 256 or 384) feature map ?
You are mixing input channels and output channels.
Your input image has three channels: R, G and B. Each filter in your conv layer acts on these three channels and its spatial kernel size (e.g., 3-by-3). Each filter outputs a single number per spatial location. So, if you have one filter in your layer then your output would have only one output channel(!)
Normally, you would like to compute more than a single filter at each layer, this is what num_output parameter is used for in convolution_param: It allows you to define how many filters will be trained in a specific convolutional layer.
Thus a Conv layer
layer {
type: "Convolution"
name: "my_conv"
bottom: "x" # shape 3-by-100-by-100
top: "y"
convolution_param {
num_output: 32 # number of filters = number of output channels
kernel_size: 3
}
}
Will output "y" with shape 32-by-98-by-98.

Caffe classification labels in HDF5

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.
For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:
Iteration 19200, loss = -118232
Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)
Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.
UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows
f['label'] = numpy.array([1, 0, 0, 0, 0, 0])
Trying to run my network now returns
Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)
After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240)
Number of labels must match number of predictions; e.g., if softmax axis == 1
and prediction shape is (N, C, H, W), label count (number of labels)
must be N*H*W, with integer values in {0, 1, ..., C-1}.
My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?
Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.
I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.
Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".
Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.
Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.

Single Label Regression(Finetuning) With Input Artificial Features In Caffe

I have,say, n images and for each of them I have additional 2 artificial(made-up) features, and image-labels are single dimensional integers.
I want to fine-tune Image-net on my dataset, but I do not know how to handle these 2 additional features as input, how should I feed the data to caffe? Please help!
EDIT: The 2 features can be any 2 numbers (1 dimensional) say two numbers representing what class an image falls into, and how many images fall into that class.
Say, I have 'cat.jpg', then the features are say, 5 and 2000, where 5 is the feature 1 representing the class and 2000 is the total images in that class.
In short, the 2 features can be any two integers.
I think the most straight forward way for you is to use "HDF5Data" input layer, where you can store both the input images, the additional two "features" and the expected output value (for regression).
You can see an example here for creating HDF5 data in python. A Matlab example can be found here.
Your HDF5 should have 4 "datasets": one is the input images (or the image descriptors of dim 4096). n dimensional array of images/descriptors.
Another dataset is "feat_1" an n by 1 array, and "feat_2" and n by 1 array.
Finally you should have another input "target" an n by 1 array of the expected output you wish to learn.
Once you have an HDF5 file ready with these datasets in it, you should have
layer {
type: "HDF5Data"
top: "data" # name of dataset with images/imagenet features
top: "feat_1"
top: "feat_2"
top: "target"
hdf5_data_param {
source: "/path/to/list/file.txt"
}
}
As you can see a single "HDF5Data" layer can produce several "top"s.

Matlab - Dilation function alternative

I'm looking through various online sources trying to learn some new stuff with matlab.
I can across a dilation function, shown below:
function rtn = dilation(in)
h =size(in,1);
l =size(in,2);
rtn = zeros(h,l,3);
rtn(:,:,1)=[in(2:h,:); in(h,:)];
rtn(:,:,2)=in;
rtn(:,:,3)=[in(1,:); in(1:h-1,:)];
rtn_two = max(rtn,[],3);
rtn(:,:,1)=[rtn_two(:,2:l), rtn_two(:,l)];
rtn(:,:,2)=rtn_two;
rtn(:,:,3)=[rtn_two(:,1), rtn_two(:,1:l-1)];
rtn = max(rtn,[],3);
The parameter it takes is: max(img,[],3) %where img is an image
I was wondering if anyone could shed some light on what this function appears to do and if there's a better (or less confusing way) to do it? Apart from a small wiki entry, I can't seem to find any documentation, hence asking for your help.
Could this be achieved with the imdilate function maybe?
What this is doing is creating two copies of the image shifted by one pixel up/down (with the last/first row duplicated to preserve size), then taking the max value of the 3 images at each point to create a vertically dilated image. Since the shifted copies and the original are layered in a 3-d matrix, max(img,[],3) 'flattens' the 3 layers along the 3rd dimension. It then repeats this column-wise for the horizontal part of the dilation.
For a trivial image:
00100
20000
00030
Step 1:
(:,:,1) (:,:,2) (:,:,3) max
20000 00100 00100 20100
00030 20000 00100 20130
00030 00030 20000 20030
Step 2:
(:,:,1) (:,:,2) (:,:,3) max
01000 20100 22010 22110
01300 20130 22013 22333
00300 20030 22003 22333
You're absolutely correct this would be simpler with the Image Processing Toolbox:
rtn = imdilate(in, ones(3));
With the original code, dilating by more than one pixel would require multiple iterations, and because it operates one dimension at a time it's limited to square (or possibly rectangular, with a bit of modification) structuring elements.
Your function replaces each element with the maximum value among the corresponding 3*3 kernel. By creating a 3D matrix, the function align each element with two of its shift, thus equivalently achieves the 3*3 kernel. Such alignment was done twice to find the maximum value along each column and row respectively.
You can generate a simple matrix to compare the result with imdilate:
a=magic(8)
rtn = dilation(a)
b=imdilate(a,ones(3))
Besides imdilate, you can also use
c=ordfilt2(a,9,ones(3))
to get the same result ( implements a 3-by-3 maximum filter. )
EDIT
You may have a try on 3D image with imdilate as well:
a(:,:,1)=magic(8);
a(:,:,2)=magic(8);
a(:,:,3)=magic(8);
mask = true(3,3,3);
mask(2,2,2) = false;
d = imdilate(a,mask);