How to calculate third element of caffe convnet? - neural-network

Following this question and this tutorial I've create a simple net just like the tutorial but with 100X100 images and first convolution kernel of 11X11 and pad=0.
I understand that the formula is : (W−F+2P)/S+1 and in my case dimension became [51X51X3] (3 is channel of rgb) but the number 96 popup in my net diagram and as this tutorial said it is third dimension of the output, in other hand , my net after first conv became [51X51X96]. I couldn't figure out , how the number 96 calculated and why.
Isn't the network convolution layer suppose to pass throw three color channel and the output should be three feature map? How come its dimension grow like this? Isn't it true that we have one kernel for each channel ? How this one kernel create 96(or in the first tutorial, 256 or 384) feature map ?

You are mixing input channels and output channels.
Your input image has three channels: R, G and B. Each filter in your conv layer acts on these three channels and its spatial kernel size (e.g., 3-by-3). Each filter outputs a single number per spatial location. So, if you have one filter in your layer then your output would have only one output channel(!)
Normally, you would like to compute more than a single filter at each layer, this is what num_output parameter is used for in convolution_param: It allows you to define how many filters will be trained in a specific convolutional layer.
Thus a Conv layer
layer {
type: "Convolution"
name: "my_conv"
bottom: "x" # shape 3-by-100-by-100
top: "y"
convolution_param {
num_output: 32 # number of filters = number of output channels
kernel_size: 3
}
}
Will output "y" with shape 32-by-98-by-98.

Related

Number of parameters calculation in Convolutional NN

I'm new in the CNN study and I started by watching Andrew'NG lessons.
There is an example that I did not understand :
How did he compute the #parameters value ?
As you can see in Answer 1 of this StackOverflow question, the formula for the calculation of the number of parameters of a convolutional network is: channels_in * kernel_width * kernel_height * channels_out + channels_out.
But this formula doesn't agree with your data. And in fact the drawing you are showing does not agree with the table you are giving.
If I base myself on the drawing, then the first CN has 3 entry channels, a 5*5 sliding window and 6 output channels, so the number of parameters should be 456.
You give the number 208, and this is the number obtained for 1 entry channel and 8 output channels (the table says 8, while the drawing says 6). So it seems that 208 is correctly obtained from the table data, if we consider that there is one input channel and not three.
As for the second CN, with 6 entry channels, a sliding window 5*5 and 16 output channels, you need 2,416 parameters, which looks suspiciously close to 416, the number given in the table.
As for the remaining networks it is always the number of input dimension times the number of output dimensions, plus one: 5*5*16*120+1=48,001, 120*84+1=10,081, 84*10+1=841.

Caffe Element-Wise multiplication with fixed blobs

I think I will be asking multiple quesitons here, I'd love any comment because I'm new to Caffe.
In my network input images have size 1x41x41 Since I am using 64 batch size I think the data size will be 64x1x41x41 (Please correct me if this is wrong)
After some convolutional layers (that don't change the data size), I would like to multiply the resulting data with predefined blobs of size 1x41x41. It seems convenient to use EltwiseLayer to do the multiplication. So in order to define second bottom layer of the Eltwise I need to have another input data for the blobs. (Please advise if this can be done in other way)
The first question: Batch training confuses me. If I want to multiply a batch of images with a single blob in an EltwiseLayer should the bottom sizes be the same? In other words should I use repmat (matlab) to clone 64 blobs to have a size of 64x1x41x41 or can I just plug single blob of size 1x1x41x41?
Second question: I want to multiply the data with 100 different blobs and then take the mean of 100 results. Do I need to define 100 EltwiseLayers to do the job? Or can I collect blobs in a single data of size 1x100x41x41 (or 64x100x41x41) and clone the data to be multipled 100 times? And if so how can I do it? An example would be very useful. (I've seen a TileLayer somewhere but the info is spread across the galaxy.)
Thanks in advance.
In order to do element-wise multiplication in caffe both blobs must have exactly the same shape. Caffe does not "broadcast" along singleton dimensions.
So, if you want to multiply a batch of 64 blobs of shape 1x41x41 each, you'll have to provide two 64x1x41x41 bottom blobs.
As you already noted, you can use "Tile" layer to do the repmating:
layer {
name: "repmat"
type: "Tile"
bottom: "const_1x1x41x41_blob"
top: "const_64x1x41x41_blob"
tile_param {
axis = 0 # you want to "repmat" along the first axis
tiles = 64 # you want 64 repetitions
}
}
Now you can do "Eltwise" multiplication
layer {
name: "mul"
type: "Eltwise"
bottom: "const_64x1x41x41_blob"
bottom: "other_blob"
top: "mul"
eltwise_param {
operation: MUL
}
}

Caffe classification labels in HDF5

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.
For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:
Iteration 19200, loss = -118232
Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)
Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.
UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows
f['label'] = numpy.array([1, 0, 0, 0, 0, 0])
Trying to run my network now returns
Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)
After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240)
Number of labels must match number of predictions; e.g., if softmax axis == 1
and prediction shape is (N, C, H, W), label count (number of labels)
must be N*H*W, with integer values in {0, 1, ..., C-1}.
My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?
Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.
I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.
Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".
Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.
Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.

Single Label Regression(Finetuning) With Input Artificial Features In Caffe

I have,say, n images and for each of them I have additional 2 artificial(made-up) features, and image-labels are single dimensional integers.
I want to fine-tune Image-net on my dataset, but I do not know how to handle these 2 additional features as input, how should I feed the data to caffe? Please help!
EDIT: The 2 features can be any 2 numbers (1 dimensional) say two numbers representing what class an image falls into, and how many images fall into that class.
Say, I have 'cat.jpg', then the features are say, 5 and 2000, where 5 is the feature 1 representing the class and 2000 is the total images in that class.
In short, the 2 features can be any two integers.
I think the most straight forward way for you is to use "HDF5Data" input layer, where you can store both the input images, the additional two "features" and the expected output value (for regression).
You can see an example here for creating HDF5 data in python. A Matlab example can be found here.
Your HDF5 should have 4 "datasets": one is the input images (or the image descriptors of dim 4096). n dimensional array of images/descriptors.
Another dataset is "feat_1" an n by 1 array, and "feat_2" and n by 1 array.
Finally you should have another input "target" an n by 1 array of the expected output you wish to learn.
Once you have an HDF5 file ready with these datasets in it, you should have
layer {
type: "HDF5Data"
top: "data" # name of dataset with images/imagenet features
top: "feat_1"
top: "feat_2"
top: "target"
hdf5_data_param {
source: "/path/to/list/file.txt"
}
}
As you can see a single "HDF5Data" layer can produce several "top"s.

How to calculate the Number of parameters for GoogLe Net?

I have a pretty good understanding of AlexNet and VGG. I could verify the number of parameters used in each layer with what is being submitted in their respective papers.
However when i try to do the same on the GoogleNet paper "Going Deeper With COnvolution", even after many iterations I am NOT able to verify the numbers they have in the 'Table 1' of their paper.
For example, the first layer is the good old plain convolution layer with kernel size (7x7), input number of maps 3 , output number of maps is 64. So based on this fact the number of parameters needed would be (3 * 49 * 64) + 64 (bias) which is around 9.5k but they say they use 2.7k. I did the math for other layers as well and i am always off by few percent than what they report. Any idea?
Thanks
I think the first line (2.7k) is wrong, but the rest of the lines of the table are correct.
Here is my computation:
http://i.stack.imgur.com/4bDo9.jpg
Be care to check which input is connect to which layer,
e.g. for the layer "inception_3a/5x5_reduce":
input = "pool2/3x3_s2" with 192 channels
dims_kernel = C*S*S =192x1x1
num_kernel = 16
Hence parameter size for that layer = 16*192*1*1 = 3072
Looks like they divided the numbers by 1024^n to convert to the K/M labels on the number of parameters in the paper Table 1. That feels wrong. We're not talking about actual storage numbers here (as in "bytes"), but straight up number of parameters. They should have just divided by 1000^n instead.
May be 7*7 conv layer is actually the combination of 7*1 conv layer and 1*7 conv layer, then the num of params could be : ((7+7)*64*3 + 64*2) / 2014 = 2.75k, which approaches 2.7k (or you can omit 128 biases).
As we know, Google introduced asymmetric convolution while doing spatial factorization in paper "Spatial Factorization into Asymmetric Convolutions"
(1x7+7x1)x3x64=2688≈2.7k, this is my opinion, I am a fresh student
Number of parameters in a CONV layer would be : ((m * n * d)+1)* k), added 1 because of the bias term for each filter. The same expression can be written as follows: ((shape of width of the filter * shape of height of the filter * number of filters in the previous layer+1)*number of filters)