Single Label Regression(Finetuning) With Input Artificial Features In Caffe - matlab

I have,say, n images and for each of them I have additional 2 artificial(made-up) features, and image-labels are single dimensional integers.
I want to fine-tune Image-net on my dataset, but I do not know how to handle these 2 additional features as input, how should I feed the data to caffe? Please help!
EDIT: The 2 features can be any 2 numbers (1 dimensional) say two numbers representing what class an image falls into, and how many images fall into that class.
Say, I have 'cat.jpg', then the features are say, 5 and 2000, where 5 is the feature 1 representing the class and 2000 is the total images in that class.
In short, the 2 features can be any two integers.

I think the most straight forward way for you is to use "HDF5Data" input layer, where you can store both the input images, the additional two "features" and the expected output value (for regression).
You can see an example here for creating HDF5 data in python. A Matlab example can be found here.
Your HDF5 should have 4 "datasets": one is the input images (or the image descriptors of dim 4096). n dimensional array of images/descriptors.
Another dataset is "feat_1" an n by 1 array, and "feat_2" and n by 1 array.
Finally you should have another input "target" an n by 1 array of the expected output you wish to learn.
Once you have an HDF5 file ready with these datasets in it, you should have
layer {
type: "HDF5Data"
top: "data" # name of dataset with images/imagenet features
top: "feat_1"
top: "feat_2"
top: "target"
hdf5_data_param {
source: "/path/to/list/file.txt"
}
}
As you can see a single "HDF5Data" layer can produce several "top"s.

Related

Add missing/extra values in data array in Matlab

I have recorded WiFi CSI sensor data 5000 packets in 5 seconds(5000 packets x 57 subcarriers). But due to dynamic hardware configuration sometimes I only receive 4998 x 57. I want to add and estimate 2 rows so that my original design has consistent 5000 rows x 57 columns.
As you can see some data are 5000x57, and some are 4998x57.
You can achieve your desired output using mean()-function combined with the concatenation operator [] and the repmat() like this:
A=randi(100,4998,57);
A=[A;repmat(mean(A),2,1)];
Most of the functions in Matlab that take arrays as an input will calculate for each column except if the input array hast just 1 row. So does the mean function and you can just append means output to your arrays.
If you show me the code that you used to import the data, I might be able to help you create a cleaner data structure and thus be able to automatically process all of your arrays. The way the data is currently designed it's only possible to do this with dynamic variable names which is considered bad programming practice.

Caffe Element-Wise multiplication with fixed blobs

I think I will be asking multiple quesitons here, I'd love any comment because I'm new to Caffe.
In my network input images have size 1x41x41 Since I am using 64 batch size I think the data size will be 64x1x41x41 (Please correct me if this is wrong)
After some convolutional layers (that don't change the data size), I would like to multiply the resulting data with predefined blobs of size 1x41x41. It seems convenient to use EltwiseLayer to do the multiplication. So in order to define second bottom layer of the Eltwise I need to have another input data for the blobs. (Please advise if this can be done in other way)
The first question: Batch training confuses me. If I want to multiply a batch of images with a single blob in an EltwiseLayer should the bottom sizes be the same? In other words should I use repmat (matlab) to clone 64 blobs to have a size of 64x1x41x41 or can I just plug single blob of size 1x1x41x41?
Second question: I want to multiply the data with 100 different blobs and then take the mean of 100 results. Do I need to define 100 EltwiseLayers to do the job? Or can I collect blobs in a single data of size 1x100x41x41 (or 64x100x41x41) and clone the data to be multipled 100 times? And if so how can I do it? An example would be very useful. (I've seen a TileLayer somewhere but the info is spread across the galaxy.)
Thanks in advance.
In order to do element-wise multiplication in caffe both blobs must have exactly the same shape. Caffe does not "broadcast" along singleton dimensions.
So, if you want to multiply a batch of 64 blobs of shape 1x41x41 each, you'll have to provide two 64x1x41x41 bottom blobs.
As you already noted, you can use "Tile" layer to do the repmating:
layer {
name: "repmat"
type: "Tile"
bottom: "const_1x1x41x41_blob"
top: "const_64x1x41x41_blob"
tile_param {
axis = 0 # you want to "repmat" along the first axis
tiles = 64 # you want 64 repetitions
}
}
Now you can do "Eltwise" multiplication
layer {
name: "mul"
type: "Eltwise"
bottom: "const_64x1x41x41_blob"
bottom: "other_blob"
top: "mul"
eltwise_param {
operation: MUL
}
}

How to calculate third element of caffe convnet?

Following this question and this tutorial I've create a simple net just like the tutorial but with 100X100 images and first convolution kernel of 11X11 and pad=0.
I understand that the formula is : (W−F+2P)/S+1 and in my case dimension became [51X51X3] (3 is channel of rgb) but the number 96 popup in my net diagram and as this tutorial said it is third dimension of the output, in other hand , my net after first conv became [51X51X96]. I couldn't figure out , how the number 96 calculated and why.
Isn't the network convolution layer suppose to pass throw three color channel and the output should be three feature map? How come its dimension grow like this? Isn't it true that we have one kernel for each channel ? How this one kernel create 96(or in the first tutorial, 256 or 384) feature map ?
You are mixing input channels and output channels.
Your input image has three channels: R, G and B. Each filter in your conv layer acts on these three channels and its spatial kernel size (e.g., 3-by-3). Each filter outputs a single number per spatial location. So, if you have one filter in your layer then your output would have only one output channel(!)
Normally, you would like to compute more than a single filter at each layer, this is what num_output parameter is used for in convolution_param: It allows you to define how many filters will be trained in a specific convolutional layer.
Thus a Conv layer
layer {
type: "Convolution"
name: "my_conv"
bottom: "x" # shape 3-by-100-by-100
top: "y"
convolution_param {
num_output: 32 # number of filters = number of output channels
kernel_size: 3
}
}
Will output "y" with shape 32-by-98-by-98.

Multi-Output Multi-Class Keras Model

For each input I have, I have a 49x2 matrix associated. Here's what 1 input-output couple looks like
input :
[Car1, Car2, Car3 ..., Car118]
output :
[[Label1 Label2]
[Label1 Label2]
...
[Label1 Label2]]
Where both Label1 and Label2 are LabelEncode and they have respectively 1200 and 1300 different classes.
Just to make sure this is what we call a multi-output multi-class problem?
I tried to flatten the output but I feared the model wouldn't understand that all similar Label share the same classes.
Is there a Keras layer that handle output this peculiar array shape?
Generally, multi-class problems correspond with models outputting a probability distribution over the set of classes (that is typically scored against the one-hot encoding of the actual class through cross-entropy). Now, independently of whether you are structuring it as one single output, two outputs, 49 outputs or 49 x 2 = 98 outputs, that would mean having 1,200 x 49 + 1,300 x 49 = 122,500 output units - which is not something a computer cannot handle, but maybe not the most convenient thing to have. You could try having each class output to be a single (e.g. linear) unit and round it's value to choose the label, but, unless the labels have some numerical meaning (e.g. order, sizes, etc.), that is not likely to work.
If the order of the elements in the input has some meaning (that is, shuffling it would affect the output), I think I'd approach the problem through an RNN, like an LSTM or a bidirectional LSTM model, with two outputs. Use return_sequences=True and TimeDistributed Dense softmax layers for the outputs, and for each 118-long input you'd have 118 pairs of outputs; then you can just use temporal sample weighting to drop, for example, the first 69 (or maybe do something like dropping the 35 first and the 34 last if you're using a bidirectional model) and compute the loss with the remaining 49 pairs of labellings. Or, if that makes sense for your data (maybe it doesn't), you could go with something more advanced like CTC (although Keras does not have it, I'm trying to integrate TensorFlow implementation into it without much sucess), which is also implemented in Keras (thanks #indraforyou)!.
If the order in the input has no meaning but the order of the outputs does, then you could have an RNN where your input is the original 118-long vector plus a pair of labels (each one-hot encoded), and the output is again a pair of labels (again two softmax layers). The idea would be that you get one "row" of the 49x2 output on each frame, and then you feed it back to the network along with the initial input to get the next one; at training time, you would have the input repeated 49 times along with the "previous" label (an empty label for the first one).
If there are no sequential relationships to exploit (i.e. the order of the input and the output do not have a special meaning), then the problem would only be truly represented by the initial 122,500 output units (plus all the hidden units you may need to get those right). You could also try some kind of middle ground between a regular network and a RNN where you have the two softmax outputs and, along with the 118-long vector, you include the "id" of the output that you want (e.g. as a 49-long one-hot encoded vector); if the "meaning" of each label at each of the 49 outputs is similar, or comparable, it may work.

Genetic-algorithm encoding

I am trying to create an algorithm which I believe is similar to a knapsack-problem. The problem is to find recipes/Bill-of-Materials for certain intermediate products. There are different alternatives of recipes for the intermediate products. For example product X can either consist of 25 % raw material A + 75 % raw material B, or 50 % of raw material A + 50 % raw material B, etc. There are between 1 to 100 different alternatives for each recipe.
My question is, how best to encode the different recipe alternatives (and/or where to find similar problems on the internet). I think I have to use value encoding, ie assign a value to each alternative of a recipe. Do I have reasonable, different options?
Thanks & kind regards
You can encode the problem with a number chromosome. If your product has N ingredients, then your number chromosome has the length N: X={x1,x2,..,xN}. Every number xi of the chromosome represents the parts of ingredient i. It is not required, that the numbers sum to one.
E.g. X={23,5,0} means, you need 23 parts of ingredient 1, 5 parts of ingredient 2 and zero parts of ingredient 3.
With this encoding, crossover will not invalidate the chromosome.
You can use a 100 dimentions variable to present a individual just like below
X={x1,x2,x3,...,x100} xi∈[0,1] ∑(xi)=1.0
It's hard to use crossover operation.So I suggest that the offspring can just be produced by mutation operation.
Mutation operation toward parent individual 'X':
(1)randly choose two dimention 'xi' and 'xj' from 'X';
(2)p=rand(0,1);
(3)xj=xj+(1-p)*xi;
(4)xi=xi*p;