I am doing regression using caffe, and my test.txt and train.txt files are like this:
/home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0
/home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6
/home/foo/caffe/data/finetune/flickr/3208038920.jpg 3.2
/home/foo/caffe/data/finetune/flickr/6170430622.jpg 4.0
/home/foo/caffe/data/finetune/flickr/7508671542.jpg 2.7272
My problem is it seems caffe does not allow float labels like 2.0, when I use float labels while reading, for example the 'test.txt' file caffe only
recognizes
a total of 1 images
which is wrong.
But when I for example change the 2.0 to 2 in the file and the following lines same, caffe now gives
a total of 2 images
implying that the float labels are responsible for the problem.
Can anyone help me here, to solve this problem, I definitely need to use float labels for regression, so does anyone know about a work around or solution for this? Thanks in advance.
EDIT
For anyone facing a similar issue use caffe to train Lenet with CSV data might be of help. Thanks to #Shai.
When using the image dataset input layer (with either lmdb or leveldb backend) caffe only supports one integer label per input image.
If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.
In python you can use h5py package to create hdf5 files.
import h5py, os
import caffe
import numpy as np
SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
sp = l.split(' ')
img = caffe.io.load_image( sp[0] )
img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
# you may apply other input transformations here...
# Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
# for example
# transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
X[i] = transposed_img
y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
L.write( 'train.h5' ) # list all h5 files you are going to use
Once you have all h5 files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt:
layer {
type: "HDF5Data"
top: "X" # same name as given in create_dataset!
top: "y"
hdf5_data_param {
source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
batch_size: 32
}
include { phase:TRAIN }
}
Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset tool.
At closer inspection, it seems like caffe stores data of type Datum in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.
Shai's answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here's a snippet on how to create an LMDB from float data (adapted from this github comment):
import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):
db = lmdb.open(path_dst, map_size=int(1e12))
with db.begin(write=True) as in_txn:
for idx, x in enumerate(scalars):
content_field = np.array([x])
# get shape (1,1,1)
content_field = np.expand_dims(content_field, axis=0)
content_field = np.expand_dims(content_field, axis=0)
content_field = content_field.astype(float)
dat = caffe.io.array_to_datum(content_field)
in_txn.put('{:0>10d}'.format(idx) dat.SerializeToString())
db.close()
I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.
First read the image as unsigned ints:
img = np.array(Image.open('images/' + image_name))
Then change the channel order from RGB to BGR:
img = img[:, :, ::-1]
Finally, switch from Height x Width x Channels to Channels x Height x Width:
img = img.transpose((2, 0, 1))
Merely changing the shape will scramble your image and ruin your data!
To read back the image:
with h5py.File(h5_filename, 'r') as hf:
images_test = hf.get('images')
targets_test = hf.get('targets')
for i, img in enumerate(images_test):
print(targets_test[i])
from skimage.viewer import ImageViewer
viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
viewer.show()
Here's a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0
Besides #Shai's answer above, I wrote a MultiTaskData layer supporting float typed labels.
Its main idea is to store the labels in float_data field of Datum, and the MultiTaskDataLayer will parse them as labels for any number of tasks according to the value of task_num and label_dimension set in net.prototxt. The related files include: caffe.proto, multitask_data_layer.hpp/cpp, io.hpp/cpp.
You can easily add this layer to your own caffe and use it like this (this is an example for face expression label distribution learning task in which the "exp_label" can be float typed vectors such as [0.1, 0.1, 0.5, 0.2, 0.1] representing face expressions(5 class)'s probability distribution.):
name: "xxxNet"
layer {
name: "xxx"
type: "MultiTaskData"
top: "data"
top: "exp_label"
data_param {
source: "expression_ld_train_leveldb"
batch_size: 60
task_num: 1
label_dimension: 8
}
transform_param {
scale: 0.00390625
crop_size: 60
mirror: true
}
include:{ phase: TRAIN }
}
layer {
name: "exp_prob"
type: "InnerProduct"
bottom: "data"
top: "exp_prob"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 8
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "exp_loss"
type: "EuclideanLoss"
bottom: "exp_prob"
bottom: "exp_label"
top: "exp_loss"
include:{ phase: TRAIN }
}
Related
Instead of having a learnable filter, I am interested in a convolution with a fix predefined matrix; for example sobel filter:
so, I set learning = 0 (so its fixed), and my kernel size = 3 as:
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param { lr_mult: 0 decay_mult: 0 }
convolution_param {
num_output: 10
kernel_size: 3 # filter is 3x3
stride: 2
weight_filler {
type: ??}
}
}
Now, I do not know how to give matrix information to the conv layer. Any ideas? I think it should go to weight_filler, but how?
One more question: num_output has to be same as bottom's (data channel = 10 here) channel size? can I set num_output another number? if yes, what will happen and what that means?
How to init weights to specific values?
You can use net_surgery to load your untrained/un-initialized net in python and then assign the specific weights you want to the filters, save the net, and use it with the weights you want for this specific layer.
How do set num_output and other conv_params?
This is a good question: You have an input blob of shape bx10xhxw and you want to apply a 3x3 filter to each channel and get back a new filtered bx10xhxw. If you just set num_output: 10, the shape of the filters would be 10x10x3x3, that is, 10 filters of shape 10x3x3 - which is not want you expect. You want a 3x3 filter.
To that end you need to look at group conv_param. Setting group: 10 together with num_output: 10 (assuming input c=10) will give you what you want, the weight shape will be 10x1x3x3.
In python caffe interface, caffe.Net object instatiated with loading .prototxt file, which defined the network architecture. You can use caffe.Net object with following properties for accessing various information on the network.
blob_loss_weights: An OrderedDict (bottom to top, i.e., input to output) of network blob loss weights indexed by layer name
blobs: An OrderedDict (bottom to top, i.e., input to output) of network blobs indexed by layer name
bottom_names: all bottom names in the network
inputs: inputs to this network
layer_dict: An OrderedDict (bottom to top, i.e., input to output) of network layers indexed by layer name
layers: caffe._caffe.LayerVec - list of whose element is caffe.Layer objects in the network, caffe.Layer classs has blobs field for layer's parameters memory and type for layer type (e.g, Convolution, Data, etc)
outputs: outputs from this network
params: An OrderedDict (bottom to top, i.e., input to output) of network parameters indexed by name; each is a list of multiple blobs (e.g., weights and biases)
top_names: all top names in the network
You can use caffe.Net.params for accessing layer's learnable parameters together with caffe.Net.layer_dict to access layer info.
caffe.Net.params is ordered dictionary where key is layer name and value is the blobs for parameters (e.g, weight and bias) and in case of Convolution layer, first element of blobs are weiht and second element of blobs is bias:
caffe.Net.params['layer_name'][0] : weight
caffe.Net.params['layer_name'][1] : bias
Please note that access to blob's memory should be done with caffe.Net.params['layer_name'][0].data and updating the blob's memory should be done with ... such as caffe.Net.params['layer_name'][0].data[...]
Following code illustrate the loading learnable parameter from numpy saved file (.npy):
def load_weights_and_biases(network):
k_list = list(network.params.keys())
suffix = ["weight", "bias"]
num_layers = len(network.layer_dict)
for idx, layer_name in enumerate(network.layer_dict):
print(f"\n-----------------------------")
print(f"layer index: {idx}/{num_layers}")
print(f"layer name: '{layer_name}''")
print(f"layer type: '{detection_nw.layers[idx].type}' ")
if layer_name in k_list:
params = network.params[layer_name]
print(f"{len(params)} learnable parameters in '{detection_nw.layers[idx].type}' type")
for i, p in enumerate(params):
#print(f"\tparams[{i}]: {p}")
#print(f"\tparams[{i}] CxHxW: {p.channels}x{p.height}x{p.width}")
print(f"\tp[{i}]: {p.data.shape} of {p.data.dtype}")
param_file_path = f"./npy_save/{layer_name}_{suffix[i]}.npy"
param_file = Path(param_file_path)
if param_file.exists():
print(f"\tload {param_file_path}")
arr = np.load(param_file_path, allow_pickle=True)
if p.data.shape == arr.shape:
print(f"\tset {layer_name}_{suffix[i]} with arr:shape {arr.shape}, type {arr.dtype}")
p.data[...] = arr
else:
print(f"p.data.shape: {p.data.shape} is not equal to arr.shape: {arr.shape}")
break
else:
print(f"{param_file_path} is not exits!!")
break
else:
print(f"no learnable parameters in '{layer_name}' of '{network.layers[idx].type}' type'")
Blob type is defined as caffe._caffe.Blob in python caffe (aka pycaffe) interface. Use help(caffe._caffe.Blob) after import caffe and names described in data descriptors defined here section of help output as attribute.
For more detaild info on Blob in Caffe reference
Blobs, Layers, and Nets: anatomy of a Caffe model - caffe documentations
caffe::Blob Class Template Reference - C++ source for Blob class
Any ideas how to implement Spatial Reflection Padding in Caffe like in Torch?
(x): nn.SpatialReflectionPadding(l=1, r=1, t=1, b=1)
(x): nn.SpatialConvolution(64 -> 64, 3x3)
(x): nn.ReLU
One way to do this would be using the Python Layer of Caffe. You can then set the functions yourself and customize based on your needs. However, this layer can only run in the CPU, so it might slow down your model especially if you use it in the middle of the network.
In the following, I have defined a layer to zero pad input using the Python layer, which you can modify to suit your needs:
import caffe
import numpy as np
class SpatialReflectionPadding(caffe.Layer):
def setup(self,bottom,top):
if len(bottom) != 1: # check that a single bottom blob is given
raise Exception("Expected a single blob")
if len(bottom[0].shape) != 4: # check that it is 4D
raise Exception("Expected 4D blob")
params = eval(self.param_str) # get the params given in the prototxt
self.l = params["l"]
self.r = params["r"]
self.t = params["t"]
self.b = params["b"]
def reshape(self,bottom,top):
top[0].reshape(bottom[0].shape[0],bottom[0].shape[1],bottom[0].shape[2]+self.t+self.b,bottom[0].shape[3]+self.r+self.l) # set the shape of the top blob based on the shape of the existing bottom blob
def forward(self,bottom,top):
for i in range(0,top[0].shape[2]):
for j in range(0,top[0].shape[3]):
if (i < self.t or i >= self.t+bottom[0].shape[2]) or (j < self.l or j >= self.l+bottom[0].shape[3]):
top[0].data[:,:,i,j] = 0 # for the padded part, set the value to 0
else:
top[0].data[:,:,i,j] = bottom[0].data[:,:,i-self.t,j-self.l] # for the rest, copy the value from the bottom blob
def backward(self,top,propagate_down,bottom):
bottom[0].diff[...] = np.full(bottom[0].shape,1) * top[0].diff[:,:,self.t:self.t+bottom[0].shape[2],self.l:self.l+bottom[0].shape[3]] # set the gradient for backward pass
Then, in your prototxt file, you can use it as:
layer {
name: "srp" # some name
type: "Python"
bottom: "some_layer" # the layer which provides the input blob
top: "srp"
python_param {
module: "caffe_srp" # whatever is your module name
layer: "SpatialReflectionPadding"
param_str: '{ "l": 1, "b": 1, "t": 1, "r": 1}'
}
}
I am not 100% sure that it works correctly, though when I used it, it appeared to do so. In any case, it should give an idea and a starting point on how one could proceed. Also, you could refer to this question and its answers.
I am implementing following Colorization Model written in Caffe. I am confused about my output_shape parameter to supply in Keras
model.add(Deconvolution2D(256,4,4,border_mode='same',
output_shape=(None,3,14,14),subsample=(2,2),dim_ordering='th',name='deconv_8.1'))
I have added a dummy output_shape parameter. But how can I determine the output parameter? In caffe model the layer is defined as:
layer {
name: "conv8_1"
type: "Deconvolution"
bottom: "conv7_3norm"
top: "conv8_1"
convolution_param {
num_output: 256
kernel_size: 4
pad: 1
dilation: 1
stride: 2
}
If I do not supply this parameter the code give parameter error but I can not understand what should I supply as output_shape
p.s. already asked on data science forum page with no response. may be due to small user base
What output shape does the Caffe deconvolution layer produce?
For this colorization model in particular you can simply refer to page 24 of their paper (which is linked in their GitHub page):
So basically the output shape of this deconvolution layer in the original model is [None, 56, 56, 128]. This is what you want to pass to Keras as output_shape. The only problem is as I mention in the section below, Keras doesn't really use this parameter to determine the output shape, so you need to run a dummy prediction to find what your other parameters need to be in order for you to get what you want.
More generally the Caffe source code for computing its Deconvolution layer output shape is:
const int kernel_extent = dilation_data[i] * (kernel_shape_data[i] - 1) + 1;
const int output_dim = stride_data[i] * (input_dim - 1)
+ kernel_extent - 2 * pad_data[i];
Which with a dilation argument equal to 1 reduces to just:
const int output_dim = stride_data[i] * (input_dim - 1)
+ kernel_shape_data[i] - 2 * pad_data[i];
Note that this matches the Keras documentation when the parameter a is zero:
Formula for calculation of the output shape 3, 4: o = s (i - 1) +
a + k - 2p
How to verify actual output shape with your Keras backend
This is tricky, because the actual output shape depends on the backend implementation and configuration. Keras is currently unable to find it on its own. So you actually have to execute a prediction on some dummy input to find the actual output shape. Here's an example of how to do this from the Keras docs for Deconvolution2D:
To pass the correct `output_shape` to this layer,
one could use a test model to predict and observe the actual output shape.
# Examples
```python
# apply a 3x3 transposed convolution with stride 1x1 and 3 output filters on a 12x12 image:
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', input_shape=(3, 12, 12)))
# Note that you will have to change the output_shape depending on the backend used.
# we can predict with the model and print the shape of the array.
dummy_input = np.ones((32, 3, 12, 12))
# For TensorFlow dummy_input = np.ones((32, 12, 12, 3))
preds = model.predict(dummy_input)
print(preds.shape)
# Theano GPU: (None, 3, 13, 13)
# Theano CPU: (None, 3, 14, 14)
# TensorFlow: (None, 14, 14, 3)
Reference: https://github.com/fchollet/keras/blob/master/keras/layers/convolutional.py#L507
Also you might be curious to know why is it that the output_shape parameter apparently doesn't really define the output shape. According to the post Deconvolution2D layer in keras this is why:
Back to Keras and how the above is implemented. Confusingly, the output_shape parameter is actually not used for determining the output shape of the layer, and instead they try to deduce it from the input, the kernel size and the stride, while assuming only valid output_shapes are supplied (though it's not checked in the code to be the case). The output_shape itself is only used as input to the backprop step. Thus, you must also specify the stride parameter (subsample in Keras) in order to get the desired result (which could've been determined by Keras from the given input shape, output shape and kernel size).
I know if I have the input layer as follows, my network will take in blobs of dimension (1,1,100,100).
layer {
name: "data"
type: "Input"
top: "data"
input_param {
shape {
dim: 1
dim: 1
dim: 100
dim: 100
}
}
}
What should I do to make the first dimension (input batch size) variable? so that I can feed in the network batches of different sizes?
You can reshape the network before calling the forward() method. So if you want a variable batch_size, you should reshape the everytime. This can be done in any interface you are using (C, python, MATLAB).
In python, it goes like this:
net.blobs['data'].reshape(BATCH_SIZE, CHANNELS, HEIGHT, WIDTH)
net.reshape()
net.forward()
hint: I believe net.reshape() is optional and the network calls this before executing the forward action.
in addition to AHA's answer, in c++ it's like
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(batch_size, input_layer->shape(1), input_layer->shape(2), input_layer->shape(3));
net_->Reshape();
This question refers to a question answered here.
The accepted answer suggests to create labels on the fly. I have a very similar problem but need to use HDF5.
Here is my prototxt:
name: "StereoNet"
layer {
name: "layer_data_left"
type: "HDF5Data"
top: "data_left"
top: "labels_left"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/home/ubuntu/trainLeftPatches.txt"
batch_size: 128
}
}
layer {
name: "layer_data_right"
type: "HDF5Data"
top: "data_right"
top: "labels_right"
include {
phase: TRAIN
}
hdf5_data_param {
source: "/home/ubuntu/trainRightPatches.txt"
batch_size: 128
}
}
... etc.
As you hopefully understand, I create two separate data HDF5 data files. They consist of positive and negative samples by having on the same index a left and a right image that in combination are a positive or negative sample. The labels_left and labels_right are identical matlab arrays of 1's and 0's. I tried to use a single labels array before but caffe gave an error, which seemed to indicate that two processes were clashing. When changing to a copy of the labels array, the training could start.
Here is part of the Matlab data creation file I am now using, the data are the KITTI data:
h5create('trainLeftPatches.h5','/data_left',[9 9 1 numberOfTrainingPatches],'Datatype','double');
h5create('trainLeftPatches.h5','/labels_left',[1 numberOfTrainingPatches],'Datatype','double');
h5create('trainRightPatches.h5','/data_right',[9 9 1 numberOfTrainingPatches],'Datatype','double');
h5create('trainRightPatches.h5','/labels_right',[1 numberOfTrainingPatches],'Datatype','double');
h5create('valLeftPatches.h5','/data_left',[9 9 1 numberOfValidatePatches],'Datatype','double');
h5create('valLeftPatches.h5','/labels_left',[1 numberOfValidatePatches],'Datatype','double');
h5create('valRightPatches.h5','/data_right',[9 9 1 numberOfValidatePatches],'Datatype','double');
h5create('valRightPatches.h5','/labels_right',[1 numberOfValidatePatches],'Datatype','double');
h5write('trainLeftPatches.h5','/data_left', dataLeft_permutated(:, :, :, 1:numberOfTrainingPatches));
h5write('trainLeftPatches.h5','/labels_left', labels_permutated(:, 1:numberOfTrainingPatches));
h5write('trainRightPatches.h5','/data_right', dataRight_permutated(:, :, :, 1:numberOfTrainingPatches));
h5write('trainRightPatches.h5','/labels_right', labels_permutated(:, 1:numberOfTrainingPatches));
h5write('valLeftPatches.h5','/data_left', dataLeft_permutated(:, :, :, numberOfTrainingPatches+1:end));
h5write('valLeftPatches.h5','/labels_left', labels_permutated(:, numberOfTrainingPatches+1:end));
h5write('valRightPatches.h5','/data_right', dataRight_permutated(:, :, :, numberOfTrainingPatches+1:end));
h5write('valRightPatches.h5','/labels_right', labels_permutated(:, numberOfTrainingPatches+1:end));
toc;
the loss is acceptable on mini batches at the end, but stays too high on the tests
Please advice. (It may not work). If there is an error, it is probably very subtle.
A few points you should consider:
Your network is not a Siamese network: it contains two paths left and right, but these paths do not share the same filters. See this tutorial how to build a siamese network that shares filters across layers.
"HDF5Data" layer is not restricted to two outputs ("top"s), it can have as many as you'd like. Thus, you can have a single layer for train and a single layer for test:
layer {
name: "data"
type: "HDF5Data"
top: "data_left"
top: "data_right"
top: "labels"
hdf5_data_param { ... }
include { phase: TRAIN }
}
The corresponding hdf5 files should have three dataset specified for the h5write command (instead of only two in your code).
Have you considered using minibatch loss, instead of pairs loss?