I am trying to import the SUN RGB-D dataset into lmdb format so that caffe can train for the bounding box regression. I see for imagenet conversion, there is a file putting the filename and the class label on one row. How can I prepare the data so I can label an object by four point coordinates? There are about 10 objects recognized in the ground truth image, so one image should contain around 10 * 8 values for the regression result.
Related
we have been using the ifanbeam function to produce an image of a simulated phantom for medical X-ray imaging. The phantom consists of a water disc with three smaller discs inserted, made of bone, fat, and air. The phantom is positioned in the middle between a detector and an X-ray source (distance between detector and source is 5 cm). The X-ray beam is defined as a fan beam of 56 deg opening angle. The phantom rotates around its axis.
Our issue is that the reconstructed image looks blurry inside and it is difficult to see the smaller discs. Image reconstructed using ifanbeam().
I've attached the ground truth image which I obtained from a different simulation using parallel beam rather than fan beam. Ground truth image reconstructed using iradon()
The matlab code is below. After we preprocessed the raw data, we are creating a 3D array of size 180x240x20, which corresponds to an array of the single projections images of size 180x240. FYI, the raw data only consists of 10 projections, but we faced some issues with the FanCoverage parameter, so we padded the sinogram with zeros to artificially add another 10 projections and then setting FanCoverage to "cycle".
Did anyone have a similar problem before or knows how to help?
n=max(size(indices_time)); % indices_time corresponds to the number of events in the simulation
images=zeros(180,240,nrOfProjections);
for m=1:n
images(indices_y(m),indices_x(m),indices_time(m))=images(indices_y(m),indices_x(m),indices_time(m))+1;
end
sinogram=zeros(240,nrOfProjections*2);
for m=1:nrOfProjections
sinogram(:,m)=sum(images(89:90,:,m));
end
theta=0:18:342;
figure(1)
colormap(gray)
imagesc(sinogram)
movegui('northwest')
rec_fanbeam=ifanbeam(sinogram,113.5,"FanCoverage","cycle","FanRotationIncrement",18,"FanSensorGeometry","line","FanSensorSpacing",0.25,"OutputSize",100);
figure(2)
colormap(gray)
imagesc(rec_fanbeam)
xlabel('xPos')
ylabel('yPos')
title('Reconstructed image')
movegui('northeast')
I'm using U-Net for image segmentation.
The model was trained with images that could contain up to 4 different classes. The train classes are never overlapping.
The output of the UNet is a heatmap (with float values between 0 and 1) for each of these 4 classes.
Now, I have 2 problems:
for a certain class, how do I segment (draw contours) in the original image only for the points where the heatmap has significant values? (In the image below an example: the values in the centre are significant, while the values on the left aren't. If I draw the segmentation of the entire image without any additional operation, both are considered.)
downstream of the first point, how do I avoid that in the original image the contours of two superimposed classes are drawn? (maybe by drawing only the one that has higher values in the corresponding heatmap)
I'd like to discuss feature extraction using Caffe model called GoggleNet.
I am referring to this paper "End to end people detection in crowded scenes". For those who are familiar with caffe, should be able to cope with my queries.
The paper has its own library using Python, I also run through the library but can't cope with some points mentioned in the paper.
The input image is passed through with GoogleNet till inception_5b/output layer.
Then output is formed as multidimensional array in 15x20x1024. So each 1024 vector represents a bounding box in the center of 64x64 region. Since it is 50% overlapping, there are 15x20 matrix for 640x480 image and each cell has third dimension of 1024 vector in length.
My query is
(1)how this 15x20x1024 array output can be obtained?
(2)how this 1x1x1024 data can represent 64x64 region in the image?
There is a description in the source code as
"""Takes the output from the decapitated googlenet and transforms the output
from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers.
N = batch size, C = channels, W = grid width, H = grid height."""
That conversion is implemented using the function in Python as
def generate_intermediate_layers(net):
"""Takes the output from the decapitated googlenet and transforms the output
from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers.
N = batch size, C = channels, W = grid width, H = grid height."""
net.f(Convolution("post_fc7_conv", bottoms=["inception_5b/output"],
param_lr_mults=[1., 2.], param_decay_mults=[0., 0.],
num_output=1024, kernel_dim=(1, 1),
weight_filler=Filler("gaussian", 0.005),
bias_filler=Filler("constant", 0.)))
net.f(Power("lstm_fc7_conv", scale=0.01, bottoms=["post_fc7_conv"]))
net.f(Transpose("lstm_input", bottoms=["lstm_fc7_conv"]))
I can't cope that portion as how each 1x1x1024 represents that size of bounding box rectangle.
Since you are looking at a 1x1 cell very deep in the net, it's effective recptive field is quite large and can be (and probably is) 64x64 pixels in the original image.
That is, each feature in "inception_5b/output" is affected by 64x64 pixels in the input image.
I am currently trying to build a CNN that takes in images. The label that I want is the (i,j) coordinates. I know that the image data layer that uses a file with the filename and the label in the following format:
folder/file1.jpg label
Is it possible to have a label that isn't a number but two numbers?
I am trying to make a mask for a georaster (netCDN file) using a shp file. This is similar to what is described in this thread (How to limit the raster processing extent using a spatial mask?), but since my georaster is not an image I do not have an R variable, so the solution presented there does not work for me.
I first load my shapefile into MATLAB:
S1 = shaperead('Export_Output_3.shp');
The data has a Lambert Conformal Conical projection and the units is meters.
Then I load the netCDF file and extract the first slice of data. I get a matrixx with one valu for each cell. It also has a Lambert Conformal Conic projection with the same parameter values, but the data is in “degrees north” and “degrees east”.
So how can I put this data together? I tried the utm2deg function for converting the units of the shapefile from meters to degrees, but I do not know the UTM zone, so I got stuck with that.
I think that I first need to have both datasets in the same units and then to convert the shapefile into a matrix with the same resolution as the netCDF. I am unable to perform any of these steps.
Can anyone help?