I'd like to discuss feature extraction using Caffe model called GoggleNet.
I am referring to this paper "End to end people detection in crowded scenes". For those who are familiar with caffe, should be able to cope with my queries.
The paper has its own library using Python, I also run through the library but can't cope with some points mentioned in the paper.
The input image is passed through with GoogleNet till inception_5b/output layer.
Then output is formed as multidimensional array in 15x20x1024. So each 1024 vector represents a bounding box in the center of 64x64 region. Since it is 50% overlapping, there are 15x20 matrix for 640x480 image and each cell has third dimension of 1024 vector in length.
My query is
(1)how this 15x20x1024 array output can be obtained?
(2)how this 1x1x1024 data can represent 64x64 region in the image?
There is a description in the source code as
"""Takes the output from the decapitated googlenet and transforms the output
from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers.
N = batch size, C = channels, W = grid width, H = grid height."""
That conversion is implemented using the function in Python as
def generate_intermediate_layers(net):
"""Takes the output from the decapitated googlenet and transforms the output
from a NxCxWxH to (NxWxH)xCx1x1 that is used as input for the lstm layers.
N = batch size, C = channels, W = grid width, H = grid height."""
net.f(Convolution("post_fc7_conv", bottoms=["inception_5b/output"],
param_lr_mults=[1., 2.], param_decay_mults=[0., 0.],
num_output=1024, kernel_dim=(1, 1),
weight_filler=Filler("gaussian", 0.005),
bias_filler=Filler("constant", 0.)))
net.f(Power("lstm_fc7_conv", scale=0.01, bottoms=["post_fc7_conv"]))
net.f(Transpose("lstm_input", bottoms=["lstm_fc7_conv"]))
I can't cope that portion as how each 1x1x1024 represents that size of bounding box rectangle.
Since you are looking at a 1x1 cell very deep in the net, it's effective recptive field is quite large and can be (and probably is) 64x64 pixels in the original image.
That is, each feature in "inception_5b/output" is affected by 64x64 pixels in the input image.
Related
I have around 200 CT slides from patients and need to reconstruct them and segment out the eye orbit. In Matlab, I read the original DICOM image, use imbinarize(I,T) to change them to binary image, and then stack them up to form the 3D orbital image (see attached images below).
Original DICOMenter image description here
Binarized Imageenter image description here
Reconstructed Orbit in 3D
However, as you can see, there are many holes inside the orbit. To cope with this, I use bwmorph() with 'thicken' and 'bridge', then use imclose() with the 'disk' parameter:
for i = 30 : 160
info = dicominfo(dirOutput(i).name,'UseDictionaryVR',true);
imgTemp = dicomread(info);
imgCropped = imcrop(imgTemp,bboxCrop);
imgCroppedBin = imbinarize(imgCropped,0.51728);
img_thick = bwmorph(imgCroppedBin,'thicken');
img_bridge = bwmorph(img_thick,'bridge',Inf);
se = strel('disk',1);
I(:,:,i) = imclose(img_bridge,se);
end;
I(:,:,i) is the resulting 3D array. Then I use fv = isosurface(I,ISOVALUE) and patch() to reconstruct the 3D array to a 3D image (as attached above).
The resulting 3D image still has many holes unfilled. Increasing the STREL element size or decreasing the ISOVALUE will make the final model thick and coarse (e.g. with stair case inside).
May I ask how can I fill the resulting holes in the 3D model and still preserve the original smoothness of the image? Should I do this in the 3D image or do this in 2D binary image before the reconstruction?
I tried alphaShape as well to fill the holes, the result is more satisfactory and easier to control than isosurface() and patch(). But with large "alpha value" e.g. 8, the output model becomes to lose detail, such as the "inferior orbital fissure" being smoothed too.
Reconstructed image using alphaShape with alpha value = 8
Grateful if anyone can suggest a solution. Thanks!
Introduction
Background: I am segmenting images using the watershed algorithm in MATLAB. For memory and time constraints, I prefer to perform this segmentation on subsampled images, let's say with a resize factor of 0.45.
The problem: I can't properly re-scale the output of the segmentation to the original image scale, both for visualization purposes and other post processing steps.
Minimal Working Example
For example, I have this image:
I run this minimal script and I get a watershed segmentation output L that consists in a label image, where each connected component is addressed with a natural number and the borders between the connected components are zero-valued:
im_orig = imread('kitty.jpg'); % Load image [530x530]
im_res = imresize(im_orig, 0.45); % Resize image [239x239]
im_res = rgb2gray(im_res); % Convert to grayscale
im_blur = imgaussfilt(im_res, 5); % Gaussian filtering
L = watershed(im_blur); % Watershed aglorithm
Now I have L that has the same dimension of im_res. How can I use the result stored in L to actually segment the original im_orig image?
Wrong solution
The first approach I tried was to resize L to the original scale by using imresize.
L_big = imresize(L, [size(im_orig,1), size(im_orig,2)]); % Upsample L
Unfortunately the upsampling of L produces a series of unwanted artifacts. It especially loses some of the fundamental zeros that represent the boundaries between the image segments. Here is what I mean:
figure; imagesc(imfuse(im_res, L == 0)); axis auto equal;
figure; imagesc(imfuse(im_orig, L_big == 0)); axis auto equal;
I know that this is due to the blurring caused by the upscaling process, but for now I couldn't think about anything else that could succeed.
The only other approach I thought about involve the use of Mathematical Morphology to "enlarge" the boundaries of the resized image and then upsample, but this would still lead to some unwanted artifacts.
TL;DR (or recap)
Is there a way to perform watershed on a downscaled image in MATLAB and then upscale the result to the original image, keeping the crisp region boundaries outputted by the algorithm? Is what I am looking for a completely absurd thing to ask?
If you only need the watershed segment borders after upsizing the image, then just make these little changes:
L_big = ~imresize(L==0, [size(im_orig,1), size(im_orig,2)]); % Upsample L
and here the results:
you can use nearest neighbor interpolation when resizing:
L_big = imresize(L, [size(im_orig,1), size(im_orig,2)],'nearest'); % Upsample L
Normally when we resize images we star fro the destination, iterate over x, y, and find the best matching pixel in the source. Here you want to do the reverse. Iterate over the source in x, y and write to the destination buffer, with 0 taking priority (so initialise to 0xFF, then don't overwrite any zeroes with other values),
There's unlikely to be a function that does this on the toolkit, you;ll hav e to roll your own.
I am trying to write a program that uses computer vision techniques to detect (and track) tiny blobs in a stream of very noisy images. The image stream comes from an dual X ray imaging setup, which outputs left and right views (different sizes because of collimating differently). My data is of two types: one set of images are not so noisy, which I am just using to try different techniques with, and the other set are noisier, and this is where the detection needs to work at the end. The image stream is at 60 Hz. This is an example of a raw image from the X ray imager:
Here are some cropped out samples of the regions of interest. The blobs that need to be detected are the small black spots near the center of the image.
Initially I started off with a simple contour/blob detection techniques in OpenCV, which were not very helpful. Eventually I moved on to techniques such as "opening" the image using morphological operators, and subsequently performing a Laplacian of Gaussian blob detection to detect areas of interest. This gave me better results for the low-noise versions of the images, but fails when it comes to the high-noise ones: gives me too many false positives. Here is a result from a low-noise image (please note input image was inverted).
The code for my current LoG based approach in MATLAB goes as below:
while ~isDone(videoReader)
frame = step(videoReader);
roi_frame = imcrop(frame, [660 410 120 110]);
I_roi = rgb2gray(roi_frame);
I_roi = imcomplement(I_roi);
I_roi = wiener2(I_roi, [5 5]);
background = imopen(I_roi,strel('disk',3));
I2 = imadjust(I_roi - background);
K = imgaussfilt(I2, 5);
level = graythresh(K);
bw = im2bw(I2);
sigma = 3;
% Filter image with LoG
I = double(bw);
h = fspecial('log',sigma*30,sigma);
Ifilt = -imfilter(I,h);
% Threshold for points of interest
Ifilt(Ifilt < 0.001) = 0;
% Dilate to obtain local maxima
Idil = imdilate(Ifilt,strel('disk',50));
% This is the final image
P = (Ifilt == Idil) .* Ifilt;
Is there any way I can improve my current detection technique to make it work for images with a lot of background noise? Or are there techniques better suited for images like this?
The approach I would take:
-Average background subtraction
-Aggressive Gaussian smoothing (this filter should be shaped based on your target object, off the top of my head I think you want the sigma about half the smallest cross section of your object, but you may want to fiddle with this) Basically the goal is blurring the noise as much as possible without completely losing your target objects (based on shape and size)
-Edge detection. Try to be specific to the object if possible (basically, look at what the object's edge looks like after Gaussian smoothing and set your edge detection to look for that width and contrast shift)
-May consider running a closing operation here.
-Search the whole image for islands (fully enclosed regions) filter based on size and then on shape.
I am taking a hunch that despite the incredibly low signal to noise ratio, your granularity of noise is hopefully significantly smaller than your object size. (if your noise is both equivalent contrast and same ballpark size as your object... you are sunk and need to re-evaluate your acquisition imo)
Another note based on your speed needs. Extreme amounts of processing savings can be made through knowing last known positions and searching locally and also knowing where new targets can enter the image from.
Here with i have attached two consecutive frames captured by a cmos camera with IR Filter.The object checker board was stationary at the time of capturing images.But the difference between two images are nearly 31000 pixels.This could be affect my result.can u tell me What kind of noise is this?How can i remove it.please suggest me any algorithms or any function possible to remove those noises.
Thank you.Sorry for my poor English.
Image1 : [1]: http://i45.tinypic.com/2wptqxl.jpg
Image2: [2]: http://i45.tinypic.com/v8knjn.jpg
That noise appears to result from camera sensor (Bayer to RGB conversion). There's the checkerboard pattern still left.
Also lossy jpg contributes a lot to the process. You should first have an access to raw images.
From those particular images I'd first try to use edge detection filters (Sobel Horizontal and Vertical) to make a mask that selects between some median/local histogram equalization for the flat areas and to apply some checker board reducing filter to the edges. The point is that probably no single filter is able to do good for both jpeg ringing artifacts and to the jagged edges. Then the real question is: what other kind of images should be processed?
From the comments: if corner points are to be made exact, then the solution more likely is to search for features (corner points with subpixel resolution) and make a mapping from one set of points to the other images set of corners, and search for the best affine transformation matrix that converts these sets to each other. With this matrix one can then perform resampling of the other image.
One can fortunately estimate motion vectors with subpixel resolution without brute force searching all possible subpixel locations: when calculating a matched filter, one gets local maximums for potential candidates of exact matches. But this is not all there is. One can try to calculate a more precise approximation of the peak location by studying the matched filter outputs in the nearby pixels. For exact match the output should be symmetric. Otherwise the 'energies' of the matched filter are biased towards the second best location. (A 2nd degree polynomial fit + finding maximum can work.)
Looking closely at these images, I must agree with #Aki Suihkonen.
In my view, the main noise comes from the jpeg compression, that causes sharp edges to "ring". I'd try a "de-speckle" type of filter on the images, and see if this makes a difference. Some info that can help you implement this can be found in this link.
In a more quick and dirty fashion, you apply one of the many standard tools, for example, given the images are a and b:
(i) just smooth the image with a Gaussian filter, this can reduce noise differences between the images by an order of magnitude. For example:
h=fspecial('gaussian',15,2);
a=conv2(a,h,'same');
b=conv2(b,h,'same');
(ii) Reduce Noise By Adaptive Filtering
a = wiener2(a,[5 5]);
b = wiener2(b,[5 5]);
(iii) Adjust ntensity Values Using Histogram Equalization
a = histeq(a);
b = histeq(b);
(iv) Adjust Intensity Values to a Specified Range
a = imadjust(a,[0 0.2],[0.5 1]);
b = imadjust(b,[0 0.2],[0.5 1]);
If your images are supposed to be black and white but you have captured them in gray scale there could be difference due to noise.
You can convert the images to black and white by defining a threshold, any pixel with a value less than that threshold should be assigned 0 and anything larger than that threshold should be assigned 1, or whatever your gray scale range is (maybe 255).
Assume your image is I, to make it black and white assuming your gray scale image level is from 0 to 255, assume you choose a threshold of 100:
ind = find(I < 100);
I(ind) = 0;
ind = find(I >= 100);
I(ind) = 255;
Now you have a black and white image, do the same thing for the other image and you should get very small difference if the camera and the subject have note moved.
I have a volume (3D matrix) that has undergone a segmentation process. Most of the volume consist of NaNs (or zeros), except regions that have passed some criteria (see picture). I need to know how large each remaining segment is in number of voxels and how is their distribution on the 2D planes (xy, xz, yz). Is there anything in matlab that can help me do this in an efficient way rather than direct search? The volume can be rather large. For ex. in the attached picture there is one segment in yellowish/brownish colour of 7 voxels and extends more vertically than in xy.
Thanks in advance.
The most convenient solution is to use REGIONPROPS. In your example:
stats = regionprops(image, 'area', 'centroid')
For every feature, there is an entry in the structure stats with the area (i.e. # of voxels) and the centroid.
I think that what you are looking for is called bwlabeln. It allows you to find blobs in 3D space, just like bwlabel does in 2D. Afterwards, you can use regionprops to find out the properties of the data.
Taken directly from help:
bwlabeln Label connected components in binary image.
L = bwlabeln(BW) returns a label matrix, L, containing labels for the
connected components in BW. BW can have any dimension; L is the same
size as BW. The elements of L are integer values greater than or equal
to 0. The pixels labeled 0 are the background. The pixels labeled 1
make up one object, the pixels labeled 2 make up a second object, and
so on. The default connectivity is 8 for two dimensions, 26 for three
dimensions, and CONNDEF(NDIMS(BW),'maximal') for higher dimensions.