How to understand RPN phrase in Faster-RCNN - neural-network

Sorry, it is all by one question but relate to many small questions. I can't split them into seperated questions.
For example, input picture size 960x640
Through VGG16 layer 13 Conv5_3, get feature_map 60x40x512
Do 3x3 convolution.
    3.1 How 3x3 convolution compress the output above to 1x512 ?
    3.2 I read some article said, RPN would random select 512 samples from 2000 anchors . If 1x512 matrix mean this , what is 3x3 convolution doing ?
Loop feature_map, with 16 stride and 16 scale to find the center of the original map (corresponding feature map current point), cut 9 anchors out, calculate IoU< 0.3 as neg samples and IoU > 0.7 as pos samples.
    4.1 If there are several points on the feature_map, how to cover the GT? I mean, because it need IoU > 0.7 to label pos sample, here IoU refers to [ the intersection of the area(map from this point to the original image) and GT], or [all the area of GT]? I think it should be the former.
After all the loops are over, filter out the positive and negative samples by nms . Is it possible to have multiple anchors in a single point, or is nms sure to filter this out?
Pass to softmax.
    6.1 My problem is , in many cases, the positions(labeled by positive and negative) of the points on the feature_map are different. Because the position of the parameter is also fixed at a specific position on the feature_map, how to find proposals from an image in the detect phase?
    6.2 random selection of anchors is at here? ? ?
RoI merges feature_map and proposals to do pooling. (1. roi (roi said a group of anchors it) is located in the feature map , and get the patch zone in feature map 2. something like SPP layer(7x7 down sampling ) is applied to the feature map patch, transform to fixed size of features, to fit full connection layer)
Another softmax. (Training phase using BP to tune the parameters), my problem is that in many cases the positions of the points on the feature_map each time labeled positive and negative are different. Because the position of the parameter is also fixed at a specific position on the feature_map, how to find proposals from the image in the detect phase?
RoI compare to GT, do reggession.
After finishing the above questions, re-think again. I found my understanding of anchors, proposals a bit confusing. Does many anchors compose to a proposal?
If so, then the above 6 becomes
Select 512 anchors , pass their parameters into softmax, the output show if it is part of the target object. So this layer is the detect phase. When doing detect phase, just loop all the anchors to get the possible ones .
6.1 But in this case, how RPN output bbox size (x, y, w h) ? I think it need merge selected anchors and then scale to the size of the original image , to get the bbox size.
    6.2 If operation is merger , then randomly selected 512 from 2000 is likely to miss some areas, isn't it ?
Mainly is 3 and 6, and I think all of them are highly relative can not be seperated. Some are just need yes or no confirm, thanks

Related

matlab: limiting erosion on binary images

I am trying to erode objects in a binary image such that they do not become smaller than some fixed size. Consider, for instance, a binary map composed of connected components (blobs), wherein one defines blob size by either the minimal or maximal antipolar (anti-perimetric) distance (i.e., the distance between two points that are as far from one another as they can be on the perimeter or contour of the blob; if the contour consists of N consecutively numbered points, then the distances evaluated would be those between points 1 and N/2+1, points 2 and N/2+2, etc.). Given such an arrangement, I seek to erode these blobs until the distance metric reaches a specified limit. If the blobs were simple circles, then the effect could be realized by ultimate erosion followed by dilation to a fixed size; however, the contour of an irregular object would be lost by such a procedure. Is there a way to achieve such an effect for connected, irregular components using built-in functions in MATLAB?
With no image and already tried code presented I can understand you wrong, but may be iterative using bwmorph with 'thin','skel' or 'shrink' will help you.
while(cond < cond_threshold)
bw=bwmorph(bw,...,1); %one of the options above
cond = calc_cond(bw);
end

Matlab - Concatenation of overlapping blocks with weighted average

I'm looking for a quick way to combine overlapping blocks into one image. Assume the size of the full image and the coordinates of each block within the full image are known. Also assume the blocks are regularly spaced both horizontally and vertically.
The catch - in the overlapping region, a pixel in the output image should get a value according to a weighted average of the corresponding pixels in the overlapping blocks. The weights should be proportional to the distance from the block center.
So, for example, take a pixel location p (relative to the full image coordinates) in the overlapping region between block B1 and B2. Assume the overlap region is due to a horizontal shift only of size h. If B1(p) and B2(p) are the values at that location as they appear in blocks B1,B2, and d1,d2 are the respective distances of p from the center of blocks B1 and B2 then in the output image O the location p will get O(p) = (h-d1)/h*B1(p) + (h-d2)/h*B2(p).
Note that generally, there can be up to 4 overlapping blocks in any region.
I'm looking for the best way to do this in Matlab. Hopefully, for any choice of distance function.
blockproc and alike can help splitting an image into blocks but allow for very basic combination of results. imfuse comes close to what I need, but offers simple non-weighted alpha blending only. bwdist seems to be useful, but I haven't figured what the most efficient method to put it to use is.
You should use the command im2col.
Once you have all your patches in vectors aligned in one matrix you'll be able to work on the columns (Filtering per patch) and rows (Filtering between patches).
It will be trickier than the classic usage of im2col but it should work.

Creating a feature vector in MATLAB

I have calculated texture, color and shape features on an image.But those just add up to 12 features. I read that people extract 1000 features and such. Could someone please explain to me how do i increase the number of features? And then how do i save them to form a feature vector?
Features are the most significant or interest points of an image.In general lets say I am interested in the edges information of an image.As we know edges are found by laplacian filter in spatial domain so the only points that will remain in the image will be edges point.Each edge point will have its x,y co-ordinate followed by intensity value.These three information of all the interest points would probably lead to multi-dimension feature vector in this case which would be depend on the type of image you are taking.These three multidimensional information will be called my feature vector.
Similarly histogram of an image can also be your feature which will range from 0 to 255 value for an gray scale image.So in that case you can store for every image this 255 values as features.
Hope you got the idea.Image is an subjective thing so depending on any application and given data set we will extract features and form feature vectors.
Apart from color,texture and shape you can even work on the signatures,edges,histogram and so on properties of an image.

Garment Cropping from mannequin

I have two images – mannequin with and without garment.
Please refer sample images below. Ignore the jewels, footwear on the mannequin, imagine the second mannequin has only dress.
I want to extract only the garment from the two images for further processing.
The complexity is that there is slight displacement in the position of camera when taking the two pictures. Due to this simple subtraction to generate the garment mask will not work.
Can anyone tell me how to handle it?
I think I need to do registration between the two images so that I can extract only the garment from the image?
Any references to blogs, articles and codes is highly appreciated.
--
Thanks
Idea
This is an idea of how you could do it, I haven't tested it but my gut tells me it might work. I'm assuming that there will be slight differences in the pose of the manequin as well as the camera attitude.
Let the original image be A, and the clothed image be B.
Take the difference D = |A - B|, apply a median filter that is proportional to the largest deviation you expect from pose and camera attitude error: Dmedian = Median(D, kernelsize).
Quantize Dmedian into a binary mask Dmask = Q(Dmedian, threshold) using appropriate threshold values to obtain an approximate mask for the garment (this will be smaller than the garment itself due to the median filter). Reject any shapes in Dmedian that have too small area by setting their pixels to 0.
Expand the shape(s) in Dmask proportionally to the size of the median kernel into Emask=expand(Dmask, k*kernelsize). Then construct the difference in the masks Fmask=|Dmask - Emask| which now contains areas of pixels where the garment edge is expected to be. For every pixel in Fmask which is in this area, find the correlation Cxy between A and B using a small neighbourhood, store the correlations into an image C=1.0 - Corr(A,B, Fmask, n).
Your final garment mask will be M=C+Dmask.
Explanation
Since your image has nice and continuous swatches of colour, the difference between the two similar images will be thin lines and small gradients where the pose and camera attitude is different. When taking a median filter of the difference image over a sufficiently large kernel, these lines will be removed because they are in a minority of the pixels.
The garment on the other hand will (hopefully) have a significant difference from the colors in the unclothed version. And will generate a bigger difference. Thresholding the difference after the median filter should give you a rough mask of the garment that is undersized dues to some of the pixels on the edge being rejected due to their median values being too low. You could stop here if the approximation is good enough for you.
By expanding the mask we obtained above we get a probable region for the "true" edge. The above process has served to narrow our search region for the true edge considerably and we can apply a more costly correlation search between the images along this edge to find where the garment is. High correlation means no carment and low correlation means garment.
We use the inverted correlation as an alpha value together with the initially smaller mask to obtain a alpha valued mask of the garment that can be used for extracting it.
Clarification
Expand: What I mean by "expanding the mask" is to find the contour of the mask region and outsetting/growing/enlarging it to make it larger.
Corr(A,B,Fmask,n): Is just an arbitrarily chosen correlation function that gives correlation between pixels in A and B that are selected by the mask Fmask using a region of size n. The function returns 1.0 for perfect match and 0.0 for anti-match for each pixel tested. A good function is this pseudocode:
foreach px_pos in Fmask where Fmask[px_pos] == 1
Ap = subregion(A, px_pos, size) - mean(mean(A));
Bp = subregion(B, px_pos, size) - mean(mean(B))
Cxy = sum(sum(Ap .* Bp))*sum(sum(Ap .* Bp)) / (sum(sum(Ap.*Ap))*sum(sum(Bp.*Bp)))
C[px_pos] = 1.0 - Cxy;
end
where subregion selects a region of size size around the pixel with position px_pos.
You can see that if Ap == Bp then Cxy=1

How to remove camera noises in CMOS camera

Here with i have attached two consecutive frames captured by a cmos camera with IR Filter.The object checker board was stationary at the time of capturing images.But the difference between two images are nearly 31000 pixels.This could be affect my result.can u tell me What kind of noise is this?How can i remove it.please suggest me any algorithms or any function possible to remove those noises.
Thank you.Sorry for my poor English.
Image1 : [1]: http://i45.tinypic.com/2wptqxl.jpg
Image2: [2]: http://i45.tinypic.com/v8knjn.jpg
That noise appears to result from camera sensor (Bayer to RGB conversion). There's the checkerboard pattern still left.
Also lossy jpg contributes a lot to the process. You should first have an access to raw images.
From those particular images I'd first try to use edge detection filters (Sobel Horizontal and Vertical) to make a mask that selects between some median/local histogram equalization for the flat areas and to apply some checker board reducing filter to the edges. The point is that probably no single filter is able to do good for both jpeg ringing artifacts and to the jagged edges. Then the real question is: what other kind of images should be processed?
From the comments: if corner points are to be made exact, then the solution more likely is to search for features (corner points with subpixel resolution) and make a mapping from one set of points to the other images set of corners, and search for the best affine transformation matrix that converts these sets to each other. With this matrix one can then perform resampling of the other image.
One can fortunately estimate motion vectors with subpixel resolution without brute force searching all possible subpixel locations: when calculating a matched filter, one gets local maximums for potential candidates of exact matches. But this is not all there is. One can try to calculate a more precise approximation of the peak location by studying the matched filter outputs in the nearby pixels. For exact match the output should be symmetric. Otherwise the 'energies' of the matched filter are biased towards the second best location. (A 2nd degree polynomial fit + finding maximum can work.)
Looking closely at these images, I must agree with #Aki Suihkonen.
In my view, the main noise comes from the jpeg compression, that causes sharp edges to "ring". I'd try a "de-speckle" type of filter on the images, and see if this makes a difference. Some info that can help you implement this can be found in this link.
In a more quick and dirty fashion, you apply one of the many standard tools, for example, given the images are a and b:
(i) just smooth the image with a Gaussian filter, this can reduce noise differences between the images by an order of magnitude. For example:
h=fspecial('gaussian',15,2);
a=conv2(a,h,'same');
b=conv2(b,h,'same');
(ii) Reduce Noise By Adaptive Filtering
a = wiener2(a,[5 5]);
b = wiener2(b,[5 5]);
(iii) Adjust ntensity Values Using Histogram Equalization
a = histeq(a);
b = histeq(b);
(iv) Adjust Intensity Values to a Specified Range
a = imadjust(a,[0 0.2],[0.5 1]);
b = imadjust(b,[0 0.2],[0.5 1]);
If your images are supposed to be black and white but you have captured them in gray scale there could be difference due to noise.
You can convert the images to black and white by defining a threshold, any pixel with a value less than that threshold should be assigned 0 and anything larger than that threshold should be assigned 1, or whatever your gray scale range is (maybe 255).
Assume your image is I, to make it black and white assuming your gray scale image level is from 0 to 255, assume you choose a threshold of 100:
ind = find(I < 100);
I(ind) = 0;
ind = find(I >= 100);
I(ind) = 255;
Now you have a black and white image, do the same thing for the other image and you should get very small difference if the camera and the subject have note moved.