How Region proposal network works in Faster R-CNN?

How Region proposal network works in Faster R-CNN? - neural-network

I try to understand how anchor boxes coordinate generated from feature map, and i have some questions about this process.
1-) From above image, feature map size is N x M x C and sliding windows is chosen as 3x3. What is the mission of this 3x3 window? I think, it is used to down size from NxMxC to NxMx1? Am i right? if noti what is the mission of this window?
2-) To obtain anchor boxes coordinate on RGB input image from feature map, how 3x3 windows affect this coordinate?
Thanks in advance.

Related

Find actual height of image from screen height with camera angle distortion

If my phone camera is at a known height on my phone and it takes a picture from a known distance, how can I find out the actual height of an object at the bottom of the picture, considering that the camera is taking the photo from a top angle?
This question talks of a similar problem, but the answers haven't taken camera angle and height into account.
Here's a diagram of the setup -
h is the actual height of the yellow box in front of the blue screen.
This is the image captured by the camera -
How can I find h, given h' on the image? Assume the focal length of the camera is known.

Assuming you know the calibration matrix K, here is a solution that I find simpler than calculating angles. Choose the points p1=(x,y) and p2=(r,s) as indicated in the figure above. Since you say that you know the distance from the camera to the object, that means you know the depth d of these points in camera coordinates, and
Q1=inverse(K)*p1*d
Q2=inverse(K)*p2*d
give you the corresponding points on the cube in camera coordinates. Now the height you seek is simply
abs(Q1-Q2)
Hope that helps.
Edit: Here's a quick explanation about the calibration matrix. When using the pinhole camera model, a 3d point P can be reprojected in the image plane via the multiplication KP where K is (assuming square pixels) the matrix
f 0 a
0 f b
0 0 1
where f is the focal length expressed in terms of pixel size, and [-a,-b]^t is the center of the image coorrdinates system (expressed in pixels). For more info, you can just goolge "intrinsic camera parameters", or for a quick and dirty explanation look here or here. And maybe my other answer can help?
Note: In your case since you only care about depth, you do not need a and b, you can set them to 0 and just set f.
PS: If you don't know f, you should look into camera calibration algorithms (there are auto-calibrating methods but as far as I know they require many frames and fall into the domain of SLAM/SFM). However, I think that you can find pre-computed intrinsic parameters in Blender for a few known smartphone models, but they are not expressed in the exact manner presented above, and you'll need to convert them. I'd calibrate.

I must be missing something, but I think this is quite easy (based on your assumptions, which include doing some type of image processing to detect the front and bottom edges of your object in order to get h') Keep in mind that you are also assuming that your distance from the top of the object to your camera is the same as from the bottom of your object to your camera. (at greater distances this becomes moot, but at close ranges, the skew can actually be quite significant)
The standard equation for distance:
dist = (focalDist(mm) * objectRealHeight(mm) * imageHeight(pix) ) / ( objectHeight(pix) * sensorHeight(mm) )
You can re-arrange this equation to solve for objectRealHeight since you know everything else...

Verify that camera calibration is still valid

How do you determine that the intrinsic and extrinsic parameters you have calculated for a camera at time X are still valid at time Y?
My idea would be
to use a known calibration object (a chessboard) and place it in the camera's field of view at time Y.
Calculate the chessboard corner points in the camera's image (at time Y).
Define one of the chessboard corner points as world origin and calculate the world coordinates of all remaining chessboard corners based on that origin.
Relate the coordinates of 3. with the camera coordinate system.
Use the parameters calculated at time X to calculate the image points of the points from 4.
Calculate distances between points from 2. with points from 5.
Is that a clever way to go about it? I'd eventually like to implement it in MATLAB and later possibly openCV. I think I'd know how to do steps 1)-2) and step 6). Maybe someone can give a rough implementation for steps 2)-5). Especially I'd be unsure how to relate the "chessboard-world-coordinate-system" with the "camera-world-coordinate-system", which I believe I would have to do.
Thanks!

If you have a single camera you can easily follow the steps from this article:
Evaluating the Accuracy of Single Camera Calibration
For achieving step 2, you can easily use detectCheckerboardPoints function from MATLAB.
[imagePoints, boardSize, imagesUsed] = detectCheckerboardPoints(imageFileNames);
Assuming that you are talking about stereo-cameras, for stereo pairs, imagePoints(:,:,:,1) are the points from the first set of images, and imagePoints(:,:,:,2) are the points from the second set of images. The output contains M number of [x y] coordinates. Each coordinate represents a point where square corners are detected on the checkerboard. The number of points the function returns depends on the value of boardSize, which indicates the number of squares detected. The function detects the points with sub-pixel accuracy.
As you can see in the following image the points are estimated relative to the first point that covers your third step.
[The image is from this page at MATHWORKS.]
You can consider point 1 as the origin of your coordinate system (0,0). The directions of the axes are shown on the image and you know the distance between each point (in the world coordinate), so it is just the matter of depth estimation.
To find a transformation matrix between the points in the world CS and the points in the camera CS, you should collect a set of points and perform an SVD to estimate the transformation matrix.
But,
I would estimate the parameters of the camera and compare them with the initial parameters at time X. This is easier, if you have saved the images that were used when calibrating the camera at time X. By repeating the calibrating process using those images you should get very similar results, if the camera calibration is still valid.
Edit: Why you need the set of images used in the calibration process at time X?
You have a set of images to do the calibrations for the first time, right? To recalibrate the camera you need to use a new set of images. But for checking the previous calibration, you can use the previous images. If the parameters of the camera are changes, there would be an error between the re-estimation and the first estimation. This can be used for evaluating the validity of the calibration not for recalibrating the camera.

Matlab - Concatenation of overlapping blocks with weighted average

I'm looking for a quick way to combine overlapping blocks into one image. Assume the size of the full image and the coordinates of each block within the full image are known. Also assume the blocks are regularly spaced both horizontally and vertically.
The catch - in the overlapping region, a pixel in the output image should get a value according to a weighted average of the corresponding pixels in the overlapping blocks. The weights should be proportional to the distance from the block center.
So, for example, take a pixel location p (relative to the full image coordinates) in the overlapping region between block B1 and B2. Assume the overlap region is due to a horizontal shift only of size h. If B1(p) and B2(p) are the values at that location as they appear in blocks B1,B2, and d1,d2 are the respective distances of p from the center of blocks B1 and B2 then in the output image O the location p will get O(p) = (h-d1)/h*B1(p) + (h-d2)/h*B2(p).
Note that generally, there can be up to 4 overlapping blocks in any region.
I'm looking for the best way to do this in Matlab. Hopefully, for any choice of distance function.
blockproc and alike can help splitting an image into blocks but allow for very basic combination of results. imfuse comes close to what I need, but offers simple non-weighted alpha blending only. bwdist seems to be useful, but I haven't figured what the most efficient method to put it to use is.

You should use the command im2col.
Once you have all your patches in vectors aligned in one matrix you'll be able to work on the columns (Filtering per patch) and rows (Filtering between patches).
It will be trickier than the classic usage of im2col but it should work.

eye position mapping with the screen pixel

I am currently doing a project called eye controlled cursor using MATLAB.
I have few stages before I extract out the center of the iris (which can be considered as a pupil location). face detetcion - > eye detection -- > iris detection -->And finally i have obtained the center of the iris as show in the figure.
Now, I am trying to map this position (X,Y) to my computer screen pixel (1366 x 768). In most of the journals I have found, they require a reference point such as lips, nose or eye corner. But I am only able to extract the center of iris by doing certain thresholding. How can i map this position (X,Y) to my computer screen pixel (1366 x 768)?

Well you either have to fix the head to a certain position (which isn't very practical) or you will have to adapt to the face position. Depending on your image, you will have to choose points that are always on that image and are easy to detect. If you just have one point (like the nose), you can only adjust for the x/y shift of your head. If you have more points (like the 4 corners of the eye, the nose, maybe the corners of the mouth), you can also extract the 3 rotational values of the head and therefore calculate the direction of sight much better. For a first approach, I guess only the two inner corners of the eye (they are "easy" to detect) will do.
I would also recommend using a calibration sequency. You present the user with a sequence of 4 red points in the corners of the screen and he has to look at them. You can then record the positions of the pupils and interpolate between them.

How to find the distance between the only two points in an image produced by a grating like substance?

i need to find the distance between the two points.I can find the distance between them manually by the pixel to cm converter in the image processing tool box. But i want a code which detects the point positions in the image and calculate the distance.
More accurately speaking the image contains only three points one mid and the other two approximately distanced equally from it...

There might be a better way then this, but I hacked something similar together last night.
Use bwboundaries to find the objects in the image (the contiguous regions in a black/white image).
The second returned matrix, L, is the same image but with the regions numbered. So for the first point, you want to isolate all the pixels related to it,
L2 = (L==1)
Now find the center of that region (for object 1).
x1 = (1:size(L2,2))*sum(L2,1)'/size(L2,2);
y1 = (1:size(L2,1))*sum(L2,2)/size(L2,1);
Repeat that for all the regions in your image. You should have the center of mass of each point. I think that should do it for you, but I haven't tested it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse