EDIT:
What I have: camera intrinsics, extrinsic from calibration, 2D image and depth map
What I need: 2D virtual view image
I am trying to generate a novel view(right view) for Depth Image Based Rendering. Reason for this is that only the left image and depth map are available at the receiver which has to reconstruct the right view (see image).
I want to know if these steps will give me the desired result or what I should be doing instead,
First, by using Camera Calibration toolbox for MATLAB by CalTech, the intrinsic, extrinsic matrices can be obtained.
Then, they can be mapped to 3D world points using calibration parameters by this method "http://nicolas.burrus.name/index.php/Research/KinectCalibration#tocLink2"
Now, I want to back project this to a new image plane(right view). The right view is simply a translation of left and no rotation because of the setup.
How do I do this reconstruction?
Also, Can I estimate R and T from MATLAB stereo calibration tool and transform every point in original left view to right view using P2 = R*P1+T,
P1 and P2 are image points of 3D world point P in the respective planes.
Any idea and help are highly appreciated, would rephrase/add details if the question is not clear.
(Theoretic answer*)
You have to define what R and T means. If I understand, is the Roto-translation of your (main) left camera. If you can map a point P (like your P1 or P2) in 3D space, the correspondance with a point m (I not call it p to avoid confusion) in your left camera is (unless you use a different convention (pseudocode)
m = K[R|t]*P
which
P1 = (X,Y,Z,1)
m = (u',v',w)
but you want 2D coordinates so the coordinates in your left camera are:
u = u'/w
v = v'/w
if you already roto-translated P1 into P2 (not very useful) is equal to (pseudocode)
1 0 0 0
m = K[I|0]*P = K*[0 1 0 0] * P2
0 0 1 0
Assume this is the theoretical relationship with a 3D point P with his 2D point in an image m, you may think to have you right camera in a different position. If there is only translation with respect to left camera, the right camera is translated of T2 with respect to the left camera and roto-translated of R/T+T2 with respect to the centre of the world.
So the m' proiected point in your right camera should be (assuming that the cameras are equal means the have the same intrinsics K)
m' = K[R|T+T2]*P = K[I|T2]*P2
I is the identity matrix.
If you want to transform m directly into m' withot using 3D points you have to implement epipolar geometry.
If cameras are different with different K, if the calibration of R and T has not the same standard of calibration of K, this equation may not work.
If calibration is not well-done, it could work but with errors.
Related
I captured a depth image of a human body in the room and I collected and saved Skeletal data related to that depth image (the joints of wrists, elbows, ...).
Considering the joints' coordinates are in the camera space and depth image is in depth space, I was able to show the location of the right hand wrist joint on depth image using this code:
depthJointIndices = metadata.DepthJointIndices(:, :, trackedBodies);
plot(depthJointIndices(11,1), depthJointIndices(11,2), '*');
Now I want to know which pixel EXACTLY contains the right hand wrist joint, how can I do this properly?
I thought that I can get the coordinate of x,y of that joint using the code I used to show the right hand wrist joint.
As follows:
depthJointIndices = metadata.DepthJointIndices(:, :, trackedBodies);
x=depthJointIndices(11, 1)
y=depthJointIndices(11, 2)
But x,y are calculated as follows:
x = 303.5220
y = 185.6131
As you can see x,y are Floating-point numbers, but coordinates of pixels can't be Floating-point numbers.
So can anyone help me with this problem? how can I get coordinate of a pixel that is containing right hand wrist joint, in depth image, using kinect?
You can use the following 2 equations to derive the coordinates.
Here,
(U,V) and Z denote screen coordinates and depth value, respectively, Cx and Cy denote the center of a depth map, and fx and fy are the focal lengths of the camera. For Kinect-V1 cameras, fx = fy = 580
You can refer the paper Action Recognition From Depth Maps Using Deep Convolutional Neural Networks by P. Wang et.al for more information.
I have a calibrated camera whose intrinsics were calculated prior to doing an initial two view reconstruction. Suppose I have 20 images around a static, rigid body all taken with the same camera. Using the first two views and a ground-truth measurement of the scene, I have the
1) initial reconstruction using Stewenius 5 point algorithm to find E (essential matrix).
2) camera matrices P1 and P2 where the origin is set to that of camera P1.
My question is, how would I add more views? For the first two views, I found the feature points by hand since I found that MATLAB feature-detectors and matchers were outputting false correspondences.
Do I continuously do two-view reconstructions to get the other camera extrinsics i.e. P1 and P3, P1 and P4...P1 and P20; all using the same feature points as that of P1-P2? Wouldn't there be some sort of error propagation with this approach? The reason for using P1 as a reference is because it is chosen to be at the world origin.
I do have a procedure to bundle adjust after I acquire all initial estimates for the camera extrinics, but my problem is getting the initial camera matrices P3...P20.
Thanks in advance!
You start by obtaining pairwise calibration P1-P2, P2-P3, P3-P4 ... using feature points of the corresponding pair. You need using some sort of RANSAC to get rid of false correspondences here or do matching manually between all pairs. The you need to put all cameras to the common coordinate frame. Say we select P1 as a key camera. To add third camera P3 to the pair P1-P2 you need to calculate rotation delta between P2 and P3 from pairwise calibration, Delta2-3 and then apply it to known camera matrix of P2. And so on until all camera matrices in the common coordinate frame. The you do bundle adjustment.
I have given 3d points of a scene or a subset of these points comprising one object of the scene. I would like to create a depth image from these points, that is the pixel value in the image encodes the distance of the corresponding 3d point to the camera.
I have found the following similar question
http://www.mathworks.in/matlabcentral/newsreader/view_thread/319097
however the answers there do not help me, since I want to use MATLAB. To get the image values is not difficult (e.g. simply compute the distance of each 3d point to the camera's origin), however I do not know how to figure out the corresponding locations in the 2d image.
I could only imagine that you project all 3d points on a plane and bin their positions on the plane in discrete, well, rectangles on the plane. Then you could average the depth value for each bin.
I could however imagine that the result of such a procedure would be a very pixelated image, not being very smooth.
How would you go about this problem?
Assuming you've corrected for camera tilt (a simple matrix multiplication if you know the angle), you can probably just follow this example
X = data(:,1);
Y = data(:,1);
Z = data(:,1);
%// This bit requires you to make some choices like the start X and Z, end X and Z and resolution (X and Z) of your desired depth map
[Xi, Zi] = meshgrid(X_start:X_res:X_end, Z_start:Z_res:Z_end);
depth_map = griddata(X,Z,Y,Xi,Zi)
I am trying to artificially manipulate a 2D image using a rigid 3D transformation (T). Specifically, I have an image and I want to transform using T it to determine the image if captured from a different location.
Here's what I have so far:
The problem reduces to determining the plane-induced homography (Hartley and Zisserman Chapter 13) - without camera calibration matrices this is H = R-t*n'/d.
I am unsure, however, how to define n and d. I know that they help to define the world plane, but I'm not sure how to define them in relation to the first image plane (e.g. the camera plane of the original image).
Please advise! Thanks! K
Not sure what you mean by "first image plane": the camera's?
The vector n and the scalar d define the equation of the plane defining the homography:
n X + d = 0, or, in coordinates, n_x * x + n_y * y + n_z * z + d = 0, for every point X = (x, y, z) belonging to the plane.
There are various ways to estimate the homography. For example, you can map a quadrangle on the plane to a rectangle of known aspect ratio.
Or you can estimate the locations of vanishing points (this comes handy when you have, say, an image of a skyscraper, with nice rows of windows). In this case, if p and q are the homogeneous coordinates of the vanishing points of two orthogonal lines on a plane in the scene, then normal to the plane in camera coordinates is simply given by (p X q) / (|p| |q|)
I'm trying to obtain the 3D metric reconstruction of the points I have in two different views of my scene by means of a pair of iPhone 4S (a rudimental stereo system).
To do so, I did calibrate the cameras, estimate the fundamental matrix and obtained an estimate of the essential matrix. Now, in Hartley&Zisserman "Multiple View Geometry in CV" book, I see that to any given E, they correspond 4 canonical cameras pairs, of which only one reconstructs as the "actual" stereo configuration.
The problem is that they say [cit.]"...the reconstructed point will be in front of both cameras in one of these four solutions only. Thus, testing with a single point to determine if it is in front of both cameras is sufficient to decide between the four different solutions for the camera matrix P'. ..."
Given that I know F, K_left and K_right, how does one establish whether a 3D point is in front of both cameras or not?
Thanks,
Riccardo
You can get the camera rotation and translation from essential matrix. So, you have the camera matrix P = [R, -R*t]. To test if point X = (x, y, z, 1) is in front of the camera, compute X' = P*X. X' = (x', y', z') will be position of point if camera is in the origin, looking toward z direction. So, if z' is positive, then X' is in front of the camera and X is in front of camera P. Else X is behind P.