Depth to world registration hololens2 unity - unity3d

I'm working on a program on hololens2 research mode on unity. Hololens give us a depth image that is distance from depth sensor to object in front, for every pixel.
What I do is for every pixel I project pixel to image plane, then backproject it according to depth distance captured by depth sensor and it gives me the xyz in depth sensor coordinate frame. now it is needed to transform this coordinate to global coordinate system. to do so I get camera coordinate from unity by cam_pose = Camera.main.transform and in the other hand saved depth sensor extrinsic matrix.
From these two matrices I create a depth_to_world = cam_pose # inv(extrinsic). Now for every xyz on depth I perform global_xyz = depth_to_world # xyz to get point in real world. Problem is it return a point with 10-15 cm error. What am I doing wrong? (code is in python)
x = self.us[Depth_i, Depth_j] # projection from pixels to image plane
y = self.vs[Depth_i, Depth_j] # projection from pixels to image plane
D = distance_img[Depth_i, Depth_j] #distance_img is depth image
distance = 1000*float(D) / np.sqrt(x * x + y * y + 1) #distance according to spherical image plane D is in millimeter
depth_to_world = cam_pose # np.linalg.inv(Constants.camera_extrinsic)
X = (np.array([x * distance, y * distance, 1.0 * distance, 1])).reshape(4, 1)
point = (depth_to_world # X )[0:3, 0]

I got it! according to (https://github.com/petergu684/HoloLens2-ResearchMode-Unity) first I passed unity world origin to a winrt plugin, and depth_to_world was depth_to_world = inv(extrinsic) * cam_pose witch cam_pose is given by TryLocateAtTimeStamp. And other point is that unity coordinate is left handed (surprisingly!) so we should multiply a (-1) to z. (z <- -z)
my depth_to_world transformation was near but not correct.

Related

Is there a way to convert earth location geocentric to tangent coordinates at some particular location?

I'm using astropy for some simulations. I have a set of EarthLocations that I want to convert to an XYZ coordinate system pointing to some sky coordinate. It's a very simple coordinate rotation.
I have some code that is close to working (See below). xyz is the geocentric coordinates derived from EarthLocation.geocentric. ha is ra - local sidereal time, and dec is the declination of the source. ra and dec come from a SkyCoord. I'd prefer to do this with EarthLocations and SkyCoords. The output coordinates should be tangent to the earth and have one axis pointing towards a SkyCoord.
def xyz_to_uvw(xyz, ha, dec):
"""
Rotate :math:`(x,y,z)` positions in earth coordinates to
:math:`(u,v,w)` coordinates relative to astronomical source
position :math:`(ha, dec)`. Can be used for both antenna positions
as well as for baselines.
Hour angle and declination can be given as single values or arrays
of the same length. Angles can be given as radians or astropy
quantities with a valid conversion.
:param xyz: :math:`(x,y,z)` co-ordinates of antennas in array
:param ha: hour angle of phase tracking centre (:math:`ha = ra - lst`)
:param dec: declination of phase tracking centre.
"""
x, y, z = numpy.hsplit(xyz, 3)
# Two rotations:
# 1. by 'ha' along the z axis
# 2. by '90-dec' along the u axis
u = x * numpy.cos(ha) - y * numpy.sin(ha)
v0 = x * numpy.sin(ha) + y * numpy.cos(ha)
w = z * numpy.sin(dec) - v0 * numpy.cos(dec)
v = z * numpy.cos(dec) + v0 * numpy.sin(dec)
return numpy.hstack([v0, v, w])
I think that the output should be a set of EarthLocations referred to a particular frame.
I've studied the documentation but cannot see how to do this.

LiDAR to camera image fusion

I want to fuse LiDAR {X,Y,Z,1} points on camera image {u,v} for which we have LiDAR points, camera matrix (K), distortion coefficient (D), position of camera and LiDAR (x,y,z), rotation of camera and LiDAR (w+xi+yj+zk). There are three coordinates system involved. Vehicle axle coordinate system(X:forward, Y:Left, Z: upward), LiDAR coordinate (X:Right, Y:Forward, Z: Up) and camera coordinate system (X: Right, Y:down, Z: Forward). I tried the below approach but the points are not fusing properly. All points are wrongly plotted.
Coordinate system:
For given Rotation and Position of camera and LiDAR we compute the translation using below equation.
t_lidar = R_lidar * Position_lidar^T
t_camera = R_camera *Position_camera^T
Then relative rotation and translation is computed as flows
R_relative = R_camera^T * R_lidar
t_relative = t_lidar - t_camera
Then the final Transformation Matrix and point transformation between LiDAR Points [X,Y,Z,1] and image frame [u,v,1] is given by:
T = [ R_relative | t_relative ]
[u,v,1]^T = K * T * [X,Y,Z,1]^T
Is there anything which I am missing?
Use opencv projectpoint directly
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#projectpoints
C++: void projectPoints(InputArray objectPoints, InputArray rvec, InputArray tvec, InputArray cameraMatrix, InputArray distCoeffs, OutputArray imagePoints, OutputArray jacobian=noArray(), double aspectRatio=0 )
objectPoints – Array of object points, 3xN/Nx3 1-channel or 1xN/Nx1 3-channel (or vector ), where N is the number of points in the view.
rvec – Rotation vector. See Rodrigues() for details.
tvec – Translation vector.
cameraMatrix – Camera matrix

Capture the floor texture with ARCore

I'm trying to capture the "floor texture" based on a ARCore detected plane and the environment (camera) texture. Then reapply this floor texture in a plane mesh, creating a digital floor based on reality.
I've uploaded an image to illustrate this:
This is not a ARCore specific question, I think it can be resolved with math and graphics programming, maybe something like unprojecting the plane based on the camera matrix, but I do not know exactly how to do that.
Can someone help me?
Thanks!!
Essentially, we have three important coordinate systems in this problem:
We have the usual 3D world coordinate system that can be defined arbitrarily. We have a 2D coordinate system for the plane. The origin of that coordinate system is in the plane's center (note that I use the term plane synonymous for rectangle for this purpose) and the coordinates range from -1 to +1. And finally, we have a 2D coordinate for the image. Actually, we have two for the image: An unsigned coordinate system (as shown in the figure) with the origin in the bottom left and coordinates ranging from 0 to 1, and a signed one with coordinates ranging from -1 to 1.
We know the four corners of our plane in world space and the 3x4 view/projection matrix P that allows us to project any point in world space to image space using homogeneous coordinates:
p_image,signed = P * p_world
If your projection matrix is 4x4, simply drop the third row (last-but 1) as we are not interested in image-space depth.
We do not really care about world space as this is somewhat arbitrary. Given a 2D point in plane space, we can transform it to world space using:
p_world = 1/4 (p0 + p1 + p2 + p3) + u * 1/2 * (p1 - p0) + v * 1/2 * (p3 - p0)
The first part is the plane's origin and the point differences in the second and third terms are the coordinate axes. We can represent this in matrix form as
/ x \ / p1_x - p0_x p2_x - p0_x 1/4 (p0_x + p1_x + p2_x + p3_x) \ / u \
| y | = | p1_y - p0_y p2_x - p0_x 1/4 (p0_y + p1_y + p2_y + p3_y) | * | v |
| z | | p1_z - p0_z p2_x - p0_x 1/4 (p0_z + p1_z + p2_z + p3_z) | \ 1 /
\ 1 / \ 0 0 1 /
Let us call this matrix M.
Now, we can go directly from plane space to image space with:
p_image,signed = P * M * p_plane
The matrix P * M is now a 3x3 matrix. It is the homography between your ground plane and the image plane.
So, what can we do with it? We can use it to draw the image in plane space. So, here is what we are going to do:
We will generate a render target that we will fill with a single draw call. This render target will then contain the texture of our plane. To draw this, we:
Upload the camera image to the GPU as a texture
Bind the render target
Draw a full-screen quad with corners (-1, -1), (1, -1), (1, 1), (-1, 1)
In the vertex shader, calculate texture coordinates in image space from plane space
In the pixel shader, sample the camera image at the interpolated texture coordinates
The interesting part is number 4. We almost know what we need to do. We already know how to go to signed image space. Now, we just need to go to unsigned image space. And this is a simple shift and scale:
/ 1/2 0 1/2 \
p_image,unsigned = | 0 1/2 1/2 | * p_image,signed
\ 0 0 1 /
If we call this matrix S, we can then calculate S * P * M to get a single 3x3 matrix T. This matrix can be used in the vertex shader to calculate the texture coordinates from the plane-space points that you pass in:
texCoords = p_image,unsigned = T * p_plane
It is important that you pass the entire 3D vector to the fragment shader and do the perspective divide only in the pixel shader to produce a correct perspective.

Triangulate set of points on arbitrary plane in 3D space

I have a set of points in 3D space. With maximum error of 10^-5 i can place a plane through them (error is the distance from point to plane).
Is there a way to triangulate these points on this arbitrary plane? I have tried Bowyer-Watson but this only works when the error is 0. Anything else and it wont triangulate or i wont get a good triangulation(overlapping triangles).
Edit
I think i found the problem. At certain angles the bowyer watson algorithm wont work because my calculation of the circumcenter is off. How can i calculate the circumcenter of a triangle in 3D?
Since i know the points on the plane i can calculate a vector. This vector lies on the plane. Next i calculate the center of mass of the points.
Using the vector and center of mass i can create a large triangle on the plane
Vertex p1 = new Vertex(dir * 3000 + center);
Vertex p2 = new Vertex(Quaternion.AngleAxis(120, plane.normal) * dir * 3000 + center);
Vertex p3 = new Vertex(Quaternion.AngleAxis(240, plane.normal) * dir * 3000 + center);
Now that i have the enclosing triangle i can just use Bowyer-Watson.
For circumcenter in 3D i use:
Vector3 ac = p3 - p1;
Vector3 ab = p2 - p1;
Vector3 abXac = Vector3.Cross(ab, ac);
circumceter = p1 + (Vector3.Cross(abXac, ab) * ac.sqrMagnitude + Vector3.Cross(ac, abXac) * ab.sqrMagnitude) / (2 * abXac.sqrMagnitude);
And now i have a triangulated set of points on an arbitrary plane in 3D.

Estimating distance to a point using camera calibration

I want to estimate distance (camera to a point in the ground : that means Yw=0) from a given pixel coordinate of that point . For that I used camera calibration methods
But the results are not meaningful.
I have following details to calibration
-focal length x and y , principal point x and y, effective pixel size in meters , yaw and pitch angles and camera heights etc.
-I have entered focal length ,principal points and translation vector in terms of pixels for calculation
-I have multiplied image point with camera_matrix and then rotational| translation matrix (R|t), to get the world point.
Is my procedure correct?? What can be wrong ?
result
image_point(x,y) =400,380
world_point z co ordinate(distance) = 12.53
image_point(x,y) =400,180
world_point z co ordinate(distance) = 5.93
problem
I am getting very few pixels for z coordinate ,
That means z co ordinate is << 1 m , (because effective pixel size in meters = 10 ^-5 )
This is my matlab code
%positive downward pitch
xR = 0.033;
yR = 0;
zR = pi;
%effective pixel size in meters = 10 ^-5 ; focal_length x & y = 0.012 m
% principal point x & y = 320 and 240
intrinsic_params =[1200,0,320;0,1200,240;0,0,1];
Rx=[1,0,0 ; 0,cos(xR),sin(xR); 0,-sin(xR),cos(xR)];
Ry=[cos(yR),0,-sin(yR) ; 0,1,0 ; sin(yR),0,cos(yR)];
Rz=[cos(zR),sin(zR),0 ; -sin(zR),cos(zR),0 ; 0,0,1];
R= Rx * Ry * Rz ;
% The camera is 1.17m above the ground
t=[0;117000;0];
extrinsic_params = horzcat(R,t);
% extrinsic_params is 3 *4 matrix
P = intrinsic_params * extrinsic_params; % P 3*4 matrix
% make it square ....
P_sq = [P; 0,0,0,1];
%image size is 640 x 480
%An arbitrary pixel 360,440 is entered as input
image_point = [400,380,0,1];
% world point will be in the form X Y Z 1
world_point = P_sq * image_point'
Your procedure is kind of right, however it is going in the wrong direction.
See this link. Using your intrinsic and extrinsic calibration matrix you can find the pixel-space position of a real-world vector, NOT the other way around. The exception to this is if your camera is stationary in the global frame and you have the Z position of the feature in the global space.
Stationary camera, known feature Z case: (see also this link)
%% First we simulate a camera feature measurement
K = [0.5 0 320;
0 0.5 240;
0 0 1]; % Example intrinsics
R = rotx(0)*roty(0)*rotz(pi/4); % orientation of camera in global frame
c = [1; 1; 1]; %Pos camera in global frame
rwPt = [ 10; 10; 5]; %position of a feature in global frame
imPtH = K*R*(rwPt - c); %Homogeneous image point
imPt = imPtH(1:2)/imPtH(3) %Actual image point
%% Now we use the simulated image point imPt and the knowledge of the
% features Z coordinate to determine the features X and Y coordinates
%% First determine the scaling term lambda
imPtH2 = [imPt; 1];
z = R.' * inv(K) * imPtH2;
lambda = (rwPt(3)-c(3))/z(3);
%% Now the RW position of the feature is:
rwPt2 = c + lambda*R.' * inv(K) * imPtH2 % Reconstructed RW point
Non-stationary camera case:
To find the real-world position or distance from the camera to a particular feature (given on the image plane) you have to employ some method of reconstructing the 3D data from the 2D image.
The two that come to mind immediately is opencv's solvePnP and stereo-vision depth estimation.
solvePnP requires 4 co-planar (in RW space) features to be available in the image, and the positions of the features in RW space known. This may not sound useful as you need to know the RW position of the features, but you can simply define the 4 features with a known offset rather than a position in the global frame - the result will be the relative position of the camera in the frame the features are defined in. solvePnP gives very accurate pose estimation of the camera. See my example.
Stero vision depth estimation requires the same feature to be found in two spatially-separate images and the transformation between the images in RW space must be known very precisely.
There may be other methods but these are the two I am familiar with.