Google VR Lens Correction Uses Depth? - unity3d

I have been looking at the lens correction shader code from the Google GVR SDK for Unity and have been scratching my head as to the use of the z component of the view space position (UNITY_MATRIX_MV, without the perspective transform of UNITY_MATRIX_MVP) in the undistort() functions (this one of the simpler variants):
float r2 = clamp(dot(pos.xy, pos.xy) / (pos.z*pos.z), 0, _MaxRadSq);
pos.xy *= 1 + (_Undistortion.x + _Undistortion.y*r2)*r2;
Given my understand that we want to warp the rendered image in 2d screenspace to counteract distortion that will be applied by lens the screen is viewed through, what on earth are we doing dividing our radius(?) by the linear depth (pos.z) squared? I can conceive that this is in lieu of dividing by w for perspective, but then why would we want to divide by the square of the z component (how would that ever be more correct than simply dividing by z or w) ?

Felt a bit silly in hind sight, as this is just the result of a regular optimisation.
The division is regular perspective division (but leaving the z coord used for depth buffer/culling as linear, and presumably w should thus be 1.0 to ensure proper depth interpolation). Reorganising the computation presumably was found to save shader cycles and/or accuracy.
This code is equivalent to foreshortening pos.xy by dividing it by pos.z first, then doing taking the dot product of pos.xy with itself to get its length squared in 2D screenspace (and then clamping it, etc.)

Related

I need help understanding the GGX normal distribution function

After reading the book "Real Time Rendering 4. Edition", I've decided to give online pbr a try and i chose the GGX algorithm for the normal distribution function. The equation shown in this book looks like in this image:
Now, h is the half vector created from the light and view directions L and V respectively.
X(nDotH)+ is 1 if nDotH is greater than 0, else 0.
alpha-g is the GGX roughness value between 0 and 1.
My question now is the following: As far as I understood the concept of NDF and roughness, a high alpha value would mean that the (micro)surface is very rough, and a low value smooth. So, if I want to render a smooth metallic surface such as the body of a car, I would set my alpha to a low value such as 0,1. By doing so, the result of my D(h) is so low that the object cant even be seen.
Am I missing something or did I not fully understand the value of alpha?
I implemented the NDF in MATLAB to analyse my results. I tried it with the coordinates of a cube placed at origin without transformations.
Given 2 coordinates (world space):
N = [0.0 1.0 0.0; 0.0 0.0 1.0] P = [-1.0 1.0 1.0; -1.0 1.0 1.0] L-Direction = [0.0 1.0 1.0] C-Position = [0.0 3.0 4.0] alpha = 0.1
Results:
D(h) for N1 = 8.6212e-03 D(h) for N2 = 1.7998e-02
As you can see, the values are so low, they aren't visible, specially for the first coordinate whose normal vector point straight up.
The root problem is that simple point lights often don't suffice for full PBR rendering. Consider the following two renderings of a smooth metallic sphere:
This is the top-left sphere from a glTF sample model rendered in Babylon Sandbox.
On the left side, the sphere is placed in a dark environment against a gray background, and a single point light illuminates the scene. The light is quite bright, but because the sphere is so smooth, and because the "point" nature of the light gives it essentially no radius, the reflection of this light is barely a few pixels, regardless of how bright it may be. The remainder of the sphere has the low D(h) values you mentioned, and is almost black.
On the right side, the same sphere again in the same rendering engine, but this time the engine is using its default environment, which comes from an HDR image. In the case of smooth metal, the resulting render is mostly a mirror reflection of the environment, but rougher and non-metallic surfaces can also have their appearance greatly influenced by colors and intensities in the surrounding environment. With a good quality environment, there's often no need to add point lights at all, and indeed there are no point lights in the right image.
In general, PBR, and particularly metallic PBR, looks best with a full HDRI environment, not just point lights. For some sample code and shaders showing some of this math in action, the Khronos glTF Sample Viewer might be a good place to start. [Disclaimer, I'm a contributor.]
As far as I understood the concept of NDF and roughness, a high alpha
value would mean that the (micro)surface is very rough, and a low
value smooth. So, if I want to render a smooth metallic surface such
as the body of a car, I would set my alpha to a low value such as 0,1.
By doing so, the result of my D(h) is so low that the object cant even
be seen. Am I missing something or did I not fully understand the
value of alpha?
It's true that the numerator of the equation goes to zero.
But denominator too. And it does so more rapidly.
Taking, as an example, n = h -> dot(n, h) will be one. And if alpha is 0.1:
0.1^2 / (3.141593 * (1 + (0.1^2 - 1))^2)
If you plug that into your calculator you will get ~32.83.
So, as you can see, the whole equation doesn't go to zero.
Actually, if you calculate the limit of the equation as alpha goes to zero, the equation goes to infinity. Which makes sense, because when roughness is zero, all the normals are concentrated in a single direction.

ARKit project point with previous device position

I'm combining ARKit with a CNN to constantly update ARKit nodes when they drift. So:
Get estimate of node position with ARKit and place a virtual object in the world
Use CNN to get its estimated 2D location of the object
Update node position accordingly (to refine it's location in 3D space)
The problem is that #2 takes 0,3s or so. Therefore I can't use sceneView.unprojectPoint because the point will correspond to a 3D point from the device's world position from #1.
How do I calculate the 3D vector from my old location to the CNN's 2D point?
unprojectPoint is just a matrix-math convenience function similar to those found in many graphics-oriented libraries (like DirectX, old-style OpenGL, Three.js, etc). In SceneKit, it's provided as a method on the view, which means it operates using the model/view/projection matrices and viewport the view currently uses for rendering. However, if you know how that function works, you can implement it yourself.
An Unproject function typically does two things:
Convert viewport coordinates (pixels) to the clip-space coordinate system (-1.0 to 1.0 in all directions).
Reverse the projection transform (assuming some arbitrary Z value in clip space) and the view (camera) transform to get to 3D world-space coordinates.
Given that knowledge, we can build our own function. (Warning: untested.)
func unproject(screenPoint: float3, // see below for Z depth hint discussion
modelView: float4x4,
projection: float4x4,
viewport: CGRect) -> float3 {
// viewport to clip: subtract viewport origin, divide by size,
// scale/offset from 0...1 to -1...1 coordinate space
let clip = (screenPoint - float3(viewport.x, viewport.y, 1.0))
/ float3(viewport.width, viewport.height, 1.0)
* float3(2) - float3(1)
// apply the reverse of the model-view-projection transform
let inversePM = (projection * modelView).inverse
let result = inversePM * float4(clip.x, clip.y, clip.z, 1.0)
return float3(result.x, result.y, result.z) / result.w // perspective divide
}
Now, to use it... The modelView matrix you pass to this function is the inverse of ARCamera.transform, and you can also get projectionMatrix directly from ARCamera. So, if you're grabbing a 2D position at one point in time, grab the camera matrices then, too, so that you can work backward to 3D as of that time.
There's still the issue of that "Z depth hint" I mentioned: when the renderer projects 3D to 2D it loses information (one of those D's, actually). So you have to recover or guess that information when you convert back to 3D — the screenPoint you pass in to the above function is the x and y pixel coordinates, plus a depth value between 0 and 1. Zero is closer to the camera, 1 is farther away. How you make use of that sort of depends on how the rest of your algorithm is designed. (At the very least, you can unproject both Z=0 and Z=1, and you'll get the endpoints of line segment in 3D, with your original point somewhere along that line.)
Of course, whether this can actually be put together with your novel CNN-based approach is another question entirely. But at least you learned some useful 3D graphics math!

Camera: "Linear" zoom

I want to move a camera closer to an object in Unity, so that the zooming is linear. Take a look at the zooming in the Scene window in the Unity Editor. The camera isn't moving with a constant speed. The more zoomed in you are, the slower the camera moves. And I do want to move the camera, I don't want to change the FOV.
First off, yes do not use the FOV to do that, it would be a massive pain to control the movement. Being in low FOV, would mean you are focus on a tiny part and movement would be hard to control since one unit would be a large movement in small FOV. Also, a large FOV creates fish eye effect. So nope.
One way you could achieve this fast to slow effect is to use a distance to define the amount of movement
Let's consider you use the mouse wheel to zoom in and out, you listens to the value and create a movement based on that.
float movement = Input.GetAxis("Mouse ScrollWheel");
if(movement == 0)
{
return;
}
Transform camTr = Camera.main.transform;
float distance = Vector3.Distance(camTr.position, origin.position) + 1;
camTr.Translate(camTr.forward * distance * movement * Time.deltaTime * speed);
speed variable is meant to control how fast regardless of the distance. Distance gets a plus 1 so you can still move while real close to the origin.
The origin position needs to be defined, it could be based on a raycast going forward from the camera and hitting at y = 0. Just an idea. It could also be a focused object, that all depends on what you are doing.
The whole point here is that your camera will move faster if you are far from the focus point. Also, you may want to clamp the movement so that begin far away does not result in gigantic movement.
i propose you to use FOV (because player feel good)
1- use Transform.localPosition to change distance of camera to near or far - you should change z axis to changing distance of camera
2- use camera rendertotexture and in GUI change scale of it ( isn't good because need pro unity license and get allot of memory);
but can't know whats reason for you want forget changing FOV
One step of perspective projection of a point is the division by distance of that point. This means that the projected size of any object does not change linearly with its distance.
projectedSize = originalSize / distance
So if you want to approach an object in a way that its projected size grows linearly, you'll have to change your distance non-linearly.
We do this by also dividing the distance by some value f. The exact value of f depends on the stepSize and (I think) the FoV of your camera.
By using FixedUpdate() instead of Update() we make sure we have a constant stepSize which allows us to assume f as being constant as well.
void FixedUpdate()
{
newDistance = oldDistance / f;
}
For the definition of f you can use any value bigger than, but close to 1. For example f = 1.01f. This will approximate the desired behaviour quite well. If you want it to be perfectly linear, you'll have to dig deeper into the calculations behind perspective projection and calculate the exact value of f yourself: https://en.wikipedia.org/wiki/3D_projection#Perspective_projection

What repulsion direction do we use for Smoothed Particle Hydrodynamics when radius is 0?

When doing SPH, the paper by Kelagar recommends using a particular kernel for pressure induced forces between particles. The kernel it recommends is the following when the radius is within the kernel radius:
(15/(pi*h^9)) * (h - r)^3
where h is the kernel radius, and r is the radius we are interested in calculating the value of a function at.
The paper then states that the gradient of this function is
(-45/(pi*h^9))*((r_vec)/r)*(h-r)^2
where r_vec is now the vector from the center of the kernel to the point we are interested in. As length of r_vec goes to 0 from the positive direction, the paper states that this gradient approaches:
(-45/(pi*h^6))
But this is a scalar, not a vector. In order for there to be a repulsion between the two points we're interested in, there needs to be a direction to repel in.
What direction should we use for when two particles are right next to each other?
I assume that first expression is meant to be a potential. The negative gradient (derivative with respect to r) is then the force. This gradient is a vector, always pointing toward or away from the center. This appears correct for the second expression.
r_vec is, according to what you say, a vector pointing away from the origin to a point at some distance r away. (r_vec/r) is then a unit vector to specify direction. This works at every point except the origin itself, where it can be declared undefined, or declared to be zero. Zero is the average value of (r_vec/r) over all "nearby" points. This means zero force.
Normally in particle simulations with pair-wise forces, we ignore forces of a particle on itself, and of two particle at the same exact position. What about two particle very close, and you have a force law that goes like 1/r, 1/(r^2), or similar? Nobody wants a divide by zero fault. Usually there's a small radius below which the potential is constant matching the given potential formula at the boundary of that radius. Particles too close together have zero force, just so that the simulation won't crash. It may seem unphysical for the force to suddenly cease just inside that boundary when it is fiercely strong just outside it. But we strive to avoid such situations. Keep count of such incidences, and if there are too many, the simulation has gone bad. Maybe a smaller time step is needed.
Luckily you don't have a 1/r type of force, but still you have that nasty r_vec/r whose direction can swing wildly. The same technique of making force zero below a certain tiny radius will help.
But that third expression bothers me. If it's force at r=0, then starting with the force law in the second expression, I'm not sure how the third expression comes about. The problem of it looking scalar while, if it is supposed to be force, expecting vector could be resolved by understanding that it is meant to be the radial component of a force vector. Just multiply the expression by (r_vec/r), the familiar unit-magnitude vector. OTOH, it has no defined direction, so it is nonsense.
Better overall solution: start with a new potential function, one that smoothly levels off and is flat right at r=0, like exp(-r^2) or 1/(1+r^2). The given potential peaks sharply. You want something more like Instead of declaring force zero inside some small zone, the force would just naturally be zero at r=0. Find a flat-at-origin potential that approximates the given one well outside some small radius.

3D trajectory reconstruction from video (taken by a single camera)

I am currently trying to reconstruct a 3D trajectory of a falling object like a ball or a rock out of a sequence of images taken from an iPhone video.
Where should I start looking? I know I have to calibrate the camera (I think I'll use the matlab calibration toolbox by Jean-Yves Bouguet) and then find the vanishing point from the same sequence, but then I'm really stuck.
read this: http://www.cs.auckland.ac.nz/courses/compsci773s1c/lectures/773-GG/lectA-773.htm
it explains 3d reconstruction using two cameras. Now for a simple summary, look at the figure from that site:
You only know pr/pl, the image points. By tracing a line from their respective focal points Or/Ol you get two lines (Pr/Pl) that both contain the point P. Because you know the 2 cameras origin and orientation, you can construct 3d equations for these lines. Their intersection is thus the 3d point, voila, it's that simple.
But when you discard one camera (let's say the left one), you only know for sure the line Pr. What's missing is depth. Luckily you know the radius of your ball, this extra information can give you the missing depth information. see next figure (don't mind my paint skills):
Now you know the depth using the intercept theorem
I see one last issue: the shape of ball changes when projected under an angle (ie not perpendicular on your capture plane). However you do know the angle, so compensation is possible, but I leave that up to you :p
edit: #ripkars' comment (comment box was too small)
1) ok
2) aha, the correspondence problem :D Typically solved by correlation analysis or matching features (mostly matching followed by tracking in a video). (other methods exist too)
I haven't used the image/vision toolbox myself, but there should definitely be some things to help you on the way.
3) = calibration of your cameras. Normally you should only do this once, when installing the cameras (and every other time you change their relative pose)
4) yes, just put the Longuet-Higgins equation to work, ie: solve
P = C1 + mu1*R1*K1^(-1)*p1
P = C2 + mu2*R2*K2^(-1)*p2
with
P = 3D point to find
C = camera center (vector)
R = rotation matrix expressing the orientation of the first camera in the world frame.
K = calibration matrix of the camera (containing internal parameters of the camera, not to be confused with the external parameters contained by R and C)
p1 and p2 = the image points
mu = parameter expressing the position of P on the projection line from camera center C to P (if i'm correct R*K^-1*p expresses a line equation/vector pointing from C to P)
these are 6 equations containing 5 unknowns: mu1, mu2 and P
edit: #ripkars' comment (comment box too small once again)
The only computer vison library that pops up in my mind is OpenCV (http://opencv.willowgarage.com/wiki ). But that's a C library, not matlab... I guess google is your friend ;)
About the calibration: yes, if those two images contain enough information to match some features. If you change the relative pose of the cameras, you'll have to recalibrate of course.
The choice of the world frame is arbitrary; it only becomes important when you want to analyze the retrieved 3d data afterwards: for example you could align one of the world planes with the plane of motion -> simplified motion equation if you want to fit one.
This world frame is just a reference frame, changeable with a 'change of reference frame transformation' (translation and/or rotation transformation)
Unless you have a stereo camera, you will never be able to know the position for sure, even with calibrated camera. Because you don't know whether the ball is small and close or large and far away.
There are other methods with single camera, based on series of images with different focus. But I doubt that you can control the camera of your cell phone in that way.
Edit(1):
as #GuntherStruyf points out correctly, you can know the position if one of your inputs is the size of the ball.