I am learning programmable rendering pipeline by implementing a tiny software renderer. I try to implement it in a 'hardware' style. However, I am not familiar with the GPU pipeline and got some problems of homogeneous clipping.
According to this thread, suppose we have two points e0, e1 in 3D eye coordinate, which are projected to h0(-70, -70, 118, 120), h1(-32, -99, -13, -11) in the 4D homogeneous clipping space. Then we do interpolation in 4D homogeneous space, the segment h0-h1 is clipped by plane w = -x (z = -1 in NDC) at 4D point h(t)=t*h1+(1-t)*h2, with t = 0.99. Without loss of generality, suppose we have h0-h(0.99) part (which is viewable) feeding to the rasterization stage. So we need to generate the corresponding vertex properties of h(0.99) (same as the output format of vertex shader). My question is how to generate these new vertices' properties?
Update: I try to use t as the interpolate variable to get the vertex properties of h(t), and get reasonable result. I am wondering why the t from 4D space can get good interpolation result in 3D vertex properties?
I am wondering why the t from 4D space can get good interpolation result in 3D vertex properties?
Because that's how math works. Or more to the point, that's how linear math works.
Without getting too deep into the mathematics, a linear transformation is a transformation between two spaces that preserves the linear nature of the original space. For example, two lines that are parallel to one another will remain parallel after a linear transformation. If you perform a 2x scale in the Y direction, the new lines will be longer and farther from the origin. But they will still be parallel.
Let's say we have a line AB, and you define the point C which is the midpoint between A and B. If you perform the same linear transformation on A, B, and C, the new point C1 will still be on the line A1B1. Not only that, C1 will still be the midpoint of the new line.
We can even generalize this. C could be any point which fits the following equation: C = (B-A)t + A, for any t. A linear transformation of A, B and C will not affect change t in this equation.
In fact, that is what a linear transformation really means: it's a transformation that preserves t in that equation, for all points A, B, and C in the original space.
The fact that you have 4 dimensions in your space is ultimately irrelevant to the vector equation above. Linear transformations in any space will preserve t. A matrix transform represents a linear transformation from one space to another (usually).
Also, your original 3D positions were really 4D positions, with the W assumed to be 1.0.
Do be aware however that the transformation from clip-space (4D homogeneous) to normalized-device-coordinate space (3D non-homogeneous) is non-linear. The division-by-W is not a linear transformation. That's one reason why you do clipping in 4D homogeneous clip-space, where we still preserve a linear relationship between the original positions and clip-space.
This is also why perspective-correct interpolation of per-vertex outputs is important: because the space you're doing your rasterization in (window space) is not a linear transformation of the original space output by the vertex shader (clip space). This means that t is not properly preserved. When interpolating, you usually need to compensate for that in order to maintain the linear relationships of your per-vertex values.
Related
I have 2 sets of DICOM image data for 1 subject, consisting of a PET scan and CT scan which were taken at the same time. The Frame of Reference UIDs are different, which I think means that their reference origins are different. So that the 'Image Position Patient' tag can't be compared.
What I want to do is resample both images such that their spatial dimensions are equal and their pixel dimensions are equal. The task seems relatively straightforward, but for the fact that their origins are different.
Download link for data
For any two images A and B deemed to represent the same object, registration is the act of identifying for each pixel / landmark in A the equivalent pixel / landmark in B.
Assuming each pixel in both A and B can be embedded in a coordinate system, registration usually entails transforming A such that after the transformation, the coordinates of each pixel in A coincide with those of the equivalent pixel in B (i.e. the objective is for the two objects overlap in that coordinate space)
An isometric transformation is one where the distance between any two pixels in A, and the distance between the equivalent two pixels in B does not change after the transformation has been applied. For instance, rotation in space, reflection (i.e. mirror image), and translation (i.e. shifting the object in a particular direction) are all isometric transformations. A registration algorithm applying only isometric transformations is said to be rigid.
An affine transformation is similar to an isometric one, except scaling may also be involved (i.e. the object can also grow or shrink in size).
In medical imaging If A and B were obtained at different times, it is highly unlikely that the transformation is a simple affine or isometric one. For instance, say during scan A the patient had their arms down by their side, and in scan B the patient had their arms over their head. There is no rigid registration of A that would result in perfect overlap with B, since distances between equivalent points have changed (e.g. the distance between head-to-hand, and hand-to-foot in each case). Therefore more elaborate non-rigid registration algorithms would need to be used.
The fact that in your case A and B were obtained during the same scanning session in the same machine means that it's a reasonable assumption that the transformation will be a simple affine one. I.e. you will probably only need to rotate and translate the object a bit; if the coordinate system of A is 'denser' than B, you might also need to grow / shrink it a bit. But that's it, no weird 'warping' will be necessary to compensate for 'movement' occurring between scans A and B being obtained, since they happened at the same time.
A 3D vector, denoting a 'magnitude and direction' in 3D space can be transformed to another 3D vector using a 3x3 transformation matrix T. For example, if you apply transformation to vector (using matrix multiplication), the resulting vector u is . In other words, the 'new' x-coordinate depends on the old x, y, and z coordinates in a manner specified by the transformation matrix, and similarly for the new y and new z coordinates.
If you apply a 3x3 transformation T to three vectors at the same time, you'll get three transformed vectors out. e.g. for v = [v1, v2, v3] where v1 = [1; 2; 3], v2 = [2; 3; 4], v3 = [3; 4; 5], then T*v will
give you a 3x3 matrix u, where each column corresponds to a
transformed vector of x,y,z coordinates.
Now, consider the transformation matrix T is unknown and we want to discover it. Say we have a known point and we know that after the transformation it becomes a known point . We have:
Consider the top row; even if you know p and p', it should be clear that you cannot determine a, b, and c from a single point. You have three unknowns and only one equation. Therefore to solve for a, b, and c, you need at least a system of three equations. The same applies for the other two rows. Therefore, to find the transformation matrix T you need three known points (before and after transformation).
In matlab, you can solve such a system of equations where T*v = u, by typing T = u/v. For a 3x3 transformation matrix T, u and v need to contain at least 3 vectors, but they can contain more (i.e. the system of equations is overrepresented). The more vectors you pass in, the more accurate the transformation matrix from a numerical point of view. But in theory you only need three.
If your transformation also involves a translation element, then you need to do the trick described in the image you posted. I.e. you represent a 3D vector [x,y,z] as a homogeneous-coordinates vector [x,y,z,1]. This enables you to add a 4th column in your transformation matrix, which results in a 'translation' for each point, i.e. adding an extra value in the new x', y' and z' coefficients, which is independent of the input vector. Since the translation coefficients are also unknown, you now have 12 instead of 9 unknowns, and therefore you need 4 points to solve this system. i.e.
To summarise:
To transform your image A to occupy the same space as B, interpret the coordinates of A as if they were in the same coordinate system as B, find four equivalent landmarks in both, and obtain a suitable transformation matrix as above by solving this system of equations using the / right matrix division operator. You can then use this transformation matrix T you found, to transform all the coordinates in A (expressed as homogeneous coordinates) to the new ones.
Consider the fallowing stereo camera system calibration parameters using matlab stereoCameraCalibrator app.
R1 = stereoParams.CameraParameters1.RotationMatrices(:,:,N);
R2 = stereoParams.CameraParameters2.RotationMatrices(:,:,N);
R12 = stereoParams.RotationOfCamera2;
Were:
R1: rotation from world coordinates (for image N) to camera 1.
R2: rotation from world coordinates (for image N) to camera 2.
R12: rotation from camera 1 coordinates to camera 2. As described on a related SO question
If that is correct, shouldn't R12*R1 == R2 ?
But I'm getting different values, so, what I'm missing here?
Edit
Well, it seams all matrices are transposed. So: R12'*R1' == R2' !
Why they are transposed?
The reason why they are transposed is due to the fact that when performing geometric transformations between coordinates, MATLAB uses row vectors to perform the transformation whereas column vectors are traditionally used in practice.
In other words, to transform a coordinate from one point to another, you typically perform:
x' = A*x
A would be the transformation matrix and x is a column vector of coordinates. The output x' would be another column vector of coordinates. MATLAB in fact uses a row vector and so if you want to achieve the same effect in multiplication, you must transpose the matrix A (i.e. A^{T}) and pre-multiply by A instead of post-multiplying:
x' = x*A^{T}
Here x would be a row vector and to ensure that weighted combination of rows and columns is correctly accumulated, you must transpose A to maintain the same calculations. However, the shape of the output x' would be a row vector instead.
This can also be verified by transposing the product of two matrices. Specifically, if x' = A*x, then in order to transform the output into a row vector x'^{T}, we must transpose the matrix-vector product:
x'^{T} = (A*x)^{T} = x^{T}*A^{T}
The last statement is a natural property of transposing the product of two matrices. See point 3 at the Transpose Wikipedia article for more details: https://en.wikipedia.org/wiki/Transpose#Properties
The reason why the transpose is performed ultimately stems back to the way MATLAB handles how numbers are aligned in memory. MATLAB is a column-major based language which means that numbers are populated in a matrix column-wise. Therefore, if you were to populate a matrix one element at a time, this would be done in a column-wise fashion and so the coefficients are populated per column instead of per row as we are normally used to, ultimately leading to what we've concluded above.
Therefore, when you have transposed both R12 and R1, this brings back the representation into a row major setting where these matrices were originally column major for ease of MATLAB use. The row major setting thus allows you to use coordinates that are column vectors to facilitate the transformation. This column vector setting is what we are used to. Therefore, multiplying R12 and R1 after you transpose them both brings you to the correct transformation matrix R2 in the standard row major representation.
What does MATLAB's estimateUncalibratedRectification do in mathematical/geometrical terms ?
What does it calculate exactly ?
As far as I understand, if the camera parameters are unknown then only the fundamental matrix can be computed from two images, not the essential matrix.
So, as far as I understand, estimateUncalibratedRectification's result should be ambiguous up to some kind of a transformation T because the fundamental matrix - that can be computed from two images if the camera's intrinsic parameters are not known - is ambiguous up to an arbitrary projective transformation.
Is this correct ?
My guess is that estimateUncalibratedRectification computes a projective transformation P1 for image1 and another projective transformation P2 for image2 in such a way that when using these two transformations (P1 and P2) on the corresponding images then the resulting images (R1 and R2) will be rectified in the sense that the corresponding epipolar lines will be in the same rows, as shown in the image below.
My question is: how ambiguous is this result ?
My feeling is that the resulting transformations P1 and P2 are ambiguous up to some transformation T but I don't really see what this T can be.
Could someone please explain how estimateUncalibratedRectification works conceptually/mathematically/geometrically and also tell what T can be ?
In other words, what is the transformation T which when applied to R1 and R2 will result in an image pair TR1 and TR2 which will have the same rectified property as R1 and R2, namely that corresponding epipolar lines appear in matching rows in TR1 and TR2, just like they do in R1 and R2?
What is this T ? Is there such a T ?
PS.: I have read the code of estimateUncalibratedRectification.m before posting this question but it made me no wiser.
If the intrinsics are not known, the result is ambiguous up to a projective transformation. In other words, if you use estimateUncalibratedRectification to rectify a pair of images, and then compute disparity and do the 3D reconstruction, then you will reconstruct the 3D scene up to a projective transformation. Straight lines will be straight, parallel lines will be parallel, but your angles and sizes will likely be wrong.
To determine what that projective transformation is, you would need more information. If you know the camera intrinsics, then you get up to scale reconstruction. In other words you get the correct angles and relative sizes. To get the correct scale you would either need to know the baseline (distance between the cameras) or the size of some reference object in the scene.
The more straight-forward way to approach this is to calibrate your cameras using the Camera Calibrator app or the Stereo Camera Calibrator app.
I want to create in Simulink, a homogenous matrix in order to simulate the rotation and translation of an object in space.
How can I create a 4x4 matrix which will take as input the angle given?
For example a translation across the X axes combined with a rotation in Z would be in MATLAB:
%Supposing the input is
in = [a, b]
%translational part:
transl = eye(4);
transl (1,4) = in(1);
%Rotational Part:
rotat = eye(4);
rotat(1:3,1:3) = rotx(in(2));
move = transl*rotat;
The main problem is that I would like the Simulink model to be the more code-free (without MATLAB interpreted functions etc), just blocks.
Thank you.
First off, sometimes code is the better way to accomplish something. Some things are needlessly complicated when done as signal processing.
A Vector Concatenate can be used to generate a vector, which in turn can be fed into a Matrix Concatenate to create a matrix. Both blocks are found under Math Operations. There you should also find all methods necessary to multiply it with the given values, etc.
Try the 'Rotation Angles to Direction Cosine Matrix' block. It converts rotation angles to direction cosine matrix. The output is a 3x3 matrix, Rxyz, that performs coordinate transformations based on rotation angles from body frame to earth frame.
Multilinear function is such that it is linear with respect to each variable. For example, x1+x2x1-x4x3 is a multilinear function. Working with them requires proper data-structure and algrorihms for fast assignment, factorization and basic aritmetics.
Does there exist some library for processing multilinear function in Matlab?
No, not that much so.
For example, interp2 an interpn have 'linear' methods, which are effectively as you describe. But that is about the limit of what is supplied. And there is nothing for more general functions of this form.
Anyway, this class of functions has some significant limitations. For example, as applied to color image processing, they are often a terribly poor choice because of what they do to neutrals in your image. Other functional forms are strongly preferred there.
Of course, there is always the symbolic toolbox for operations such as factorization, etc., but that tool is not a speed demon.
Edit: (other functional forms)
I'll use a bilinear form as the example. This is the scheme that is employed by tools like Photoshop when bilinear interpolation is chosen. Within the square region between a group of four pixels, we have the form
f(x,y) = f_00*(1-x)*(1-y) + f_10*x*(1-y) + f_01*(1-x)*y + f_11*x*y
where x and y vary over the unit square [0,1]X[0,1]. I've written it here as a function parameterized by the values of our function at the four corners of the square. Of course, those values are given in image interpolation as the pixel values at those locations.
As has been said, the bilinear interpolant is indeed linear in x and in y. If you hold either x or y fixed, then the function is linear in the other variable.
An interesting question is what happens along the diagonal of the unit square? Thus, as we follow the path between points (0,0) and (1,1). Since x = y along this path, substitute x for y in that expression, and expand.
f(x,x) = f_00*(1-x)*(1-x) + f_10*x*(1-x) + f_01*(1-x)*x + f_11*x*x
= (f_11 + f_00 - f_10 - f_01)*x^2 + (f_10 + f_01 - 2*f_00)*x + f_00
So we end up with a quadratic polynomial along the main diagonal. Likewise, had we followed the other diagonal, it too would have been quadratic in form. So despite the "linear" nature of this beast, it is not truly linear along any linear path. It is only linear along paths that are parallel to the axes of the interpolation variables.
In three dimensions, which is where we really care about this behavior for color space interpolation, that main diagonal will now show a cubic behavior along that path, despite that "linear" name for the function.
Why are these diagonals important? What happens along the diagonal? If our mapping takes colors from an RGB color space to some other space, then the neutrals in your image live along the path R=G=B. This is the diagonal of the cube. The problem is when you interpolate an image with a neutral gradient, you will see a gradient in the result after color space conversion that moves from neutral to some non-neutral color as the gradient moves along the diagonal through one cube after another. Sadly, the human eye is very able to see differences from neutrality, so this behavior is critically important. (By the way, this is what happens inside the guts of your color ink jet printer, so people do care about it.)
The alternative chosen is to dissect the unit square into a pair of triangles, with the shared edge along that main diagonal. Linear interpolation now works inside a triangle, and along that edge, the interpolant is purely a function of the endpoints of that shared edge.
In three dimensions, the same thing happens, except we use a dissection of the unit cube into SIX tetrahedra, all of which share the main diagonal of the cube. The difference is indeed critically important, with a dramatic reduction in the deviation of your neutral gradients from neutrality. As it turns out, the eye is NOT so perceptive to deviations along other gradients, so the loss along other paths does not hurt nearly so much. It is neutrals that are crucial, and the colors we must reproduce as accurately as possible.
So IF you do color space interpolation using mappings defined by what are commonly called 3-d lookup tables, this is the agreed way to do that interpolation (agreed upon by the ICC, an acronym for the International Color Consortium.)