Consider the fallowing stereo camera system calibration parameters using matlab stereoCameraCalibrator app.
R1 = stereoParams.CameraParameters1.RotationMatrices(:,:,N);
R2 = stereoParams.CameraParameters2.RotationMatrices(:,:,N);
R12 = stereoParams.RotationOfCamera2;
Were:
R1: rotation from world coordinates (for image N) to camera 1.
R2: rotation from world coordinates (for image N) to camera 2.
R12: rotation from camera 1 coordinates to camera 2. As described on a related SO question
If that is correct, shouldn't R12*R1 == R2 ?
But I'm getting different values, so, what I'm missing here?
Edit
Well, it seams all matrices are transposed. So: R12'*R1' == R2' !
Why they are transposed?
The reason why they are transposed is due to the fact that when performing geometric transformations between coordinates, MATLAB uses row vectors to perform the transformation whereas column vectors are traditionally used in practice.
In other words, to transform a coordinate from one point to another, you typically perform:
x' = A*x
A would be the transformation matrix and x is a column vector of coordinates. The output x' would be another column vector of coordinates. MATLAB in fact uses a row vector and so if you want to achieve the same effect in multiplication, you must transpose the matrix A (i.e. A^{T}) and pre-multiply by A instead of post-multiplying:
x' = x*A^{T}
Here x would be a row vector and to ensure that weighted combination of rows and columns is correctly accumulated, you must transpose A to maintain the same calculations. However, the shape of the output x' would be a row vector instead.
This can also be verified by transposing the product of two matrices. Specifically, if x' = A*x, then in order to transform the output into a row vector x'^{T}, we must transpose the matrix-vector product:
x'^{T} = (A*x)^{T} = x^{T}*A^{T}
The last statement is a natural property of transposing the product of two matrices. See point 3 at the Transpose Wikipedia article for more details: https://en.wikipedia.org/wiki/Transpose#Properties
The reason why the transpose is performed ultimately stems back to the way MATLAB handles how numbers are aligned in memory. MATLAB is a column-major based language which means that numbers are populated in a matrix column-wise. Therefore, if you were to populate a matrix one element at a time, this would be done in a column-wise fashion and so the coefficients are populated per column instead of per row as we are normally used to, ultimately leading to what we've concluded above.
Therefore, when you have transposed both R12 and R1, this brings back the representation into a row major setting where these matrices were originally column major for ease of MATLAB use. The row major setting thus allows you to use coordinates that are column vectors to facilitate the transformation. This column vector setting is what we are used to. Therefore, multiplying R12 and R1 after you transpose them both brings you to the correct transformation matrix R2 in the standard row major representation.
Related
I have 2 sets of DICOM image data for 1 subject, consisting of a PET scan and CT scan which were taken at the same time. The Frame of Reference UIDs are different, which I think means that their reference origins are different. So that the 'Image Position Patient' tag can't be compared.
What I want to do is resample both images such that their spatial dimensions are equal and their pixel dimensions are equal. The task seems relatively straightforward, but for the fact that their origins are different.
Download link for data
For any two images A and B deemed to represent the same object, registration is the act of identifying for each pixel / landmark in A the equivalent pixel / landmark in B.
Assuming each pixel in both A and B can be embedded in a coordinate system, registration usually entails transforming A such that after the transformation, the coordinates of each pixel in A coincide with those of the equivalent pixel in B (i.e. the objective is for the two objects overlap in that coordinate space)
An isometric transformation is one where the distance between any two pixels in A, and the distance between the equivalent two pixels in B does not change after the transformation has been applied. For instance, rotation in space, reflection (i.e. mirror image), and translation (i.e. shifting the object in a particular direction) are all isometric transformations. A registration algorithm applying only isometric transformations is said to be rigid.
An affine transformation is similar to an isometric one, except scaling may also be involved (i.e. the object can also grow or shrink in size).
In medical imaging If A and B were obtained at different times, it is highly unlikely that the transformation is a simple affine or isometric one. For instance, say during scan A the patient had their arms down by their side, and in scan B the patient had their arms over their head. There is no rigid registration of A that would result in perfect overlap with B, since distances between equivalent points have changed (e.g. the distance between head-to-hand, and hand-to-foot in each case). Therefore more elaborate non-rigid registration algorithms would need to be used.
The fact that in your case A and B were obtained during the same scanning session in the same machine means that it's a reasonable assumption that the transformation will be a simple affine one. I.e. you will probably only need to rotate and translate the object a bit; if the coordinate system of A is 'denser' than B, you might also need to grow / shrink it a bit. But that's it, no weird 'warping' will be necessary to compensate for 'movement' occurring between scans A and B being obtained, since they happened at the same time.
A 3D vector, denoting a 'magnitude and direction' in 3D space can be transformed to another 3D vector using a 3x3 transformation matrix T. For example, if you apply transformation to vector (using matrix multiplication), the resulting vector u is . In other words, the 'new' x-coordinate depends on the old x, y, and z coordinates in a manner specified by the transformation matrix, and similarly for the new y and new z coordinates.
If you apply a 3x3 transformation T to three vectors at the same time, you'll get three transformed vectors out. e.g. for v = [v1, v2, v3] where v1 = [1; 2; 3], v2 = [2; 3; 4], v3 = [3; 4; 5], then T*v will
give you a 3x3 matrix u, where each column corresponds to a
transformed vector of x,y,z coordinates.
Now, consider the transformation matrix T is unknown and we want to discover it. Say we have a known point and we know that after the transformation it becomes a known point . We have:
Consider the top row; even if you know p and p', it should be clear that you cannot determine a, b, and c from a single point. You have three unknowns and only one equation. Therefore to solve for a, b, and c, you need at least a system of three equations. The same applies for the other two rows. Therefore, to find the transformation matrix T you need three known points (before and after transformation).
In matlab, you can solve such a system of equations where T*v = u, by typing T = u/v. For a 3x3 transformation matrix T, u and v need to contain at least 3 vectors, but they can contain more (i.e. the system of equations is overrepresented). The more vectors you pass in, the more accurate the transformation matrix from a numerical point of view. But in theory you only need three.
If your transformation also involves a translation element, then you need to do the trick described in the image you posted. I.e. you represent a 3D vector [x,y,z] as a homogeneous-coordinates vector [x,y,z,1]. This enables you to add a 4th column in your transformation matrix, which results in a 'translation' for each point, i.e. adding an extra value in the new x', y' and z' coefficients, which is independent of the input vector. Since the translation coefficients are also unknown, you now have 12 instead of 9 unknowns, and therefore you need 4 points to solve this system. i.e.
To summarise:
To transform your image A to occupy the same space as B, interpret the coordinates of A as if they were in the same coordinate system as B, find four equivalent landmarks in both, and obtain a suitable transformation matrix as above by solving this system of equations using the / right matrix division operator. You can then use this transformation matrix T you found, to transform all the coordinates in A (expressed as homogeneous coordinates) to the new ones.
Considering a discrete dynamical system where x[0]=rand() denotes the initial condition of the system.
I have generated an m by n matrix by the following step -- generate m vectors with m different initial conditions each with dimension N (N indicates the number of samples or elements). This matrix is called R. Using R how do I create a Toeplitz matrix, T? T
Mathematically,
R = [ x_0[0], ....,x_0[n-1];
..., ,.....;
x_m[0],.....,x_m[n-1]]
The toeplitz matrix T =
x[n-1], x[n-2],....,x[0];
x[0], x[n-1],....,x[1];
: : :
x[m-2],x[m-3]....,x[m-1]
I tried working with toeplitz(R) but the dimension changes. The dimension should no change, as seen mathematically.
According to the paper provided (Toeplitz structured chaotic sensing matrix for compressive sensing by Yu et al.) there are two Chaotic Sensing Matrices involved. Let's explore them separately.
The Chaotic Sensing Matrix (Section A)
It is clearly stated that to create such matrix you have to build m independent signals (sequences) with m different initials conditions (in range ]0;1[) and then concatenate such signals per rows (that is, one signal = one row). Each of these signals must have length N. This actually is your matrix R, which is correctly evaluated as it is. Although I'd like to suggest a code improvement: instead of building a column and then transpose the matrix you can directly build such matrix per rows:
R=zeros(m,N);
R(:,1)=rand(m,1); %build the first column with m initial conditions
Please note: by running randn() you select values with Gaussian (Normal) distribution, such values might not be in range ]0;1[ as stated in the paper (right below equation 9). As instead by using rand() you take uniformly distributed values in such range.
After that, you can build every row separately according to the for-loop:
for i=1:m
for j=2:N %skip first column
R(i,j)=4*R(i,j-1)*(1-R(i,j-1));
R(i,j)=R(i,j)-0.5;
end
end
The Toeplitz Chaotic Sensing Matrix (Section B)
It is clearly stated at the beginning of Section B that to build the Toeplitz matrix you should consider a single sequence x with a given, single, initial condition. So let's build such sequence:
x=rand();
for j=2:N %skip first element
x(j)=4*x(j-1)*(1-x(j-1));
x(j)=x(j)-0.5;
end
Now, to build the matrix you can consider:
how do the first row looks like? Well, it looks like the sequence itself, but flipped (i.e. instead of going from 0 to n-1, it goes from n-1 to 0)
how do the first column looks like? It is the last item from x concatenated with the elements in range 0 to m-2
Let's then build the first row (r) and the first column (c):
r=fliplr(x);
c=[x(end) x(1:m-1)];
Please note: in Matlab the indices start from 1, not from 0 (so instead of going from 0 to m-2, we go from 1 to m-1). Also end means the last element from a given array.
Now by looking at the help for the toeplitz() function, it is clearly stated that you can build a non-squared Toeplitz matrix by specifying the first row and the first column. Therefore, finally, you can build such matrix as:
T=toeplitz(c,r);
Such matrix will indeed have dimensions m*N, as reported in the paper.
Even though the Authors call both of them \Phi, they actually are two separate matrices.
They do not take the Toeplitz of the Beta-Like Matrix (Toeplitz matrix is not a function or operator of some kind), neither do they transform the Beta-Like Matrix into a Toeplitz-matrix.
You have the Beta-Like Matrix (i.e. the Chaotic Sensing Matrix) at first, and then the Toeplitz-structured Chaotic Sensing Matrix: such structure is typical for Toeplitz matrices, that is a diagonal-constant structure (all elements along a diagonal have the same value).
Suppose there is a matrix B, where its size is a 500*1000 double(Here, 500 represents the number of observations and 1000 represents the number of features).
sigma is the covariance matrix of B, and D is a diagonal matrix whose diagonal elements are the eigenvalues of sigma. Assume A is the eigenvectors of the covariance matrix sigma.
I have the following questions:
I need to select the first k = 800 eigenvectors corresponding to the eigenvalues with the largest magnitude to rank the selected features. The final matrix named Aq. How can I do this in MATLAB?
What is the meaning of these selected eigenvectors?
It seems the size of the final matrix Aq is 1000*800 double once I calculate Aq. The time points/observation information of 500 has disappeared. For the final matrix Aq, what does the value 1000 in matrix Aq represent now? Also, what does the value 800 in matrix Aq represent now?
I'm assuming you determined the eigenvectors from the eig function. What I would recommend to you in the future is to use the eigs function. This not only computes the eigenvalues and eigenvectors for you, but it will compute the k largest eigenvalues with their associated eigenvectors for you. This may save computational overhead where you don't have to compute all of the eigenvalues and associated eigenvectors of your matrix as you only want a subset. You simply supply the covariance matrix of your data to eigs and it returns the k largest eigenvalues and eigenvectors for you.
Now, back to your problem, what you are describing is ultimately Principal Component Analysis. The mechanics behind this would be to compute the covariance matrix of your data and find the eigenvalues and eigenvectors of the computed result. It has been known that doing it this way is not recommended due to numerical instability with computing the eigenvalues and eigenvectors for large matrices. The most canonical way to do this now is via Singular Value Decomposition. Concretely, the columns of the V matrix give you the eigenvectors of the covariance matrix, or the principal components, and the associated eigenvalues are the square root of the singular values produced in the diagonals of the matrix S.
See this informative post on Cross Validated as to why this is preferred:
https://stats.stackexchange.com/questions/79043/why-pca-of-data-by-means-of-svd-of-the-data
I'll throw in another link as well that talks about the theory behind why the Singular Value Decomposition is used in Principal Component Analysis:
https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca
Now let's answer your question one at a time.
Question #1
MATLAB generates the eigenvalues and the corresponding ordering of the eigenvectors in such a way where they are unsorted. If you wish to select out the largest k eigenvalues and associated eigenvectors given the output of eig (800 in your example), you'll need to sort the eigenvalues in descending order, then rearrange the columns of the eigenvector matrix produced from eig then select out the first k values.
I should also note that using eigs will not guarantee sorted order, so you will have to explicitly sort these too when it comes down to it.
In MATLAB, doing what we described above would look something like this:
sigma = cov(B);
[A,D] = eig(sigma);
vals = diag(D);
[~,ind] = sort(abs(vals), 'descend');
Asort = A(:,ind);
It's a good thing to note that you do the sorting on the absolute value of the eigenvalues because scaled eigenvalues are also eigenvalues themselves. These scales also include negatives. This means that if we had a component whose eigenvalue was, say -10000, this is a very good indication that this component has some significant meaning to your data, and if we sorted purely on the numbers themselves, this gets placed near the lower ranks.
The first line of code finds the covariance matrix of B, even though you said it's already stored in sigma, but let's make this reproducible. Next, we find the eigenvalues of your covariance matrix and the associated eigenvectors. Take note that each column of the eigenvector matrix A represents one eigenvector. Specifically, the ith column / eigenvector of A corresponds to the ith eigenvalue seen in D.
However, the eigenvalues are in a diagonal matrix, so we extract out the diagonals with the diag command, sort them and figure out their ordering, then rearrange A to respect this ordering. I use the second output of sort because it tells you the position of where each value in the unsorted result would appear in the sorted result. This is the ordering we need to rearrange the columns of the eigenvector matrix A. It's imperative that you choose 'descend' as the flag so that the largest eigenvalue and associated eigenvector appear first, just like we talked about before.
You can then pluck out the first k largest vectors and values via:
k = 800;
Aq = Asort(:,1:k);
Question #2
It's a well known fact that the eigenvectors of the covariance matrix are equal to the principal components. Concretely, the first principal component (i.e. the largest eigenvector and associated largest eigenvalue) gives you the direction of the maximum variability in your data. Each principal component after that gives you variability of a decreasing nature. It's also good to note that each principal component is orthogonal to each other.
Here's a good example from Wikipedia for two dimensional data:
I pulled the above image from the Wikipedia article on Principal Component Analysis, which I linked you to above. This is a scatter plot of samples that are distributed according to a bivariate Gaussian distribution centred at (1,3) with a standard deviation of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction. The component with a standard deviation of 3 is the first principal component while the one that is orthogonal is the second component. The vectors shown are the eigenvectors of the covariance matrix scaled by the square root of the corresponding eigenvalue, and shifted so their tails are at the mean.
Now let's get back to your question. The reason why we take a look at the k largest eigenvalues is a way of performing dimensionality reduction. Essentially, you would be performing a data compression where you would take your higher dimensional data and project them onto a lower dimensional space. The more principal components you include in your projection, the more it will resemble the original data. It actually begins to taper off at a certain point, but the first few principal components allow you to faithfully reconstruct your data for the most part.
A great visual example of performing PCA (or SVD rather) and data reconstruction is found by this great Quora post I stumbled upon in the past.
http://qr.ae/RAEU8a
Question #3
You would use this matrix to reproject your higher dimensional data onto a lower dimensional space. The number of rows being 1000 is still there, which means that there were originally 1000 features in your dataset. The 800 is what the reduced dimensionality of your data would be. Consider this matrix as a transformation from the original dimensionality of a feature (1000) down to its reduced dimensionality (800).
You would then use this matrix in conjunction with reconstructing what the original data was. Concretely, this would give you an approximation of what the original data looked like with the least amount of error. In this case, you don't need to use all of the principal components (i.e. just the k largest vectors) and you can create an approximation of your data with less information than what you had before.
How you reconstruct your data is very simple. Let's talk about the forward and reverse operations first with the full data. The forward operation is to take your original data and reproject it but instead of the lower dimensionality, we will use all of the components. You first need to have your original data but mean subtracted:
Bm = bsxfun(#minus, B, mean(B,1));
Bm will produce a matrix where each feature of every sample is mean subtracted. bsxfun allows the subtraction of two matrices in unequal dimension provided that you can broadcast the dimensions so that they can both match up. Specifically, what will happen in this case is that the mean of each column / feature of B will be computed and a temporary replicated matrix will be produced that is as large as B. When you subtract your original data with this replicated matrix, the effect will subtract every data point with their respective feature means, thus decentralizing your data so that the mean of each feature is 0.
Once you do this, the operation to project is simply:
Bproject = Bm*Asort;
The above operation is quite simple. What you are doing is expressing each sample's feature as a linear combination of principal components. For example, given the first sample or first row of the decentralized data, the first sample's feature in the projected domain is a dot product of the row vector that pertains to the entire sample and the first principal component which is a column vector.. The first sample's second feature in the projected domain is a weighted sum of the entire sample and the second component. You would repeat this for all samples and all principal components. In effect, you are reprojecting the data so that it is with respect to the principal components - which are orthogonal basis vectors that transform your data from one representation to another.
A better description of what I just talked about can be found here. Look at Amro's answer:
Matlab Principal Component Analysis (eigenvalues order)
Now to go backwards, you simply do the inverse operation, but a special property with the eigenvector matrix is that if you transpose this, you get the inverse. To get the original data back, you undo the operation above and add the means back to the problem:
out = bsxfun(#plus, Bproject*Asort.', mean(B, 1));
You want to get the original data back, so you're solving for Bm with respect to the previous operation that I did. However, the inverse of Asort is just the transpose here. What's happening after you perform this operation is that you are getting the original data back, but the data is still decentralized. To get the original data back, you must add the means of each feature back into the data matrix to get the final result. That's why we're using another bsxfun call here so that you can do this for each sample's feature values.
You should be able to go back and forth from the original domain and projected domain with the above two lines of code. Now where the dimensionality reduction (or the approximation of the original data) comes into play is the reverse operation. What you need to do first is project the data onto the bases of the principal components (i.e. the forward operation), but now to go back to the original domain where we are trying to reconstruct the data with a reduced number of principal components, you simply replace Asort in the above code with Aq and also reduce the amount of features you're using in Bproject. Concretely:
out = bsxfun(#plus, Bproject(:,1:k)*Aq.', mean(B, 1));
Doing Bproject(:,1:k) selects out the k features in the projected domain of your data, corresponding to the k largest eigenvectors. Interestingly, if you just want the representation of the data with regards to a reduced dimensionality, you can just use Bproject(:,1:k) and that'll be enough. However, if you want to go forward and compute an approximation of the original data, we need to compute the reverse step. The above code is simply what we had before with the full dimensionality of your data, but we use Aq as well as selecting out the k features in Bproject. This will give you the original data that is represented by the k largest eigenvectors / eigenvalues in your matrix.
If you'd like to see an awesome example, I'll mimic the Quora post that I linked to you but using another image. Consider doing this with a grayscale image where each row is a "sample" and each column is a feature. Let's take the cameraman image that's part of the image processing toolbox:
im = imread('camerman.tif');
imshow(im); %// Using the image processing toolbox
We get this image:
This is a 256 x 256 image, which means that we have 256 data points and each point has 256 features. What I'm going to do is convert the image to double for precision in computing the covariance matrix. Now what I'm going to do is repeat the above code, but incrementally increasing k at each go from 3, 11, 15, 25, 45, 65 and 125. Therefore, for each k, we are introducing more principal components and we should slowly start to get a reconstruction of our data.
Here's some runnable code that illustrates my point:
%%%%%%%// Pre-processing stage
clear all;
close all;
%// Read in image - make sure we cast to double
B = double(imread('cameraman.tif'));
%// Calculate covariance matrix
sigma = cov(B);
%// Find eigenvalues and eigenvectors of the covariance matrix
[A,D] = eig(sigma);
vals = diag(D);
%// Sort their eigenvalues
[~,ind] = sort(abs(vals), 'descend');
%// Rearrange eigenvectors
Asort = A(:,ind);
%// Find mean subtracted data
Bm = bsxfun(#minus, B, mean(B,1));
%// Reproject data onto principal components
Bproject = Bm*Asort;
%%%%%%%// Begin reconstruction logic
figure;
counter = 1;
for k = [3 11 15 25 45 65 125 155]
%// Extract out highest k eigenvectors
Aq = Asort(:,1:k);
%// Project back onto original domain
out = bsxfun(#plus, Bproject(:,1:k)*Aq.', mean(B, 1));
%// Place projection onto right slot and show the image
subplot(4, 2, counter);
counter = counter + 1;
imshow(out,[]);
title(['k = ' num2str(k)]);
end
As you can see, the majority of the code is the same from what we have seen. What's different is that I loop over all values of k, project back onto the original space (i.e. computing the approximation) with the k highest eigenvectors, then show the image.
We get this nice figure:
As you can see, starting with k=3 doesn't really do us any favours... we can see some general structure, but it wouldn't hurt to add more in. As we start increasing the number of components, we start to get a clearer picture of what the original data looks like. At k=25, we actually can see what the cameraman looks like perfectly, and we don't need components 26 and beyond to see what's happening. This is what I was talking about with regards to data compression where you don't need to work on all of the principal components to get a clear picture of what's going on.
I'd like to end this note by referring you to Chris Taylor's wonderful exposition on the topic of Principal Components Analysis, with code, graphs and a great explanation to boot! This is where I got started on PCA, but the Quora post is what solidified my knowledge.
Matlab - PCA analysis and reconstruction of multi dimensional data
I have a matrix M of size NxP. Every P columns are orthogonal (M is a basis). I also have a vector V of size N.
My objective is to transform the first vector of M into V and to update the others in order to conservate their orthogonality. I know that the origins of V and M are the same, so it is basically a rotation from a certain angle. I assume we can find a matrix T such that T*M = M'. However, I can't figure out the details of how to do it (with MATLAB).
Also, I know there might be an infinite number of transforms doing that, but I'd like to get the simplest one (in which others vectors of M approximately remain the same, i.e no rotation around the first vector).
A small picture to illustrate. In my actual case, N and P can be large integers (not necessarily 3):
Thanks in advance for your help!
[EDIT] Alternative solution to Gram-Schmidt (accepted answer)
I managed to get a correct solution by retrieving a rotation matrix R by solving an optimization problem minimizing the 2-norm between M and R*M, under the constraints:
V is orthogonal to R*M[1] ... R*M[P-1] (i.e V'*(R*M[i]) = 0)
R*M[0] = V
Due to the solver constraints, I couldn't indicate that R*M[0] ... R*M[P-1] are all pairwise orthogonal (i.e (R*M)' * (R*M) = I).
Luckily, it seems that with this problem and with my solver (CVX using SDPT3), the resulting R*M[0] ... R*M[P-1] are also pairwise orthogonal.
I believe you want to use the Gram-Schmidt process here, which finds an orthogonal basis for a set of vectors. If V is not orthogonal to M[0], you can simply change M[0] to V and run Gram-Schmidt, to arrive at an orthogonal basis. If it is orthogonal to M[0], instead change another, non-orthogonal vector such as M[1] to V and swap the columns to make it first.
Mind you, the vector V needs to be in the column space of M, or you will always have a different basis than you had before.
Matlab doesn't have a built-in Gram-Schmidt command, although you can use the qr command to get an orthogonal basis. However, this won't work if you need V to be one of the vectors.
Option # 1 : if you have some vector and after some changes you want to rotate matrix to restore its orthogonality then, I believe, this method should work for you in Matlab
http://www.mathworks.com/help/symbolic/mupad_ref/numeric-rotationmatrix.html
(edit by another user: above link is broken, possible redirect: Matrix Rotations and Transformations)
If it does not, then ...
Option # 2 : I did not do this in Matlab but a part of another task was to find Eigenvalues and Eigenvectors of the matrix. To achieve this I used SVD. Part of SVD algorithm was Jacobi Rotation. It says to rotate the matrix until it is almost diagonalizable with some precision and invertible.
https://math.stackexchange.com/questions/222171/what-is-the-difference-between-diagonalization-and-orthogonal-diagonalization
Approximate algorithm of Jacobi rotation in your case should be similar to this one. I may be wrong at some point so you will need to double check this in relevant docs :
1) change values in existing vector
2) compute angle between actual and new vector
3) create rotation matrix and ...
put Cosine(angle) to diagonal of rotation matrix
put Sin(angle) to the top left corner of the matric
put minus -Sin(angle) to the right bottom corner of the matrix
4) multiple vector or matrix of vectors by rotation matrix in a loop until your vector matrix is invertible and diagonalizable, ability to invert can be calculated by determinant (check for singularity) and orthogonality (matrix is diagonalized) can be tested with this check - if Max value in LU matrix is less then some constant then stop rotation, at this point new matrix should contain only orthogonal vectors.
Unfortunately, I am not able to find exact pseudo code that I was referring to in the past but these links may help you to understand Jacobi Rotation :
http://www.physik.uni-freiburg.de/~severin/fulltext.pdf
http://web.stanford.edu/class/cme335/lecture7.pdf
https://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2003/rapporter03/maleko_mercy_03003.pdf
The commands
a = magic(3);
b = pascal(3);
c = cat(4,a,b);
produce a 3-by-3-by-1-by-2 array.
Why is the result 3-3-1-2 when the dimension is 4?
Both a and b are two-dimensional matrices of size 3-by-3. When you concatenate them along a fourth dimension, the intervening third dimension is singleton (i.e. 1). So c(:,:,1,1) will be your matrix a and c(:,:,1,2) will be your matrix b.
Here's a link to some documentation that may help with understanding multidimensional arrays.
EDIT:
Perhaps it will help to think of these four dimensions in terms that us humans can more easily relate to...
Let's assume that the four dimensions in the example represent three dimensions in space (x, y, and z) plus a fourth dimension time. Imagine that I'm sampling the temperature in the air at a number of points in space at one given time. I can sample the air temperature in a grid that comprises all combinations of three x positions, three y positions, and one z position. That will give me a 3-by-3-by-1 grid. Normally, we'd probably just say that the data is in a 3-by-3 grid, ignoring the trailing singleton dimension.
However, let's say that I now take another set of samples at these points at a later time. I therefore get another 3-by-3-by-1 grid at a second time point. If I concatenate these sets of data together along the time dimension I get a 3-by-3-by-1-by-2 matrix. The third dimension is singleton because I only sampled at one z value.
So, in the example c=cat(4,a,b), we are concatenating two matrices along the fourth dimension. The two matrices are 3-by-3, with the third dimension implicitly assumed to be singleton. However, when concatenating along the fourth dimension we end up having to explicitly show that the third dimension is still there by listing its size as 1.