transformation matrix of reduced row echelon form - matlab

I can compute the reduced row echelon form R of a matrix C in Matlab using the command R = rref(C).
However, I would also like to keep track of the performed steps, that is, to obtain the transformation matrix T that gives me TC = R. This matrix should, to the best of my knowledge, be implicitly computed when using Gauss-Jordan elimination.
Is there a way to get T? Maybe a workaround? In the matlab documentation, I couldn't find any information. Are there maybe rref-functions in other programming languages that would return T?

You can use the fact that elementary row operations are equivalent to multiplying with an
elementary matrix on the left. Let c be a matrix of size (mxn);
z= rref([c eye(m)]); % [c I] is multiplied by some matrix T
% the result is [rref(c) T]
r= z(:,1:n); % the reduced row echelon form of c
t= z(:,n+1:end); % now we have T

Related

How exactly works this simple calculus of a ML gradient descent cost function using Octave\MatLab?

I am following a machine learning course on Coursera and I am doing the following exercise using Octave (MatLab should be the same).
The exercise is related to the calculation of the cost function for a gradient descent algoritm.
In the course slide I have that this is the cost function that I have to implement using Octave:
This is the formula from the course slide:
So J is a function of some THETA variables represented by the THETA matrix (in the previous second equation).
This is the correct MatLab\Octave implementation for the J(THETA) computation:
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
J = (1/(2*m))*sum(((X*theta) - y).^2)
% =========================================================================
end
where:
X is a 2 column matrix of m rows having all the elements of the first column set to the value 1:
X =
1.0000 6.1101
1.0000 5.5277
1.0000 8.5186
...... ......
...... ......
...... ......
y is a vector of m elements (as X):
y =
17.59200
9.13020
13.66200
........
........
........
Finnally theta is a 2 columns vector having 0 asvalues like this:
theta = zeros(2, 1); % initialize fitting parameters
theta
theta =
0
0
Ok, coming back to my working solution:
J = (1/(2*m))*sum(((X*theta) - y).^2)
specifically to this matrix multiplication (multiplication between the matrix X and the vector theta): I know that it is a valid matrix multiplication because the number of column of X (2 columns) is equal to the number of rows of theta (2 rows) so it is a perfectly valid matrix multiplication.
My doubt that is driving me crazy (probably it is a trivial doubt) is related to the previous course slide context:
As you can see in the second equation used to calculated the current h_theta(x) value it is using the transposed theta vector and not the theta vector as done in the code.
Why ?!?!
I suspect that it depends only on how was created the theta vector. It was build in this way:
theta = zeros(2, 1); % initialize fitting parameters
that is generating a 2 line 1 column vector instead of a classic one line 2 column vector. So maybe I have not to transpose it. But I am absolutely not sure about this assertion.
Is my intuition correct or what am I missing?
Your intuition is correct. Effectively it does not matter whether you perform the multiplication as theta.' * X or as X.' * theta, since this either generates a horizontal vector or a vertical vector of the hypothesis representing all observations, and what you're expected to do next is subtract the y vector from the hypothesis vector at each observation, and sum the results. So as long as y has the same orientation as your hypothesis and you subtract at each equivalent point, then the scalar end-result of the summation will be the same.
Often enough, you'll see the X.' * theta version preferred over theta.' * X purely for convenience, to avoid transposing over and over again just to be consistent with the mathematical notation. But this is fine, since the underlying math doesn't really change, only the order of equivalent operations.
I agree it's confusing though, both because it makes it harder to follow the formula when the code effectively looks like it's doing something else, and also since it messes with the usual convention that a vertical vector represents 'coordinates', and a horizontal vector represents observations. In such cases, especially in languages like matlab / octave where the orientation of a vector isn't explicitly defined in the variable's type, it is doubly important to document what you expect the inputs to represent, and preferably there should have been assert statements in the code confirming the input has been passed in the correct orientation. Clearly here they felt it wasn't necessary because this code is acting under controlled conditions in a predefined exercise environment anyway, but it would have been good practice to do so from a software engineering point of view.

How to solve a linear system for only one component in MATLAB

I need to solve the linear system
A x = b
which can be done efficiently by
x = A \ b
But now A is very large and I actually only need one component, say x(1). Is there a way to solve this more efficiently than to compute all components of x?
A is not sparse. Here, efficiency is actually an issue because this is done for many b.
Also, storing the inverse of K and multiplying only its first row to b is not possible because K is badly conditioned. Using the \ operator employs the LDL solver in this case, and accuracy is lost when the inverse is explicitly used.
I don't think you'd technically get a speed-up over the very optimized Matlab routine however if you understand how it is solved then you can just solve for one part of x. E.g the following. in traditional solver you use backsub for QR solve for instance. In LU solve you use both back sub and front sub. I could get LU. Unfortunately, it actually starts at the end due to how it solves it. The same is true for LDL which would employ both. That doesn't preclude that fact there may be more efficient ways of solving whatever you have.
function [Q,R] = qrcgs(A)
%Classical Gram Schmidt for an m x n matrix
[m,n] = size(A);
% Generates the Q, R matrices
Q = zeros(m,n);
R = zeros(n,n);
for k = 1:n
% Assign the vector for normalization
w = A(:,k);
for j=1:k-1
% Gets R entries
R(j,k) = Q(:,j)'*w;
end
for j = 1:k-1
% Subtracts off orthogonal projections
w = w-R(j,k)*Q(:,j);
end
% Normalize
R(k,k) = norm(w);
Q(:,k) = w./R(k,k);
end
end
function x = backsub(R,b)
% Backsub for upper triangular matrix.
[m,n] = size(R);
p = min(m,n);
x = zeros(n,1);
for i=p:-1:1
% Look from bottom, assign to vector
r = b(i);
for j=(i+1):p
% Subtract off the difference
r = r-R(i,j)*x(j);
end
x(i) = r/R(i,i);
end
end
The method mldivide, generally represented as \ accepts solving many systems with the same A at once.
x = A\[b1 b2 b3 b4] # where bi are vectors with n rows
Solves the system for each b, and will return an nx4 matrix, where each column is the solution of each b. Calling mldivide like this should improve efficiency becaus the descomposition is only done once.
As in many decompositions like LU od LDL' (and in the one you are interested in particular) the matrix multiplying x is upper diagonal, the first value to be solved is x(n). However, having to do the LDL' decomposition, a simple backwards substitution algorithm won't be the bottleneck of the code. Therefore, the decomposition can be saved in order to avoid repeating the calculation for every bi. Thus, the code would look similar to this:
[LA,DA] = ldl(A);
DA = sparse(DA);
% LA = sparse(LA); %LA can also be converted to sparse matrix
% loop over bi
xi = LA'\(DA\(LA\bi));
% end loop
As you can see in the documentation of mldivide (Algorithms section), it performs some checks on the input matrixes, and having defined LA as full and DA as sparse, it should directly go for a triangular solver and a tridiagonal solver. If LA was converted to sparse, it would use a triangular solver too, and I don't know if the conversion to sparse would represent any improvement.

Multiply coefficients into a matrix after meshgrid-ed in matlab

I have a matrix of A=[60x60],and two coefficients a,b. Since matrix A was moved by a,b, how to multiply the coefficients into matrix A so that I could obtain A_moved? Any function to do it?
Here's part of matlab code implemented:
A=rand(60); %where it's in 2D, A(k1,k2)
a=0.5; b=0.8;
[m, n]=size(A);
[M,N] = meshgrid(1:m,1:n);
X = [M(:), N(:)];
A_moved=A(:)(X)*[a b] %I know this is not valid but you get the idea
In another word A_moved is calculated by A_moved=a*k1+b*k2.
This line of code A_moved=A(:)(X)*[a b] is to represent my idea that a,b multiply back into the original A because X represent correspond coordinates of k1 and k2. The first column represent k1, and second column represent k2. Thus it become A_moved=a*k1+b*k2. But this couldn't get me anyway.
In the end A-moved is a 60x60 matrix which have been multiplied by coefficients a,b correspondingly. To make it clearer,A is the phase of image. a,b moved it phase.
Appreciate any help. Thank you!
Reference paper: Here
EDIT:
As suggested by Noel for better understanding.
A=[2 3;5 7], a=1.5 and b=2.5.
Since A is approximated as a*k1+b*k2
Thus,
A_moved=[1.5*k1_1+2.5k2_1 1.5*k1_2+2.5k2_2; 1.5*k1_2+2.5k2_1 1.5*k1_2+2.5k2_2];
where k1 and k2, If I'm understood correctly is the coordinates of the original A matrix, as defined in X above.
On the chat we found that your problem was matrix algebra related
What you want to obtain in A_moved is the x coordinate multiplied by a contant a plus the y coordinate multiplied by a constant b.
You already have this coordinates in M and N, so you can obtain A_moved as
A_moved = (a*M) + (b*N);
And it will retain same shape as A

Computing only necessary rows of a matrix product

Suppose I have a large (but possibly sparse) matrix A, which is K-by-K in dimension. I have another K-by-1 vector, b.
Let Ax=b. If I am only interested in the first n rows, where n < K, of x, then one way of dealing with this in MATLAB is to calculate x=A\b and take the first n elements.
If the dimension K is so large that the entire computation infeasible, is there any other way to get these elements?
I guess one way would be to rearrange the columns of A and rows of x so that the elements you are interested in occur at the end of x. Then you would reduce [A,b] to row echelon form. Finally, to get the components you are after, you take the lower right hand nxn submatrix of the modified A (let's call it An) and you solve the reduced system An * xn = bn, where xn denotes the submarine of x that you are interested in, and bn denotes the last n rows of b after the row echelon reduction.
I mean, the conversion here to echelon form is still expensive, but you don't need to solve for the rest of the components in x, which can save you time.
Just an idea: You could try to use Block Matrix inversion: if you block your matrix into A = [A11, A12;A21, A22], where A11 is n x n, you can compute the blocks of its inverse B = inv(A) = [B11, B12;B21, B22] via Block Matrix Inversion. There are different versions of it, you could use the one where the Schur complement you use is only of size n x n. I'm not quite sure whether it is possible to avoid any inversion that scales with K, but you could look into it.
Your solution is then x(1:n) = [B11, B12]*b. It saves you from ever computing B21, B22. Still, I'm not sure if it is worth it. Depends on the dimensions I guess.
Here is one version, though this still needs the inverse of A22 which is (K-n)x(K-n):
K = 100;
n = 10;
A = randn(K,K);
b = randn(K,1);
% reference version: full inverse
xfull = inv(A)*b;
% blocks of A
A11 = A(1:n,1:n);A12 = A(1:n,n+1:K);A21 = A(n+1:K,1:n);A22 = A(n+1:K,n+1:K);
% blocks of inverse
A22i = inv(A22); % not sure if this can be avoided
B11 = inv(A11 - A12*A22i*A21);
B12 = -B11*A12*A22i;
% solution
x_n = [B11,B12]*b;
disp(x_n - xfull(1:n))
edit: Of course, this computes the inverse "explicitly" and as such is probably much slower than just solving the LSE. It could be worth it, if you had several vectors b you want to fit for a fixed A.

Comparing two sets of vectors

I've got matrices A and B
size(A) = [n x]; size(B) = [n y];
Now I need to compare euclidian distance of each column vector of A from each column vector of B. I'm using dist method right now
Q = dist([A B]); Q = Q(1:x, x:end);
But it does also lot of needless work (like calculating distances between vectors of A and B separately).
What is the best way to calculate this?
You are looking for pdist2.
% Compute the ordinary Euclidean distance
D = pdist2(A.',B.','euclidean'); % euclidean distance
You should take the transpose of the matrices since pdist2 assumes the observations are in rows, not in columns.
An alternative solution to pdist2, if you don't have the Statistics Toolbox, is to compute this manually. For example, one way to do it is:
[X, Y] = meshgrid(1:size(A, 2), 1:size(B, 2)); %// or meshgrid(1:x, 1:y)
Q = sqrt(sum((A(:, X(:)) - B(:, Y(:))) .^ 2, 1));
The indices of the columns from A and B for each value in vector Q can be obtained by computing:
[X(:), Y(:)]
where each row contains a pair of indices: the first is the column index in matrix A, and the second is the column index in matrix B.
Another solution if you don't have pdist2 and which may also be faster for very large matrices is to vectorize the following mathematical fact:
||x-y||^2 = ||x||^2 + ||y||^2 - 2*dot(x,y)
where ||a|| is the L2-norm (euclidean norm) of a.
Comments:
C=-2*A'*B (this is a x by y matrix) is the vectorization of the dot products.
||x-y||^2 is the square of the euclidean distance which you are looking for.
Is that enough or do you need the explicit code?
The reason this may be faster asymptotically is that you avoid doing the metric calculation for all x*y comparisons, since you are instead making the bottleneck a matrix multiplication (matrix multiplication is highly optimized in matlab). You are taking advantage of the fact that this is the euclidean distance and not just some unknown metric.