Gramm Schmidt QR factorisation doesn't return Q and R correctly - matlab

I have implemented a MATLAB function for Gramm-Schmidt QR factorisation. Q's inverse should be equal to it's inverse, but it's not, and I can't see why. I even tried with somebody else's function, which is identical, and the result was the same. This is my function:
function [Q R] = gramschmidt(A)
[n n] = size(A);
for i = 1:n
R(i,i) = norm( A(:, i) );
Q(:, i) = A(:, i) / R ( i, i);
for j = i + 1 : n
R(i, j) = Q(:, i)' * A(:, j);
A(:, j) = A(:, j) - Q(:, i) * R(i, j);

Firstly, I think what you meant to say is Q's conjugate transpose should be equal to it's inverse, i.e. that it is a unitary matrix.
Secondly, what makes you think that Q returned by your function is not unitary? Let's check.
A = randn(20,20);
[Q, R] = gramschmidt(A);
diff = #(X,Y) max(abs(X(:)-Y(:))); % element-wise max abs difference
diff(Q'*Q, eye(size(A)))
ans =
As you can see, it is unitary to a very good precision.
Also, just in case, Matlab has a built-in and efficient qr function that performs this decomposition, which also handles rectangular matrices, not just square ones like your implementation.


How can I express this large number of computations without for loops?

I work primarily in MATLAB but I think the answer should not be too hard to carry over from one language to another.
I have a multi-dimensional array X with dimensions [n, p, 3].
I would like to calculate the following multi-dimensional array.
T = zeros(p, p, p)
for i = 1:p
for j = 1:p
for k = 1:p
T(i, j, k) = sum(X(:, i, 1) .* X(:, j, 2) .* X(:, k, 3));
The sum is of the elements of a length-n vector. Any help is appreciated!
You only need some permuting of dimensions and multiplication with singleton expansion:
T = sum(bsxfun(#times, bsxfun(#times, permute(X(:,:,1), [2 4 5 3 1]), permute(X(:,:,2), [4 2 5 3 1])), permute(X(:,:,3), [4 5 2 3 1])), 5);
From R2016b onwards, this can be written more easily as
T = sum(permute(X(:,:,1), [2 4 5 3 1]) .* permute(X(:,:,2), [4 2 5 3 1]) .* permute(X(:,:,3), [4 5 2 3 1]), 5);
As I mentioned in a comment, vectorization is not always a huge advantage any more. Therefore there are vectorization methods that slow down the code rather than speed it up. You must always time your solutions. Vectorization often involves the creation of large temporary arrays, or the copy of large amounts of data, which are avoided in loop code. It depends on the architecture, the size of the input, and many other factors if such a solution is going to be faster.
Nonetheless, in this case it seems vectorization approaches can yield a large speedup.
The first thing to notice about the original code is that X(:, i, 1) .* X(:, j, 2) gets re-computed in the inner loop, though it is a constant value there. Rewriting the inner loop as this will save time:
Y = X(:, i, 1) .* X(:, j, 2);
for k = 1:p
T(i, j, k) = sum(Y .* X(:, k, 3));
Now we notice that the inner loop is a dot product, and can be written as follows:
Y = X(:, i, 1) .* X(:, j, 2);
T(i, j, :) = Y.' * X(:, :, 3);
The .' transpose on Y does not copy the data, as Y is a vector. Next, we notice that X(:, :, 3) is indexed repeatedly. Let's move this out of the outer loop. Now I'm left with the following code:
T = zeros(p, p, p);
X1 = X(:, :, 1);
X2 = X(:, :, 2);
X3 = X(:, :, 3);
for i = 1:p
for j = 1:p
Y = X1(:, i) .* X2(:, j);
T(i, j, :) = Y.' * X3;
It is likely that removing the loop over j is equally easy, which would leave a single loop over i. But this is where I stop.
This is the timings I see (R2017a, 3-year old iMac with 4 cores). For n=10, p=20:
original: 0.0206
moving Y out the inner loop: 0.0100
removing inner loop: 0.0016
moving indexing out of loops: 7.6294e-04
Luis' answer: 1.9196e-04
For a larger array with n=50, p=100:
original: 2.9107
moving Y out the inner loop: 1.3488
removing inner loop: 0.0910
moving indexing out of loops: 0.0361
Luis' answer: 0.1417
"Luis' answer" is this one. It is by far fastest for small arrays, but for larger arrays it shows the cost of the permutation. Moving the computation of the first product out of the inner loop saves a bit over half the computation cost. But removing the inner loop reduces the cost quite dramatically (which I hadn't expected, I presume the single matrix product can use parallelism better than the many small element-wise products). We then get a further time reduction by reducing the amount of indexing operations within the loop.
This is the timing code:
function so()
n = 10; p = 20;
%n = 50; p = 100;
X = randn(n,p,3);
T1 = method1(X);
T2 = method2(X);
T3 = method3(X);
T4 = method4(X);
T5 = method5(X);
function T = method1(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
for k = 1:p
T(i, j, k) = sum(X(:, i, 1) .* X(:, j, 2) .* X(:, k, 3));
function T = method2(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
Y = X(:, i, 1) .* X(:, j, 2);
for k = 1:p
T(i, j, k) = sum(Y .* X(:, k, 3));
function T = method3(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
Y = X(:, i, 1) .* X(:, j, 2);
T(i, j, :) = Y.' * X(:, :, 3);
function T = method4(X)
p = size(X,2);
T = zeros(p, p, p);
X1 = X(:, :, 1);
X2 = X(:, :, 2);
X3 = X(:, :, 3);
for i = 1:p
for j = 1:p
Y = X1(:, i) .* X2(:, j);
T(i, j, :) = Y.' * X3;
function T = method5(X)
T = sum(permute(X(:,:,1), [2 4 5 3 1]) .* permute(X(:,:,2), [4 2 5 3 1]) .* permute(X(:,:,3), [4 5 2 3 1]), 5);
You have mentioned you are open to other languages and NumPy by its syntax is very close to MATLAB, so we will try to have a NumPy based solution on this.
Now, these tensor related sum-reductions, specially matrix multiplications ones are easily expressed as einstein-notation and NumPy luckily has one function on the same as np.einsum. Under the hoods, it's implemented in C and is pretty efficient. Recently it's been optimized further to leverage BLAS based matrix-multiplication implementations.
So, a translation of the stated code onto NumPy territory keeping in mind that it follows 0-based indexing and the axes are visuallized differently than the dimensions with MATLAB, would be -
import numpy as np
# X is a NumPy array of shape : (n,p,3). So, a random one could be
# generated with : `X = np.random.rand(n,p,3)`.
T = np.zeros((p, p, p))
for i in range(p):
for j in range(p):
for k in range(p):
T[i, j, k] = np.sum(X[:, i, 0] * X[:, j, 1] * X[:, k, 2])
The einsum way to solve it would be -
To leverage matrix-multiplication, use optimize flag -
Timings (with large sizes)
In [27]: n,p = 100,100
...: X = np.random.rand(n,p,3)
In [28]: %%timeit
...: T = np.zeros((p, p, p))
...: for i in range(p):
...: for j in range(p):
...: for k in range(p):
...: T[i, j, k] = np.sum(X[:, i, 0] * X[:, j, 1] * X[:, k, 2])
1 loop, best of 3: 6.23 s per loop
In [29]: %timeit np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2])
1 loop, best of 3: 353 ms per loop
In [31]: %timeit np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2],optimize=True)
100 loops, best of 3: 10.5 ms per loop
In [32]: 6230.0/10.5
Out[32]: 593.3333333333334
Around 600x speedup there!

Compute weighted summation of matrix power (matrix polynomial) in Matlab

Given an nxn matrix A_k and a nx1 vector x, is there any smart way to compute
using Matlab? x_i are the elements of the vector x, therefore J is a sum of matrices. So far I have used a for loop, but I was wondering if there was a smarter way.
Short answer: you can use the builtin matlab function polyvalm for matrix polynomial evaluation as follows:
x = x(end:-1:1); % flip the order of the elements
x(end+1) = 0; % append 0
J = polyvalm(x, A);
Long answer: Matlab uses a loop internally. So, you didn't gain that much or you perform even worse if you optimise your own implementation (see my calcJ_loopOptimised function):
% construct random input
n = 100;
A = rand(n);
x = rand(n, 1);
% calculate the result using different methods
Jbuiltin = calcJ_builtin(A, x);
Jloop = calcJ_loop(A, x);
JloopOptimised = calcJ_loopOptimised(A, x);
% check if the functions are mathematically equivalent (should be in the order of `eps`)
relativeError1 = max(max(abs(Jbuiltin - Jloop)))/max(max(Jbuiltin))
relativeError2 = max(max(abs(Jloop - JloopOptimised)))/max(max(Jloop))
% measure the execution time
t_loopOptimised = timeit(#() calcJ_loopOptimised(A, x))
t_builtin = timeit(#() calcJ_builtin(A, x))
t_loop = timeit(#() calcJ_loop(A, x))
% check if builtin function is faster
builtinFaster = t_builtin < t_loopOptimised
% calculate J using Matlab builtin function
function J = calcJ_builtin(A, x)
x = x(end:-1:1);
x(end+1) = 0;
J = polyvalm(x, A);
% naive loop implementation
function J = calcJ_loop(A, x)
n = size(A, 1);
J = zeros(n,n);
for i=1:n
J = J + A^i * x(i);
% optimised loop implementation (cache result of matrix power)
function J = calcJ_loopOptimised(A, x)
n = size(A, 1);
J = zeros(n,n);
A_ = eye(n);
for i=1:n
A_ = A_*A;
J = J + A_ * x(i);
For n=100, I get the following:
t_loopOptimised = 0.0077
t_builtin = 0.0084
t_loop = 0.0295
For n=5, I get the following:
t_loopOptimised = 7.4425e-06
t_builtin = 4.7399e-05
t_loop = 1.0496e-04
Note that my timings fluctuates somewhat between different runs, but the optimised loop is almost always faster (up to 6x for small n) than the builtin function.

Finding matrix inverse by Gaussian Elimination With Partial Pivoting

Hello guys I am writing program to compute determinant(this part i already did) and Inverse matrix with GEPP. Here problem arises since i have completely no idea how to inverse Matrix using GEPP, i know how to inverse using Gauss Elimination ([A|I]=>[I|B]). I have searched through internet but still no clue, could you please explain me?
Here is my matlab code (maybe someone will find it useful), as of now it solves AX=b and computes determinant:
function [det1,X ] = gauss_czesciowy( A, b )
n = length(b);
if n~=m
error('vector has wrong size');
for j = 1:n
% choice of main element
for i = j:n
if abs(A(i,j)) >= abs(A(p,j))
p = i;
if A(p,j) == 0
error('Matrix A is singular');
%rows permutation
t = A(p,:);
A(p,:) = A(j,:);
A(j,:) = t;
t = b(p);
b(p) = b(j);
b(j) = t;
% reduction
for i = j+1:n
t = (A(i,j)/A(j,j));
A(i,:) = A(i,:)-A(j,:)*t;
b(i) = b(i)-b(j)*t;
for i=1:n
% solution
X = zeros(1,n);
X(n) = b(n)/A(n,n);
if (det1~=0)
for i = 1:n
s = sum( A(i, (i+1):n) .* X((i+1):n) );
X(i) = (b(i) - s) / A(i,i);
Here is the algorithm for Guassian elimination with partial pivoting. Basically you do Gaussian elimination as usual, but at each step you exchange rows to pick the largest-valued pivot available.
To get the inverse, you have to keep track of how you are switching rows and create a permutation matrix P. The permutation matrix is just the identity matrix of the same size as your A-matrix, but with the same row switches performed. Then you have:
[A] --> GEPP --> [B] and [P]
[A]^(-1) = [B]*[P]
I would try this on a couple of matrices just to be sure.
EDIT: Rather than empirically testing this, let's reason it out. Basically what you are doing when you switch rows in A is you are multiplying it by your permutation matrix P. You could just do this before you started GE and end up with the same result, which would be:
[P*A|I] --> GE --> [I|B] or
(P*A)^(-1) = B
Due to the properties of the inverse operation, this can be rewritten:
A^(-1) * P^(-1) = B
And you can multiply both sides by P on the right to get:
A^(-1) * P^(-1)*P = B*P
A^(-1) * I = B*P
A^(-1) = B*P

matlab generate fixed degree undirected graph

I would like to generate the adjacency matrix of an undirected graph with N nodes.
In particular, this graph should have a fixed degree (each node is connected to a fixed number of node d).
If a set d = N-1, the solution is trivial:
A = ones(N) - eye(N);
How can I generalize it for any d?
Here is a solution (thanks to Oli Charlesworth):
function A = fixedDegreeGraph(N, d)
A = zeros(N);
for i=1:N
b = i;
f = i;
for k=1:floor(d/2)
f = f + 1;
if (f == N + 1)
f = 1;
A(i, f) = 1;
A(f, i) = 1;
b = b - 1;
if (b == 0)
b = N;
A(i, b) = 1;
A(b, i) = 1;
For even d, here's a way to visualise the approach.
Draw the vertices out arranged in a circle.
Each vertex is connected to its immediate (d/2) left-hand neighbours, and its immediate (d/2) right-hand neighbours.
It should be fairly obvious how to turn this into an adjacency matrix (hint: it will be a circulant matrix, so you may find the toeplitz function useful).
Extending this to odd d is not much harder... (although note there is no solution if both N and d are odd)

Lagrange interpolation method

I use convolution and for loops (too much for loops) for calculating the interpolation using
Lagrange's method , here's the main code :
function[p] = lagrange_interpolation(X,Y)
L = zeros(n);
p = zeros(1,n);
% computing L matrice, so that each row i holds the polynom L_i
% Now we compute li(x) for i=0....n ,and we build the polynomial
for k=1:n
multiplier = 1;
outputConv = ones(1,1);
for index = 1:n
if(index ~= k && X(index) ~= X(k))
outputConv = conv(outputConv,[1,-X(index)]);
multiplier = multiplier * ((X(k) - X(index))^-1);
polynimialSize = length(outputConv);
for index = 1:polynimialSize
L(k,n - index + 1) = outputConv(polynimialSize - index + 1);
L(k,:) = multiplier .* L(k,:);
% continues
Those are too much for loops for computing the l_i(x) (this is done before the last calculation of P_n(x) = Sigma of y_i * l_i(x)) .
Any suggestions into making it more matlab formal ?
Yeah, several suggestions (implemented in version 1 below): if loop can be combined with for above it (just make index skip k via something like jr(jr~=j) below); polynomialSize is always equal length(outputConv) which is always equal n (because you have n datapoints, (n-1)th polynomial with n coefficients), so the last for loop and next line can be also replaced with simple L(k,:) = multiplier * outputConv;
So I replicated the example on (and adopted their j-m notation, but for me j goes 1:n and m is 1:n and m~=j), hence my initialization looks like
clear; clc;
X=[-9 -4 -1 7]; %example taken from
Y=[ 5 2 -2 9];
n=length(X); %Lagrange basis polinomials are (n-1)th order, have n coefficients
lj = zeros(1,n); %storage for numerator of Lagrange basis polyns - each w/ n coeff
Lj = zeros(n); %matrix of Lagrange basis polyns coeffs (lj(x))
L = zeros(1,n); %the Lagrange polynomial coefficients (L(x))
then v 1.0 looks like
jr=1:n; %j-range: 1<=j<=n
for j=jr %my j is your k
multiplier = 1;
outputConv = 1; %numerator of lj(x)
mr=jr(jr~=j); %m-range: 1<=m<=n, m~=j
for m = mr %my m is your index
outputConv = conv(outputConv,[1 -X(m)]);
multiplier = multiplier * ((X(j) - X(m))^-1);
Lj(j,:) = multiplier * outputConv; %jth Lagrange basis polinomial lj(x)
L = Y*Lj; %coefficients of Lagrange polinomial L(x)
which can be further simplified if you realize that numerator of l_j(x) is just a polynomial with specific roots - for that there is a nice command in matlab - poly. Similarly the denominator is just that polyn evaluated at X(j) - for that there is polyval. Hence, v 1.9:
jr=1:n; %j-range: 1<=j<=n
for j=jr
mr=jr(jr~=j); %m-range: 1<=m<=n, m~=j
lj=poly(X(mr)); %numerator of lj(x)
mult=1/polyval(lj,X(j)); %denominator of lj(x)
Lj(j,:) = mult * lj; %jth Lagrange basis polinomial lj(x)
L = Y*Lj; %coefficients of Lagrange polinomial L(x)
Why version 1.9 and not 2.0? well, there is probably a way to get rid of this last for loop, and write it all in 1 line, but I can't think of it right now - it's a todo for v 2.0 :)
And, for dessert, if you want to get the same picture as wikipedia:
hold on
hold off
xlim([-10 10])
ylim([-10 10])
grid on
enjoy and feel free to reuse/improve
X=0:1/20:1; Y=cos(X) and create L and apply polyval(L,1).
Why there is huge difference?