I am writing the following code for Gram Schmidt Orthogonalization. It says that there's an error in calling the function. What's the error and how to rectify it?
A =[1,1,1,1;-1,4,4,-1;4,-2,2,0];
A =A';
B=myGramschmidt(A);
function [B] = myGramschmidt(A)
x1=A(:,1);
x2=A(:,2);
x3=A(:,3);
v1=x1;
c = dot(v1);
v2 = x2-((dot(x2,v1)/c)* v1);
d = dot(v2);
v3 = x3-((dot(x3,v1)/c)* v1)-((dot(x3,v2)/d)* v2);
C=[v1,v2,v3];
V1=normc(v1);
V2=normc(v2);
V3=normc(v3);
B=[V1,V2,V3];
end
Using the Wikipedia Gram-Schmidt page, but Luis Mendo is correct as to why you got the error.
function [B] = myGramschmidt(A)
B = A;
for k = 1:size(A, 1)
for j = 1:k-1
B(k, :) = B(k, :) - proj(B(j, :), A(k, :));
end
end
end
function p = proj(u, v)
% https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process#The_Gram.E2.80.93Schmidt_process
p = dot(v, u) / dot(u, u) * u;
end
Try this vectorized implementation in python.
Also I would suggest to go through David C lay book for theory.
def replace_zero(array):
for i in range(len(array)) :
if array[i] == 0 :
array[i] = 1
return array
def gram_schmidt(self,A, norm=True, row_vect=False):
"""Orthonormalizes vectors by gram-schmidt process
Parameters
-----------
A : ndarray,
Matrix having vectors in its columns
norm : bool,
Do you need Normalized vectors?
row_vect: bool,
Does Matrix A has vectors in its rows?
Returns
-------
G : ndarray,
Matrix of orthogonal vectors
Gram-Schmidt Process
--------------------
The Gram–Schmidt process is a simple algorithm for
producing an orthogonal or orthonormal basis for any
nonzero subspace of Rn.
Given a basis {x1,....,xp} for a nonzero subspace W of Rn,
define
v1 = x1
v2 = x2 - (x2.v1/v1.v1) * v1
v3 = x3 - (x3.v1/v1.v1) * v1 - (x3.v2/v2.v2) * v2
.
.
.
vp = xp - (xp.v1/v1.v1) * v1 - (xp.v2/v2.v2) * v2 - .......
.... - (xp.v(p-1) / v(p-1).v(p-1) ) * v(p-1)
Then {v1,.....,vp} is an orthogonal basis for W .
In addition,
Span {v1,.....,vp} = Span {x1,.....,xp} for 1 <= k <= p
References
----------
Linear Algebra and Its Applications - By David.C.Lay
"""
if row_vect :
# if true, transpose it to make column vector matrix
A = A.T
no_of_vectors = A.shape[1]
G = A[:,0:1].copy() # copy the first vector in matrix
# 0:1 is done to to be consistent with dimensions - [[1,2,3]]
# iterate from 2nd vector to number of vectors
for i in range(1,no_of_vectors):
# calculates weights(coefficents) for every vector in G
numerator = A[:,i].dot(G)
denominator = np.diag(np.dot(G.T,G)) #to get elements in diagonal
weights = np.squeeze(numerator/denominator)
# projected vector onto subspace G
projected_vector = np.sum(weights * G,
axis=1,
keepdims=True)
# orthogonal vector to subspace G
orthogonalized_vector = A[:,i:i+1] - projected_vector
# now add the orthogonal vector to our set
G = np.hstack((G,orthogonalized_vector))
if norm :
# to get orthoNormal vectors (unit orthogonal vectors)
# replace zero to 1 to deal with division by 0 if matrix has 0 vector
# or normazalization value comes out to be zero
G = G/self.replace_zero(np.linalg.norm(G,axis=0))
if row_vect:
return G.T
return G
G = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])
gram_schmidt(G)
>
array([[ 0.5 , -0.8660254 , 0. ],
[ 0.5 , 0.28867513, -0.81649658],
[ 0.5 , 0.28867513, 0.40824829],
[ 0.5 , 0.28867513, 0.40824829]])
Related
I have an n x p matrix - mX which is composed of n points in R^p.
I have another m x p matrix - mY which is composed of m reference points in R^p.
I would like to create an n x m matrix - mD which is the Mahalanobis Distance matrix.
D(i, j) means the Mahalanobis Distance between point i in mX, mX(i, :) and point j in mY, mY(j, :).
Namely, is computes the following:
mD(i, j) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).';
Where mC is the given Mahalanobis Distance PSD Matrix.
It is easy to be done in a loop, is there a way to vectorize it?
Namely, is the a function which its inputs are mX, mY and mC and its output is mD and fully vectorized without using any MATLAB toolbox?
Thank You.
Approach #1
Assuming infinite resources, here's one vectorized solution using bsxfun and matrix-multiplication -
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),n,m);
Approach #2
Here's a comprise solution trying to reduce the loop complexity -
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
imC = inv(mC);
out = zeros(n*m,1);
for ii = 1:n*m
out(ii) = A(ii,:)*imC*A(ii,:).';
end
out = reshape(out,n,m);
Sample run -
>> n = 3; m = 4; p = 5;
mX = rand(n,p);
mY = rand(m,p);
mC = rand(p,p);
imC = inv(mC);
>> %// Original solution
for i = 1:n
for j = 1:m
mD(i, j) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).'; %//'
end
end
>> mD
mD =
-8.4256 10.032 2.8929 7.1762
-44.748 -4.3851 -13.645 -9.6702
-4.5297 3.2928 0.11132 2.5998
>> %// Approach #1
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),n,m); %//'
>> out
out =
-8.4256 10.032 2.8929 7.1762
-44.748 -4.3851 -13.645 -9.6702
-4.5297 3.2928 0.11132 2.5998
>> %// Approach #2
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
imC = inv(mC);
out1 = zeros(n*m,1);
for ii = 1:n*m
out1(ii) = A(ii,:)*imC*A(ii,:).'; %//'
end
out1 = reshape(out1,n,m);
>> out1
out1 =
-8.4256 10.032 2.8929 7.1762
-44.748 -4.3851 -13.645 -9.6702
-4.5297 3.2928 0.11132 2.5998
Instead if you had :
mD(j, i) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).';
The solutions would translate to the versions listed next.
Approach #1
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),m,n);
Approach #2
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
imC = inv(mC);
out1 = zeros(m*n,1);
for i = 1:n*m
out(i) = A(i,:)*imC*A(i,:).'; %//'
end
out = reshape(out,m,n);
Sample run -
>> n = 3; m = 4; p = 5;
mX = rand(n,p); mY = rand(m,p); mC = rand(p,p); imC = inv(mC);
>> %// Original solution
for i = 1:n
for j = 1:m
mD(j, i) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).'; %//'
end
end
>> mD
mD =
0.81755 0.33205 0.82254
1.7086 1.3363 2.4209
0.36495 0.78394 -0.33097
0.17359 0.3889 -1.0624
>> %// Approach #1
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),m,n); %//'
>> out
out =
0.81755 0.33205 0.82254
1.7086 1.3363 2.4209
0.36495 0.78394 -0.33097
0.17359 0.3889 -1.0624
>> %// Approach #2
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
imC = inv(mC);
out1 = zeros(m*n,1);
for i = 1:n*m
out1(i) = A(i,:)*imC*A(i,:).'; %//'
end
out1 = reshape(out1,m,n);
>> out1
out1 =
0.81755 0.33205 0.82254
1.7086 1.3363 2.4209
0.36495 0.78394 -0.33097
0.17359 0.3889 -1.0624
Here is one solution that eliminates one loop
function d = mahalanobis(mX, mY)
n = size(mX, 2);
m = size(mY, 2);
data = [mX, mY];
mc = cov(transpose(data));
dist = zeros(n,m);
for i = 1 : n
diff = repmat(mX(:,i), 1, m) - mY;
dist(i,:) = sum((mc\diff).*diff , 1);
end
d = sqrt(dist);
end
You would invoke it as:
d = mahalanobis(transpose(X),transpose(Y))
Reduce to L2
It seems that Mahalanobis Distance can be reduced to ordinary L2 distance if you are allowed to preprocess matrix mC and you are not afraid of numerical differences.
First of all, compute Cholesky decomposition of mC:
mR = chol(mC) % C = R^t * R, where R is upper-triangular
Now we can use these factors to reformulate Mahalanobis Distance:
(Xi-Yj) * inv(C) * (Xi-Yj)^t = || (Xi-Yj) inv(R) ||^2 = ||TXi - TYj||^2
where: TXi = Xi * inv(R)
TYj = Yj * inv(R)
So the idea is to transform points Xi, Yj to TXi, TYj first, and then compute euclidean distances between them. Here is the algorithm outline:
Compute mR - Cholesky factor of covariance matrix mC (takes O(p^3) time).
Invert triangular matrix mR (takes O(p^3) time).
Multiply both mX and mY by inv(mR) on the right (takes O(p^2 (m+n)) time).
Compute squared L2 distances between pairs of points (takes O(m n p) time).
Total time is O(m n p + (m + n) p^2 + p^3) versus original O(m n p^2). It should work faster when 1 << p << n,m. In such case step 4 would takes most of the time and should be vectorized.
Vectorization
I have little experience of MATLAB, but quite a lot of SIMD vectorization on x86 CPUs. In raw computations, it would be enough to vectorize along one sufficiently large array dimension, and make trivial loops for the other dimensions.
If you expect p to be large enough, it may probably be OK to vectorize along coordinates of points, and make two nested loops for i <= n and j <= m. That's similar to what #Daniel posted.
If p is not sufficiently large, you can vectorize along one of the point sequences instead. This would be similar to solution posted by #dpmcmlxxvi: you have to subtract single row of one matrix from all the rows of the second matrix, then compute squared norms of the resulting rows. Repeat n times (
or m times).
As for me, full vectorization (which means rewriting with matrix operations instead of loops in MATLAB) does not sound like a clever performance goal. Most likely partially vectorized solutions would be optimally fast.
I came to the conclusion that vectorizing this problem is not efficient. My best idea for vectorizing this problem would require m x n x p x p working memory, at least if everything is processed at once. This means with n=m=p=152 the code would already require 4GB Ram. At these dimensions, my system can run the loop in less than a second:
mD=zeros(size(mX,1),size(mY,1));
ImC=inv(mC);
for i=1:size(mX,1)
for j=1:size(mY,1)
d=mX(i, :) - mY(j, :);
mD(i, j) = (d) * ImC * (d).';
end
end
I need to create arbitrary perpendicular vector n with components (a, b, c) to another known vector k with components (x,y,z).
The following code creates arbitrary vector n, but I need random numbers for components in the range [-inf, inf] how can I acheive that? (because otherwise vector components created may not exceed some value in given case 10^11 ) Or maybe concept "arbitrary vector" does not require that?
function [a,b,c] = randomOrghogonalVector(x,y,z)
a = 0;
b = 0;
c = 0;
randomDistr = rand * 10^11 * 2 - 10^11; % issue 1
% excluding trivial solution
if x == 0 && y == 0 && z ==0
a = NaN; b = a; c = a;
else
if z ~=0
a = randomDistr;
b = randomDistr;
c = - (x * a + b * y ) / z;
else
if z == 0 && x ~= 0
c = randomDistr;
b = randomDistr;
a = - (z * c + b * y ) / x;
else
if z == 0 && x == 0 && y ~= 0
c = randomDistr;
a = randomDistr;
b = - (z * c + a * x ) / y;
end
end
end
end
The easiest solution I see is to first find a random vector that is orthogonal to your original vector, and then give it a random length. In Matlab, this can be done by defining the following function
function [a, b, c] = orthoVector(x, y, z)
xin = [x;y;z];
e = xin;
while ((e'*xin)==xin'*xin)
e = 2.*rand(3,1)-1;
end
xout = cross(xin, e);
xout = 1.0/(rand()) * xout;
a = xout(1);
b = xout(2);
c = xout(3);
end
Line-by-line, here's what I'm doing:
you asked for this format [a,b,c] = f(x,y,z). I would recommend using function xout = orthoVector(xin), which would make this code even shorter.
Since Matlab handles vectors best as vectors, I'm creating vector xin.
e will be one random vector, different from xin used to compute the orthogonal vector. Since we're dealing with random vectors, we initialize it to be equal to xin.
For this algorithm to work, we need to make sure that e and xin are pointing in different directions. Until this is the case...
...create a new random vector e. Note that rand will give values between 0 and 1. Thus, each component of e will be between -1 and 1.
Ok, if we end, e and xin are pointing in different directions
Our vector xout will be orthogonal to xin and e.
Let's multiply vector xout by a random number between 1 and "very large"
a is first component of xout
b is second component of xout
c is third component of xout
all done.
Optional: if you want to have very large vectors, you could replace line 8 by
xout = exp(1./rand())/(rand()) * xout;
This will give you a very large spread of values.
Hope this helps, cheers!
Could the following equation be possible in MATLAB?
Suppose we have given data of length N, and we want to consider a linear equation among the some L nubmers and find coefficients ai. Is this possible? Because if yes, then coefficients can be solved by
a = pinv(D)*d
where D is a given matrix and d is a left vector.
The above equation comes from the following linear models
k = L, L+1, L+2, N-1
I have tested this code with some fixed f.
unction [a] = find_coeficient(y,N,L)
Lp = L + 1;
Np = N + 1;
d = y(L:N-1);
D=[];
for ii=Lp:(Np-1)
% Index into the y vector for each row of D
D = vertcat(D, y(ii:-1:(ii-Lp+1))');
end
a = D\d;
end
Is it correct?
This is absolutely possible in MATLAB. However, 0 indexes are not natively supported. You will need to do a "change of variables" by letting the index in each element be index+1. Here's a bit of an example:
% Generate some data
N = 40;
y = 10 * randn(N,1);
% Select an L value
L = N - 4 + 1;
d = y(L:N);
D = reshape(y,4,10);
% Solve the equation using the '\' rather than the pseudo inverse
b = D\d
For more information on the divide operator, see Systems of Linear Equations.
OK, I've thought through this a bit more. Part of the confusion here is the change of variable limits. The substitution applies to the indexing variable, not the size of the data, so L and N are unchanged, but the index is adjusted to keep it from falling off the edge of the array. So in the formula, just add 1 to every element index.
y[L] = [ y[L-1] y[L-2] ... y[0] ] * a1
.
.
y[N-1] = [ y[N-2] y[N-3] ... y[N-L-1] ] * aL
becomes:
y[L+1] = [ y[L-1+1] y[L-2+1] ... y[0+1] ] * a1
.
.
y[N-1+1] = [ y[N-2+1] y[N-3+1] ... y[N-L-1+1] ] * aL
=
y[L+1] = [ y[L] y[L-1] ... y[1] ] * a1
.
.
y[N] = [ y[N-1] y[N-2] ... y[N-L] ] * aL
Which we can then use to complete our script:
function a = find_coeficient(y,N,L)
d = y((L+1):N);
D=[];
for ii=L:(N-1)
% index into the y vector for each row of D
D = vertcat(D, y(ii:-1:(ii-L+1))');
end
a = D\d;
end
Suppose we are given a training dataset {yᵢ, xᵢ}, for i = 1, ..., n, where yᵢ can either be -1 or 1 and xᵢ can be e.g. a 2D or 3D point.
In general, when the input points are linearly separable, the SVM model can be defined as follows
min 1/2*||w||²
w,b
subject to the constraints (for i = 1, ..., n)
yᵢ*(w*xᵢ - b) >= 1
This is often called the hard-margin SVM model, which is thus a constrained minimization problem, where the unknowns are w and b. We can also omit 1/2 in the function to be minimized, given it's just a constant.
Now, the documentation about Matlab's quadprog states
x = quadprog(H, f, A, b) minimizes 1/2*x'*H*x + f'*x subject to the restrictions A*x ≤ b. A is a matrix of doubles, and b is a vector of doubles.
We can implement the hard-margin SVM model using quadprog function, to get the weight vector w, as follows
H becomes an identity matrix.
f' becomes a zeros matrix.
A is the left-hand side of the constraints
b is equal to -1 because the original constraint had >= 1, it becomes <= -1 when we multiply with -1 on both sides.
Now, I am trying to implement a soft-margin SVM model. The minimization equation here is
min (1/2)*||w||² + C*(∑ ζᵢ)
w,b
subject to the constraints (for i = 1, ..., n)
yᵢ*(w*xᵢ - b) >= 1 - ζᵢ
such that ζᵢ >= 0, where ∑ is the summation symbol, ζᵢ = max(0, 1 - yᵢ*(w*xᵢ - b)) and C is a hyper-parameter.
How can this optimization problem be solved using the Matlab's quadprog function? It's not clear to me how the equation should be mapped to the parameters of the quadprog function.
The "primal" form of the soft-margin SVM model (i.e. the definition above) can be converted to a "dual" form. I did that, and I am able to get the Lagrange variable values (in the dual form). However, I would like to know if I can use quadprog to solve directly the primal form without needing to convert it to the dual form.
I don't see how it can be a problem. Let z be our vector of (2n + 1) variables:
z = (w, eps, b)
Then, H becomes diagonal matrix with first n values on the diagonal equal to 1 and the last n + 1 set to zero:
H = diag([ones(1, n), zeros(1, n + 1)])
Vector f can be expressed as:
f = [zeros(1, n), C * ones(1, n), 0]'
First set of constrains becomes:
Aineq = [A1, eye(n), zeros(n, 1)]
bineq = ones(n, 1)
where A1 is a the same matrix as in primal form.
Second set of constraints becomes lower bounds:
lb = (inf(n, 1), zeros(n, 1), inf(n, 1))
Then you can call MATLAB:
z = quadprog(H, f, Aineq, bineq, [], [], lb);
P.S. I can be mistaken in some small details, but the general idea is right.
I wanted to clarify #vharavy answer because you could get lost while trying to deduce what 'n' means in his code. Here is my version according to his answer and SVM wikipedia article. I assume we have a file named "test.dat" which holds coordinates of test points and their class membership in the last column.
Example content of "test.dat" with 3D points:
-3,-3,-2,-1
-1,3,2,1
5,4,1,1
1,1,1,1
-2,5,4,1
6,0,1,1
-5,-5,-3,-1
0,-6,1,-1
-7,-2,-2,-1
Here is the code:
data = readtable("test.dat");
tableSize = size(data);
numOfPoints = tableSize(1);
dimension = tableSize(2) - 1;
PointsCoords = data(:, 1:dimension);
PointsSide = data.(dimension+1);
C = 0.5; %can be changed
n = dimension;
m = numOfPoints; %can be also interpretet as number of constraints
%z = [w, eps, b]; number of variables in 'z' is equal to n + m + 1
H = diag([ones(1, n), zeros(1, m + 1)]);
f = [zeros(1, n), C * ones(1, m), 0];
Aineq = [-diag(PointsSide)*table2array(PointsCoords), -eye(m), PointsSide];
bineq = -ones(m, 1);
lb = [-inf(1, n), zeros(1, m), -inf];
z = quadprog(H, f, Aineq, bineq, [], [], lb);
If let z = (w; w0; eps)T be a the long vector with n+1+m elements.(m the number of points)
Then,
H= diag([ones(1,n),zeros(1,m+1)]).
f = [zeros(1; n + 1); ones(1;m)]
The inequality constraints can be specified as :
A = -diag(y)[X; ones(m; 1); zeroes(m;m)] -[zeros(m,n+1),eye(m)],
where X is the n x m input matrix in the primal form.Out of the 2 parts for A, the first part is for w0 and the second part is for eps.
b = ones(m,1)
The equality constraints :
Aeq = zeros(1,n+1 +m)
beq = 0
Bounds:
lb = [-inf*ones(n+1,1); zeros(m,1)]
ub = [inf*ones(n+1+m,1)]
Now, z=quadprog(H,f,A,b,Aeq,beq,lb,ub)
Complete code. The idea is the same as above.
n = size(X,1);
m = size(X,2);
H = diag([ones(1, m), zeros(1, n + 1)]);
f = [zeros(1,m+1) c*ones(1,n)]';
p = diag(Y) * X;
A = -[p Y eye(n)];
B = -ones(n,1);
lb = [-inf * ones(m+1,1) ;zeros(n,1)];
z = quadprog(H,f,A,B,[],[],lb);
w = z(1:m,:);
b = z(m+1:m+1,:);
eps = z(m+2:m+n+1,:);
for i=0:255
m(i+1)=sum((0:i)'.*p(1:i+1)); end
What is happening can anyone explain. p is an array of size 256 elements same as m.
p = (0:255)';
m = zeros(1,256);
for i=0:255
m(i+1)=sum((0:i)'.*p(1:i+1));
end
m[i+1] contains the scalar product of [0,1,2,..,i] with (p[1],...,p[i+1])
You can write it as :
p = (0:255);
m = zeros(1,256);
for i=0:255
m(i+1)=sum((0:i).*p(1:i+1));
end
Or:
p = (0:255);
m = zeros(1,256);
for i=0:255
m(i+1)=(0:i)*p(1:i+1)';
end
In case you don't recall, that is the definition of scalar product
Whatever the p is, you can calculate m by:
dm = (0 : length(p) - 1)' .* p(:); % process as column vector
m = cumsum(dm);
Hint: write the formula for m[n], then for m[n+1], then subtract to get the formula:
m[n+1] - m[n] = (n - 1) * p[n]
and this is dm.