Feature mapping using multi-variable polynomial - matlab

Consider we have a data-matrix of data points and we are interested to map those data points into a higher dimensional feature space. We can do this by using d-degree polynomials. Thus for a sequence of data points the new data-matrix is
I have studied a relevant script (Andrew Ng. online course) that make such a transform for 2-dimensional data points to a higher feature space. However, I could not figure out a way to generalize in arbitrary higher dimensional samples, . Here is the code:
d = 6;
m = size(D,1);
new = ones(m);
for k = 1:d
for l = 0:k
new(:, end+1) = (x1.^(k-l)).*(x2.^l);
end
end
Can we vectorize this code? Also given a data-matrix could you please suggest a way on how we can transform data points of arbitrary dimension to a higher one using a d-dimensional polynomial?
PS: A generalization of d-dimensional data points would be very helpful.

This solution can handle k variables and generate all the terms of a degree d polynomial where k and d are non-negative integers. Most of the code length is due to the combinatoric complexity of generating all the terms of a degree d polynomial in k variables.
It takes an n_obs by k data matrix X where n_obs is the number of observations and k is the number of variables.
Helper function
This function generates all possible rows such that every entry is a non-negative integer and the row sums to a positive integer:
the row [0, 1, 3, 0, 1] corresponds to (x1^0)*(x1^1)*(x2^3)*(x4^0)*(x5^1)
The function (which almost certainly could be written more efficiently) is:
function result = mg_sums(n_numbers, d)
if(n_numbers<=1)
result = d;
else
result = zeros(0, n_numbers);
for(i = d:-1:0)
rc = mg_sums(n_numbers - 1, d - i);
result = [result; i * ones(size(rc,1), 1), rc];
end
end
Initialization code
n_obs = 1000; % number observations
n_vars = 3; % number of variables
max_degree = 4; % order of polynomial
X = rand(n_obs, n_vars); % generate random, strictly positive data
stacked = zeros(0, n_vars); %this will collect all the coefficients...
for(d = 1:max_degree) % for degree 1 polynomial to degree 'order'
stacked = [stacked; mg_sums(n_vars, d)];
end
Final Step: Method 1
newX = zeros(size(X,1), size(stacked,1));
for(i = 1:size(stacked,1))
accumulator = ones(n_obs, 1);
for(j = 1:n_vars)
accumulator = accumulator .* X(:,j).^stacked(i,j);
end
newX(:,i) = accumulator;
end
Use either method 1 or method 2.
Final Step: Method 2 (requires all data in data matrix X is strictly positive (The problem is that if you have 0 elements, the -inf doesn't propagate properly when you call the matrix algebra routines.)
newX = real(exp(log(X) * stacked')); % multiplying log of data matrix by the
% matrix of all possible exponent combinations
% effectively raises terms to powers and multiplies them!
Example Run
X = [2, 3, 5];
max_degree = 3;
The stacked matrix and the polynomial term it represents are:
1 0 0 x1 2
0 1 0 x2 3
0 0 1 x3 5
2 0 0 x1.^2 4
1 1 0 x1.*x2 6
1 0 1 x1.*x3 10
0 2 0 x2.^2 9
0 1 1 x2.*x3 15
0 0 2 x3.^2 25
3 0 0 x1.^3 8
2 1 0 x1.^2.*x2 12
2 0 1 x1.^2.*x3 20
1 2 0 x1.*x2.^2 18
1 1 1 x1.*x2.*x3 30
1 0 2 x1.*x3.^2 50
0 3 0 x2.^3 27
0 2 1 x2.^2.*x3 45
0 1 2 x2.*x3.^2 75
0 0 3 x3.^3 125
If data matrix X is [2, 3, 5] this correctly generates:
newX = [2, 3, 5, 4, 6, 10, 9, 15, 25, 8, 12, 20, 18, 30, 50, 27, 45, 75, 125];
Where the 1st column is x1, 2nd is x2, 3rd is x3, 4th is x1.^2, 5th is x1.*x2 etc...

Related

What does std(A, 0, 3) mean?

I am running the following code in which I am generating ten 4 x 4 matrix with random values.
A = zeros(4,4,10);
for idx = 1:size(A,3)
A(:,:,idx) = [1 2 3 4; 5 6 7 8; 9 10 11 12; 0 0 0 1].*randn(4,4)
end
X = std(A, 0, 3)
X = std(A, 0, 1) gives the standard deviation of each column and
X = std(A, 0, 2) gives the standard deviation of each row.
What does X = std(A, 0, 3) give?
I am getting a 4x4 matrix value answer as follows
4.0479 2.7137 1.8706 1.2579
4.9812 9.0766 7.2079 4.1866
1.0548 2.7205 3.3140 3.8712
0 0 0 0.8496
The X = std(A, 0, 3) is the standard deviation across the third dimension.
The 0 argument is the degrees of freedom for the bias normalization. In this case the denominator is N-1
If you use 1, it’s going to be N
From the documentation:
w β€” Weight
0 (default) | 1 | vector
Weight, specified as one of these values:
0 β€” Normalize by N-1, where N is the number of observations. If there is only one observation, then the weight is 1.
1 β€” Normalize by N.
Vector made up of nonnegative scalar weights corresponding to the dimension of A along which the standard deviation is calculated.

How to reduce coefficients to their lowest possible integers using Matlab - Balancing Chemical Equations

I am attempting to develop a Matlab program to balance chemical equations. I am able to balance them via solving a system of linear equations. Currently my output is a column vector with the coefficients.
My problem is that I need to return the smallest integer values of these coefficients. For example, if [10, 20, 30] was returned. I want [1, 2, 3] to be returned.
What is the best way to accomplish this?
I want this program to be fully autonomous once it is fed a matrix with the linear system. Thus I can not play around with the values, I need to automate this from the code. Thanks!
% Chemical Equation in Matrix Form
Chem = [1 0 0 -1 0 0 0; 1 0 1 0 0 -3 0; 0 2 0 0 -1 0 0; 0 10 0 0 0 -1 0; 0 35 4 -4 0 12 1; 0 0 2 -1 -3 0 2]
%set x4 = 1 then Chem(:, 4) = b and
b = Chem(:, 4); % Arbitrarily set x4 = 1 and set its column equal to b
Chem(:,4) = [] % Delete the x4 column from Chem and shift over
g = 1; % Initialize variable for LCM
x = Chem\b % This is equivalent to the reduced row echelon form of
% Chem | b
% Below is my sad attempt at factoring the values, I divide by the smallest decimal to raise all the values to numbers greater than or equal to 1
for n = 1:numel(x)
g = x(n)*g
M = -min(abs(x))
y = x./M
end
I want code that will take some vector with coefficients, and return an equivalent coefficient vector with the lowest possible integer coefficients. Thanks!
I was able to find a solution without using integer programming. I converted the non-integer values to rational expressions, and used a built-in matlab function to extract the denominator of each of these expressions. I then used a built in matlab function to find the least common multiples of these values. Finally, I multiplied the least common multiple by the matrix to find my answer coefficients.
% Chemical Equation in Matrix Form
clear, clc
% Enter chemical equation as a linear system in matrix form as Chem
Chem = [1 0 0 -1 0 0 0; 1 0 1 0 0 -3 0; 0 2 0 0 -1 0 0; 0 10 0 0 0 -1 0; 0 35 4 -4 0 -12 -1; 0 0 2 -1 -3 0 -2];
% row reduce the system
C = rref(Chem);
% parametrize the system by setting the last variable xend (e.g. x7) = 1
x = [C(:,end);1];
% extract numerator and denominator from the rational expressions of these
% values
[N,D] = rat(x);
% take the least common multiple of the first pair, set this to the
% variable least
least = lcm(D(1),D(2));
% loop through taking the lcm of the previous values with the next value
% through x
for n = 3:numel(x)
least = lcm(least,D(n));
end
% give answer as column vector with the coefficients (now factored to their
% lowest possible integers
coeff = abs(least.*x)

Creating new matrix from combining two other by using an if-statement in MATLAB

I have a matrix in dimension 8x6. Half of the elements in this matrix are 0, which is totally fine. Now, I would like to refer to another matrix which is 160x6. The 8x6 matrix is based on the 160x6 matrix and results from a rolling window (20 observations).
I would like to create a new matrix (again 160x6). Whenever an element in my 8x6 matrix equals 0, I would like the 20 observations from the original 160x6 matrix referring to this element (being 0) to be 0 as well.
I have tried the following:
for t=1:T
for i=1:N
if B(:,i) == 0;
C(t,i) = 0;
else
C(t,i) = A(t,i);
end
end
end
where I have:
A being the 160x6 matrix
B being the 8x6 matrix
C being the new output as 160x6 matrix
At the moment, I obtain a "new" 160x6 matrix (C), but it exactly replicates the original 160x6 matrix (A). So the looping or the if statement is incorrect.
I will give a small example based on my understanding of your problem.
>> B = randi(10,8,6) - 5; % Sample B matrix
B =
-4 0 -4 4 5 1
-2 3 3 1 3 -4
-1 3 1 -3 1 4
2 5 0 -2 0 4
-3 4 5 4 -4 3
3 -1 2 -4 2 -3
-3 2 2 0 -4 2
2 -3 4 -3 -4 1
In this matrix you want to identify the locations which have 0 e.g. (1,2), (4,3), (4,5). and in the 160 x 6 matrix (1:20,2), (61:80,3), (61:80,5) should be zero. You can use the repelem function to get such indexes.
>> zeroIdx = repelem(B == 0,20,1)
zeroIdx would contain true for wherever B is zero, with each row repeated twenty times.
>> C = A
>> C(zeroIdx) = 0 % Assign 0 to C using zeroIdx
Check the following:
%Initialize A matrix with ones for testing.
A = ones(160, 6);
B = ones(8, 6);
%Put few zeros in B
B(1:2:end,1) = 0;
B(5:3:end,3) = 0;
T = 160;
N = 6;
for t=1:T
for i=1:N
%The formula k = floor((t-1)/20)+1 equals 1, 1, 1, 1... 20 times, then 2, 2, 2, 2... 20 times
k = floor((t-1)/20)+1;
if B(k,i) == 0;
C(t,i) = 0;
else
C(t,i) = A(t,i);
end
end
end
%Display C as an image (for testing).
figure;imagesc(C);colormap gray
Image for testing result:
Values of k are demonstrated in the following graph:
T=160;t = 1:T;k = floor((t-1)/20)+1;figure;plot(t, k, 'x');grid on;
Most compact solution I could achieved:
C = A.*imresize((B ~= 0), size(A), 'nearest');

Matlab operation without loops

I have a matrix distance = [d11,d12,d13,d14;d15;...;dn1,dn2,dn3,dn4,dn5]; and a vector index(n,1). The value of index are between 1 and 5.
I want to get the sum of the distance according to the index if a vector R(1,5).
An example :
distance = [1,2,4,1,2 ; 4,5,6,1,6 ; 7,8,9,5,8] and index = [1;1;3]
So, I want R(1) = 1+4 = 5, R(2) = 0, R(3) = 9, and R(4) = R(5) = 0
The condition is to not use a loop over 1:5 with a if condition in order to minimise the time of execution if there are billions of points.
Maybe it is possible with arrayfun, but I don't succeed.
Best regards
To find out which elements you have to sum up, you can use the bsxfun to create a matrix containing a 1 if the value is relevant, and a 0 otherwise. This can done with
bsxfun(#eq, index, 1:5)
which will create a vector [1, 2, 3, 4, 5] and do an element-wise comparison between index and that vector. The result of this function is
ans =
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
Now you can multiply this matrix with the distance matrix (element-wise!) and finally sum over each column:
>> R = sum(bsxfun(#eq, index, 1:5) .* distance, 1);
which results in
R =
5 0 9 0 0

Finding the column index for the 1 in each row of a matrix

I have the following matrix in Matlab:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
Each row has exactly one 1. How can I (without looping) determine a column vector so that the first element is a 2 if there is a 1 in the second column, the second element is a 3 for a one in the third column etc.? The above example should turn into:
M = [ 3
1
2
1
3];
You can actually solve this with simple matrix multiplication.
result = M * (1:size(M, 2)).';
3
1
2
1
3
This works by multiplying your M x 3 matrix with a 3 x 1 array where the elements of the 3x1 are simply [1; 2; 3]. Briefly, for each row of M, element-wise multiplication is performed with the 3 x 1 array. Only the 1's in the row of M will yield anything in the result. Then the result of this element-wise multiplication is summed. Because you only have one "1" per row, the result is going to be the column index where that 1 is located.
So for example for the first row of M.
element_wise_multiplication = [0 0 1] .* [1 2 3]
[0, 0, 3]
sum(element_wise_multiplication)
3
Update
Based on the solutions provided by #reyryeng and #Luis below, I decided to run a comparison to see how the performance of the various methods compared.
To setup the test matrix (M) I created a matrix of the form specified in the original question and varied the number of rows. Which column had the 1 was chosen randomly using randi([1 nCols], size(M, 1)). Execution times were analyzed using timeit.
When run using M of type double (MATLAB's default) you get the following execution times.
If M is a logical, then the matrix multiplication takes a hit due to the fact that it has to be converted to a numerical type prior to matrix multiplication, whereas the other two have a bit of a performance improvement.
Here is the test code that I used.
sizes = round(linspace(100, 100000, 100));
times = zeros(numel(sizes), 3);
for k = 1:numel(sizes)
M = generateM(sizes(k));
times(k,1) = timeit(#()M * (1:size(M, 2)).');
M = generateM(sizes(k));
times(k,2) = timeit(#()max(M, [], 2), 2);
M = generateM(sizes(k));
times(k,3) = timeit(#()find(M.'), 2);
end
figure
plot(range, times / 1000);
legend({'Multiplication', 'Max', 'Find'})
xlabel('Number of rows in M')
ylabel('Execution Time (ms)')
function M = generateM(nRows)
M = zeros(nRows, 3);
col = randi([1 size(M, 2)], 1, size(M, 1));
M(sub2ind(size(M), 1:numel(col), col)) = 1;
end
You can also abuse find and observe the row positions of the transpose of M. You have to transpose the matrix first as find operates in column major order:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
[out,~] = find(M.');
Not sure if this is faster than matrix multiplication though.
Yet another approach: use the second output of max:
[~, result] = max(M.', [], 1);
Or, as suggested by #rayryeng, use max along the second dimension instead of transposing M:
[~, result] = max(M, [], 2);
For
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
this gives
result =
3 1 2 1 3
If M contains more than one 1 in a given row, this will give the index of the first such 1.