Appropriate zero variance handling for vectorised feature normalization? - matlab

Problem: when doing feature normalisation in Octave, zero-variance input causes div-zero errors.
Question: Is there a nice(r) way to handle div-zero when working with vectorised data?
Example:
Input is a matrix containing multiple datasets in columns:
X = [1 3.5 7.5 9 ;
1 4 8 9 ;
1 4.5 8.5 9]
So X contains three series: x_1 = [1,1,1], x_2 = [7.5, 8, 8.5], and x_3 = [9,9,9]. In order to normalise each set using vectorisation the following approach seems sensible:
mu = mean(X);
sigma = std(X);
X_norm = (1 ./ sigma) .* (X - mu);
However, the above approach will fail because both x_1 and x_3 have zero variance and so division-by-zero errors will occur.
My preferred handling of zero variance data is to set sigma to 1. Currently I'm using the following kludge:
dataset_size = length(sigma);
for index = 1:dataset_size
if sigma(index) == 0
sigma(index) = 1;
endif
end
Notes:
Broadcasting is being used twice here, in the division and subtraction operations
this example is based in Octave, but the question may be equally applicable to MATLAB.
this example is simple for illustration - 'real' usage would have more, larger datasets
this example will treat zero-variance data differently from regular data (imperfect by pragmatic)
zscore sounded relevant, but is (as the name suggests) better suited to calculating a z-score...

Why not just this?
mu = mean(X);
sigma = std(X);
sigma(sigma==0) = 1; %// add this line to remove zeros
X_norm = (1 ./ sigma) .* (X - mu);
Or, to save some operations:
mu = mean(X);
sigma = std(X);
ind = sigma~=0; %// detect zero values
X_norm = X - mu;
X_norm(:,ind) = X_norm(:,ind) ./ sigma(ind) ;
In general, it may be preferable to use
sigma(sigma<=tol) = 1; %// add this line to remove values close to zero
in the first approach, or
ind = sigma>tol; %// detect values close to zero
in the second, for a given tolerance tol (for example tol = 1e-10). This is a better way in applications where finite-precision errors can produce values such as 1e-15 instead of zero.

Related

How do I normalize the data sets in matlab? [duplicate]

I have a provided standardize function for a machine learning course that wasn't well documented and I'm still new to MATLAB so I'm just trying to break down the function. Any explanation of the syntax or the general idea of standardizing would greatly help. We use this function to standardize a set of training data provided in a large matrix. A break down of most of the lines of the code snippet would help me greatly. Thank you so much.
function [X, mean_X, std_X] = standardize(varargin)
switch nargin
case 1
mean_X = mean(varargin{1});
std_X = std(varargin{1});
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std(X(:, i));
end
case 3
mean_X = varargin{2};
std_X = varargin{3};
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std_X(:, i);
end
end
This code accepts a data matrix of size M x N, where M is the dimensionality of one data sample from this matrix and N is the total number of samples. Therefore, one column of this matrix is one data sample. Data samples are all stacked horizontally and are columns.
Now, the true purpose of this code is to take all of the columns of your matrix and standardize / normalize the data so that each data sample exhibits zero mean and unit variance. This means that after this transform, if you found the mean value of any column in this matrix, it would be 0 and the variance would be 1. This is a very standard method for normalizing values in statistical analysis, machine learning, and computer vision.
This actually comes from the z-score in statistical analysis. Specifically, the equation for normalization is:
Given a set of data points, we subtract the value in question by the mean of these data points, then divide by the respective standard deviation. How you'd call this code is the following. Given this matrix, which we will call X, there are two ways you can call this code:
Method #1: [X, mean_X, std_X] = standardize(X);
Method #2: [X, mean_X, std_X] = standardize(X, mu, sigma);
The first method automatically infers the mean of each column of X and the standard deviation of each column of X. mean_X and std_X will both return 1 x N vectors that give you the mean and standard deviation of each column in the matrix X. The second method allows you to manually specify a mean (mu) and standard deviation (sigma) for each column of X. This is possibly for use in debugging, but you would specify both mu and sigma as 1 x N vectors in this case. What is returned for mean_X and std_X is identical to mu and sigma.
The code is a bit poorly written IMHO, because you can certainly achieve this vectorized, but the gist of the code is that it finds the mean of every column of the matrix X if we are are using Method #1, duplicates this vector so that it becomes a M x N matrix, then we subtract this matrix with X. This will subtract each column by its respective mean. We also compute the standard deviation of each column before the mean subtraction.
Once we do that, we then normalize our X by dividing each column by its respective standard deviation. BTW, doing std_X(:, i) is superfluous as std_X is already a 1 x N vector. std_X(:, i) means to grab all of the rows at the ith column. If we already have a 1 x N vector, this can simply be replaced with std_X(i) - a bit overkill for my taste.
Method #2 performs the same thing as Method #1, but we provide our own mean and standard deviation for each column of X.
For the sake of documentation, this is how I would have commented the code:
function [X, mean_X, std_X] = standardize(varargin)
switch nargin %// Check how many input variables we have input into the function
case 1 %// If only one variable - this is the input matrix
mean_X = mean(varargin{1}); %// Find mean of each column
std_X = std(varargin{1}); %// Find standard deviation of each column
%// Take each column of X and subtract by its corresponding mean
%// Take mean_X and duplicate M times vertically
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
%// Next, for each column, normalize by its respective standard deviation
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std(X(:, i));
end
case 3 %// If we provide three inputs
mean_X = varargin{2}; %// Second input is a mean vector
std_X = varargin{3}; %// Third input is a standard deviation vector
%// Apply the code as seen in the first case
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std_X(:, i);
end
end
If I can suggest another way to write this code, I would use the mighty and powerful bsxfun function. This avoids having to do any duplication of elements and we can do this under the hood. I would rewrite this function so that it looks like this:
function [X, mean_X, std_X] = standardize(varargin)
switch nargin
case 1
mean_X = mean(varargin{1}); %// Find mean of each column
std_X = std(varargin{1}); %// Find std. dev. of each column
X = bsxfun(#minus, varargin{1}, mean_X); %// Subtract each column by its respective mean
X = bsxfun(#rdivide, X, std_X); %// Take each column and divide by its respective std dev.
case 3
mean_X = varargin{2};
std_X = varargin{3};
%// Same code as above
X = bsxfun(#minus, varargin{1}, mean_X);
X = bsxfun(#rdivide, X, std_X);
end
I would argue that the new code above is much faster than using for and repmat. In fact, it is known that bsxfun is faster than the former approach - especially for larger matrices.

Image convolution in MATLAB - how is conv is 360x faster than my hand-coded version?

I am playing with image processing algorithms in MATLAB. One of the basic ones is convolving an image with a Gaussian. I ran the following test on a grayscale 800x600 image:
function [Y1, Y2] = testConvolveTime(inputImage)
[m,n] = size(inputImage);
% casting...
inputImage = cast(inputImage, 'single');
Gvec = [1 4 6 4 1]; % sigma=1;
Y1 = zeros(size(inputImage)); % modify it
Y2 = zeros(size(inputImage)); % modify it
%%%%%%%%%%%%%%%%%%% MATLAB CONV %%%%%%%%%%%%%%%%%%%%%
t1 = cputime;
for i=1:m
Y1(i,:) = conv(inputImage(i,:),Gvec,'same');
end
for j=1:n
Y1(:,j) = conv(inputImage(:,j),Gvec','same');
end
Y1 = round(Y1 / 16);
e1 = cputime - t1
%%%%%%%%%%%%%%%%%%% HAND-CODED CONV %%%%%%%%%%%%%%%%%%%%%
t2 = cputime;
for i=1:m
Y2(i,:) = myConv(inputImage(i,:),Gvec)';
end
for j=1:n
Y2(:,j) = myConv(inputImage(:,j),Gvec');
end
Y2 = round(Y2 / 16);
e2 = cputime - t2
end
Here is the code I wrote implementing convolution of 2 vectors:
% mimic MATLAB's conv(u,v,'same') function
% always returns column vec
function y = myConv(u_in, v_in)
len1 = length(u_in);
len2 = length(v_in);
if (len1 >= len2)
u = u_in;
v = v_in;
else
u = v_in;
v = u_in;
end
% from here on: v is the shorter vector (len1 >= len2)
len1 = length(u);
len2 = length(v);
maxLen = len1 + len2 - 1;
ytemp = zeros(maxLen,1);
% first part -- partial overlap
for i=1:len2-1
sum = 0;
for j=1:i
sum = sum + u(i-j+1)*v(j);
end
ytemp(i) = sum;
end
% main part -- complete overlap
for i=len2:len1
sum = 0;
for j=1:len2
sum = sum + u(i-j+1)*v(j);
end
ytemp(i) = sum;
end
% finally -- end portion
for i=len1+1:maxLen
%i-len1+1
sum = 0;
for j=i-len1+1:len2
sum = sum + u(i-j+1)*v(j);
end
ytemp(i) = sum;
end
%y = ytemp;
startIndex = round((maxLen - length(u_in))/2 + 1);
y = ytemp(startIndex:startIndex+length(u_in)-1);
% ^ note: to match MATLAB's conv(u,v,'same'), the output length must be
% that of the first argument.
end
Here are my test results:
>> [Y1, Y2] = testConvolveTime(A1);
e1 =
0.5313
e2 =
195.8906
>> norm(Y1 - Y2)
ans =
0
The norm being 0 verifies mathematical correctness. My questions are as follows:
How can my hand-coded function be >360x slower than the one that uses MATLAB's conv?
Even MATLAB's conv is still "slow" for image processing. If convolving with a Gaussian takes 0.5 of a second, what hope is there for running any image processing algorithms in real-time (e.g. at 24 FPS)? For reference my CPU is Intel N3540 # 2.16 GHz. w/ 8GB of RAM.
^ The real question: when I switch to OpenCV on C++, will operations like this become much faster?
1) conv is so much faster because it is an built-in native function, while your function is interpreted MATLAB code with nested loops.
2) Try imfilter in the Image Processing Toolbox. It may be faster than conv, and it works on uint8 arrays. Or, if you get a more recent version of MATLAB, and if you only need the Gaussian filter, try imgaussfilt.
Because (discrete) convolution is often represented via linear algebra but certainly not via for loops. In fact everytime you walk through rows or columns you should seek for ways to represent it as an algebraic operation.
The typical way is to do it via Toeplitz matrices but can be extended to way faster algorithms. And once you have the toeplitz structure then you can optimize the multiplication even further
https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution
http://www.netlib.org/utk/people/JackDongarra/etemplates/node384.html
Note that native Matlab functions can still be slow. It is not an indication of speed but maintenance level. Often you can find the algorithm used linked in the documentation and you can decide whether you should go for the custom implementation or the standard.
Why convolution in matalab is faster?
The implementation itself is very efficient.
They use fast techniques to perform multiplication and convolution.
check the BLAS, ATLAS packages if you want to see tricks to do these things very fast.
in practical (convolution in the original domain \ time or space) and (multiplication in frequency domain) are equivalent. what they do is to transform to frequency domain by using FFT (Fast Fourier Transform) and then perform the multiplication and then go back to the original domain.

How does this code for standardizing data work?

I have a provided standardize function for a machine learning course that wasn't well documented and I'm still new to MATLAB so I'm just trying to break down the function. Any explanation of the syntax or the general idea of standardizing would greatly help. We use this function to standardize a set of training data provided in a large matrix. A break down of most of the lines of the code snippet would help me greatly. Thank you so much.
function [X, mean_X, std_X] = standardize(varargin)
switch nargin
case 1
mean_X = mean(varargin{1});
std_X = std(varargin{1});
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std(X(:, i));
end
case 3
mean_X = varargin{2};
std_X = varargin{3};
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std_X(:, i);
end
end
This code accepts a data matrix of size M x N, where M is the dimensionality of one data sample from this matrix and N is the total number of samples. Therefore, one column of this matrix is one data sample. Data samples are all stacked horizontally and are columns.
Now, the true purpose of this code is to take all of the columns of your matrix and standardize / normalize the data so that each data sample exhibits zero mean and unit variance. This means that after this transform, if you found the mean value of any column in this matrix, it would be 0 and the variance would be 1. This is a very standard method for normalizing values in statistical analysis, machine learning, and computer vision.
This actually comes from the z-score in statistical analysis. Specifically, the equation for normalization is:
Given a set of data points, we subtract the value in question by the mean of these data points, then divide by the respective standard deviation. How you'd call this code is the following. Given this matrix, which we will call X, there are two ways you can call this code:
Method #1: [X, mean_X, std_X] = standardize(X);
Method #2: [X, mean_X, std_X] = standardize(X, mu, sigma);
The first method automatically infers the mean of each column of X and the standard deviation of each column of X. mean_X and std_X will both return 1 x N vectors that give you the mean and standard deviation of each column in the matrix X. The second method allows you to manually specify a mean (mu) and standard deviation (sigma) for each column of X. This is possibly for use in debugging, but you would specify both mu and sigma as 1 x N vectors in this case. What is returned for mean_X and std_X is identical to mu and sigma.
The code is a bit poorly written IMHO, because you can certainly achieve this vectorized, but the gist of the code is that it finds the mean of every column of the matrix X if we are are using Method #1, duplicates this vector so that it becomes a M x N matrix, then we subtract this matrix with X. This will subtract each column by its respective mean. We also compute the standard deviation of each column before the mean subtraction.
Once we do that, we then normalize our X by dividing each column by its respective standard deviation. BTW, doing std_X(:, i) is superfluous as std_X is already a 1 x N vector. std_X(:, i) means to grab all of the rows at the ith column. If we already have a 1 x N vector, this can simply be replaced with std_X(i) - a bit overkill for my taste.
Method #2 performs the same thing as Method #1, but we provide our own mean and standard deviation for each column of X.
For the sake of documentation, this is how I would have commented the code:
function [X, mean_X, std_X] = standardize(varargin)
switch nargin %// Check how many input variables we have input into the function
case 1 %// If only one variable - this is the input matrix
mean_X = mean(varargin{1}); %// Find mean of each column
std_X = std(varargin{1}); %// Find standard deviation of each column
%// Take each column of X and subtract by its corresponding mean
%// Take mean_X and duplicate M times vertically
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
%// Next, for each column, normalize by its respective standard deviation
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std(X(:, i));
end
case 3 %// If we provide three inputs
mean_X = varargin{2}; %// Second input is a mean vector
std_X = varargin{3}; %// Third input is a standard deviation vector
%// Apply the code as seen in the first case
X = varargin{1} - repmat(mean_X, [size(varargin{1}, 1) 1]);
for i = 1:size(X, 2)
X(:, i) = X(:, i) / std_X(:, i);
end
end
If I can suggest another way to write this code, I would use the mighty and powerful bsxfun function. This avoids having to do any duplication of elements and we can do this under the hood. I would rewrite this function so that it looks like this:
function [X, mean_X, std_X] = standardize(varargin)
switch nargin
case 1
mean_X = mean(varargin{1}); %// Find mean of each column
std_X = std(varargin{1}); %// Find std. dev. of each column
X = bsxfun(#minus, varargin{1}, mean_X); %// Subtract each column by its respective mean
X = bsxfun(#rdivide, X, std_X); %// Take each column and divide by its respective std dev.
case 3
mean_X = varargin{2};
std_X = varargin{3};
%// Same code as above
X = bsxfun(#minus, varargin{1}, mean_X);
X = bsxfun(#rdivide, X, std_X);
end
I would argue that the new code above is much faster than using for and repmat. In fact, it is known that bsxfun is faster than the former approach - especially for larger matrices.

using size of scatter points to weight line of best fit in matlab

Is it possible to use the size (s) of the points to 'weight' the line of best fit?
x = [1 2 3 4 5];
y = [2 4 5 3 4];
s = [10 15 20 2 5];
scatter(x,y,s)
hold on
weight = s;
p = polyfit(x,y,1); %how do I take into account the size of the points?
f = polyval(p,x);
plot(x,f,'-r')
Using Marcin's suggestion, you can incorporate lscov into polyfit. As the documentation explains, polynomial fitting is done by computing the Vandermonde matrix V of x, and then executing p = V\y. This is the standard formalism of any least-squares solution, and lends itself to weighted-least-squares in MATLAB through lscov.
Taking your x, y and weight vectors, instead of calling polyfit(x,y,n) you can do the following:
% Construct Vandermonde matrix. This code is taken from polyfit.m
V(:,n+1) = ones(length(x),1,class(x));
for j = n:-1:1
V(:,j) = x.*V(:,j+1);
end
% Solve using weighted-least-squares
p = lscov(V,y,weight);
You can even go one step further, and modify polyfit.m itself to include this functionality, or add another function polyfitw.m if you are not inclined to modify original MATLAB functions. Note however that polyfit has some more optional outputs for structure, computed using QR decomposition as detailed in the documentation. Generalization of these outputs to the weighted case will require some more work.
x = (1:10)';
y = (3 * x + 5) + rand(length(x),1)*5;
w = ones(1,length(y));
A = [x ones(length(x),1)];
p = lscov(A,y,w);
plot(x,y,'.');
hold on
plot(x,p(1)*x + p(2),'-r');

Fast technique for normalizing a matrix in MATLAB

I want to normalise each column of a matrix in Matlab. I have tried two implementations:
Option A:
mx=max(x);
mn=min(x);
mmd=mx-mn;
for i=1:size(x,1)
xn(i,:)=((x(i,:)-mn+(mmd==0))./(mmd+(mmd==0)*2))*2-1;
end
Option B:
mn=mean(x);
sdx=std(x);
for i=1:size(x,1)
xn(i,:)=(x(i,:)-mn)./(sdx+(sdx==0));
end
However, these options take too much time for my data, e.g. 3-4 seconds on a 5000x53 matrix. Thus, is there any better solution?
Use bsxfun instead of the loop. This may be a bit faster; however, it may also use more memory (which may be an issue in your case; if you're paging, everything'll be really slow).
To normalize with mean and std, you'd write
mn = mean(x);
sd = std(x);
sd(sd==0) = 1;
xn = bsxfun(#minus,x,mn);
xn = bsxfun(#rdivide,xn,sd);
Remember, in MATLAB, vectorizing = speed.
If A is an M x N matrix,
A = rand(m,n);
minA = repmat(min(A), [size(A, 1), 1]);
normA = max(A) - min(A); % this is a vector
normA = repmat(normA, [length(normA) 1]); % this makes it a matrix
% of the same size as A
normalizedA = (A - minA)./normA; % your normalized matrix
Note: I am not providing a freshly new answer, but I am comparing the proposed answers.
Option A: Using bsxfun()
function xn = normalizeBsxfun(x)
mn = mean(x);
sd = std(x);
sd(sd==0) = eps;
xn = bsxfun(#minus,x,mn);
xn = bsxfun(#rdivide,xn,sd);
end
Option B: Using a for-loop
function xn = normalizeLoop(x)
xn = zeros(size(x));
for ii=1:size(x,2)
xaux = x(:,ii);
xn(:,ii) = (xaux - mean(xaux))./mean(xaux);
end
end
We compare both implementations for different matrix sizes:
expList = 2:0.5:5;
for ii=1:numel(expList)
expNum = round(10^expList(ii));
x = rand(expNum,expNum);
tic;
xn = normalizeBsxfun(x);
ts(ii) = toc;
tic;
xn = normalizeLoop(x);
tl(ii) = toc;
end
figure;
hold on;
plot(round(10.^expList),ts,'b');
plot(round(10.^expList),tl,'r');
legend('bsxfun','loop');
set(gca,'YScale','log')
The results show that for small matrices, the bsxfun is faster. But, the difference is neglect able for higher dimensions, as it was also found in other post.
The x-axis is the squared root number of matrix elements, while the y-axis is the computation time in seconds.
Let X be a m x n matrix and you want to normalize column wise.
The following matlab code does it
XMean = repmat(mean(X),m,1);
XStd = repmat(std(X),m,1);
X_norm = (X - XMean)./(XStd);
The element wise ./ operator is explained here: http://www.mathworks.in/help/matlab/ref/arithmeticoperators.html
Note: As op mentioned, this is simply a faster solution and performs the same task as looping through the matrix. The underlying implementation of this inbuilt function makes it work faster
Note: This code works in Octave and MATLAB versions R2016b or higher.
function X_norm = normalizeMatrix(X)
mu = mean(X); %mean
sigma = std(X); %standard deviation
X_norm = (X - mu)./sigma;
end
How about using
normc(X)
that would normalize the matrix X columnwise. You need to include the Neural Network Toolbox in your install though.
How about this?
A = [7, 2, 6; 3, 8, 4]; % a 2x3 matrix
Asum = sum(A); % sum the columns
Anorm = A./Asum(ones(size(A, 1), 1), :); % normalise the columns