Min-max normalization of individual columns in a 2D matrix - matlab

I have a dataset which has 4 columns/attributes and 150 rows. I want to normalize this data using min-max normalization. So far, my code is:
minData=min(min(data1))
maxData=max(max(data1))
minmaxeddata=((data1-minData)./(maxData))
Here, minData and maxData returns the global minimum and maximum values. Therefore, this code actually applies a min-max normalization over all values in the 2D matrix so that the global minimum is 0 and the global maximum is 1.
However, I would like to perform the same operation on each column individually. Specifically, each column of the 2D matrix should be min-max normalized independently from the other columns.
I tried using just using min(data1) and max(data1), but got the error saying that the Matrix dimensions must agree.
However, by using the global minimum and maximum, I got the values in the range of [0-1] and have done experimentations using this normalized dataset. I would like to know whether there is any problem in my results? Is there a problem in my understanding as well? Any guidance would be appreciated.

If I understand you correctly, you wish to normalize each column of data1. Also, as each column is an independent data set and most likely having different dynamic ranges, doing a global min-max operation is probably not recommended. I would recommend that you go with your initial thoughts in normalizing each column individually.
Going with your error, you can't subtract data1 with min(data1) because min(data1) would produce a row vector while data1 is a matrix. You are subtracting a matrix with a vector which is why you are getting that error.
If you want to achieve what you're asking, use bsxfun to broadcast the vector and repeat it for as many rows as you have data1. Therefore:
mindata = min(data1);
maxdata = max(data1);
minmaxdata = bsxfun(#rdivide, bsxfun(#minus, data1, mindata), maxdata - mindata);
With later versions of MATLAB, broadcasting is built-in to the language, so you can simply do:
mindata = min(data1);
maxdata = max(data1);
minmaxdata = (data1 - mindata) ./ (maxdata - mindata);
It's a lot easier to read and still does the same job.
Example
>> data1 = [5 9 9 9 3 3; 3 10 2 1 10 1; 2 4 4 6 5 5]
data1 =
5 9 9 9 3 3
3 10 2 1 10 1
2 4 4 6 5 5
When I run the above normalization code, I get:
minmaxdata =
1.0000 0.8333 1.0000 1.0000 0 0.5000
0.3333 1.0000 0 0 1.0000 0
0 0 0.2857 0.6250 0.2857 1.0000

Related

Replacing outlier values with NaN in MATLAB

I have an n x m data matrix with n samples and m measurements per sample. I'm dealing with data from mass spectrometry, measuring the concentration of different metabolites. Each column is the concentrations of a single metabolite. The rows are the samples. Some of the samples have a few metabolite measurements that are much higher than the rest of the samples.
I want to find these outlier values, and replace them with NaN. Is there a way to do this automatically, maybe by looking for values higher than X column SDs and making them NaN? I have found relevant questions for R and Python, but not for MATLAB.
Addendum: dfri's solution worked perfectly for me. However, I couldn't use the column SD as a cutoff-measure, because the outliers made the SD so large that the outlier values were still within the threshold (they were 10 000 times larger than the rest). I ended up using 100 x the column median as a threshold for removal.
You can compare elements in your data for some threshold to identify your outliers, and use the resulting indices to replace outlier values by NaN. E.g.
data = randi(4,5); %// values in {1, 2, 3, 4}
threshold = 3; %// decide upon your threshold
data(data > threshold) = NaN
data =
NaN 3 NaN 2 2
3 1 3 2 2
2 2 2 NaN 3
3 1 NaN NaN 3
1 1 1 1 NaN
If you want to replace outliers w.r.t. some threshold column per column, you can make use of e.g. bsxfun (thanks #Dan):
data = randi(4,5) %// values in {1, 2, 3, 4}
threshold = mean(data)+1*std(data) %// per column
data(bsxfun(#(x, y) x > y, data, threshold)) = NaN
%// example:
threshold =
4.7416 3.7416 4.0000 2.8954 1.9477
data =
4 3 2 NaN NaN
4 NaN 3 1 1
1 3 4 1 NaN
4 1 4 1 1
4 1 2 NaN 1
Note that the most important (non-matlab-technical) part in your case, as mentioned by #Dan in his comments above, is to decide upon how you create your threshold values for each of the columns. The simple thresholds in the example above has only been included to show the technical aspects of how to "remove" outliers (set to NaN) given an array of thresholds for the columns.

Normalization of inputs of a feedforward Neural network

Let's say I have a mxn matrix of different features of a time series signal (column 1 represents linear regression of the last n samples, column 2 represents the average of the last n samples, column 3 represents the local max values of a different time series but correlated signal, etc). How should I normalize these inputs? All the inputs fall into different categories, so they have a different range. One ranges from 0,1, the other ranges from -5 to 50, etc etc.
Should I normalize the WHOLE matrix? Or should I normalize each set of inputs one by one individually?
Note: I usually use mapminmax function from MATLAB for the normalization.
You should normalise each vector/column of your matrix individually, they represent different data types and shouldn't be mixed up together.
You could for example transpose your matrix to have your 3 different data types in the rows instead of in the columns of your matrix and still use mapminmax:
A = [0 0.1 -5; 0.2 0.3 50; 0.8 0.8 10; 0.7 0.9 20];
A =
0 0.1000 -5.0000
0.2000 0.3000 50.0000
0.8000 0.8000 10.0000
0.7000 0.9000 20.0000
B = mapminmax(A')
B =
-1.0000 -0.5000 1.0000 0.7500
-1.0000 -0.5000 0.7500 1.0000
-1.0000 1.0000 -0.4545 -0.0909
You should normalize each feature independently.
column 1 represents linear regression of the last n samples, column 2 represents the average of the last n samples, column 3 represents the local max values of a different time series but correlated signal, etc
I can't say for sure about your particular problem, but generally, you should normalize each feature independently. So normalize column 1, then column 2 etc.
Should I normalize the WHOLE matrix? Or should I normalize each set of inputs one by one individually?
I'm not sure what you mean here. What is an input? If by that you mean an instance (a row of your matrix), then no, you should not normalize rows individually, but columns.
I don't know how you would do this in Matlab, but I took your question more as a theoretical one than an implementation one.
If you want to have a range of [0,1] for all the columns that normalized within each column, you can use mapminmax like so (assuming A as the 2D input array) -
out = mapminmax(A.',0,1).'
You can also use bsxfun for the same output, like so -
Aoffsetted = bsxfun(#minus,A,min(A,[],1))
out = bsxfun(#rdivide,Aoffsetted,max(Aoffsetted,[],1))
Sample run -
>> A
A =
3 7 4 2 7
1 3 4 5 7
1 9 7 5 3
8 1 8 6 7
>> mapminmax(A.',0,1).'
ans =
0.28571 0.75 0 0 1
0 0.25 0 0.75 1
0 1 0.75 0.75 0
1 0 1 1 1
>> Aoffsetted = bsxfun(#minus,A,min(A,[],1));
>> bsxfun(#rdivide,Aoffsetted,max(Aoffsetted,[],1))
ans =
0.28571 0.75 0 0 1
0 0.25 0 0.75 1
0 1 0.75 0.75 0
1 0 1 1 1

Find the same values in another column in matlab

i want to find same values of number in different column,
for example i have a matrix array:
A = [1 11 0.17
2 1 78
3 4 90
45 5 14
10 10 1]
so as you can see no. 1 in column 1 have the same values in column 2 and column 3, so i want to pick that number and put into another cell or matrix cell
B= [1]
and perform another operation C/B, letting C is equal to:
C= [1
3
5
7
9]
and you will have:
D= [1 11 0.17 1
2 1 78 3
3 4 90 5
45 5 14 7
10 10 1 9]
then after that, values in column 4 have equivalent numbers that we can define, but we will choose only those number that have number 1, or B in theirs row
define:
1-->23
3 -->56
9 --> 78
then we have, see image below:
so how can i do that? is it possible? thanks
Let's tackle your problem into steps.
Step #1 - Determine if there is a value shared by all columns
We can do this intelligently by bsxfun, unique, permute and any and all.
We first need to use unique so that we can generate all possible unique values in the matrix A. Once we do this, we can look at each value of the unique values and see if all columns in A contain this value. If this is the case, then this is the number we need to focus on.
As such, do something like this first:
Aun = unique(A);
eqs_mat = bsxfun(#eq, A, permute(Aun, [3 2 1]));
eqs_mat would generate a 3D matrix where each slice figures out where a particular value in the unique array appeared. As such, for each slice, each column will have a bunch of false values but at least one true value where this true value tells you the position in the column that matched a unique value. The next thing you'll want to do is go through each slice of this result and determine whether there is at least one non-zero value for each column.
For a value to be shared along all columns, a slice should have a non-zero value per column.
We can eloquently determine which value we need to extract by:
ind = squeeze(all(any(eqs_mat,1),2));
Given your example data, we have this for our unique values:
>> B
B =
0.1700
1.0000
2.0000
3.0000
4.0000
5.0000
10.0000
11.0000
14.0000
45.0000
78.0000
90.0000
Also, the last statement I executed above gives us:
>> ind
ind =
0
1
0
0
0
0
0
0
0
0
0
0
The above means that the second location of the unique array is the value we want, and this corresponds to 1. Therefore, we can extract the particular value we want by:
val = Aun(ind);
val contains the value that is shared along all columns.
Step #2 - Given the value B, take a vector C and divide by B.
That's pretty straight forward. Make sure that C is the same size as the total number of rows as A, so:
C = [1 3 5 7 9].';
B = val;
col = C / B;
Step #3 - For each location in A that shares the common value, we want to generate a new fifth column that gives a new value for each corresponding row.
You can do that by declaring a vector of... say... zeroes, then find the right rows that share the common value and replace the values in this fifth column with the values you want:
zer = zeros(size(A,1), 1);
D = [23; 56; 78];
ind2 = any(A == val, 2);
zer(ind2) = D;
%// Create final matrix
fin = [A col zer];
We finally get:
>> fin
fin =
1.0000 11.0000 0.1700 1.0000 23.0000
2.0000 1.0000 78.0000 3.0000 56.0000
3.0000 4.0000 90.0000 5.0000 0
45.0000 5.0000 14.0000 7.0000 0
10.0000 10.0000 1.0000 9.0000 78.0000
Take note that you need to make sure that what you're assigning to the fifth column is the same size as the total number of columns in A.

Average for all elements in a row except the element itself - MATLAB

For a given matrix A, how can i create a matrix B of the same size where every column is the mean (or any other function) of all the other columns?
example:
a function on
A = [
1 1 1
2 3 4
4 5 6]
should result in
B = [
1 1 1
3.5 3 2.5
5.5 5 4.5]
Perfect setup for bsxfun -
B = bsxfun(#minus,sum(A,2),A)./(size(A,2)-1)
Explanation: Breaking it down to two steps
Given
>> A
A =
1 1 1
2 3 4
4 5 6
Step #1: For each element in A, calculate the sum of all elements except the element itself -
>> bsxfun(#minus,sum(A,2),A)
ans =
2 2 2
7 6 5
11 10 9
Step #2: Divide each element result by the number of elements responsible for the summations, which would be the number of columns minus 1, i.e. (size(A,2)-1) -
>> bsxfun(#minus,sum(A,2),A)./(size(A,2)-1)
ans =
1.0000 1.0000 1.0000
3.5000 3.0000 2.5000
5.5000 5.0000 4.5000
Using your example:
[m,n]=size(A);
B=zeros(m,n);
for k=1:n
B(:,k) = mean(A(:,[1:k-1 k+1:end]),2);
end
It may not be as quick or efficient as #Divakar's answer, but I tend to prefer for loop due to better readability. It might also make it easier to call a different function from mean.
For an arbitrary function, you can use a vectorized approach if you don't mind using up more memory. Specifically, this requires generating a 3D array of size rxcxc, where r and c are the number of rows and columns of A.
f = #(x) prod(x,2); %// any function which operates on columns
c = size(A,2); %// number of columns
B = repmat(A, [1 1 c]);
B(:,1:c+1:end) = []; %// remove a different column in each 3D-layer
B = reshape(B, [], c-1, c); %// each 3D-layer of B contains a set of c-1 columns
result = f(B); %// apply function
result = squeeze(result); %// remove singleton dimension
As noted by Divakar in comments, anonymous functions tend to slow things down. It may be better to define the function f in a file.

Checking values of two vectors against eachother and then using the column location of equal entries to extract colums from a matrix in matlab

I'm doing a curve fitting problem in Matlab and so far I've set up some orthonormal polynomials along a specified range of x-values with x = (0:0.0001:40);
The polynomials themselves are each a manipulation of that x vector and are stored as a row in a matrix. I also have some have data entries in the form of two vectors - one for the data x-coords and one for the actual values. I need a way to use the x-coords of my data points to find the same values in my continuous x-vector and then take the corresponding columns from my polynomial matrix and add them to a new matrix.
EDIT: To be more clear. I have, for example:
x = [0 1 2 3 4 5]
Polynomial =
1 1 1 1 1 1
0 1 2 3 4 5
0 1 4 9 16 25
% Data values:
x-coord = [1 3 4]
values = [5 3 8]
I want to check the x-coord values against 'x' to find the corresponding columns and then pull out those columns from the polynomial matrix to get:
Polynomial =
1 1 1
1 3 4
1 9 16
If your x, Polynomial, and xcoord are the same length you could use logical indexing which is elegant; something along the lines of Polynomial(x==xcoord). But since this doesn't seem to be the case, there's a less fancy solution with a for-loop and find(xcoord(i)==x)