Normalization of inputs of a feedforward Neural network - matlab

Let's say I have a mxn matrix of different features of a time series signal (column 1 represents linear regression of the last n samples, column 2 represents the average of the last n samples, column 3 represents the local max values of a different time series but correlated signal, etc). How should I normalize these inputs? All the inputs fall into different categories, so they have a different range. One ranges from 0,1, the other ranges from -5 to 50, etc etc.
Should I normalize the WHOLE matrix? Or should I normalize each set of inputs one by one individually?
Note: I usually use mapminmax function from MATLAB for the normalization.

You should normalise each vector/column of your matrix individually, they represent different data types and shouldn't be mixed up together.
You could for example transpose your matrix to have your 3 different data types in the rows instead of in the columns of your matrix and still use mapminmax:
A = [0 0.1 -5; 0.2 0.3 50; 0.8 0.8 10; 0.7 0.9 20];
A =
0 0.1000 -5.0000
0.2000 0.3000 50.0000
0.8000 0.8000 10.0000
0.7000 0.9000 20.0000
B = mapminmax(A')
B =
-1.0000 -0.5000 1.0000 0.7500
-1.0000 -0.5000 0.7500 1.0000
-1.0000 1.0000 -0.4545 -0.0909

You should normalize each feature independently.
column 1 represents linear regression of the last n samples, column 2 represents the average of the last n samples, column 3 represents the local max values of a different time series but correlated signal, etc
I can't say for sure about your particular problem, but generally, you should normalize each feature independently. So normalize column 1, then column 2 etc.
Should I normalize the WHOLE matrix? Or should I normalize each set of inputs one by one individually?
I'm not sure what you mean here. What is an input? If by that you mean an instance (a row of your matrix), then no, you should not normalize rows individually, but columns.
I don't know how you would do this in Matlab, but I took your question more as a theoretical one than an implementation one.

If you want to have a range of [0,1] for all the columns that normalized within each column, you can use mapminmax like so (assuming A as the 2D input array) -
out = mapminmax(A.',0,1).'
You can also use bsxfun for the same output, like so -
Aoffsetted = bsxfun(#minus,A,min(A,[],1))
out = bsxfun(#rdivide,Aoffsetted,max(Aoffsetted,[],1))
Sample run -
>> A
A =
3 7 4 2 7
1 3 4 5 7
1 9 7 5 3
8 1 8 6 7
>> mapminmax(A.',0,1).'
ans =
0.28571 0.75 0 0 1
0 0.25 0 0.75 1
0 1 0.75 0.75 0
1 0 1 1 1
>> Aoffsetted = bsxfun(#minus,A,min(A,[],1));
>> bsxfun(#rdivide,Aoffsetted,max(Aoffsetted,[],1))
ans =
0.28571 0.75 0 0 1
0 0.25 0 0.75 1
0 1 0.75 0.75 0
1 0 1 1 1

Related

Ranking (ordering value) of an observation in a matrix

I am trying to get the rank of an observation in a matrix, taking into account NaN's and values that can repeat themselfs.
E.g. if we have
A = [0.1 0.15 0.3; 0.5 0.15 0.1; NaN 0.2 0.4];
A =
0.1000 0.1500 0.3000
0.5000 0.1500 0.1000
NaN 0.2000 0.4000
Then I want to get the following output:
B =
1 2 4
6 2 1
NaN 3 5
Thus 0.1 is the lowest value (rank=1), whereas 0.5 is the highest value (rank = 6).
Ideally an efficient solution without loops.
You can use unique. This sorts data by default, and you can get the index of the sorted unique values. This would replicate your tie behaviour, since identical values will have the same index. You can omit NaN values with logical indexing.
r = A; % or NaN(size(A))
nanIdx = isnan(A); % Get indices of NaNs in A to ignore
[~, ~, r(~nanIdx)] = unique(A(~nanIdx)) % Assign non-NaN values to their 'unique' index
>> r =
[ 1 2 4
6 2 1
NaN 3 5 ]
If you have the stats toolbox you can use tiedrank function for a similar result.
r = reshape(tiedrank(A(:)), size(A)) % Have to use reshape or rank will be per-column
>> r =
[ 1.5, 3.5, 6.0
8.0, 3.5, 1.5
NaN, 5.0, 7.0 ]
This is not your desired result (as per your example). You can see that tiedrank actually uses a more conventional ranking system than yours, where a tie gives each result the average rank. For example a tied 1st and 2nd gives each rank 1.5, and the next rank is 3.

How to accumulate (average) data based on multiple criteria

I have a set of data where I have recorded values in sets of 3 readings (so as to be able to obtain a general idea of the SEM). I have them recorded in a list that looks as follows, which I am trying to collapse into averages of each set of 3 points:
I want to collapse essentially each 3 rows into one row where the average data value is given for that set. In essence, it would look as follows:
This is something I know how to do basically in Excel (i.e. using a Pivot table) but I am not sure how to do the same in MATLAB. I have tried using accumarray but struggle with knowing how to incorporate multiple conditions essentially. I would need to create a subs array where its number corresponds to each unique set of 3 data points. By brute force, I could create an array such as:
subs = [1 1 1; 2 2 2; 3 3 3; 4 4 4; ...]'
using some looping and have that as my subs array, but since it isn't tied to the data itself, and there may be strange hiccups throughout (i.e. more than 3 data points per set, or missing data, etc.). I know there must be some way to have this sort of Pivot-table-esque grouping for something like this, but need some help to get it off the ground. Thanks.
Here is the input data in text form:
Subject Flow On/Off Values
1 10 1 2.20
1 10 1 2.50
1 10 1 2.60
1 20 1 5.50
1 20 1 6.10
1 20 1 5.90
1 30 1 10.10
1 30 1 10.50
1 30 1 10.50
1 10 0 1.90
1 10 0 2.20
1 10 0 2.30
1 20 0 5.20
1 20 0 5.80
1 20 0 5.60
1 30 0 9.80
1 30 0 10.20
1 30 0 10.20
2 10 1 5.70
2 10 1 6.00
2 10 1 6.10
2 20 1 9.00
2 20 1 9.60
2 20 1 9.40
2 30 1 13.60
2 30 1 14.00
2 30 1 14.00
2 10 0 5.40
2 10 0 5.70
2 10 0 5.80
2 20 0 8.70
2 20 0 9.30
2 20 0 9.10
2 30 0 13.30
2 30 0 13.70
2 30 0 13.70
You can use unique and accumarray like so to maintain the order of your rows of data:
[newData, ~, subs] = unique(data(:, 1:3), 'rows', 'stable');
newData(:, 4) = accumarray(subs, data(:, 4), [], #mean);
newData =
1.0000 10.0000 1.0000 2.4333
1.0000 20.0000 1.0000 5.8333
1.0000 30.0000 1.0000 10.3667
1.0000 10.0000 0 2.1333
1.0000 20.0000 0 5.5333
1.0000 30.0000 0 10.0667
2.0000 10.0000 1.0000 5.9333
2.0000 20.0000 1.0000 9.3333
2.0000 30.0000 1.0000 13.8667
2.0000 10.0000 0 5.6333
2.0000 20.0000 0 9.0333
2.0000 30.0000 0 13.5667
I assume that
You want to average based on unique values of the first three columns (not on groups of three rows, although the two criteria coincide in your example);
Order is determined by column 1, then 3, then 2.
Then, denoting your data as x,
[~, ~, subs] = unique(x(:, [1 3 2]), 'rows', 'sorted');
result = accumarray(subs, x(:,end), [], #mean);
gives
result =
2.1333
5.5333
10.0667
2.4333
5.8333
10.3667
5.6333
9.0333
13.5667
5.9333
9.3333
13.8667
As you see, I am using the third output of unique with the 'rows' and 'sorted' options. This creates the subs grouping vector based on first three columns of your data in the desired order. Then, passing that to accumarray computes the means.
accumarray is indeed the way to go. First, you'll need assign an index to each set of values with unique :
[unique_subjects, ~, ind_subjects] = unique(vect_subjects);
[unique_flows, ~, ind_flows] = unique(vect_flows);
[unique_on_off, ~, ind_on_off] = unique(vect_on_off);
So basically, you now got ind_subjects, ind_flows and ind_on_off that are values in [1..2], [1..3] and [1..2].
Now, you can compute the mean values in a [3x2x2] array (in you example) :
mean_values = accumarray([ind_flows, ind_on_off, ind_subjects], vect_values, [], #mean);
mean_values = mean_values(:);
Nota : order is set accordingly to your example.
Then you can construct the summary :
[ind1, ind2, ind3] = ndgrid(1:numel(unique_flows), 1:numel(unique_on_off), 1:numel(unique_subjects));
flows_summary = unique_flows(ind1(:));
on_off_summary = unique_on_off(ind2(:));
subjects_summary = unique_subjects(ind3(:));
Nota : Also works with non numeric values.
You should also try checking out the findgroups and splitapply reference pages. The easiest way to use them here is probably to place your data in a table:
>> T = array2table(data, 'VariableNames', { 'Subject', 'Flow', 'On_Off', 'Values'});
>> [gid,Tgrp] = findgroups(T(:,1:3));
>> Tgrp.MeanValue = splitapply(#mean, T(:,4), gid)
Tgrp =
12×4 table
Subject Flow On_Off MeanValue
_______ ____ ______ _________
1 10 0 2.1333
1 10 1 2.4333
1 20 0 5.5333
1 20 1 5.8333
1 30 0 10.067
1 30 1 10.367
2 10 0 5.6333
2 10 1 5.9333
2 20 0 9.0333
2 20 1 9.3333
2 30 0 13.567
2 30 1 13.867

Rearranging elements in a row matlab

I have two matrices in Matlab.
A =
and
B =
I want to assign the elements having the same cell-value according to it's corresponding column number in A matrix and move the elements there. I want to map the elements of B with A so that B elements also moves in that position.
I want this
A =
And therefore,
B =
Is there a way to do this?!
Thanks.
Easiest way I can think of is to create row/column pairs where the rows correspond row locations of the matrix and column locations are the actual elements of the matrix themselves. The values seen at these row/column pairs are again just the matrix values themselves.
You can very easily do this with sparse. Recreating the matrix above and storing this in A:
A = [1 2 5 8; 1 2 4 7];
... I would do it this way:
r = repmat((1:size(A,1)).', 1, size(A,2)); %'
S = full(sparse(r(:),A(:),A(:)));
The first line of code generates row locations for each value in the matrix A, then using sparse to specify row/column pairs and the associated values and we use full to convert to a proper numeric matrix.
We get:
S =
1 2 0 0 5 0 0 8
1 2 0 4 0 0 7 0
You can also do the same for the matrix B. You'd use sparse and specify the third parameter to be B instead:
B = [0.5 0.2 0.6 0.8; 0.4 0.6 0.8 0.9];
S2 = full(sparse(r(:),A(:),B(:)));
We get:
>> S2
S2 =
0.5000 0.2000 0 0 0.6000 0 0 0.8000
0.4000 0.6000 0 0.8000 0 0 0.9000 0

Min-max normalization of individual columns in a 2D matrix

I have a dataset which has 4 columns/attributes and 150 rows. I want to normalize this data using min-max normalization. So far, my code is:
minData=min(min(data1))
maxData=max(max(data1))
minmaxeddata=((data1-minData)./(maxData))
Here, minData and maxData returns the global minimum and maximum values. Therefore, this code actually applies a min-max normalization over all values in the 2D matrix so that the global minimum is 0 and the global maximum is 1.
However, I would like to perform the same operation on each column individually. Specifically, each column of the 2D matrix should be min-max normalized independently from the other columns.
I tried using just using min(data1) and max(data1), but got the error saying that the Matrix dimensions must agree.
However, by using the global minimum and maximum, I got the values in the range of [0-1] and have done experimentations using this normalized dataset. I would like to know whether there is any problem in my results? Is there a problem in my understanding as well? Any guidance would be appreciated.
If I understand you correctly, you wish to normalize each column of data1. Also, as each column is an independent data set and most likely having different dynamic ranges, doing a global min-max operation is probably not recommended. I would recommend that you go with your initial thoughts in normalizing each column individually.
Going with your error, you can't subtract data1 with min(data1) because min(data1) would produce a row vector while data1 is a matrix. You are subtracting a matrix with a vector which is why you are getting that error.
If you want to achieve what you're asking, use bsxfun to broadcast the vector and repeat it for as many rows as you have data1. Therefore:
mindata = min(data1);
maxdata = max(data1);
minmaxdata = bsxfun(#rdivide, bsxfun(#minus, data1, mindata), maxdata - mindata);
With later versions of MATLAB, broadcasting is built-in to the language, so you can simply do:
mindata = min(data1);
maxdata = max(data1);
minmaxdata = (data1 - mindata) ./ (maxdata - mindata);
It's a lot easier to read and still does the same job.
Example
>> data1 = [5 9 9 9 3 3; 3 10 2 1 10 1; 2 4 4 6 5 5]
data1 =
5 9 9 9 3 3
3 10 2 1 10 1
2 4 4 6 5 5
When I run the above normalization code, I get:
minmaxdata =
1.0000 0.8333 1.0000 1.0000 0 0.5000
0.3333 1.0000 0 0 1.0000 0
0 0 0.2857 0.6250 0.2857 1.0000

grouping using dendrogram matlab

I have a matrix A composed by 4 vectors (columns) of 12 elements each
A = [ 0 0 0 0;
0.0100 0.0100 0.0100 0;
0.3000 0.2700 0.2400 0.2400;
0.0400 0 0.0200 0.0200;
0.1900 0.0400 0.0800 0.0800;
0.1600 0.6500 0.2100 0.3800;
0.0600 0.0100 0.0300 0.0200;
0.1500 0.0100 0.0600 0.1700;
0 0 0 0.0800;
0.0300 0 0.0200 0.0100;
0.0700 0 0.1200 0.0100;
0 0 0.2300 0]
I also have a similarity matrix that states how much a vector is similar to the others
SIM =[1.00 0.6400 0.7700 0.8300;
0.6400 1.0000 0.6900 0.9100;
0.7700 0.6900 1.0000 0.7500;
0.8300 0.9100 0.7500 1.0000]
reading the rows of this matrix
vetor 1 is similar to vector 2 for 64%
vector 1 is similar to vector 3 for the 77%
...
I would like to create a dendrogram graph that shows me how many different groups there are in A considering a threshold of 0.95 for similarity (i.e. if 2 groups have a similarity >0.7 connect them)
I didn't really understand how to use this function with my data...
Not sure I understood correctly you question, but for what I've understood I will do that:
DSIM = squareform(1-SIM); % convert to a dissimilarity vector
it gives the result:
% DSIM = 0.3600 0.2300 0.1700 0.3100 0.0900 0.2500
% DSIM = 1 vs 2 , 1 vs 3 , 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 ;
After, compute the linkage:
Z = linkage (DSIM,'average'); % there is other grouping option than average
You can plot the dendrogram with:
dendrogram(Z)
However, you want to split the groups according to a threshold so:
c = 0.1;
This is the dissimilarity at which to cut, here it means that two groups will be connected if they have a similarity higher than 0.9
T = cluster(tree,'cutoff',c,'criterion','distance')
The result of T in that case is:
T =
1
2
3
2
This means that at this level your vectors 1, 2, 3, 4 (call them A B C D) are organized in 3 groups:
A
B,D
C
Also, with c = 0.3, or 0.7 similarity:
T = 1 1 1 1
So there is just one group here.
To have that on the dendrogram you can calculate the number of groups:
num_grp = numel(unique(T));
After:
dendrogram(tree,num_grp,'labels',{'A','B','C','D'})
In that case the dendrogram won't display all groups because you set the maximum of nodes equal to the number of groups.