Mean and Standard Deviation of a column, ignoring zero values - Matlab

Mean and Standard Deviation of a column, ignoring zero values - Matlab - matlab

I am trying to find the mean of a column however I am having trouble getting an output for a function I created. My code is below, I cannot see what mistake I have made.
for j=1:48;
C_f2 = V(V(:,3) == j,:);
C_f2(C_f2==0)=NaN;
m=mean(C_f2(:,4));
s=std(C_f2(:,4));
row=[j,m,s];
s1=[s1;row];
end
I have checked the matrix, C_f2 and that is full of values so should not be returning NaN. However my output for the matrix s1 is
1 NaN NaN
2 NaN NaN
3 NaN NaN
. ... ...
48 NaN NaN
Can anyone see my issue? Help would me much appreciated!
The matrix C_f2 looks like,
1 185 01 5003
1 185 02 5009
. ... .. ....
1 259 48 5001

On line 3 you set all values which are zero to NaN. The mean function will return NaN as mean if any element is NaN. If you want to ignore the NaN values, you have to use the nanmean function, which comes with the Statistics toolbox. See the following example:
a = [1 NaN 2 3];
mean(a)
ans =
NaN
nanmean(a)
ans =
2
If you don't have the Statistics toolbox, you can exclude NaN elements with logical indexing
mean(a(~isnan(a)))
ans =
2
or it is possibly the easiest, if you directly exlude all elements which are zero instead of replacing them by NaN.
mean(a(a~=0))

Your line C_f2(C_f2==0)=NaN; will put NaNs into C_f2. Then, your mean and std operations will see those NaNs and output NaNs themselves.
To have the mean and std ignore NaN, you need to use the alternate version nanmean and nanstd.
These are part of a toolbox, however, so you might not have them if you just have the base Matlab installation.

Don't set it to NaN, any NaN involved computation without additional rules will return NaN,
use find to correctly index the none zero part of your column
say column n is your input
N = n(find(n~=0))
now do your Mu calculation

To compute the mean and standard deviation of each column excluding zeros:
A = [1 2;
3 0;
4 5;
6 7;
0 0]; %// example data
den = sum(A~=0); %// number of nonzero values in each column
mean_nz = bsxfun(#rdivide, sum(A), den);
mean2_nz = bsxfun(#rdivide, sum(A.^2), den);
std_nz = sqrt(bsxfun(#times, mean2_nz-mean_nz.^2, den./(den-1)));
The results for the example are
mean_nz =
3.5000 4.6667
std_nz =
2.0817 2.5166
The above uses the "corrected" definition of standard deviation (which divides by n-1, where n is the number of values). If you want the "uncorrected" version (i.e. divide by n):
std_nz = sqrt(mean2_nz-mean_nz.^2);

Related

Storing the digits after the comma in Matlab

I have a double between 0 and 1 stored in an array in Matlab B.
I want to create a vector t storing the N digits after the comma. If the digits after the comma are <N, then the corresponding element in the vector t should be 0.
Suppose N=10 and B=[0.908789]. Then,
t=[9;0;8;7;8;9;0;0;0;0];
This is the code I am using at the moment
n = fix(rem(B,1)*10^N);
s1 = sprintf('%.0f',n);
ttemp = (s1-'0')';
t=zeros(N,1);
t(1:size(ttemp,1))=ttemp;
but it gives me wrong results.
Indeed, suppose
B=[7.0261e-05] and N=5. The code above gives me
t=[7;0;0;0] without recognising that there e-05.
Any suggestion on how to fix this?

You need to tell sprintf that you'd like all leading 0's to actually be shown if there are fewer than N digits:
Your current way:
sprintf('%.0f', n);
% '7'
The correct way:
s1 = sprintf('%05.f', n);
% '00007'
The general example for any N would be:
s1 = sprintf(['%0', num2str(N), '.f'],n);
The way that you currently have it written, the outpuf of the sprintf command is simply a '7' which when you fill in your output starting at the beginning yields a 7 followed by all 0's (the value you initialized the output to).
If we initialize it to NaN values instead of 0's you can see what the issue is
N = 5;
B = 7.0261e-05;
n = fix(rem(B,1)*10^N);
% 7
s1 = sprintf('%.0f',n);
% '7'
ttemp = (s1 - '0').';
% 7
t = nan(N, 1);
% NaN NaN NaN NaN NaN
t(1:size(ttemp,1)) = ttemp;
% 7 NaN NaN NaN NaN
Alternately, you can keep everything you have and just modify t from the end rather than the beginning
t = zeros(N, 1);
t((end-numel(ttemp)+1):end) = ttemp;
Unsolicited Pointers
' is not the transpose, .' is.
Use numel to determine the number of elements in a vector rather than size since it will work for both row and column vectors

How to interpolate only less than 3 consecutive Nan values in Matlab?

I have a 500x600 matrix containing some NaN values. I want to interpolate places where there are less than three NaNs (possibly an average of the preceding, following values) and for all the other places where there are more than 3 consecutive NaN values I want to leave them as Nan values. I have already looked at http://uk.mathworks.com/matlabcentral/answers/34481-interpolate-nans-only-if-less-than-4-consecutive-nans but even the accepted answer doesn't work. (I realise this one is for 4 consecutive values but it doesn't work either way).

If by writing 3 consecutive nans you mean 3 consecutive nans in a row or column, you can use the following approach:
For each row, use convolution to determine for each sequence of nans if its shorter than 3.
use the following approach to fill each line in the matrix.
fill the columns by transposing the result and executing the function again.
Code:
%generates example array
data = rand(5,5);
data (1,2:4) = nan;
data (2:5,2) = nan;
data (:,4) = nan;
%fills all relevan nans in a row
data2 = interpolateNanRows(data );
%fills all relevant nans in a column
out= interpolateNanRows(data2')';
Auxiliary functions:
function res = interpolateNanRows(data)
%zero padding
dataPad = zeros(size(data,1)+2,size(data,2)+2);
dataPad(2:end-1,2:end-1)=data;
%generates relevant nan maps
nansMap = isnan(dataPad);
irrelevantNans = conv2(double(nansMap),[1,0,0,0,1],'same')>0 & nansMap;
%fills each row
for ii=1:size(dataPad,1)
filledRow = interpolateRow(dataPad(ii,:));
%ignores irrelevant values (more than 3 consecutive nans)
filledRow(irrelevantNans(ii,:)) = nan;
dataPad(ii,:) = filledRow;
end
%generates output
res = dataPad(2:end-1,2:end-1);
end
function filledRow = interpolateRow(row)
%receives a vector of values, and perform interpolation in regions of nans
if sum(isnan(row))==0 || sum(isnan(row))==length(row)
filledRow = row;
return;
end
nanData = isnan(row);
index = 1:numel(row);
filledRow = row;
filledRow(nanData) = interp1(index(~nanData), row(~nanData), index(nanData));
end
results:
data2=
0.6386 NaN NaN NaN 0.6671
0.4805 NaN 0.3171 NaN 0.7771
0.1184 NaN 0.0124 NaN 0.6860
0.2455 NaN 0.3011 NaN 0.8014
0.7761 NaN 0.7239 NaN 0.2833
out =
0.6386 0.6457 0.6528 0.6599 0.6671
0.4805 0.3988 0.3171 0.5471 0.7771
0.1184 0.0654 0.0124 0.3492 0.6860
0.2455 0.2733 0.3011 0.5512 0.8014
0.7761 0.7500 0.7239 0.5036 0.2833

turn elements to NaN after first negative

I've got a three dimensional array in Matlab. The first dimension is time, the second is Humidity, and the third is Temperature. If a Temperature value is < 0, I want every subsequent temperature value to be turned to NaN.
For example if the array is:
>> sampl = randn(4,3,2)
sampl(:,:,1) =
0.79487 0.71017 -0.39167
0.51754 -1.3068 0.84166
0.49461 0.74159 0.082784
0.66393 1.4677 0.31467
sampl(:,:,2) =
0.78981 1.3096 1.0434
-0.80122 0.16037 -1.0682
-0.32565 -2.1182 -0.31723
0.28468 0.70708 1.4797
What's the most efficient way to turn this into:
sampl(:,:,1) =
0.79487 0.71017 NaN
0.51754 NaN NaN
0.49461 NaN NaN
0.66393 NaN NaN
sampl(:,:,2) =
0.78981 1.3096 1.0434
NaN 0.16037 NaN
NaN NaN NaN
NaN NaN NaN
Specifically, for a particular slice, we want to process along each column, and as soon as we encounter a negative number in one column, we want that location to be NaN as well as all row locations for that same column that follow this NaN value to also be NaN.

Another easy way is to find those locations that are negative in the original matrix, creating another matrix that sets those values toNaN, invoke a cumsum or a cumulative sum along all the rows for each column in each slice of this new matrix, then set the corresponding locations in this cumsum result to NaN in the original matrix to obtain the final result:
>> out = sampl;
>> out(out < 0) = NaN;
>> out = cumsum(out);
>> sampl(isnan(out)) = NaN
sampl(:,:,1) =
0.7949 0.7102 NaN
0.5175 NaN NaN
0.4946 NaN NaN
0.6639 NaN NaN
sampl(:,:,2) =
0.7898 1.3096 1.0434
NaN 0.1604 NaN
NaN NaN NaN
NaN NaN NaN
The reason why cumsum is useful here is because we would essentially examine each column independently along its rows and keep accumulating over all of the rows for each column which has valid entries until we hit a NaN value for a column. After this value, subsequent values in the cumsum would become NaN for each column in each slice independently. As such, after we hit the first NaN in a column, no matter what values we encounter after (NaN or a valid number), the result in the cumsum would still be NaN. This effectively propagates NaN values after we encounter the first negative in a column for your matrix. The last bit is to find those locations in this matrix and set the corresponding locations in the original matrix to NaN, thus giving our result.

Here is a solution using accumarray.
First, get the number of rows and reshape sampl to get a 2D array; it's easier to work with:
NumRow = size(sampl,1);
a = reshape(sampl,NumRow,[])
a looks like this:
a =
0.7949 0.7102 -0.3917 0.7898 1.3096 1.0434
0.5175 -1.3068 0.8417 -0.8012 0.1604 -1.0682
0.4946 0.7416 0.0828 -0.3256 -2.1182 -0.3172
0.6639 1.4677 0.3147 0.2847 0.7071 1.4797
Then find the first row index, for each column, that is negative:
[row,col] = find(a<0);
b = accumarray(col,row,[],#min);
Now b looks like this:
b =
0
2
1
2
3
2
Before inserting NaN's, change the 0 so that whole columns are not filled with NaN's using the colon operator (see next step):
b(b==0) = NumRow+1;
Finally loop through your array and insert NaN's starting from the corresponding index in b until the last row for every column. Also reshape a to get the same size as your initial array:
for k = 1:size(a,2)
a(b(k):NumRow,k) = NaN;
end
out = reshape(a,size(sampl))
Out:
out(:,:,1) =
0.7949 0.7102 NaN
0.5175 NaN NaN
0.4946 NaN NaN
0.6639 NaN NaN
out(:,:,2) =
0.7898 1.3096 1.0434
NaN 0.1604 NaN
NaN NaN NaN
NaN NaN NaN
Here is the whole code that you can copy/paste to run:
clear
clc
NumRow = size(sampl,1);
a = reshape(sampl,NumRow,[])
[row,col] = find(a<0);
b = accumarray(col,row,[],#min)
b(b==0) = NumRow+1;
for k = 1:size(a,2)
a(b(k):NumRow,k) = NaN;
end
out = reshape(a,size(sampl))

Missing a bsxfun-based solution, anyone?
[val, ind] = max(sampl<0); %// ind gives row index of first negative value, if any
ind(~val) = inf; %// if no negative values, set ind to inf so it has no effect
sampl(bsxfun(#ge, (1:size(sampl,1)).', ind)) = NaN; %'// logical indexing to fill NaNs

How do I construct an Esri grid?

I read a lot of information about this subject but I can't obtain a solution about my problem.
First, I have a file with 3 columns: X Y Z
In MATLAB, I did this:
data = load('data.txt');
X = data(:,1);
Y = data(:,2);
Z = data(:,3);
This file is like this:
7037 6032 3
7036 6028 5
7037 6029 4
7037 6030 3
7038 6031 6
7039 6031 2
7037 6033 7
And I want to obtain the following matrix from the above matrix:
5 NaN NaN NaN NaN NaN
NaN 4 3 NaN 3 7
NaN NaN NaN 6 NaN NaN
NaN NaN NaN 2 NaN NaN
The rules is that the first column Y(1) = min(Y) , the second column Y(2) = Y(1) + 1.
The first line is X(1) = min(X), X(2) = X(1) + 1. Essentially, the first column acts as a row index, the second column acts as a column index, and for each row and column pair, the third column gets mapped to a location in this matrix. As such, the output matrix will be like so: out(1,1)=X(1) Y(1) ; out(1,2) = X(1) Y(2)
At the start, I think about created a matrix out like so:
xr = sort(unique(X));
yr = sort(unique(Y));
a = length(xr);
b = length(yr);
out = NaN(a,b);
After, with a loop, put I place this data onto this out matrix, but this obviously doesn't work.
For more information on an Esri grid, here's a Wikipedia article about it. The example grid in that page is what I desire. http://en.wikipedia.org/wiki/Esri_grid

I now understand what you want. The link that you posted from Wikipedia is very useful. You are trying to build what is known as an Esri grid. Here is a pictorial representation found on Wikipedia:
What you are given is a N x 3 matrix where the first column denotes the row IDs of this matrix, the second row denotes the column IDs of this matrix, and the third column denotes the values at each pair of IDs. So for example, given the example above - specifically looking at the right of the figure, your text file could look like:
275 125 5
275 175 2
...
...
25 75 5
25 125 1
Each row consists of a row index, a column index and a value that maps to this location in the grid. You had the right approach in that you should use unique - specifically the third output. We need to obtain a unique ID for the first two columns of your data independently. Once we do this, I'm going to show you the very powerful accumarray function. We are basically going to use the unique IDs found in the previous step, and we use these to index into our grid and place each value that corresponds to each unique pair of row and column IDs into this grid. Therefore, your code is very simply:
data = load('data.txt');
%// Or you can do this for reproducing the results
%data = [7037 6032 3;
%7036 6028 5;
%7037 6029 4;
%7037 6030 3;
%7038 6031 6;
%7039 6031 2;
%7037 6033 7];
[~,~,id1] = unique(data(:,1));
[~,~,id2] = unique(data(:,2));
out = accumarray([id1 id2], data(:,3), [], [], NaN);
out produces the desired Esri grid, and we get:
out =
5 NaN NaN NaN NaN NaN
NaN 4 3 NaN 3 7
NaN NaN NaN 6 NaN NaN
NaN NaN NaN 2 NaN NaN
So how does this work? accumarray accepts in a matrix of row and column locations that you want to use to access the output. At each of the corresponding row and column locations, you provide a value that gets mapped to this bin. Now, by default accumarray sums up the values that get mapped to each bin, but I'm going to assume that your values in your text file are all unique in that only one value gets mapped to each row and column index. Therefore, we can certainly get away with the default behaviour, and so you'd specify a [] for this behaviour (fourth input). Therefore, we will use the last column of your matrix as the values that get put into this matrix, use the [] input to allow accumarray to infer the size of your matrix (third input), then any values that don't get mapped to anything, we will fill this in with NaN. We aren't going to sum anything.
With the above explanation, the code follows.

matrix get min values of a matrix before max values occurred

I was trying to get the min values of a matrix before the max values of the matrix occurred. I have two matrices: matrix data and matrix a. Matrix a is a subset of matrix data and is composed of the max values of matrix data. I have the following code but obviously doing something wrong.
edit:
Matrix a are the max values of matrix data. I derived it from:
for x=1:size(data,1)
a(x)=max(data(x,:));
end
a=a'
clear x
matrix b code:
for x=1:size(data,1)
b(x)=min(data(x,(x<data==a)));
end
b=b'
clear x
matrix data matrix a matrix b
1 2 3 4 4 1
6 5 4 7 7 4
9 6 12 5 12 6
I need all the min values that occurred before to matrix a occurred in matrix data

Short and simple:
[a,idxmax] = max(data,[],2);
b = arrayfun(#(ii) min(data(ii,1:idxmax(ii))), 1:size(data,1));
which is the same as
b=NaN(1,size(data,1)); % preallocation!
for ii=1:size(data,1)
b(ii) = min(data(ii,1:idxmax(ii)));
end
Ignore maximum itself
If you want minimum of everything really before (and not including the maximum), it's possible that the maximum is the first number, and you try taking minimum of an empty matrix. Solution then is to use cell output, which can be empty:
b = arrayfun(#(ii) min(data(ii,1:idxmax(ii)-1)), 1:size(data,1),'uni',false);
Replace empty cells with NaN
If you want to replace empty cells to Nan and then back to a matrix use this:
b(cellfun(#isempty,b))={NaN};
b=cell2mat(b);
or simply use the earlier version and replace b(ii) with NaN when it is equal to a(ii) same outcome:
b = arrayfun(#(ii) min(data(ii,1:idxmax(ii))), 1:size(data,1));
b(b'==a) = NaN
Example:
data=magic(4)
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
outputs:
a' = 16 11 12 15
b =
16 5 6 4
and
b =[1x0 double] [5] [6] [4]
for the 2nd solution using cell output and ignoring the maximum itself also.
And btw:
for x=1:size(data,1)
a(x)=max(data(x,:));
end
a=a'
clear x
can be replaced with
a=max(data,[],2);

It's not pretty but this is the only way I found so far of doing this kind of thing without a loop.
If loops are ok I would recommend Gunther Struyf answer as the most compact use of matlab's in-built array looping function, arrayfun.
Some of the transposition etc may be superfluous if you're wanting column mins instead of row...
[mx, imx] = max(data');
inds = repmat(1:size(data,2), [size(data,1),1]);
imx2 = repmat(imx', [1, size(data,2)]);
data2 = data;
data2(inds >= imx2) = inf;
min(data2');
NOTE: if data is not needed we can remove the additional data2 variable, and reduce the line count.
So to demonstrate what this does, (and see if I understood the question correctly):
for input
>> data = [1,3,-1; 5,2,1]
I get minima:
>> min(data2')
ans = [1, inf]
I.e. it only found the min values before the max values for each row, and anything else was set to inf.
In words:
For each row get index of maximum
Generate matrix of column indices
Use repmat to generate a matrix, same size as data where each row is index of maximum
Set data to infinity where column index > max_index matrix
find min as usual.