turn elements to NaN after first negative - matlab

I've got a three dimensional array in Matlab. The first dimension is time, the second is Humidity, and the third is Temperature. If a Temperature value is < 0, I want every subsequent temperature value to be turned to NaN.
For example if the array is:
>> sampl = randn(4,3,2)
sampl(:,:,1) =
0.79487 0.71017 -0.39167
0.51754 -1.3068 0.84166
0.49461 0.74159 0.082784
0.66393 1.4677 0.31467
sampl(:,:,2) =
0.78981 1.3096 1.0434
-0.80122 0.16037 -1.0682
-0.32565 -2.1182 -0.31723
0.28468 0.70708 1.4797
What's the most efficient way to turn this into:
sampl(:,:,1) =
0.79487 0.71017 NaN
0.51754 NaN NaN
0.49461 NaN NaN
0.66393 NaN NaN
sampl(:,:,2) =
0.78981 1.3096 1.0434
NaN 0.16037 NaN
NaN NaN NaN
NaN NaN NaN
Specifically, for a particular slice, we want to process along each column, and as soon as we encounter a negative number in one column, we want that location to be NaN as well as all row locations for that same column that follow this NaN value to also be NaN.

Another easy way is to find those locations that are negative in the original matrix, creating another matrix that sets those values toNaN, invoke a cumsum or a cumulative sum along all the rows for each column in each slice of this new matrix, then set the corresponding locations in this cumsum result to NaN in the original matrix to obtain the final result:
>> out = sampl;
>> out(out < 0) = NaN;
>> out = cumsum(out);
>> sampl(isnan(out)) = NaN
sampl(:,:,1) =
0.7949 0.7102 NaN
0.5175 NaN NaN
0.4946 NaN NaN
0.6639 NaN NaN
sampl(:,:,2) =
0.7898 1.3096 1.0434
NaN 0.1604 NaN
NaN NaN NaN
NaN NaN NaN
The reason why cumsum is useful here is because we would essentially examine each column independently along its rows and keep accumulating over all of the rows for each column which has valid entries until we hit a NaN value for a column. After this value, subsequent values in the cumsum would become NaN for each column in each slice independently. As such, after we hit the first NaN in a column, no matter what values we encounter after (NaN or a valid number), the result in the cumsum would still be NaN. This effectively propagates NaN values after we encounter the first negative in a column for your matrix. The last bit is to find those locations in this matrix and set the corresponding locations in the original matrix to NaN, thus giving our result.

Here is a solution using accumarray.
First, get the number of rows and reshape sampl to get a 2D array; it's easier to work with:
NumRow = size(sampl,1);
a = reshape(sampl,NumRow,[])
a looks like this:
a =
0.7949 0.7102 -0.3917 0.7898 1.3096 1.0434
0.5175 -1.3068 0.8417 -0.8012 0.1604 -1.0682
0.4946 0.7416 0.0828 -0.3256 -2.1182 -0.3172
0.6639 1.4677 0.3147 0.2847 0.7071 1.4797
Then find the first row index, for each column, that is negative:
[row,col] = find(a<0);
b = accumarray(col,row,[],#min);
Now b looks like this:
b =
0
2
1
2
3
2
Before inserting NaN's, change the 0 so that whole columns are not filled with NaN's using the colon operator (see next step):
b(b==0) = NumRow+1;
Finally loop through your array and insert NaN's starting from the corresponding index in b until the last row for every column. Also reshape a to get the same size as your initial array:
for k = 1:size(a,2)
a(b(k):NumRow,k) = NaN;
end
out = reshape(a,size(sampl))
Out:
out(:,:,1) =
0.7949 0.7102 NaN
0.5175 NaN NaN
0.4946 NaN NaN
0.6639 NaN NaN
out(:,:,2) =
0.7898 1.3096 1.0434
NaN 0.1604 NaN
NaN NaN NaN
NaN NaN NaN
Here is the whole code that you can copy/paste to run:
clear
clc
NumRow = size(sampl,1);
a = reshape(sampl,NumRow,[])
[row,col] = find(a<0);
b = accumarray(col,row,[],#min)
b(b==0) = NumRow+1;
for k = 1:size(a,2)
a(b(k):NumRow,k) = NaN;
end
out = reshape(a,size(sampl))

Missing a bsxfun-based solution, anyone?
[val, ind] = max(sampl<0); %// ind gives row index of first negative value, if any
ind(~val) = inf; %// if no negative values, set ind to inf so it has no effect
sampl(bsxfun(#ge, (1:size(sampl,1)).', ind)) = NaN; %'// logical indexing to fill NaNs

Related

MATLAB: How to ignore NaN values in the CORR Function?

My problem in MATLAB is the opposite of the other problems reported here between NAN values in CORR function.
If I have a matrix A = [1;2;3;4] and a matrix B = [3;5;7;8], the correlation corr(A,B) is 0.9898.Ok for that.
But, If there is a NaN value in B, such as: B = [3;5;7;NaN], the correlation corr(A,B) will be NaN instead of 1.0000 (that is the correlation of the not NaN values of A (1;2;3) and B(3;5;7).
What can I do to make it calculate the corr function ignoring this NaN values making it give me answers different of "NaN"?
Many statistics functions have variants that ignore NaN values, I don't know if corr does too. But you can always fake it:
indx = ~(isnan(A) | isnan(B));
corr(A(indx),B(indx));
You can call the corr function with the rows parameter set to complete. From the official documentation:
'complete' uses only rows with no missing values
For example:
A = [1;2;3;4];
B = [3;5;7;NaN];
r = corr(A,B,'rows','complete')
Output:
r =
1.0000

How to interpolate only less than 3 consecutive Nan values in Matlab?

I have a 500x600 matrix containing some NaN values. I want to interpolate places where there are less than three NaNs (possibly an average of the preceding, following values) and for all the other places where there are more than 3 consecutive NaN values I want to leave them as Nan values. I have already looked at http://uk.mathworks.com/matlabcentral/answers/34481-interpolate-nans-only-if-less-than-4-consecutive-nans but even the accepted answer doesn't work. (I realise this one is for 4 consecutive values but it doesn't work either way).
If by writing 3 consecutive nans you mean 3 consecutive nans in a row or column, you can use the following approach:
For each row, use convolution to determine for each sequence of nans if its shorter than 3.
use the following approach to fill each line in the matrix.
fill the columns by transposing the result and executing the function again.
Code:
%generates example array
data = rand(5,5);
data (1,2:4) = nan;
data (2:5,2) = nan;
data (:,4) = nan;
%fills all relevan nans in a row
data2 = interpolateNanRows(data );
%fills all relevant nans in a column
out= interpolateNanRows(data2')';
Auxiliary functions:
function res = interpolateNanRows(data)
%zero padding
dataPad = zeros(size(data,1)+2,size(data,2)+2);
dataPad(2:end-1,2:end-1)=data;
%generates relevant nan maps
nansMap = isnan(dataPad);
irrelevantNans = conv2(double(nansMap),[1,0,0,0,1],'same')>0 & nansMap;
%fills each row
for ii=1:size(dataPad,1)
filledRow = interpolateRow(dataPad(ii,:));
%ignores irrelevant values (more than 3 consecutive nans)
filledRow(irrelevantNans(ii,:)) = nan;
dataPad(ii,:) = filledRow;
end
%generates output
res = dataPad(2:end-1,2:end-1);
end
function filledRow = interpolateRow(row)
%receives a vector of values, and perform interpolation in regions of nans
if sum(isnan(row))==0 || sum(isnan(row))==length(row)
filledRow = row;
return;
end
nanData = isnan(row);
index = 1:numel(row);
filledRow = row;
filledRow(nanData) = interp1(index(~nanData), row(~nanData), index(nanData));
end
results:
data2=
0.6386 NaN NaN NaN 0.6671
0.4805 NaN 0.3171 NaN 0.7771
0.1184 NaN 0.0124 NaN 0.6860
0.2455 NaN 0.3011 NaN 0.8014
0.7761 NaN 0.7239 NaN 0.2833
out =
0.6386 0.6457 0.6528 0.6599 0.6671
0.4805 0.3988 0.3171 0.5471 0.7771
0.1184 0.0654 0.0124 0.3492 0.6860
0.2455 0.2733 0.3011 0.5512 0.8014
0.7761 0.7500 0.7239 0.5036 0.2833

How do I construct an Esri grid?

I read a lot of information about this subject but I can't obtain a solution about my problem.
First, I have a file with 3 columns: X Y Z
In MATLAB, I did this:
data = load('data.txt');
X = data(:,1);
Y = data(:,2);
Z = data(:,3);
This file is like this:
7037 6032 3
7036 6028 5
7037 6029 4
7037 6030 3
7038 6031 6
7039 6031 2
7037 6033 7
And I want to obtain the following matrix from the above matrix:
5 NaN NaN NaN NaN NaN
NaN 4 3 NaN 3 7
NaN NaN NaN 6 NaN NaN
NaN NaN NaN 2 NaN NaN
The rules is that the first column Y(1) = min(Y) , the second column Y(2) = Y(1) + 1.
The first line is X(1) = min(X), X(2) = X(1) + 1. Essentially, the first column acts as a row index, the second column acts as a column index, and for each row and column pair, the third column gets mapped to a location in this matrix. As such, the output matrix will be like so: out(1,1)=X(1) Y(1) ; out(1,2) = X(1) Y(2)
At the start, I think about created a matrix out like so:
xr = sort(unique(X));
yr = sort(unique(Y));
a = length(xr);
b = length(yr);
out = NaN(a,b);
After, with a loop, put I place this data onto this out matrix, but this obviously doesn't work.
For more information on an Esri grid, here's a Wikipedia article about it. The example grid in that page is what I desire. http://en.wikipedia.org/wiki/Esri_grid
I now understand what you want. The link that you posted from Wikipedia is very useful. You are trying to build what is known as an Esri grid. Here is a pictorial representation found on Wikipedia:
What you are given is a N x 3 matrix where the first column denotes the row IDs of this matrix, the second row denotes the column IDs of this matrix, and the third column denotes the values at each pair of IDs. So for example, given the example above - specifically looking at the right of the figure, your text file could look like:
275 125 5
275 175 2
...
...
25 75 5
25 125 1
Each row consists of a row index, a column index and a value that maps to this location in the grid. You had the right approach in that you should use unique - specifically the third output. We need to obtain a unique ID for the first two columns of your data independently. Once we do this, I'm going to show you the very powerful accumarray function. We are basically going to use the unique IDs found in the previous step, and we use these to index into our grid and place each value that corresponds to each unique pair of row and column IDs into this grid. Therefore, your code is very simply:
data = load('data.txt');
%// Or you can do this for reproducing the results
%data = [7037 6032 3;
%7036 6028 5;
%7037 6029 4;
%7037 6030 3;
%7038 6031 6;
%7039 6031 2;
%7037 6033 7];
[~,~,id1] = unique(data(:,1));
[~,~,id2] = unique(data(:,2));
out = accumarray([id1 id2], data(:,3), [], [], NaN);
out produces the desired Esri grid, and we get:
out =
5 NaN NaN NaN NaN NaN
NaN 4 3 NaN 3 7
NaN NaN NaN 6 NaN NaN
NaN NaN NaN 2 NaN NaN
So how does this work? accumarray accepts in a matrix of row and column locations that you want to use to access the output. At each of the corresponding row and column locations, you provide a value that gets mapped to this bin. Now, by default accumarray sums up the values that get mapped to each bin, but I'm going to assume that your values in your text file are all unique in that only one value gets mapped to each row and column index. Therefore, we can certainly get away with the default behaviour, and so you'd specify a [] for this behaviour (fourth input). Therefore, we will use the last column of your matrix as the values that get put into this matrix, use the [] input to allow accumarray to infer the size of your matrix (third input), then any values that don't get mapped to anything, we will fill this in with NaN. We aren't going to sum anything.
With the above explanation, the code follows.

Mean and Standard Deviation of a column, ignoring zero values - Matlab

I am trying to find the mean of a column however I am having trouble getting an output for a function I created. My code is below, I cannot see what mistake I have made.
for j=1:48;
C_f2 = V(V(:,3) == j,:);
C_f2(C_f2==0)=NaN;
m=mean(C_f2(:,4));
s=std(C_f2(:,4));
row=[j,m,s];
s1=[s1;row];
end
I have checked the matrix, C_f2 and that is full of values so should not be returning NaN. However my output for the matrix s1 is
1 NaN NaN
2 NaN NaN
3 NaN NaN
. ... ...
48 NaN NaN
Can anyone see my issue? Help would me much appreciated!
The matrix C_f2 looks like,
1 185 01 5003
1 185 02 5009
. ... .. ....
1 259 48 5001
On line 3 you set all values which are zero to NaN. The mean function will return NaN as mean if any element is NaN. If you want to ignore the NaN values, you have to use the nanmean function, which comes with the Statistics toolbox. See the following example:
a = [1 NaN 2 3];
mean(a)
ans =
NaN
nanmean(a)
ans =
2
If you don't have the Statistics toolbox, you can exclude NaN elements with logical indexing
mean(a(~isnan(a)))
ans =
2
or it is possibly the easiest, if you directly exlude all elements which are zero instead of replacing them by NaN.
mean(a(a~=0))
Your line C_f2(C_f2==0)=NaN; will put NaNs into C_f2. Then, your mean and std operations will see those NaNs and output NaNs themselves.
To have the mean and std ignore NaN, you need to use the alternate version nanmean and nanstd.
These are part of a toolbox, however, so you might not have them if you just have the base Matlab installation.
Don't set it to NaN, any NaN involved computation without additional rules will return NaN,
use find to correctly index the none zero part of your column
say column n is your input
N = n(find(n~=0))
now do your Mu calculation
To compute the mean and standard deviation of each column excluding zeros:
A = [1 2;
3 0;
4 5;
6 7;
0 0]; %// example data
den = sum(A~=0); %// number of nonzero values in each column
mean_nz = bsxfun(#rdivide, sum(A), den);
mean2_nz = bsxfun(#rdivide, sum(A.^2), den);
std_nz = sqrt(bsxfun(#times, mean2_nz-mean_nz.^2, den./(den-1)));
The results for the example are
mean_nz =
3.5000 4.6667
std_nz =
2.0817 2.5166
The above uses the "corrected" definition of standard deviation (which divides by n-1, where n is the number of values). If you want the "uncorrected" version (i.e. divide by n):
std_nz = sqrt(mean2_nz-mean_nz.^2);

segregating elements and its index from matrix

I have matrix something like A=[NAN 0.9 0.8 0.7; NAN NAN 0.7 0; NAN NAN NAN NAN] and
I want to tell MATLAB that-
For all columns in A- If column contains only NAN then return the index of last NAN element, else find the maximum value from each column and return the value and index.
Thus, ultimately I will have vectors like-
value vector = 0.9,0.7,NA and index vector = 2, 3, 4 for this particular example. and
I think I can try "if else" loop inside the for loop but I don't know how to do it. Can anyone help?
Thanks in advance.
You can do this fairly easily using max:
A = [NaN 0.9 0.8 0.7; NaN NaN 0.7 0; NaN NaN NaN NaN];
[max_val,max_ind] = max(A,[],2);
max_ind(isnan(max_val)) = size(A,2);
The second output of max is the index of the maximum value. By default it will ignore NaN values, unless every value is NaN, in which case it returns 1. The 3rd line of this snippet simply finds values where the maximum value is NaN (i.e. the whole row is NaN), and replaces the index with the length of the row.
Here is simple, brute-force method. I applaud the finesse of MrAzzaman.
for j = 1:size(A,2)
if sum(isnan(A(:,j))) == size(A,1)
valueVec(j) = NaN;
indexVec(j) = size(A,1);
else
[valueVec(j),indexVec(j)] = max(A(:,j));
end
end