replace zero values with previous non-zero values - matlab

I need a fast way in Matlab to do something like this (I am dealing with huge vectors, so a normal loop takes forever!):
from a vector like
[0 0 2 3 0 0 0 5 0 0 7 0]
I need to get this:
[NaN NaN 2 3 3 3 3 5 5 5 7 7]
Basically, each zero value is replaced with the value of the previous non-zero one. The first are NaN because there is no previous non-zero element
in the vector.

Try this, not sure about speed though. Got to run so explanation will have to come later if you need it:
interp1(1:nnz(A), A(A ~= 0), cumsum(A ~= 0), 'NearestNeighbor')

Try this (it uses the cummax function, introduced in R2014b):
i1 = x==0;
i2 = cummax((1:numel(x)).*~i1);
x(i1&i2) = x(i2(i3));
x(~i2) = NaN;

Just for reference, here are some similar/identical functions from exchange central and/or SO columns.
nearestpoint ,
try knnimpute function.
Or best of all, a function designed to do exactly your task:
repnan (obviously, first replace your zero values with NaN)

I had a similar problem once, and decided that the most effective way to deal with it is to write a mex file. The c++ loop is extremely trivial. After you'l figure out how to work with mex interface, it will be very easy.

Related

How to add two matrices and get rid of Nans

How can I add two matrices and keep only the numbers ignoring the NaN values?
for example:
A=[NaN 2 NaN];
B=[1 NaN 3];
I want some form of plus C=A+B such that:
C=[1 2 3]
You can achieve this without using any specific function call just by setting the NaNs to 0s and then performing the sum:
A(A~=A)=0
B(B~=B)=0
C=A+B
Edit: Another way of achieving this as #rayryeng suggested in the first comment is to use isnan:
A(isnan(A))=0
B(isnan(B))=0
C=A+B
You can use nansum (you need Statistics and Machine Learning Toolbox):
C = nansum([A;B])
and get:
C =
1 2 3
Alternatively, you can use sum with an excluding NaN flag:
C = sum([A;B],'omitnan')
And you will get the same result.

Is there any general way to remove NaNs from a matrix?

Is there any general way to remove NaNs from a matrix? Sometimes I come across this problem in the middle of some code and then it creates problems to get appropriate outputs. Is there any way to generate any kind of check to avoid NaNs arising in a MATLAB code? It will be really helpful if someone can kindly give me an example with some idea related to it.
You can detect nan values with the isnan function:
A = [1 NaN 3];
A(~isnan(A))
1 3
This actually removes nan values, however this is not always possible, e.g.
A = [1 nan; 2 3];
A(~isnan(A))
1
2
3
as you can see this destroys the matrix structure. You can avoid this by preallocating first and thereby setting the nan values to zero:
B = zeros(size(A));
B(~isnan(A))=A(~isnan(A))
B =
1 0
2 3
or, overwriting our original matrix A
A(isnan(A))=0
A =
1 0
2 3
There are several functions that work with NaNs: isnan, nanmean, max() and min() also have a NaN flag ('omitnan') whether you want to include NaNs in the min or max evaluation.
Although you must pay attention: sometimes the NaNs can be as well generated by your code (e.g. 0/0 or also when performing standardization (x-mean(x))/std(x) if x contains either 1 value or several but equal values).
You cannot avoid NaN since some computations produces it as a result. For example, if you compute 1/0-1/0 you will get NaN. You should deal with NaNs in the code level, using builtin functions like isnan.
Several situations that come up with a matrix A containing NaN values:
(1) Construct a new matrix where all rows with a NaN are removed.
row_mask = ~any(isnan(A),2);
A_nonans = A(row_mask,:);
(2) Construct a new matrix where all columns with a NaN are removed.
column_mask = ~any(isnan(A),1);
A_nonans = A(:, column_mask);
(3) Construct a new matrix where all NaN entries are replaced with 0.
A_nans_replaced = A;
A_nans_replaced(isnan(A_nans_replaced)) = 0;
Easy:
A=[1 2; nan 4];
A(isnan(A))=0;

vector which skip's a step

I want to create a vector without the number 1 .
x=-10:1:10;
To avoid this:
for(n=0:21)
if(x(n)==1)
x(n)=[];
end
end
What can I do ?
I would use setdiff
>> setdiff(-5:5,1)
ans =
-5 -4 -3 -2 -1 0 2 3 4 5
Instead of manually generating a vector from -10 to 10 and removing the entry that has the value of 1, you can always use colon / : and not include 1 in the vector instead. Something like:
x = [-10:0 2:10];
Because it's such a small vector, you probably won't gain much by doing it this way in comparison to fully generating the vector and removing one entry as per David's suggestion. I do agree with David though. Learn logical indexing! It's one of the backbones for making any MATLAB code fast.
You can try setting it manually to " ".
eg x(10)=[];

What does the command A(~A) really do in matlab

I was looking to find the most efficient way to find the non zero minimum of a matrix and found this on a forum :
Let the data be a matrix A.
A(~A) = nan;
minNonZero = min(A);
This is very short and efficient (at least in number of code lines) but I don't understand what happens when we do this. I can't find any documentation about this since it's not an operation on matrices like +,-,\,... would be.
Could anyone explain me or give me a link or something that could help me understand what is done ?
Thank you !
It uses logical indexing
~ in Matlab is the not operator. When used on a double array, it finds all elements equal to zero. e.g.:
~[0 3 4 0]
Results in the logical matrix
[1 0 0 1]
i.e. it's a quick way to find all the zero elements
So if A = [0 3 4 0] then ~A = [1 0 0 1] so now A(~A) = A([1 0 0 1]). A([1 0 0 1]) uses logical indexing to only affect the elements that are true so in this case element 1 and element 4.
Finally A(~A) = NaN will replace all the elements in A that were equal to 0 with NaN which min ignores and thus you find the smallest non-zero element.
The code you provided:
A(~A) = NaN;
minNonZero = min(A);
Does the following:
Create a logical index
Apply the logical index on A
Change A, by assigning NaN values
Get the minimum of all values, while not including NaN values
Note that this leaves you with a changed A, which may be indesirable. But more importantly this has some inefficiencies as you spend time changing A and possibly even because you get the minimum of a large matrix.
Therefore you could speed things up (and even reduce one line) by doing:
minNonZero = min(A(logical(A)))
Basically you have now skipped step 3 and possibly reduced step 4.
Furthermore, you seem to get an additional small speedup by doing:
minNonZero = min(A(A~=0))
I don't have any good reason for this, but it seems like step 1 is now done more efficiently.

Performace Issue on MATLAB's vector addressing

I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...
A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.
Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?
How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).