I was looking to find the most efficient way to find the non zero minimum of a matrix and found this on a forum :
Let the data be a matrix A.
A(~A) = nan;
minNonZero = min(A);
This is very short and efficient (at least in number of code lines) but I don't understand what happens when we do this. I can't find any documentation about this since it's not an operation on matrices like +,-,\,... would be.
Could anyone explain me or give me a link or something that could help me understand what is done ?
Thank you !
It uses logical indexing
~ in Matlab is the not operator. When used on a double array, it finds all elements equal to zero. e.g.:
~[0 3 4 0]
Results in the logical matrix
[1 0 0 1]
i.e. it's a quick way to find all the zero elements
So if A = [0 3 4 0] then ~A = [1 0 0 1] so now A(~A) = A([1 0 0 1]). A([1 0 0 1]) uses logical indexing to only affect the elements that are true so in this case element 1 and element 4.
Finally A(~A) = NaN will replace all the elements in A that were equal to 0 with NaN which min ignores and thus you find the smallest non-zero element.
The code you provided:
A(~A) = NaN;
minNonZero = min(A);
Does the following:
Create a logical index
Apply the logical index on A
Change A, by assigning NaN values
Get the minimum of all values, while not including NaN values
Note that this leaves you with a changed A, which may be indesirable. But more importantly this has some inefficiencies as you spend time changing A and possibly even because you get the minimum of a large matrix.
Therefore you could speed things up (and even reduce one line) by doing:
minNonZero = min(A(logical(A)))
Basically you have now skipped step 3 and possibly reduced step 4.
Furthermore, you seem to get an additional small speedup by doing:
minNonZero = min(A(A~=0))
I don't have any good reason for this, but it seems like step 1 is now done more efficiently.
Related
Is there any general way to remove NaNs from a matrix? Sometimes I come across this problem in the middle of some code and then it creates problems to get appropriate outputs. Is there any way to generate any kind of check to avoid NaNs arising in a MATLAB code? It will be really helpful if someone can kindly give me an example with some idea related to it.
You can detect nan values with the isnan function:
A = [1 NaN 3];
A(~isnan(A))
1 3
This actually removes nan values, however this is not always possible, e.g.
A = [1 nan; 2 3];
A(~isnan(A))
1
2
3
as you can see this destroys the matrix structure. You can avoid this by preallocating first and thereby setting the nan values to zero:
B = zeros(size(A));
B(~isnan(A))=A(~isnan(A))
B =
1 0
2 3
or, overwriting our original matrix A
A(isnan(A))=0
A =
1 0
2 3
There are several functions that work with NaNs: isnan, nanmean, max() and min() also have a NaN flag ('omitnan') whether you want to include NaNs in the min or max evaluation.
Although you must pay attention: sometimes the NaNs can be as well generated by your code (e.g. 0/0 or also when performing standardization (x-mean(x))/std(x) if x contains either 1 value or several but equal values).
You cannot avoid NaN since some computations produces it as a result. For example, if you compute 1/0-1/0 you will get NaN. You should deal with NaNs in the code level, using builtin functions like isnan.
Several situations that come up with a matrix A containing NaN values:
(1) Construct a new matrix where all rows with a NaN are removed.
row_mask = ~any(isnan(A),2);
A_nonans = A(row_mask,:);
(2) Construct a new matrix where all columns with a NaN are removed.
column_mask = ~any(isnan(A),1);
A_nonans = A(:, column_mask);
(3) Construct a new matrix where all NaN entries are replaced with 0.
A_nans_replaced = A;
A_nans_replaced(isnan(A_nans_replaced)) = 0;
Easy:
A=[1 2; nan 4];
A(isnan(A))=0;
I'm still fairly new to working with Matlab and programming. I have a dataset with n trials, of these (in this case) m are relevant. So I have an m-by-1 vector with the indices of the relevant trials (rel). I have another vector (Correct which is n-by-1) that consists of 0 and 1. n is always bigger than m. I need to know which trials (of the m-by-1 relevant trials) have a 1 in the n-by-1 vector. I have tried for-loops but I always get an error 'Index exceeds matrix dimensions.'
Here is my code:
for i=1:length(rel);
CC=rel(find(Correct==1));
end;
I think it should be fairly simple but I don't know yet how to explain to Matlab what I want...
Thank you all for your answers. I realized that my question was not as clear as I thought (also a learning process I guess..) so your suggestions weren't exactly what I need. I'm sorry for being unclear.
Correct is not a logical, it does contain 0 and 1 but these refer to correct or incorrect answer (I'm actually not sure if this matters but I thought I let you know)
rel is a subset of the original data with all trials (all trials=n trials), Correct is the same length as the original vector with all trials (n trials). So rel contains the indices of the (for me) relevant trials of the original data and is that way connected to Correct.
I hope this makes my question a bit more clear, if not, let me know!
Thank you!
It's not quite clear from your question what you are trying to do but I think I have an idea.
You have a vector n similar to
>> n = round(rand(1, 10))
n =
0 1 1 0 0 0 1 0 0 1
and m is indices of this vector similar to
>> m = [1 3 7 9];
Now we use m to index n as n(m) which will return the values of n corresponding to the elements in m. Next we need to check these for equality with 1 as n(m) == 1 and finally we need to figure out what values of m have n equal to 1 again by indexing. So putting this altogether we get
>> m(n(m) == 1)
ans =
3 7
To find the indices of m that are being returned you can use
>> find(n(m) == 1)
ans =
2 4
I'm assuming that Correct is of type logical (i.e. it contains trues and falses instead of 0s and 1s).
You actually don't need a loop here (this is clear in your case, since you are looping over i and never actually use i in your loop):
m = numel(rel)
CC = rel(Correct(1:m))
The reason you are getting that error is because Correct has more elements than rel so you are attempting to address elements beyond the end of rel. I solve this above by only considering the first m elements of Correct.
I need a fast way in Matlab to do something like this (I am dealing with huge vectors, so a normal loop takes forever!):
from a vector like
[0 0 2 3 0 0 0 5 0 0 7 0]
I need to get this:
[NaN NaN 2 3 3 3 3 5 5 5 7 7]
Basically, each zero value is replaced with the value of the previous non-zero one. The first are NaN because there is no previous non-zero element
in the vector.
Try this, not sure about speed though. Got to run so explanation will have to come later if you need it:
interp1(1:nnz(A), A(A ~= 0), cumsum(A ~= 0), 'NearestNeighbor')
Try this (it uses the cummax function, introduced in R2014b):
i1 = x==0;
i2 = cummax((1:numel(x)).*~i1);
x(i1&i2) = x(i2(i3));
x(~i2) = NaN;
Just for reference, here are some similar/identical functions from exchange central and/or SO columns.
nearestpoint ,
try knnimpute function.
Or best of all, a function designed to do exactly your task:
repnan (obviously, first replace your zero values with NaN)
I had a similar problem once, and decided that the most effective way to deal with it is to write a mex file. The c++ loop is extremely trivial. After you'l figure out how to work with mex interface, it will be very easy.
I have a line of code in matlab for which i am selecting a subset of a matrix:
A(3:5,1:3);
Now i want to adapt this line, to only select rows for which all three values are larger than zero:
(A(3:5,1:3) > 0);
But apparently i am not doing this right. How do i select part of the matrix, and also make sure that only the rows (for which all three values are) larger than zero are selected?
EDIT: To clarify: lets say that i have a matrix of coordinates called A, that looks like this:
Matrix A [5,3]
3 4 0
0 1 0
0 3 1
0 0 0
4 8 7
Now i want to select only part [3:5,1:3], and of that part i only want to select row 3 and 5. How do i do that?
The expression:
A(find(sum(A(3:5,:),2)~=0),:)
will return only the rows of A(3:5,:) which have a row-sum not equal to zero.
If you had posted syntactically correct Matlab it would have been easier for me to cut and paste your test data into my Matlab session.
I'm modelling this answer off of A(find( A > 0 ))
distances = pdist(find( pdist(medoidContainer(i,1:3)) > 0 ));
This will give you a vector of values in the distances variable. The reason the pdist(medoidContainer(i,1:3) > 0) does not work is because it first, finds the indices specified by i,1:3 in medoidContainer. Then it finds the indices in medoidContainer(i,1:3) that are greater than 0. However, since medoidContainer(i,1:3) and pdist now likely have different dimensions, the comparison does not give the right indexes.
I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...
A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.
Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?
How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).