MATLAB Remove Only Certain Zeros from Matrix - matlab

I have seen plenty of answers regarding how to remove leading and/or trailing zeros, and how to remove all zeros from a vector or matrix. What I need to do, though, is only remove some of them. I have two matrices, and I only want to remove the entries where both of them are zero. They are two-dimensional x and y coordinates, solved using characteristics (I can give more detail if needed) and I just want to remove the values where both matrices contain zeros at the same indices. I can easily convert the matrices into vectors and work with vectors, so any help in either case would be greatly appreciated.

For the sake of simplicity, let's assume you're using vectors called X and Y (of the same length), and you want to remove only those entries where both vectors are zero. Here's how (not tested):
% Find the indexes where either X or Y is different from zero
% (these are the indexes of the components we want to keep)
I = find(X~=0 | Y~=0);
% Select the desired components from X and Y
X=X(I);
Y=Y(I);
Edit: As Oli has pointed out below (and stefano explained further), you should use logical indexing for better performance.

Related

Purpose of matrix length

Matlab defines the matrix function length to return
Length of largest array dimension
What is an example of a use of knowing the largest dimension? Knowing number of rows or columns has obvious uses... but I don't know why someone would want the largest dimension regardless of whether it is rows or cols.
Thank You
In fact, most of my code wants to do things exactly once for each row, for each column or for each element.
Therefore, I typically use one of these
size(M,1)
size(M,2)
numel(V)
In particular do not depend on length to match the number of elements in a vector!
The only real convenience that I found {in older versions of matlab} for length is if I need a repeat statement rather than a while. Then it is convenient that length of vectors usually returns at least one.
Some other uses that I had for length:
A quick rough check whether something is big.
Making something square as mentioned by #Mike
This question addresses a good point and I have seen programs fail because of applying the length command on matrices (for looping). Especially when one expects to get size(M, n) because the n-th dimension should be the largest. In total, I can not see an advantage of allowing length to be applied on matrices, in fact I only see risks from probably unexpected behavior.
If I want to know the largest dimension of any matrix, I would prefer to be more explicit and use max(size(M)), which also should be much clearer for anyone reading this code.
I am not sure, whether the following example should be in this answer, but It somehow addresses the same point.
It is also useful to be explicit with dimension, when averaging over matrices. Consider the case, where you always want to average over the first dimension, i.e. over the columns of a matrix. As long as your matrix is of size n x m, where n is greater than 1, you do not have to care about specifying a dimension. But for unforseen cases, where your matrix happens to be a row-vector, things get messy:
%// good case, where num of rows is 2 or greater
size(mean(rand(2, 4), 1)) %// [1, 4]
size(mean(rand(2, 4))) %// [1, 4]
%// bad case, where num of rows is 1
size(mean(rand(1, 4), 1)) %// [1, 4]
size(mean(rand(1, 4))) %// [1, 1], returns the average of that row
If you want to create a square matrix B that can contain the input matrix A which is non-square, you can take the latter's length and use it to initialize the matrix B with zeros where the rows and columns would be of A's length, then copy the input matrix into the new zeroed matrix.
Another example - the one I use most - is when working with vectors. There it is very convenient to work with length instead of size(vec,1) or size(vec,2) as it doesn't matter if it is a row or a column vector.
As #Dennis Jaheruddin pointed out, length gave wrong results for empty vectors in some versions of MATLAB. Using numel instead of length might therefore be convenient for better backward compatibility. The readibility of the code is almost the same IMHO.
This question compares length and numel and their performance, and comes to the result that they perform similarly up to 100k elements in a vector. With more than 100k elements, numel appears to be faster. I tried to verify this (with MATLAB R2014a) and came to the following results:
Here, length is a bit slower, but as it is in the range of micro seconds, I guess it won't be a real difference in speed.

Remove duplicates in correlations in matlab

Please see the following issue:
P=rand(4,4);
for i=1:size(P,2)
for j=1:size(P,2)
[r,p]=corr(P(:,i),P(:,j))
end
end
Clearly, the loop will cause the number of correlations to be doubled (i.e., corr(P(:,1),P(:,4)) and corr(P(:,4),P(:,1)). Does anyone have a suggestion on how to avoid this? Perhaps not using a loop?
Thanks!
I have four suggestions for you, depending on what exactly you are doing to compute your matrices. I'm assuming the example you gave is a simplified version of what needs to be done.
First Method - Adjusting the inner loop index
One thing you can do is change your j loop index so that it only goes from 1 up to i. This way, you get a lower triangular matrix and just concentrate on the values within the lower triangular half of your matrix. The upper half would essentially be all set to zero. In other words:
for i = 1 : size(P,2)
for j = 1 : i
%// Your code here
end
end
Second Method - Leave it unchanged, but then use unique
You can go ahead and use the same matrix like you did before with the full two for loops, but you can then filter the duplicates by using unique. In other words, you can do this:
[Y,indices] = unique(P);
Y will give you a list of unique values within the matrix P and indices will give you the locations of where these occurred within P. Note that these are column major indices, and so if you wanted to find the row and column locations of where these locations occur, you can do:
[rows,cols] = ind2sub(size(P), indices);
Third Method - Use pdist and squareform
Since you're looking for a solution that requires no loops, take a look at the pdist function. Given a M x N matrix, pdist will find distances between each pair of rows in a matrix. squareform will then transform these distances into a matrix like what you have seen above. In other words, do this:
dists = pdist(P.', 'correlation');
distMatrix = squareform(dists);
Fourth Method - Use the corr method straight out of the box
You can just use corr in the following way:
[rho, pvals] = corr(P);
corr in this case will produce a m x m matrix that contains the correlation coefficient between each pair of columns an n x m matrix stored in P.
Hopefully one of these will work!
this works ?
for i=1:size(P,2)
for j=1:i
Since you are just correlating each column with the other, then why not just use (straight from the documentation)
[Rho,Pval] = corr(P);
I don't have the Statistics Toolbox, but according to http://www.mathworks.com/help/stats/corr.html,
corr(X) returns a p-by-p matrix containing the pairwise linear correlation coefficient between each pair of columns in the n-by-p matrix X.

Matlab code to compare two histograms

I want to compare two image histograms. They are as follows:
h1 --> double valued 1 dimension vector .4096 in length.
h2 --> double valued 1 dimension vector .4096 in length.
I am using this matlab function here:
http://clickdamage.com/sourcecode/code/compareHists.m
It is as follows:
% s = compareHists(h1,h2)
% returns a histogram similarity in the range 0..1
%
% Compares 2 normalised histograms using the Bhattacharyya coefficient.
% Assumes that sum(h1) == sum(h2) == 1
%
function s = compareHists(h1,h2)
s = sum(sum(sum(sqrt(h1).*sqrt(h2))));
My question is :
Is there a need for multiple sums?
Even if there is only one sum in the above equation, it would suffice..right?
like this: sum(sqrt(h1).*sqrt(h2)) --> ?
Can some one please explain the code above? Also, tell me if I use a single sum will it be all right?
I tried both ways and got the same answer for two image histograms. I did this with only two histograms not more and hence want to be sure.
Thanks!
In general, sum does the sum along one dimension only. If you want to sum along multiple dimensions you either
use sum several times; or
use linear indexing to reduce to a single dimension and then use sum once: sum(sqrt(h1(:)).*sqrt(h2(:))).
In your case, if there's only one dimension, yes, a single sum would suffice.
You are right. Only one sum is needed. However, if either h1 or h2 is a multidimensional matrix, then you may want to sum as many as the dimensions. For example:
A=magic(4); % a 4 by 4 matrix of magic numbers.
sum(A) % returns [34,34,34,34], i.e. the sum of elements in each column.
sum(sum(A)) % returns 136, i.e. the sum of all elements in A.
I believe the code you downloaded originaly was written to handle multiple histograms stacked as columns of a matrix. This is (IMHO) the reason for the multiple sums.
In your case you can leave it with only one sum.
You can do even better - without any sum
Hover here to see the answer
s = sqrt(h1(:)')*sqrt(h2(:));
The trick is to use vector multiplication!
I don't see any points in 3 sums too, but if you have not a vector with histogram but a matrix you will need 2 sums like this sum(sum(sqrt(h1).*sqrt(h2))) to compare them. First one will calculate the sum of the rows, the second - the sum of the columns.

MATLAB: why transpose single dimension array

I have the following Matlab code snippet that I'm having to translate to VBScript. However, I'm not understanding why the last line is even necessary.
clear i
for i = 1:numb_days
doy(i) = floor(dt_daily(i) - datenum(2012,12,31,0,0,0));
end
doy = doy';
Looking over the rest of the code, this happens in a lot of other places where there are single dimension arrays (?) being transposed in place. I'm a newbie when it comes to both these languages, as well as posting a question on Stack, as I'm a sleuth when it comes to finding answers, just not in this case. Thanks in advance.
All "arrays" in MATLAB have at least two dimensions, and can be treated as having any number of dimensions you wish. The transpose operator here is converting between a row (size [1 N] array) and a column (size [N 1] array). This can be significant when it comes to either concatenating the arrays, or performing other operations.
Conceptually, the dimension vector of a MATLAB array has as many trailing 1s as is required to perform an operation. This means that you can index any MATLAB array with any number of subscripts, providing you don't exceed the bounds, like so:
x = magic(4); % 4-by-4 square matrix
x(2,3,1,1,1) % pick an element
One final note: the ' operator is the complex-conjugate transpose CTRANSPOSE. The .' operator is the ordinary TRANSPOSE operator.

Matlab returning only NANs from a vector that has NaNs and non-NaNs

I have simulation data in a vector of size 50,000 x 1, which has NaNs and non-NaNs. I would like to average the non-NaNs, but the function nanmean returns NAN. I have tried removing the NANs, but I only get a vector of zeros. Visual inspection of the vector leads me to doubt that the true mean of this vector is really NaN.
Also, I would like to use this vector to compute covariance with several other vectors (at some point). My alternative is doing this in Excel, which would be painful.
Any thoughts?
Thank you
Let's say your data in stored in a vector A, you can take the mean of the vector excluding the NaNs as well as any Inf and -Inf values via:
meanA = mean( A(isfinite(A)) );
Assuming you have a vector that only contains finite numeric values, and a NaN here and there, the solution is very simple
nanmean(A)
This should only bring trouble if there are non finite values in your vector.
In this case you could filter them out as suggested by #Ryan, but then you need to realize that you are not actually calculating the mean of the vector.
Ask yourself whether you may instead be interested in something like
nanmedian(A)
About the calculation of covariances and the likes, assuming you have vectors v and w, then I would recommend you to do something like this:
idx = isfinite(v) & isfinite(w);
cov(v(idx),w(idx))