Different results using == and find in MATLAB - matlab

I have created a sparse matrix using MEX and also created a sparse matrix using MATLAB. To fill in the values of the matrix i have used same formula.
Now to check if the both the matrices are equal I used result=(A==B). result returns 1 for all indices, which implies that all the matrix elements are equal.
But if I do find(A-B) it returns some indices, which indicates that at these indices the values are non-zero. How is this possible?
Note: When i compare the value at these indices it shows equal !

I'm guessing you have values of infinity cropping up in your matrices at the same points. For example:
>> A = Inf;
>> B = Inf;
>> A == B
ans =
1 %# They are treated as equal...
>> A-B
ans =
NaN %# ...but their difference actually results in NaN...
>> find(A-B)
ans =
1 %# ...which is treated as a non-zero value.
The discrepancy here results from the fact that certain operations involving infinity result in NaN values. You can check to see if you have any infinities in A and B by using the function ISINF like so:
any(isinf(A(:)))
any(isinf(B(:)))
And if you get a value of 1 (i.e. true), then the presence of infinities is likely the source of your discrepancy.

Related

numel(isnan(A)) == numel(~isnan(A)) == numel(A)?

I have a 121x601 matrix with some NaN values.
I cannot understand the reason of the following inconsistency:
>> size(A,1)*size(A,2)
ans =
72721
>> numel(~isnan(A))
ans =
72721
>> numel(isnan(A))
ans =
72721
Can anybody point it to me, please?
numel returns the number of elements of the matrix, independently of what they are. isnan(A) transforms each element in A to a boolean, depending if the corresponding element is NaN or not. But both matrices isnan(A) and its complement ~isnan(A) have the same number of elements, namely, the number of elements of the original matrix, A.
See more about numel and isnan.

Generate random matrix with specific rank and cardinality

I would like to generate a rectangular matrix A, with entries in the closed interval [0,1], which satisfies the following properties:
(1) size(A) = (200,2000)
(2) rank(A) = 50
(3) nnz(A) = 100000
It will be best if the non-zero elements in A will decay exponentially, or at least polynomially (I want significantly more small values than large).
Obviously (I think...), normalizing to [0,1] in the end is not the major issue here.
Things I tried that didn't work:
First generating a random matrix with A=abs(randn(200,2000)) and thresholding
th = prctile(A(:),(1-(100000/(200*2000)))*100);
A = A.*(A>th);
Now that property (3) is satisfied, I lowered the rank
[U,S,V] = svd(A);
for i=51:200 S(i,i)=0; end
A = U*S/V;
But this matrix has almost full cardinality (I lost propery (3)).
First generating a matrix with the specified rank with A=rand(200,50)*rand(50,2000). Now that condition (2) is satisfied, I threshoded like before. Only now I lost property (2) as the matrix has almost full rank.
So... Is there a way to make sure both properties (2) and (3) are satisfied simultaneously?
P.S. I would like the non-zero entries in the matrix to be distributed in some random/non-structural manner (just making 50 non-zero columns or rows is not my aim...).
This satisfies all conditions, with very high probability:
A = zeros(200,2000);
A(:,1:500) = repmat(rand(200,50),1,10);
You could then then suffle the nonzero columns if desired:
A = A(:,randperm(size(A,2)));
The matrix has a vertical structure: in 500 colums all elements are nonzero, whereas in the remaining 1500 columns all elements are zero. (Not sure if that's acceptable for your purpose).
Trivial approach:
>> A= rand(200,50);
>> B= zeros(200,1950);
>> A = [A B];
>> A = A(:,randperm(size(A,2)));
>> rank(A)
ans =
50
>> nnz(A)
ans =
10000

For large sparse matrices in MATLAB, compute the cumulative sum across the columns for non-zero entries?

In MATLAB have a large matrix with transition probabilities transition_probs, and an adjacency matrix adj_mat. I want to compute the cumulative sum of the transition matrix along the columns and then element wise multiply it against the adjacency matrix which acts as a mask in this way:
cumsumTransitionMat = cumsum(transition_probs,2) .* adj_mat;
I get a MEMORY error because with the cumsum all the entries of the matrix are then non-zero.
I would like to avoid this problem by only having the cumulative sum entries where there are non zero entries in the first place. How can this be done without the use of a for loop?
when CUMSUM is applied on rows, for each row it will go and fill with values starting with the first nonzero column it finds up until the last column, thats what it does by definition.
The worst case in terms of storage is when the sparse matrix contains values at the first column, the best case is when all nonzero values occur at the last column. Example:
% worst case
>> M = sparse([ones(5,1) zeros(5,4)]);
>> MM = cumsum(M,2); % completely dense matrix
>> nnz(MM)
ans =
25
% best case
>> MM = cumsum(fliplr(M),2);
If the resulting matrix does not fit in memory, I dont see what else you can do, except maybe use a for-loop over the rows, and process the matrix is smaller batches...
Note that you cannot apply the masking operation before computing the cumulative sum, since this will alter the results. So you cant say cumsum(transition_probs .* adj_mat, 2).
You can apply cumsum on the non-zero elements only. Here is some code:
A = sparse(round(rand(100,1))); %some sparse data
A_cum = A; %instantiate A_cum by copy A
idx_A = find(A); %find non-zeros
A_cum(idx_A) = cumsum(A(idx_A)); %cumsum on non-zeros elements only
You can check the output with
B = cumsum(A);
A_cum B
1 1
0 1
0 1
2 2
3 3
4 4
5 5
0 5
0 5
6 6
and isequal(A_cum(find(A_cum)), B(find(A_cum))) gives 1.

Octave and Matlab "wat" matrix/vector inconsistencies

I've noticed various cases in Matlab and octave where functions accept both matrices and vectors, but doesn't do the same thing with vectors as it does with matrices.
This can be frustrating because when you input a matrix with a variable number of rows/columns, it could be interpreted as a vector and do something you don't expect when the height/width is 1 making for difficult debugging and weird conditional edge cases.
I'll list a few I've found, but I'm curious what others people have run into
(Note: I'm only looking for cases where code accepts matrices as valid input. Anything that raises an exception when a non-vector matrix is given as an argument doesn't count)
1) "diag" can be used to mean diagonal of a matrix or turn a vector into a diagonal matrix
Since the former is generally only used for square matrices this isn't so egregious in matlab, but in Octave it can be particularly painful when Octave interperets a vector beginning with a nonzero element and everything else zeros as a "diagonal matrix" ie
t=eye(3);
size(diag(t(:,3))) == [3,3]
size(diag(t(:,2))) == [3,3]
size(diag(t(:,1))) == [1,1]
2) Indexing into a row-vector with logicals returns a row-vector
Indexing into anything else with logicals returns a column vector
a = 1:3;
b = true(1,3);
size(a(b)) == [1, 3]
a = [a; a];
b = [b; b];
size(a(b)) == [6, 1]
3) Indexing into a vector v with an index vector i returns a vector of the same (row/col) type as v. But if either v or i is a matrix, the return value has the same size as i.
a = 1:3;
b = a';
size(a(b)) == [1, 3]
b = [b,b];
size(a(b)) == [3, 2]
4) max, min, sum etc. operate on the columns of a matrix M individiually unless M is 1xn in which case they operate on M as a single row-vector
a = 1:3
size(max(a)) == [1, 1]
a = [a;a]
size(max(a)) == [1, 3]
max is particularly bad since it can't even take a dimension as an argument (unlike sum)
What other such cases should I watch out for when writing octave/matlab code?
Each language has its own concepts. An important point of this language is to very often think of matrices as an array of vectors, each column an entry. Things will start to make sense then. If you don't want that behavior, use matrix(:) as the argument to those functions which will pass a single vector, rather than a matrix. For example:
octave> a = magic (5);
octave> max (a)
ans =
23 24 25 21 22
octave> max (a(:))
ans = 25
1) This is not true with at least Octave 3.6.4. I'm not 100% sure but may be related related to this bug which has already been fixed.
2) If you index with boolean values, it will considered to be a mask and treated as such. If you index with non-boolean values, then it's treated as the indexes for the values. This makes perfect sense to me.
3) This is not true. The returned has always the same size of the index, independent if it's a matrix or vector. The only exception is that if the index is a vector, the output will be a single row. The idea is that indexing with a single vector/matrix returns something of the same size:
octave> a = 4:7
a =
4 5 6 7
octave> a([1 1])
ans =
4 4
octave> a([1 3])
ans =
4 6
octave> a([1 3; 3 1])
ans =
4 6
6 4
4) max does take dimension as argument at least in Octave. From the 3.6.4 help text of max:
For a vector argument, return the maximum value. For a matrix
argument, return the maximum value from each column, as a row vector,
or over the dimension DIM if defined, in which case Y should be set to
the empty matrix (it's ignored otherwise).
The rest applies like I said on the intro. If you supply a matrix, it will think of each column as a dataset.
1) As pointed out by the other user, this is not true with at Octave >= 3.6.4.
In case 2) the rule is for vectors, return always the same shape of vector, for anything else return a column vector, consider:
>> a = reshape (1:3, 1,1,3)
a(:,:,1) =
1.0000e+000
a(:,:,2) =
2.0000e+000
a(:,:,3) =
3.0000e+000
>> b = true(1,3)
b =
1×3 logical array
1 1 1
>> a(b)
ans(:,:,1) =
1.0000e+000
ans(:,:,2) =
2.0000e+000
ans(:,:,3) =
3.0000e+000
>> a = [a;a]
a(:,:,1) =
1.0000e+000
1.0000e+000
a(:,:,2) =
2.0000e+000
2.0000e+000
a(:,:,3) =
3.0000e+000
3.0000e+000
>> b = [b;b]
b =
2×3 logical array
1 1 1
1 1 1
>> a(b)
ans =
1.0000e+000
1.0000e+000
2.0000e+000
2.0000e+000
3.0000e+000
3.0000e+000
You can see that this makes sense since vectors have a clear 'direction' but other shaped matrices do not when you remove elements. EDIT: actually I just checked and Octave doesn't seem work this way exactly, but probably should.
3) This is consistent with 2). Essentially if you supply a list of indices the direction of the indexed vector is preserved. If you supply indices with a shape like a matrix, the new information is the index matrix shape is used. This is more flexible, since you can always do a(b(:)) to preserve the shape of a if you so wish. You may say it is not consistent, but remember indexing with logicals may reduce the number of elements to be returned, so they cannot be reshaped in this way.
4) As pointed out in a comment, you can specify dimension for max/min to operate on: min(rand(3),[],1) or max(rand(3),[],2), but in this case there are 'legacy' issues with these functions which data back to when they were first created and now are very difficult to change without upsetting people.

Find highest/lowest value in matrix

very basic question: How can I find the highest or lowest value in a random matrix.
I know there is a possibility to say:
a = find(A>0.5)
but what I'm looking for would be more like this:
A = rand(5,5)
A =
0.9388 0.9498 0.6059 0.7447 0.2835
0.6338 0.0104 0.5179 0.8738 0.0586
0.9297 0.1678 0.9429 0.9641 0.8210
0.0629 0.7553 0.7412 0.9819 0.1795
0.3069 0.8338 0.7011 0.9186 0.0349
% find highest (or lowest) value
ans = A(19)%for the highest or A(7) %for the lowest value in this case
Have a look at the min() and max() functions. They can return both the highest/lowest value, and its index:
[B,I]=min(A(:)); %# note I fixed a bug on this line!
returns I=7 and B=A(7)=A(2,2). The expression A(:) tells MATLAB to treat A as a 1D array for now, so even though A is 5x5, it returns the linear index 7.
If you need the 2D coordinates, i.e. the "2,2" in B=A(7)=A(2,2), you can use [I,J] = ind2sub(size(A),I) which returns I=2,J=2, see here.
Update
If you need all the entries' indices which reach the minimum value, you can use find:
I = find(A==min(A(:));
I is now a vector of all of them.
For matrices you need to run the MIN and MAX functions twice since they operate column-wise, i.e. max(A) returns a vector with each element being the maximum element in the corresponding column of A.
>> A = rand(4)
A =
0.421761282626275 0.655740699156587 0.678735154857773 0.655477890177557
0.915735525189067 0.0357116785741896 0.757740130578333 0.171186687811562
0.792207329559554 0.849129305868777 0.743132468124916 0.706046088019609
0.959492426392903 0.933993247757551 0.392227019534168 0.0318328463774207
>> max(max(A))
ans =
0.959492426392903
>> min(min(A))
ans =
0.0318328463774207
Note that this only works for matrices. Higher dimensional arrays would require running MIN and MAX as many times as there are dimensions which you can get using NDIMS.
Try this out
A=magic(5)
[x,y]=find(A==max(max(A))) %index maximum of the matrix A
A_max=A(x,y)
[x1,y1]=find(A==min(max(A))) %index minimum of the matrix A
A_min=A(x1,y1)