For large sparse matrices in MATLAB, compute the cumulative sum across the columns for non-zero entries? - matlab

In MATLAB have a large matrix with transition probabilities transition_probs, and an adjacency matrix adj_mat. I want to compute the cumulative sum of the transition matrix along the columns and then element wise multiply it against the adjacency matrix which acts as a mask in this way:
cumsumTransitionMat = cumsum(transition_probs,2) .* adj_mat;
I get a MEMORY error because with the cumsum all the entries of the matrix are then non-zero.
I would like to avoid this problem by only having the cumulative sum entries where there are non zero entries in the first place. How can this be done without the use of a for loop?

when CUMSUM is applied on rows, for each row it will go and fill with values starting with the first nonzero column it finds up until the last column, thats what it does by definition.
The worst case in terms of storage is when the sparse matrix contains values at the first column, the best case is when all nonzero values occur at the last column. Example:
% worst case
>> M = sparse([ones(5,1) zeros(5,4)]);
>> MM = cumsum(M,2); % completely dense matrix
>> nnz(MM)
ans =
25
% best case
>> MM = cumsum(fliplr(M),2);
If the resulting matrix does not fit in memory, I dont see what else you can do, except maybe use a for-loop over the rows, and process the matrix is smaller batches...
Note that you cannot apply the masking operation before computing the cumulative sum, since this will alter the results. So you cant say cumsum(transition_probs .* adj_mat, 2).

You can apply cumsum on the non-zero elements only. Here is some code:
A = sparse(round(rand(100,1))); %some sparse data
A_cum = A; %instantiate A_cum by copy A
idx_A = find(A); %find non-zeros
A_cum(idx_A) = cumsum(A(idx_A)); %cumsum on non-zeros elements only
You can check the output with
B = cumsum(A);
A_cum B
1 1
0 1
0 1
2 2
3 3
4 4
5 5
0 5
0 5
6 6
and isequal(A_cum(find(A_cum)), B(find(A_cum))) gives 1.

Related

Matlab matrix with fixed sum over rows

I'm trying to construct a matrix in Matlab where the sum over the rows is constant, but every combination is taken into account.
For example, take a NxM matrix where M is a fixed number and N will depend on K, the result to which all rows must sum.
For example, say K = 3 and M = 3, this will then give the matrix:
[1,1,1
2,1,0
2,0,1
1,2,0
1,0,2
0,2,1
0,1,2
3,0,0
0,3,0
0,0,3]
At the moment I do this by first creating the matrix of all possible combinations, without regard for the sum (for example this also contains [2,2,1] and [3,3,3]) and then throw away the element for which the sum is unequal to K
However this is very memory inefficient (especially for larger K and M), but I couldn't think of a nice way to construct this matrix without first constructing the total matrix.
Is this possible in a nice way? Or should I use a whole bunch of for-loops?
Here is a very simple version using dynamic programming. The basic idea of dynamic programming is to build up a data structure (here S) which holds the intermediate results for smaller instances of the same problem.
M=3;
K=3;
%S(k+1,m) will hold the intermediate result for k and m
S=cell(K+1,M);
%Initialisation, for M=1 there is only a trivial solution using one number.
S(:,1)=num2cell(0:K);
for iM=2:M
for temporary_k=0:K
for new_element=0:temporary_k
h=S{temporary_k-new_element+1,iM-1};
h(:,end+1)=new_element;
S{temporary_k+1,iM}=[S{temporary_k+1,iM};h];
end
end
end
final_result=S{K+1,M}
This may be more efficient than your original approach, although it still generates (and then discards) more rows than needed.
Let M denote the number of columns, and S the desired sum. The problem can be interpreted as partitioning an interval of length S into M subintervals with non-negative integer lengths.
The idea is to generate not the subinterval lengths, but the subinterval edges; and from those compute the subinterval lengths. This can be done in the following steps:
The subinterval edges are M-1 integer values (not necessarily different) between 0 and S. These can be generated as a Cartesian product using for example this answer.
Sort the interval edges, and remove duplicate sets of edges. This is why the algorithm is not totally efficient: it produces duplicates. But hopefully the number of discarded tentative solutions will be less than in your original approach, because this does take into account the fixed sum.
Compute subinterval lengths from their edges. Each length is the difference between two consecutive edges, including a fixed initial edge at 0 and a final edge at S.
Code:
%// Data
S = 3; %// desired sum
M = 3; %// number of pieces
%// Step 1 (adapted from linked answer):
combs = cell(1,M-1);
[combs{end:-1:1}] = ndgrid(0:S);
combs = cat(M+1, combs{:});
combs = reshape(combs,[],M-1);
%// Step 2
combs = unique(sort(combs,2), 'rows');
%// Step 3
combs = [zeros(size(combs,1),1) combs repmat(S, size(combs,1),1)]
result = diff(combs,[],2);
The result is sorted in lexicographical order. In your example,
result =
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0

MATLAB: find means of other rows in a matrix without loop

I'm optimizing my codes. Now I have an MxN matrix, and I want to generate a mean MxN matrix which is the mean of other rows
for example: if i have matrix A:
1 2 3
3 4 5
2 3 2
In the new matrix B, I want each one is the mean of other rows.
mean(row2,row3)
mean(row1,row3)
mean(row1,row2)
I've think of many ways, but can't avoid a loop
for row=1:3
temp = A;
temp(row,:) = [];
B(row,:) = mean(temp);
end
any thoughts?
Simple with bsxfun -
B = (bsxfun(#minus,sum(A,1),A))./(size(A,1)-1)
The trick is to subtract the current row from the sum all rows with bsxfun in a vectorized manner, thus giving us the sum of all rows except the current one. Finally, get the average values by dividing them by the number of rows minus 1.

Find the number of zero elements in a matrix in MATLAB [duplicate]

This question already has answers here:
Find specific value's count in a vector
(4 answers)
Closed 8 years ago.
I have a NxM matrix for example named A. After some processes I want to count the zero elements.
How can I do this in one line code? I tried A==0 which returns a 2D matrix.
There is a function to find the number of nonzero matrix elements nnz. You can use this function on a logical matrix, which will return the number of true.
In this case, we apply nnz on the matrix A==0, hence the elements of the logical matrix are true, if the original element was 0, false for any other element than 0.
A = [1, 3, 1;
0, 0, 2;
0, 2, 1];
nnz(A==0) %// returns 3, i.e. the number of zeros of A (the amount of true in A==0)
The credits for the benchmarking belong to Divarkar.
Benchmarking
Using the following paramters and inputs, one can benchmark the solutions presented here with timeit.
Input sizes
Small sized datasize - 1:10:100
Medium sized datasize - 50:50:1000
Large sized datasize - 500:500:4000
Varying % of zeros
~10% of zeros case - A = round(rand(N)*5);
~50% of zeros case - A = rand(N);A(A<=0.5)=0;
~90% of zeros case - A = rand(N);A(A<=0.9)=0;
The results are shown next -
1) Small Datasizes
2. Medium Datasizes
3. Large Datasizes
Observations
If you look closely into the NNZ and SUM performance plots for medium and large datasizes, you would notice that their performances get closer to each other for 10% and 90% zeros cases. For 50% zeros case, the performance gap between SUM and NNZ methods is comparatively wider.
As a general observation across all datasizes and all three fraction cases of zeros,
SUM method seems to be the undisputed winner. Again, an interesting thing was observed here that the general case solution sum(A(:)==0) seems to be better in performance than sum(~A(:)).
some basic matlab to know: the (:) operator will flatten any matrix into a column vector , ~ is the NOT operator flipping zeros to ones and non zero values to zero, then we just use sum:
sum(~A(:))
This should be also about 10 times faster than the length(find... scheme, in case efficiency is important.
Edit: in the case of NaN values you can resort to the solution:
sum(A(:)==0)
I'll add something to the mix as well. You can use histc and compute the histogram of the entire matrix. You specify the second parameter to be which bins the numbers should be collected at. If we just want to count the number of zeroes, we can simply specify 0 as the second parameter. However, if you specify a matrix into histc, it will operate along the columns but we want to operate on the entire matrix. As such, simply transform the matrix into a column vector A(:) and use histc. In other words, do this:
histc(A(:), 0)
This should be equivalent to counting the number of zeroes in the entire matrix A.
Well I don't know if I'm answering well the question but you could code it as follows :
% Random Matrix
M = [1 0 4 8 0 6;
0 0 7 4 8 0;
8 7 4 0 6 0];
n = size(M,1); % Number of lines of M
p = size(M,2); % Number of columns of M
nbrOfZeros = 0; % counter
for i=1:n
for j=1:p
if M(i,j) == 0
nbrOfZeros = nbrOfZeros + 1;
end
end
end
nbrOfZeros

Generate random matrix with specific rank and cardinality

I would like to generate a rectangular matrix A, with entries in the closed interval [0,1], which satisfies the following properties:
(1) size(A) = (200,2000)
(2) rank(A) = 50
(3) nnz(A) = 100000
It will be best if the non-zero elements in A will decay exponentially, or at least polynomially (I want significantly more small values than large).
Obviously (I think...), normalizing to [0,1] in the end is not the major issue here.
Things I tried that didn't work:
First generating a random matrix with A=abs(randn(200,2000)) and thresholding
th = prctile(A(:),(1-(100000/(200*2000)))*100);
A = A.*(A>th);
Now that property (3) is satisfied, I lowered the rank
[U,S,V] = svd(A);
for i=51:200 S(i,i)=0; end
A = U*S/V;
But this matrix has almost full cardinality (I lost propery (3)).
First generating a matrix with the specified rank with A=rand(200,50)*rand(50,2000). Now that condition (2) is satisfied, I threshoded like before. Only now I lost property (2) as the matrix has almost full rank.
So... Is there a way to make sure both properties (2) and (3) are satisfied simultaneously?
P.S. I would like the non-zero entries in the matrix to be distributed in some random/non-structural manner (just making 50 non-zero columns or rows is not my aim...).
This satisfies all conditions, with very high probability:
A = zeros(200,2000);
A(:,1:500) = repmat(rand(200,50),1,10);
You could then then suffle the nonzero columns if desired:
A = A(:,randperm(size(A,2)));
The matrix has a vertical structure: in 500 colums all elements are nonzero, whereas in the remaining 1500 columns all elements are zero. (Not sure if that's acceptable for your purpose).
Trivial approach:
>> A= rand(200,50);
>> B= zeros(200,1950);
>> A = [A B];
>> A = A(:,randperm(size(A,2)));
>> rank(A)
ans =
50
>> nnz(A)
ans =
10000

Calculation the elements of different sized matrix in Matlab

Can anybody help me to find out the method to calculate the elements of different sized matrix in Matlab ?
Let say that I have 2 matrices with numbers.
Example:
A=[1 2 3;
4 5 6;
7 8 9]
B=[10 20 30;
40 50 60]
At first,we need to find maximum number in each column.
In this case, Ans=[40 50 60].
And then,we need to find ****coefficient** (k).
Coefficient(k) is equal to 1 divided by quantity of column of matrix A.
In this case, **coefficient (k)=1/3=0.33.
I wanna create matrix C filling with calculation.
Example in MS Excel.
H4 = ABS((C2-C6)/C9)*0.33+ABS((D2-D6)/D9)*0.33+ABS((E2-E6)/E9)*0.33
I4 = ABS((C3-C6)/C9)*0.33+ABS((D3-D6)/D9)*0.33+ABS((E3-E6)/E9)*0.33
J4 = ABS((C4-C6)/C9)*0.33+ABS((D4-D6)/D9)*0.33+ABS((E4-E6)/E9)*0.33
And then (Like above)
H5 = ABS((C2-C7)/C9)*0.33+ABS((D2-D7)/D9)*0.33+ABS((E2-E7)/E9)*0.33
I5 = ABS((C3-C7)/C9)*0.33+ABS((D3-D7)/D9)*0.33+ABS((E3-E7)/E9)*0.33
J5 = ABS((C4-C7)/C9)*0.33+ABS((D4-D7)/D9)*0.33+ABS((E4-E7)/E9)*0.33
C =
0.34 =|(1-10)|/40*0.33+|(2-20)|/50*0.33+|(3-30)|/60*0.33
0.28 =|(4-10)|/40*0.33+|(5-20)|/50*0.33+|(6-30)|/60*0.33
0.22 =|(7-10)|/40*0.33+|(8-20)|/50*0.33+|(9-30)|/60*0.33
0.95 =|(1-40)|/40*0.33+|(2-50)|/50*0.33+|(3-60)|/60*0.33
0.89 =|(4-40)|/40*0.33+|(5-50)|/50*0.33+|(6-60)|/60*0.33
0.83 =|(7-40)|/40*0.33+|(8-50)|/50*0.33+|(9-60)|/60*0.33
Actually A is a 15x4 matrix and B is a 5x4 matrix.
Perhaps,the matrices dimensions are more than this matrices (variables).
How can i write this in Matlab?
Thanks you!
You can do it like so. Let's assume that A and B are defined as you did before:
A = vec2mat(1:9, 3)
B = vec2mat(10:10:60, 3)
A =
1 2 3
4 5 6
7 8 9
B =
10 20 30
40 50 60
vec2mat will transform a vector into a matrix. You simply specify how many columns you want, and it will automatically determine the right amount of rows to transform the vector into a correctly shaped matrix (thanks #LuisMendo!). Let's also define more things based on your post:
maxCol = max(B); %// Finds maximum of each column in B
coefK = 1 / size(A,2); %// 1 divided by number of columns in A
I am going to assuming that coefK is multiplied by every element in A. You would thus compute your desired matrix as so:
cellMat = arrayfun(#(x) sum(coefK*(bsxfun(#rdivide, ...
abs(bsxfun(#minus, A, B(x,:))), maxCol)), 2), 1:size(B,1), ...
'UniformOutput', false);
outputMatrix = cell2mat(cellMat).'
You thus get:
outputMatrix =
0.3450 0.2833 0.2217
0.9617 0.9000 0.8383
Seems like a bit much to chew right? Let's go through this slowly.
Let's start with the bsxfun(#minus, A, B(x,:)) call. What we are doing is taking the A matrix and subtracting with a particular row in B called x. In our case, x is either 1 or 2. This is equal to the number of rows we have in B. What is cool about bsxfun is that this will subtract every row in A by this row called by B(x,:).
Next, what we need to do is divide every single number in this result by the corresponding columns found in our maximum column, defined as maxCol. As such, we will call another bsxfun that will divide every element in the matrix outputted in the first step by their corresponding column elements in maxCol.
Once we do this, we weight all of the values of each row by coefK (or actually every value in the matrix). In our case, this is 1/3.
After, we then sum over all of the columns to give us our corresponding elements for each column of the output matrix for row x.
As we wish to do this for all of the rows, going from 1, 2, 3, ... up to as many rows as we have in B, we apply arrayfun that will substitute values of x going from 1, 2, 3... up to as many rows in B. For each value of x, we will get a numCol x 1 vector where numCol is the total number of columns shared by A and B. This code will only work if A and B share the same number of columns. I have not placed any error checking here. In this case, we have 3 columns shared between both matrices. We need to use UniformOutput and we set this to false because the output of arrayfun is not a single number, but a vector.
After we do this, this returns each row of the output matrix in a cell array. We need to use cell2mat to transform these cell array elements into a single matrix.
You'll notice that this is the result we want, but it is transposed due to summing along the columns in the second step. As such, simply transpose the result and we get our final answer.
Good luck!
Dedication
This post is dedicated to Luis Mendo and Divakar - The bsxfun masters.
Assuming by maximum number in each column, you mean columnwise maximum after vertically concatenating A and B, you can try this one-liner -
sum(abs(bsxfun(#rdivide,bsxfun(#minus,permute(A,[3 1 2]),permute(B,[1 3 2])),permute(max(vertcat(A,B)),[1 3 2]))),3)./size(A,2)
Output -
ans =
0.3450 0.2833 0.2217
0.9617 0.9000 0.8383
If by maximum number in each column, you mean columnwise maximum of B, you can try -
sum(abs(bsxfun(#rdivide,bsxfun(#minus,permute(A,[3 1 2]),permute(B,[1 3 2])),permute(max(B),[1 3 2]))),3)./size(A,2)
The output for this case stays the same as the previous case, owing to the values of A and B.