Scipy sparse matrix become dense matrix after assignment

Scipy sparse matrix become dense matrix after assignment - scipy

alpha = csr_matrix((1000,1000),dtype=np.float32)
beta = csr_matrix((1,1000),dtype=np.float32)
alpha[0,:] = beta
After initiation, alpha and beta should be sparse matrixes with no element stored there. But after assigning beta to the first row of alpha, alpha become non-sparse, with 1000 zeros stored in alpha. I know I can use eliminate_zeros() to turn alpha back to sparse matrix but is there any better way to do this?

When I copy your steps I get
In [131]: alpha[0,:]=beta
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:730:
SparseEfficiencyWarning: Changing the sparsity structure of a
csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
So that's the first indicator that you are doing something that the developers consider unwise.
We could dig into the csr __setitem__ code, but my guess is that it is converting your beta to dense, and then doing the assignment. And isn't automatically doing the eliminate_zeros step (either during or after the assignment).
Normally why would people be doing a[...]=...? Usually it's to build the sparse matrix. Zeroing out non-zero values is possible, but not frequent enough to treat as a special case.
It's possible for a variety of reasons to have 0 values in a sparse matrix. You could even insert the 0s into alpha.data directly. That's why there are 'cleanup' methods like eliminate_zeros and prune. Even nonzero applies a !=0 mask
# convert to COOrdinate format
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
In normal sparse practice you build the data in coo or other format, and then convert to csr for calculations. Matrix multiplication is it's strong point. That constructs a new sparse matrix. Modification of a csr is possible, but not encouraged.
====================
alpha.__setitem__?? (in Ipython) shows
def __setitem__(self, index, x):
# Process arrays from IndexMixin
i, j = self._unpack_index(index)
i, j = self._index_to_arrays(i, j)
if isspmatrix(x):
x = x.toarray()
....
self._set_many(i, j, x.ravel())
So yes, it converts the RHS to a dense array before doing the assignment.

Related

Alternate method for bin2dec function [duplicate]

I have a 12-bit binary that I need to convert to a decimal. For example:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
Bit 1 is the most significant bit, Bit 12 is the least significant bit.

Note: This answer applies primarily to unsigned data types. For converting to signed types, a few extra steps are necessary, discussed here.
The bin2dec function is one option, but requires you to change the vector to a string first. bin2dec can also be slow compared to computing the number yourself. Here's a solution that's about 75 times faster:
>> A = [0,1,1,0,0,0,0,0,1,1,0,0];
>> B = sum(A.*2.^(numel(A)-1:-1:0))
B =
1548
To explain, A is multiplied element-wise by a vector of powers of 2, with the exponents ranging from numel(A)-1 down to 0. The resulting vector is then summed to give the integer represented by the binary pattern of zeroes and ones, with the first element in the array being considered the most significant bit. If you want the first element to be considered the least significant bit, you can do the following:
>> B = sum(A.*2.^(0:numel(A)-1))
B =
774
Update: You may be able to squeeze even a little more speed out of MATLAB by using find to get the indices of the ones (avoiding the element-wise multiplication and potentially reducing the number of exponent calculations needed) and using the pow2 function instead of 2.^...:
B = sum(pow2(find(flip(A))-1)); % Most significant bit first
B = sum(pow2(find(A)-1)); % Least significant bit first
Extending the solution to matrices...
If you have a lot of binary vectors you want to convert to integers, the above solution can easily be modified to convert all the values with one matrix operation. Suppose A is an N-by-12 matrix, with one binary vector per row. The following will convert them all to an N-by-1 vector of integer values:
B = A*(2.^(size(A, 2)-1:-1:0)).'; % Most significant bit first
B = A*(2.^(0:size(A, 2)-1)).'; % Least significant bit first
Also note that all of the above solutions automatically determine the number of bits in your vector by looking at the number of columns in A.

Dominic's answer assumes you have access to the Data Acquisition toolbox. If not use bin2dec:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A) )
or (in reverse)
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A(end:-1:1)) )
depending on what you intend to be bit 1 and 12!

If the MSB is right-most (I'm not sure what you mean by Bit 1, sorry if that seems stupid):
Try:
binvec2dec(A)
Output should be:
ans =
774
If the MSB is left-most, use fliplr(A) first.

How to speed up sparse matrix index operation in Matlab?

I need to create spare matrices with variable elements. Unfortunately, sparse matrix index operations are very slow.
Is there any way to speed up the process? Maybe there are some tricks that I don't know of?
Here is a minimal working example:
N = 400;
A = eye(7)+ [ zeros(3,3) eye(3) zeros(3,1) ;
zeros(3,3) zeros(3,3) zeros(3,1) ;
zeros(1,3) zeros(1,3) zeros(1,1) ] ;
B = [ zeros(3,3) zeros(3,1) ;
eye(3) zeros(3,1) ;
zeros(1,3) -2 ] ;
Su = sparse(7*N, 4*N);
for i=1:N
for j=0:i-1
Su(((i)*7 + 1) : ((i+1) * 7), (j*4 + 1) : ...
((j+1) * 4)) = A^(i-j-1) * B;
end
end
Sx = sparse(N*7, N*7);
for i=0:N
Sx((1 + i*7 : (i+1)*7), (1 + i*7 : (i+1)*7)) = A^i;
end
Su and Sx are matrices that are partitioned by (products of) A and B.

MATLABs code analyzer already mentions that this method of assigning sparse matrices is very slow as the nonzero pattern is changed which incurs significant overhead. Forgoing the sparse matrices and just using Su = zeros(.. instead makes the function execute in a reasonable time, even if you convert the matrices to sparse matrices afterwards.
If the memory used is of any concern, you could consider defining the zeros matrix as a different type, e.g. int16, as all values appear to be signed integers smaller than 400.
Otherwise, if you can predict the nonzero pattern in you're sparse matrix you can define it correctly upfront and won't suffer performance problems. You don't need to know the exact values, just whether a value is nonzero or not. Taking the following from MATLAB's Code Analyzer:
Explanation
Code Analyzer detects an indexing pattern for a sparse array that is likely to be slow. An assignment that changes the nonzero pattern of a sparse array can cause this error because such assignments result in considerable overhead.
Suggested Action
If possible, build sparse arrays using sparse as follows, and do not use indexed assignments (such as C(4) = B) to build them:
Create separate index and value arrays.
Call sparse to assemble the index and value arrays.
If you must use indexed assignments to build sparse arrays, you can optimize performance by first preallocating the sparse array with spalloc.
If the code changes only array elements that are already nonzero, then the overhead is reasonable. Suppress this message as described in Adjust Code Analyzer Message Indicators and Messages.
For more information, see “Constructing Sparse Matrices”.
So, if you can create an array of indices that will contain nonzero values, you can call sparse(i,j,s[,m,n,nzmax]) to predefine the nonzero pattern. If you only know the number of nonzero values, but not their pattern, using spalloc should also improve your performance.

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?

+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.

One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/

How can I convert a binary to a decimal without using a loop?

I have a 12-bit binary that I need to convert to a decimal. For example:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
Bit 1 is the most significant bit, Bit 12 is the least significant bit.

Note: This answer applies primarily to unsigned data types. For converting to signed types, a few extra steps are necessary, discussed here.
The bin2dec function is one option, but requires you to change the vector to a string first. bin2dec can also be slow compared to computing the number yourself. Here's a solution that's about 75 times faster:
>> A = [0,1,1,0,0,0,0,0,1,1,0,0];
>> B = sum(A.*2.^(numel(A)-1:-1:0))
B =
1548
To explain, A is multiplied element-wise by a vector of powers of 2, with the exponents ranging from numel(A)-1 down to 0. The resulting vector is then summed to give the integer represented by the binary pattern of zeroes and ones, with the first element in the array being considered the most significant bit. If you want the first element to be considered the least significant bit, you can do the following:
>> B = sum(A.*2.^(0:numel(A)-1))
B =
774
Update: You may be able to squeeze even a little more speed out of MATLAB by using find to get the indices of the ones (avoiding the element-wise multiplication and potentially reducing the number of exponent calculations needed) and using the pow2 function instead of 2.^...:
B = sum(pow2(find(flip(A))-1)); % Most significant bit first
B = sum(pow2(find(A)-1)); % Least significant bit first
Extending the solution to matrices...
If you have a lot of binary vectors you want to convert to integers, the above solution can easily be modified to convert all the values with one matrix operation. Suppose A is an N-by-12 matrix, with one binary vector per row. The following will convert them all to an N-by-1 vector of integer values:
B = A*(2.^(size(A, 2)-1:-1:0)).'; % Most significant bit first
B = A*(2.^(0:size(A, 2)-1)).'; % Least significant bit first
Also note that all of the above solutions automatically determine the number of bits in your vector by looking at the number of columns in A.

Dominic's answer assumes you have access to the Data Acquisition toolbox. If not use bin2dec:
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A) )
or (in reverse)
A = [0,1,1,0,0,0,0,0,1,1,0,0];
bin2dec( sprintf('%d',A(end:-1:1)) )
depending on what you intend to be bit 1 and 12!

If the MSB is right-most (I'm not sure what you mean by Bit 1, sorry if that seems stupid):
Try:
binvec2dec(A)
Output should be:
ans =
774
If the MSB is left-most, use fliplr(A) first.

Compact MATLAB matrix indexing notation

I've got an n-by-k sized matrix, containing k numbers per row. I want to use these k numbers as indexes into a k-dimensional matrix. Is there any compact way of doing so in MATLAB or must I use a for loop?
This is what I want to do (in MATLAB pseudo code), but in a more MATLAB-ish way:
for row=1:1:n
finalTable(row) = kDimensionalMatrix(indexmatrix(row, 1),...
indexmatrix(row, 2),...,indexmatrix(row, k))
end

If you want to avoid having to use a for loop, this is probably the cleanest way to do it:
indexCell = num2cell(indexmatrix, 1);
linearIndexMatrix = sub2ind(size(kDimensionalMatrix), indexCell{:});
finalTable = kDimensionalMatrix(linearIndexMatrix);
The first line puts each column of indexmatrix into separate cells of a cell array using num2cell. This allows us to pass all k columns as a comma-separated list into sub2ind, a function that converts subscripted indices (row, column, etc.) into linear indices (each matrix element is numbered from 1 to N, N being the total number of elements in the matrix). The last line uses these linear indices to replace your for loop. A good discussion about matrix indexing (subscript, linear, and logical) can be found here.
Some more food for thought...
The tendency to shy away from for loops in favor of vectorized solutions is something many MATLAB users (myself included) have become accustomed to. However, newer versions of MATLAB handle looping much more efficiently. As discussed in this answer to another SO question, using for loops can sometimes result in faster-running code than you would get with a vectorized solution.
I'm certainly NOT saying you shouldn't try to vectorize your code anymore, only that every problem is unique. Vectorizing will often be more efficient, but not always. For your problem, the execution speed of for loops versus vectorized code will probably depend on how big the values n and k are.

To treat the elements of the vector indexmatrix(row, :) as separate subscripts, you need the elements as a cell array. So, you could do something like this
subsCell = num2cell( indexmatrix( row, : ) );
finalTable( row ) = kDimensionalMatrix( subsCell{:} );
To expand subsCell as a comma-separated-list, unfortunately you do need the two separate lines. However, this code is independent of k.

Convert your sub-indices into linear indices in a hacky way
ksz = size(kDimensionalMatrix);
cksz = cumprod([ 1 ksz(1:end-1)] );
lidx = ( indexmatrix - 1 ) * cksz' + 1; #'
% lindx is now (n)x1 linear indices into kDimensionalMatrix, one index per row of indexmatrix
% access all n values:
selectedValues = kDimensionalMatrix( lindx );
Cheers!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse