Element wise exp() of scipy sparse matrix - scipy

I have a very big sparse csc_matrix x. I want to do elementwise exp() on it. Basically what I want is to get the same result as I would have got with numpy.exp(x.toarray()). But I can't do that(my memory won't allow me to convert the sparse matrix into an array). Is there any way out? Thanks in advance!

If you don't have the memory to hold x.toarray(), you don't have the memory to hold the output you're asking for. The output won't be sparse; in fact, unless your input has negative infinities in it, the output probably won't have a single 0.
It'd probably be better to compute exp(x)-1, which is as simple as
x.expm1()

If you want to do something on nonzeros only: the data attribute is writable at least in some representations including csr and csc. Some representations allow for duplicate entries, so make sure you are acting on a "normalised" form.

To change non-zero elements, maybe this would work for you:
x = some big sparse matrix
np.exp( x.data, out=x.data ) # ask np.exp() to store results in existing x.data
presumably slower:
# above seems more efficient (no new memory alloc).
x.data = np.exp( x.data )
I've been wrestling with how to get an element-wise log2() of each non-zero array element. I ended up doing smth like:
np.log2( x.data, out=x.data )
The following two techniques seem like exactly what I was looking for. My matrix is sparse but it still plenty of non-zero elements.
Credit to #DSM here for the idea of directly changing x.data, I think that is a superb insight about sparse matrices.
Credit to #Mike Müller for the idea of using "out" as itself. In the same thread, #kmario23 points out an important caveat about promoting .data to floats (inputs could be int or smth) so it is compatible with the .exp() or whatever function, I would want to do that if I was writing smth for general use.
note: I'm just starting to learn about sparse matrices, so would like to know if this is a bad idea for reason(s) I'm not seeing. Please do let me know if I'm on thin ice with this.
Normally I wouldn't mess with private attributes, but .data shows up pretty clearly in the attributes documentation for the various sparse matrices I've looked at.

Related

How to use Shuffle.c index mode for complex numbers

I want to shuffle a 3d array on the third dimension using Shuffle.c.
Till now I have used Shuffle(arr,3) with great performance. Now I try to do the same, but with array of Complex numbers and get this Error:
*** Shuffle[mex]: Use index mode for complex input!
I haven't found the proper way to use index mode.
Thank you.
Did you verify the better performance for you input sizes? For 3D matrices shuffling one dimensions, you are better off not using the MEX function with the above mentioned syntax. It forces MATLAB to copy the whole matrix to mex and back. For comparison:
arr=rand(100,100,100);
t.mexshuffle=timeit(#()(Shuffle(arr,3)));
t.randperm=timeit(#()arr(:,:,randperm(size(arr,3))));
t.mexindexshuffle=timeit(#()arr(:,:,Shuffle(size(arr,3),'index')));
The results are:
struct with fields:
mexshuffle: 0.0183
randperm: 0.0038
mexindexshuffle: 0.0037
It does not really matter if you use Shuffle with the index option or randperm, but directly using Shuffle is slower. Nice side-effect, the later two options support complex numbers.
Above code might be a bit hard to read, here a cleaner version of the suggested solution:
P=randperm(size(arr,3)); % Permutation vector, use whichever generator you prefer.
%P=Shuffle(size(arr,3),'index');
arr_out=arr(:,:,P);

Matlab: need some help for a seemingly simple vectorization of an operation

I would like to optimize this piece of Matlab code but so far I have failed. I have tried different combinations of repmat and sums and cumsums, but all my attempts seem to not give the correct result. I would appreciate some expert guidance on this tough problem.
S=1000; T=10;
X=rand(T,S),
X=sort(X,1,'ascend');
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(cc,:)-X(c,:))-(cc-c)/T;
Result=Result+abs(d');
end
end
Basically I create 1000 vectors of 10 random numbers, and for each vector I calculate for each pair of values (say the mth and the nth) the difference between them, minus the difference (n-m). I sum over of possible pairs and I return the result for every vector.
I hope this explanation is clear,
Thanks a lot in advance.
It is at least easy to vectorize your inner loop:
Result=zeros(S,1);
for c=1:T-1
d=(X(c+1:T,:)-X(c,:))-((c+1:T)'-c)./T;
Result=Result+sum(abs(d),1)';
end
Here, I'm using the new automatic singleton expansion. If you have an older version of MATLAB you'll need to use bsxfun for two of the subtraction operations. For example, X(c+1:T,:)-X(c,:) is the same as bsxfun(#minus,X(c+1:T,:),X(c,:)).
What is happening in the bit of code is that instead of looping cc=c+1:T, we take all of those indices at once. So I simply replaced cc for c+1:T. d is then a matrix with multiple rows (9 in the first iteration, and one fewer in each subsequent iteration).
Surprisingly, this is slower than the double loop, and similar in speed to Jodag's answer.
Next, we can try to improve indexing. Note that the code above extracts data row-wise from the matrix. MATLAB stores data column-wise. So it's more efficient to extract a column than a row from a matrix. Let's transpose X:
X=X';
Result=zeros(S,1);
for c=1:T-1
d=(X(:,c+1:T)-X(:,c))-((c+1:T)-c)./T;
Result=Result+sum(abs(d),2);
end
This is more than twice as fast as the code that indexes row-wise.
But of course the same trick can be applied to the code in the question, speeding it up by about 50%:
X=X';
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(:,cc)-X(:,c))-(cc-c)/T;
Result=Result+abs(d);
end
end
My takeaway message from this exercise is that MATLAB's JIT compiler has improved things a lot. Back in the day any sort of loop would halt code to a grind. Today it's not necessarily the worst approach, especially if all you do is use built-in functions.
The nchoosek(v,k) function generates all combinations of the elements in v taken k at a time. We can use this to generate all possible pairs of indicies then use this to vectorize the loops. It appears that in this case the vectorization doesn't actually improve performance (at least on my machine with 2017a). Maybe someone will come up with a more efficient approach.
idx = nchoosek(1:T,2);
d = bsxfun(#minus,(X(idx(:,2),:) - X(idx(:,1),:)), (idx(:,2)-idx(:,1))/T);
Result = sum(abs(d),1)';
Update: here are the results for the running times for the different proposals (10^5 trials):
So it looks like the transformation of the matrix is the most efficient intervention, and my original double-loop implementation is, amazingly, the best compared to the vectorized versions. However, in my hands (2017a) the improvement is only 16.6% compared to the original using the mean (18.2% using the median).
Maybe there is still room for improvement?

Test for Duplicate Quickly in Matlab Array

I have two matrices S and T which have n columns and a row vector v of length n. By my construction, I know that S does not have any duplicates. What I'm looking for is a fast way to find out whether or not the row vector v appears as one of the rows of S. Currently I'm using the test
if min([sum(abs(S - repmat(f,size(S,1),1)),2);sum(abs(T - repmat(v,size(dS_new,1),1)),2)]) ~= 0 ....
When I first wrote it, I had a for loop testing each (I knew this would be slow, I was just making sure the whole thing worked first). I then changed this to defining a matrix diff by the two components above and then summing, but this was slightly slower than the above.
All the stuff I've found online says to use the function unique. However, this is very slow as it orders my matrix after. I don't need this, and it's a massively waste of time (it makes the process really slow). This is a bottleneck in my code -- taking nearly 90% of the run time. If anyone has any advice as how to speed this up, I'd be most appreciative!
I imagine there's a fairly straightforward way, but I'm not that experienced with Matlab (fairly, just not lots). I know how to use basic stuff, but not some of the more specialist functions.
Thanks!
To clarify following Sardar_Usama's comment, I want this to work for a matrix with any number of rows and a single vector. I'd forgotten to mention that the elements are all in the set {0,1,...,q-1}. I don't know whether that helps or not to make it faster!
You may want this:
ismember(v,S,'rows')
and replace arguments S and v to get indices of duplicates
ismember(S,v,'rows')
Or
for test if v is member of S:
any(all(bsxfun(#eq,S,v,2))
this returns logical indices of all duplicates
all(bsxfun(#eq,S,v),2)

MATLAB: How to apply a vectorized function using sparsity structure?

I need to (repeatedly) build a vector of length 200 from a vector of length 2500. I can describe this operation using multiplication by a matrix which is extremely sparse: it is 200x2500 and has only one entry in each row. But I have very little control over where this entry is. My actual problem is that I need to apply this matrix not to the vector that I currently have, but rather to some componentwise function of this vector. Since I have all this sparsity, it is wasteful to apply this componentwise function to all 2500 components of my vector. Instead I would rather apply it only to the 200 components that actually contribute.
A program (with randomly chosen numbers replacing of my actual numbers) which would have a similar problem would be something like this:
ind=randi(2500,200,1);
coefficients=randn(200,1);
A=sparse(1:200,ind,coefficients,200,2500);
x=randn(2500,1);
y=A*subplus(x);
What I don't like here is applying subplus to all of x; I would rather only have to apply it to x(ind), since only that contributes to the matrix product.
Right now the only way I can see to work around this is to replace my sparse matrix with a 200-component vector of coefficients and a 200-component vector of indices. Working this way, the code above would become:
ind=randi(2500,200,1);
coefficients=randn(200,1);
x=randn(2500,1);
y=coefficients.*subplus(x(ind))
Is there a better way to do this, preferably one that would work when A contains a few elements per row instead of just one?
The code in your question throws an exception, I think it should be:
n=2500;
m=200;
ind=randi(n,m,1);
coefficients=randn(m,1);
A=sparse(1:m,ind,coefficients,m,n);
x=randn(n,1);
Your idea using x(ind) was basically right, but ind would reorder x which is not intended. Instead you could use sort(unique(ind)). I opted to use the sparse logical index any(A~=0) because I expect it to be faster, but you could compare both versions.
%original code
y=A*subplus(x);
.
%multiplication using sparse logical indexing:
relevant=any(A~=0);
y=A(:,relevant)*subplus(x(relevant));
.
%fixed version of your code
relevant=sort(unique(ind));
y=A(:,relevant)*subplus(x(relevant));

How do I calculate result for every value in a matrix in MATLAB

Keeping simple, take a matrix of ones i.e.
U_iso = ones(72,37)
and some parameters
ThDeg = 0:5:180;
dtheta = 5*pi/180;
dphi = 5*pi/180;
Th = ThDeg*pi/180;
Now the code is
omega_iso = 0;
for i = 1:72
for j=1:37
omega_iso = omega_iso + U_iso(i,j)*sin(Th(j))*dphi*dtheta;
end
end
and
D_iso = (4 * pi)/omega_iso
This code is fine. It take a matrix with dimension 72*37. The loop is an approximation of the integral which is further divided by 4pi to get ONE value of directivity of antenna.
Now this code gives one value which will be around 1.002.
My problem is I dont need 1 value. I need a 72*37 matrix as my answer where the above integral approximation is implemented on each cell of the 72 * 37 matrix. and thus the Directviity 'D' also results in a matrix of same size with each cell giving the same value.
So all we have to do is instead of getting 1 value, we need value at each cell.
Can anyone please help.
You talk about creating a result that is a function essentially of the elements of U. However, in no place is that code dependent on the elements of U. Look carefully at what you have written. While you do use the variable U_iso, never is any element of U employed anywhere in that code as you have written it.
So while you talk about defining this for a matrix U, that definition is meaningless. So far, it appears that a call to repmat at the very end would create a matrix of the desired size, and clearly that is not what you are looking for.
Perhaps you tried to make the problem simple for ease of explanation. But what you did was to over-simplify, not leaving us with something that even made any sense. Please explain your problem more clearly and show code that is consistent with your explanation, for a better answer than I can provide so far.
(Note: One option MIGHT be to use arrayfun. Or the answer to this question might be more trivial, using simple vectorized operations. I cannot know at this point.)
EDIT:
Your question is still unanswerable. This loop creates a single scalar result, essentially summing over the entire array. You don't say what you mean for the integral to be computed for each element of U_iso, since you are already summing over the entire array. Please learn to be accurate in your questions, otherwise we are just guessing as to what you mean.
My best guess at the moment is that you might wish to compute a cumulative integral, in two dimensions. cumtrapz can help you there, IF that is your goal. But I'm not sure it is your goal, since your explanation is so incomplete.
You say that you wish to get the same value in each cell of the result. If that is what you wish, then a call to repmat at the end will do what you wish.