In Scipy, How to check for differences between 2 sparse matrices?

In Scipy, How to check for differences between 2 sparse matrices? - scipy

This is related and there is another this dealing with equality, but if I have two sparse matrices (and they are coo format) how do I find which positions in the matrices are different?
If I were to subtract two matrices, I would still have to figure out which ones are nonzero.
I am seeing some non-deterministic values in my matrix formation, therefore I am trying to to find out which cells are changing (a smaller number) and which are consistent across runs (the overwhelming majority)

scipy.sparse has a built in function called find for determining which entries are non-zero.
Subtracting the two matricies from one another and feeding it into scipy.sparse.find will return all the entries which have changed (and thus were different in the original matricies).

Related

Is using matrices with many 0's and 1's considered vectorizing?

Suppose that for some needed transformation I have a mxn matrix A that consists of a few 1's and many 0's. If I were to transform a nx1 vector by A, is this considered a vectorized implementation?
While the matrix A mainly consists of 0's, does this still cause the same amount of FLOPs to occur?
Would it be wiser and more optimized to do the transformation needed another way? One that won't cause needless calculations such as 0*c?

The matrix is sparse. The number of operations for various things is lower when you store it properly and use the right routines.

What's the best way to obtain cosine similarity from two vectors in MATLAB?

I'll need to repeat this process multiple times, and the number of values will vary from ~10 to ~1000. I don't have access to all the vectors at once - they'll become accessible to me two vectors at a time.
In each instance there will always be the same number of values in each of the pair of vectors. However, from instance to instance the number of values will vary.

For column vectors a and b I might try,
a.'*b/(norm(a)*norm(b))
Ideally you would combine all or a subset of your vectors into arrays and do the operations at once, taking advantage of matlab multi threading. Different length vectors is a challenge though...
Do you have access to all the vectors at once?

Kullback Leibler Divergence of 2 Histograms in MatLab

I would like a function to calculate the KL distance between two histograms in MatLab. I tried this code:
http://www.mathworks.com/matlabcentral/fileexchange/13089-kldiv
However, it says that I should have two distributions P and Q of sizes n x nbins. However, I am having trouble understanding how the author of the package wants me to arrange the histograms. I thought that providing the discretized values of the random variable together with the number of bins would suffice (I would assume the algorithm would use an arbitrary support to evaluate the expectations).
Any help is appreciated.
Thanks.

The function you link to requires that the two histograms passed be aligned and thus have the same length NBIN x N (not N X NBIN), that is, if N>1 then the number of rows in the inputs should be equal to the number of bins in the histograms. If you are just going to compare two histograms (that is if N=1) it doesn't really matter, you can pass either row or column vector versions of these as long as you are consistent and the order of bins matches.
A generic call to the function looks like this:
dists = kldiv(bins,P,Q)
The implementation allows comparison of multiple histograms to each other (that is, N>1), in which case pairs of columns (with matching column index) in each array are compared and the result is a row vector with distances for each matching pair.
Array bins should be the same size as P and Q and is used to perform a very minimal check that the inputs are of the same size, but is not used in the computation. The routine expects bins to contain the numeric labels of your bins so that it can check for repeated bin labels and warn you if repeats occur, but otherwise doesn't use the information.
You could do away with bins and compute the distance with
KL = sum(P .* (log2(P)-log2(Q)));
without using the Matlab Central versions. However the version you link to performs the abovementioned minimal checks and in addition allows computation of two alternative distances (consult the documentation).
The version linked to by eigenchris checks that no histogram bins are empty (which would make the computation blow up numerically) and if there are, removes their contribution to the sum (not sure this is entirely appropriate - consult an expert on the subject). It should probably also be aware of the exact form of the formula, specifically note the use of log2 above versus natural logarithm in the version linked to by eigenchris.

Controlled random number/dataset generation in MATLAB

Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.

Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."

Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)

MATLAB SVD singular value ordering

MATLAB documentation of SVD states that the diagonal matrix returned has singular values in decreasing order. Is there a way to find out what the natural ordering of singular values would be?
The reason I ask is because the singular values correspond to dimensions associated with rows of the input matrix.

No, the very definition of SVD does not introduce an ordering. Restricting the discussion to square X matrices and adopting the same notation of the cited matlab documentation, if X = U*S*V' is a SVD of X, then for every permutation matrix P, we can form a valid SVD as X = (U*P)*(P'*S*P)*(V*P)'. Presenting matrix S with descending values is just a matter of convenience: every permutation P'*S*P would do the same job.
As a side note: P*X = P*U*S*V' showing that a row permutation of matrix X does not change the singular values S, which can be considered independent from any row (or column) permutation of X.

I was hoping to get some idea of what is being asked here before responding. For example, the eigenshuffle tool I've posted on the file exchange allows you to reorder the eigenvalues and eigenvectors of a sequence of eigen-problems, so they are maximally consistent with each other in sequence. Perhaps your problem is similar, thus you might think of the singular values as functions that vary along with some parameter that drives a system.
But really, there is no natural ordering of the singular values that comes from the method used to compute the SVD. In fact, the only ordering that makes sense is the one that comes out - decreasing order. The order of the singular values is not dependent on the sequence of the rows of your matrix, as the question seems to vaguely imply, so I'm not sure what is meant there.
Feel free to modify the question in case you can make your needs clearer.