How to calculate indegree and outdegree for a graph in directed graph SML code? - smlnj

I have a tuple like (1,2),(3,4),(4,5). Edges: 1->2, 3->4 and so on.
How calculate in degree and out degree for each vertex?

you can write a function take in a list of tuples which are the edges.
accumulate another list of tuples or (records), with format of [(node, inDgree, OutDgree),...]
records [{node=int, inDgree=int, outDgree = int},...]
fun degrees ((a,b)::(as,bs)) = ....

The out-degree of vertex v is the number of pairs (x, y) where x == v, since each such pair corresponds to an edge starting at v. Likewise, the in-degree of v is the number of pairs (x, y) where y == v.
Does that give you enough of the basic idea?

Related

Cosine similarity of two sparse vectors in Scala Spark

I have a dataframe with two columns where each row has a Sparse Vector. I try to find a proper way to calculate the cosine similarity (or just the dot product) of the two vectors in each row.
However, I haven't been able to find any library or tutorial to do it for Sparse vectors.
The only way I found is the following:
Create a k X n matrix, where n items are described as k-dimensioned vectors. For representing each item as a k dimension vector, you can use ALS which represents each entity in a latent factor space. The dimension of this space (k) can be chosen by you. This k X n matrix can be represented as RDD[Vector].
Convert this k X n matrix to RowMatrix.
Use columnSimilarities() function to get a n X n matrix of similarities between n items.
I feel it is an overkill to calculate all the cosine similarities for each pair while I need it only for the specific pairs in my (quite big) dataframe.
In Spark 3 there is now method dot for a SparseVector object, which takes another vector as its argument.
If you want to do this in earlier versions, you could create a user defined function that follows this algorithm:
Take intersection of your vectors' indices.
Get two subarrays of your vectors' values based on the indices from the intersection.
Do pairwise multiplication of the elements of those two subarrays.
Sum the values resulting values from such pairwise multiplication.
Here's my realization of it:
import org.apache.spark.ml.linalg.SparseVector
def dotProduct(vec: SparseVector, vecOther: SparseVector) = {
val commonIndices = vec.indices intersect vecOther.indices
commonIndices.map(x => vec(x) * vecOther(x)).reduce(_+_)
}
I guess you know how to turn it into a Spark UDF from here and apply it to your dataframe's columns.
And if you normalize your sparse vectors with org.apache.spark.ml.feature.Normalizer before computing your dot product, you'll get cosine similarity in the end (by definition).
Great answer above #Sergey-Zakharov +1.
A few adds-on:
The reduce doesn't work on empty sequences.
Make sure computing L2 normalization.
val normalizer = new Normalizer()
.setInputCol("features")
.setOutputCol("normFeatures")
.setP(2.0)
val l2NormData = normalizer.transform(df_features)
and
val dotProduct = udf {(v1: SparseVector, v2: SparseVector) =>
v1.indices.intersect(v2.indices).map(x => v1(x) * v2(x)).reduceOption(_ + _).getOrElse(0.0)
}
and then
val df = dfA.crossJoin(broadcast(dfB))
.withColumn("dot", dotProduct(col("featuresA"), col("featuresB")))
If the number of vectors you want to calculate the dot product with is small, cache the RDD[Vector] table. Create a new table [cosine_vectors] that is a filter on the original table to only select the vectors you want the cosine similarities for. Broadcast join those two together and calculate.

calculate conservative interpolation of two vectors in matlab

G'day
Firstly, apologies for poor wording - I'm at a bit of a loss of how to describe this problem. I'm trying to calculate the conservative interpolation between two different vertical coordinate systems.
I have a vector of ocean transport values Ts, that describe the amount of transport at different depth values S. These depths are unevenly spaced (and size(S) is equal to size(Ts)+1 as the values in S are the depths at the top and bottom over which the transport value applies). I want to interpolate(/project?) this onto a vector of regularly spaced depths Z, where each new transport value Tz is formed from the values of Ts but weighted by the amount of overlap.
I've drawn a picture of what I mean (sorry for the bad quality webcam picture) I want to go from Ts1,Ts2.Ts3...TsN (bottom lines) to Tz1,Tz2,...TzN (top lines). The locations in the x direction for these are s0,s1,s2,...sN and z0,z1,z2,...zN. An example of the 'weighted overlap' would be:
Tz1 = a/(s1-s0) Ts1 + b/(s2-s1) Ts2 + c/(s3-s2) Ts3
where a, b and c are shown in the image as the length of overlap.
Some more details:
Example of z and s follow:
z = 0:5:720;
s = [222.69;...
223.74
225.67
228.53
232.39
237.35
243.56
251.17
260.41
271.5
284.73
300.42
318.9
340.54
365.69
394.69
427.78
465.11
506.62
551.98
600.54
651.2];
Note that I'm free to define z, but not s. Typically, z will be bigger than s (i.e. the smallest value in z will be smaller than in s, while the largest value in z will be larger than in s).
Help or tips greatly appreciated. Cheers,
Dave
I don't think there is an easy solution, as stated in the comments. I'll give it a go though :
One hypothesis first : We assume z0>s0 in order for your problem to be defined.
The idea (for your example) would be to get to the array below :
1 (s1-z0) s1-s0 Ts1
1 (s2-s1) s2-s1 Ts2
1 (z1-s2) s3-s2 Ts3
2 (s3-z1) s3-s2 Ts3
2 (z2-s3) s4-s3 Ts4
3 (z3-z2) s4-s3 Ts4
......
Then we would be able to compute, for each row : column1*column3/column2 and then use accumarray to sum the results with respect to the indexes in the first column.
Now the hardest part is to get this array :
Suppose you have :
A Nx1 vectors Ts
2 (N+1)x1 vectors s and z, with z(1)>s(1).
Vectsz=sort([s(2:end);z]); % Sorted vector of s and z values
In your case this vector should look like :
z0
s1
s2
z1
s3
z2
z3
...
The first column will serve as a subscript to apply accumarray, so we'll want it to increase each time there is a z value in our vector Vectsz
First=interp1(z,1:length(z),Vectsz,'previous');
Second=[diff(Vectsz);0]; % Padded with a 0 to keep the right size
Temp=diff(s);
Third=interp1(s(1:end-1),Temp,Vectsz,'previous');
This will just repeat the diff value everytime you have a z value in your vector Vectsz.
The last column is built exactly like the third one
Fourth=interp1(s(1:end-1),Ts,Vectsz,'previous');
Now that the array is built, a call to accumarray is enough to get the final result :
Res=accumarray(First,Second.*Fourth./Third);
EDIT : There is actually no need for the use of interp1 with the previous option :
Vectsz=sort([s(2:end);z]);
First=cumsum(ismember(Vectsz,z));
Second=[diff(Vectsz);0];
idx=cumsum(ismember(Vectsz,s(2:end)))+1;
Diffs=[diff(s);0];
Third=Diffs(idx);
Fourth=Ts(idx);
Res=accumarray(First,Second.*Fourth./Third);

Function plot from points result of other function combinations

I have 2 functions declared in wxmaxima: f1(x, y) and f2(x, y). Both contain if-then-else statements and basic arithmetic operations: addition, subtraction, multiplication and division.
For example (just an example, real functions look much more complicated):
f1(x, y) := block([],
if x * y < 123 then x + y
else if x / y > 7 then x - y
);
In both functions x and y change from 0.1 to 500000.
I need a 3D plot (graph) of the following points:
(x, y, z), where f1(x, y) == f2(z, x)
Note that it's impossible to extract z out from the equation above (and get a new shiny function f3(x, y)), since f1 and f2 are too complex.
Is this something possible to achieve using any computational software?
Thanks in advance!
EDIT:
What I need is the plot for
F(x, y, z) = 0
where
F(x, y, z) = f1(x, y) - f2(z, x)
For Maxima, try implicit_plot(f1(x, y) = f2(x, y), [x, <x0>, <x1>], [y, <y0>, <y1>]) where <x0>, <x1>, <y0>, <y1> are some floating point numbers which are the range of the plot. Note that load(implicit_plot) is needed since implicit_plot is not loaded by default.
As an aside, I see that your function f1 has the form if <condition1> then ... else if <condition2> then ... and that's all. That means if both <condition1> and <condition2> are false, then the function will return false, not a number. Either you must ensure that the conditions are exhaustive, or put else ... at the end of the if so that it will return a number no matter what the input.
set = Table[{i,j,0},{i,1,10},{j,0,10}];
Gives a list of desired x and y values and use those with a replace all /.
set = set /.{a_ ,b_ ,c_} -> {a,b, f1[a,b] - f2[a,b]} (*Simplified of course*)
Set is a 2d list of lists so it needs to be flattened by 1 dimension.
set = Flatten[set,1];
ListPlot3D[set (*add plot options*)]

What is the Haskell / hmatrix equivalent of the MATLAB pos function?

I'm translating some MATLAB code to Haskell using the hmatrix library. It's going well, but
I'm stumbling on the pos function, because I don't know what it does or what it's Haskell equivalent will be.
The MATLAB code looks like this:
[U,S,V] = svd(Y,0);
diagS = diag(S);
...
A = U * diag(pos(diagS-tau)) * V';
E = sign(Y) .* pos( abs(Y) - lambda*tau );
M = D - A - E;
My Haskell translation so far:
(u,s,v) = svd y
diagS = diag s
a = u `multiply` (diagS - tau) `multiply` v
This actually type checks ok, but of course, I'm missing the "pos" call, and it throws the error:
inconsistent dimensions in matrix product (3,3) x (4,4)
So I'm guessing pos does something with matrix size? Googling "matlab pos function" didn't turn up anything useful, so any pointers are very much appreciated! (Obviously I don't know much MATLAB)
Incidentally this is for the TILT algorithm to recover low rank textures from a noisy, warped image. I'm very excited about it, even if the math is way beyond me!
Looks like the pos function is defined in a different MATLAB file:
function P = pos(A)
P = A .* double( A > 0 );
I can't quite decipher what this is doing. Assuming that boolean values cast to doubles where "True" == 1.0 and "False" == 0.0
In that case it turns negative values to zero and leaves positive numbers unchanged?
It looks as though pos finds the positive part of a matrix. You could implement this directly with mapMatrix
pos :: (Storable a, Num a) => Matrix a -> Matrix a
pos = mapMatrix go where
go x | x > 0 = x
| otherwise = 0
Though Matlab makes no distinction between Matrix and Vector unlike Haskell.
But it's worth analyzing that Matlab fragment more. Per http://www.mathworks.com/help/matlab/ref/svd.html the first line computes the "economy-sized" Singular Value Decomposition of Y, i.e. three matrices such that
U * S * V = Y
where, assuming Y is m x n then U is m x n, S is n x n and diagonal, and V is n x n. Further, both U and V should be orthonormal. In linear algebraic terms this separates the linear transformation Y into two "rotation" components and the central eigenvalue scaling component.
Since S is diagonal, we extract that diagonal as a vector using diag(S) and then subtract a term tau which must also be a vector. This might produce a diagonal containing negative values which cannot be properly interpreted as eigenvalues, so pos is there to trim out the negative eigenvalues, setting them to 0. We then use diag to convert the resulting vector back into a diagonal matrix and multiply the pieces back together to get A, a modified form of Y.
Note that we can skip some steps in Haskell as svd (and its "economy-sized" partner thinSVD) return vectors of eigenvalues instead of mostly 0'd diagonal matrices.
(u, s, v) = thinSVD y
-- note the trans here, that was the ' in Matlab
a = u `multiply` diag (fmap (max 0) s) `multiply` trans v
Above fmap maps max 0 over the Vector of eigenvalues s and then diag (from Numeric.Container) reinflates the Vector into a Matrix prior to the multiplys. With a little thought it's easy to see that max 0 is just pos applied to a single element.
(A>0) returns the positions of elements of A which are larger than zero,
so forexample, if you have
A = [ -1 2 -3 4
5 6 -7 -8 ]
then B = (A > 0) returns
B = [ 0 1 0 1
1 1 0 0]
Note that we have ones corresponding to an elemnt of A which is larger than zero, and 0 otherwise.
Now if you multiply this elementwise with A using the .* notation, then you are multipling each element of A that is larger than zero with 1, and with zero otherwise. That is, A .* B means
[ -1*0 2*1 -3*0 4*1
5*1 6*1 -7*0 -8*0 ]
giving finally,
[ 0 2 0 4
5 6 0 0 ]
So you need to write your own function that will return positive values intact, and negative values set to zero.
And also, u and v does not match in dimension, for a generall SVD decomposition, so you actually would need to REDIAGONALIZE pos(diagS - Tau), so that u* diagnonalized_(diagS -tau) agrres to v

How to access multidimensional arrays in MATLAB with mixed index format

Suppose I have two arrays, M1 and M2. Both have dimensions m x n x p. I'm interested in the mxn array of M1 corresponding to the maximum element along the third dimension, so I do:
[M1max, indices]=max(M1,[],3);
Both M1max and indices are m x n arrays. But now suppose I want to access the elements of M2 that correspond to those maximum elements in M1 (that is, I want the all the elements of M2 with the same index as an element of M1 that ended up in M1max). How do I do this?
I think this should make it:
[y x]=ndgrid(1:size(M1,1),1:size(M1,2));
reshape(M2(sub2ind(size(M1),y(:),x(:),indices(:))),[size(M1,1),size(M1,2)]);
you want all the index with idx <-> (y,x,indices(y,x)), this will compute it. And then compute M2(idx) and reshape it well.
Another way is ignoring indices from max:
indices2 = M1 == repmat(M1max,[1,1,size(M1,3)]);
result = reshape(M2(indices2),size(M1max));
There might be a precision issue with comparing doubles. In this case you can do
indices2 = repmat(M1max,[1,1,size(M1,3)]) - M1 < eps;
In addition, there will be a problem with this code if multiple identical max values exist in M1 in the 3rd dimension. We can catch this case with
assert(sum(indices2(:))==numel(M1max),'Multiple maximum values found')
This might be slightly faster than #Oli's suggestion, but they're basically equivalent:
[M1max, indices] = max(M1,[],3);
[m n p] = size(M1);
idx = (1:m*n).' + (indices(:)-1)*m*n;
M2max = reshape(M2(idx), m, n);