[apache-spark][GraphX]Is there a global vertex aggregation function in GraphX?

[apache-spark][GraphX]Is there a global vertex aggregation function in GraphX? - scala

Is there a global vertex aggregation function in GraphX? I hope this function can calculate the number of different values of vertex attributes, like the 'collections.Counter' function in Python.
For example, I have a graph like: (Each line represents a vertex, weight of each edge = 1)
[source vertex, source vertex some-attrbute-value, [dst vertex1, dst vertex2, ...]]
{
[1, 1, [2, 3]]
[2, 1, [1, 3, 4]]
[4, 1, [2, 3]]
[3, 1, [1, 2, 4]]
[5, 2, [4, 6, 7]]
[6, 2, [5, 7]]
[7, 2, [5, 6]]
}
and the output looks like following:
{1: 4, 2: 3}
or the vertexId_list could be given (better!) like followings:
{1: [1, 2, 3, 4], 2: [5, 6, 7]}
What's more, it's perfect if this function can work together with PregelAPI. For example, Pregel control the stop point by using this function: When the number of some vertex-attr-value reach the threshold(for example, the number of value 1 = 4(There are 4 vertex which attrbute = 1)), the superStep stops.
P.S. This function should seems like "AggregatorXXX(vertex) -> Message or Sth", not the RDD-relative method, like filter/map, etc.
Sorry for my poor English. :)..

Related

Why can SVD predict the score?

Why can SVD predict the score? I now have a matrix A, and then I know the specific values of the second row and the fourth column of matrix A
A = array([[5, 5, 3, 0, 5, 5],
[5, 0, 4, 0, 4, 4],
[0, 3, 0, 5, 4, 5],
[5, 4, 3, 3, 5, 5]]
)
The matrix decomposition is like this, but its second row and fourth column is -0.6417. Can this value also be used as the prediction result?
[[ 5.28849366 3.27680993 3.53241833 1.14752376 5.07268712 5.10856603]
[ 5.16272816 1.90208542 3.54790449 -0.64171367 3.6639954 3.40187912]
[ 0.21491233 3.74001967 -0.13316888 4.94723591 3.78868964 4.61660489]
[ 4.45908022 3.80580974 2.8984041 2.38455041 5.31300379 5.58222367]]

Seeking vectorized solution to sum up elements using accumarray in Matlab/Numpy

(To anyone who reads this, just to not waste your time, I wrote up this question and then came up with a solution to it right after I wrote it. I am posting this here just to help out anyone who happened to also be thinking about something like this.)
I have a vector with elements that I would like to sum up. The elements that I would like to add up are elements that share the same "triggerNumber". For example:
vector = [0, 1, 1, 1, 1]
triggerNumber = [1, 1, 1, 2, 2]
I will sum up the numbers that share a triggerNumber of 1 (so 0+1+1 =2) and share a triggerNumber of 2 (so 1+1+1 = 3). Therefore my desiredOutput is the array [2, 2].
accumarray accomplishes this task, and if I give it those two inputs:
output = accumarray(triggerNumber.',vector.').'
which returns [2, 2]. But, while my "triggerNumbers" are always increasing, they are not necessarily always increasing by one. So for example I might have the following situation:
vector = [0, 1, 1, 1, 1]
triggerNumber = [4, 4, 4, 6, 6]
output = accumarray(triggerNumber.',vector.').'
But now this returns the output:
output = [0, 0, 0, 2, 0, 2]
Which is not what I want. I want to just sum up elements with the same trigger number (in order), so the desired output is still [2, 2]. Naively I thought that just deleting the zeros would be sufficient, but then that messes up the situation with the inputs:
vector = [0, 0, 0, 1, 1]
triggerNumber = [4, 4, 4, 6, 6]
which if I deleted the zeroes would return just [2] instead of the desired [0, 2].
Any ideas for how I can accomplish this task (in a vectorized way of course)?

I just needed to turn things like [4, 4, 4, 6, 6] into [1, 1, 1, 2, 2], which can be done with a combination of cumsum and diff.
vector = [0, 0, 0, 1, 1];
triggerNumber = [4, 4, 4, 6, 6];
vec1 = cumsum(diff(triggerNumber)>0);
append1 = [0, vec1];
magic = append1+1;
output = accumarray(magic.',vector.').'
which returns [2, 2]....and hopefully my method works for all cases.

Prolog: dividing a number

I wanted to make a predicate that returns a list of a number dividers.
Example: 72 = 2*2*2*3*3.
prdel(A,[],_):-
A is 1.
prdel(P,[D|L],D):-
0 is mod(P,D),
P1 is P/D,
prdel(P1,L,D).
prdel(P,L,D):-
D1 is D+1,
prdel(P,L,D1).
This works and returns the right list. The problem is that it does not stop after that but returns the same list over and over again if I press space (I am sorry I don't know the term in English when you use the same predicate to get different answer). I want it to stop after the first time.
I tried to edit the last one like that,
prdel(P,L,D):-
D1 is D+1,
D1<P,
prdel(P,L,D1).
but now it returns only false and not the list.
EDIT:
I am looking for an answer without cut.

One problem in your code is that it keeps trying to divide the number P by D even when it is clear that the division is not going to succeed because D is too high. This lets D "run away" without a limit.
Adding a check for D1 to be below or equal to P fixes this problem:
prdel(1,[],_).
prdel(P,[D|L],D):-
0 is mod(P,D),
P1 is P/D,
prdel(P1,L,D).
prdel(P,L,D):-
D1 is D+1,
D1 =< P,
prdel(P,L,D1).
This produces all combinations of divisors, including non-prime ones (demo).
[[2, 2, 2, 3, 3], [2, 2, 2, 9], [2, 2, 3, 6],
[2, 2, 18], [2, 3, 3, 4], [2, 3, 12], [2, 4, 9],
[2, 6, 6], [2, 36], [3, 3, 8], [3, 4, 6], [3, 24],
[4, 18], [6, 12], [8, 9], [72]]
If you do not want that, add the condition that mod(P,D) > 0 in the last clause:
prdel(1,[],_).
prdel(P,[D|L],D):-
0 is mod(P,D),
P1 is P/D,
prdel(P1,L,D).
prdel(P,L,D):-
mod(P,D) > 0,
D1 is D+1,
D1 =< P,
prdel(P,L,D1).
This produces only [2, 2, 2, 3, 3] (demo).

Why does Qhull error when computing convex hull of a few points?

I'm trying to compute the convex hull of 9 points in 10 dimensional space. Through the scipy interface, I'm calling scipy.spatial.ConvexHull(points) and getting QH6214 qhull input error: not enough points(9) to construct initial simplex (need 12)
I think the definition of convex hull is well defined regardless of the dimension. What is going on here? Is there a different function I can call that might fix this?

Maybe projecting the points on a hyperplane before computing the hull will do the trick.
Use for example the Principal Component Analysis class sklearn.decomposition.PCA from the scikit-learn toolkit, to reduce dimension.
vertices = np.random.randn(9, 10)
from sklearn.decomposition import PCA
model = PCA(n_components=8).fit(vertices)
You can now transform back and forth from your vertices to the projected using model.transform and model.inverse_transform.
proj_vertices = model.transform(vertices)
hull_kinda = ConvexHull(proj_vertices)
hull_kinda.simplices
This outputs something like this
array([[6, 4, 3, 8, 0, 7, 5, 1],
[2, 4, 3, 8, 0, 7, 5, 1],
[2, 6, 3, 8, 0, 7, 5, 1],
[2, 6, 4, 8, 0, 7, 5, 1],
[2, 6, 4, 3, 0, 7, 5, 1],
[2, 6, 4, 3, 8, 7, 5, 1],
[2, 6, 4, 3, 8, 0, 5, 1],
[2, 6, 4, 3, 8, 0, 7, 1],
[2, 6, 4, 3, 8, 0, 7, 5]], dtype=int32)
Use now the model.inverse_transform to get the simplices back in your 10 dimensions.

Getting single dimensional array, not multi dimensional

When I run
[w*2 for w in [1, 2, 3]]
I get
[[2, 4, 6]]
but actually I want
[2, 4, 6]
Live example
Obviously following is in option, but I do not want to rely on that:
[w*2 for w in [1, 2, 3]][0]

I found a solution myself:
(w*2 for w in [1, 2, 3])
// -> [2, 4, 6]

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

[apache-spark][GraphX]Is there a global vertex aggregation function in GraphX? - scala

Related

Why can SVD predict the score?

Seeking vectorized solution to sum up elements using accumarray in Matlab/Numpy

Prolog: dividing a number

Why does Qhull error when computing convex hull of a few points?

Getting single dimensional array, not multi dimensional

Categories

Resources