How to get all vertices of all outgoing edges from a vertex scala gremlin - scala

I need to get all list of vertices label of all outgoing egdes from a vertex using scala gremlin.
My code looks like below,
val names :ListBuffer[String] = ListBuffer()
val toList: List[Vertex] = graph.V().hasLabel(100).outE().outV().toList()
for(vertex <- toList){
names += vertex.label()
}
Its returning the same label name for all vertex
Eg :
Vertex A is having outE to B,C,D . It returns the label of A.
Output:
ListBuffer(100, 100, 100)
Anything am i missing?

I believe you asking for the wrong vertex in the end. Honestly, I often make the same mistake. Maybe this is the traversal you looking for:
graph.V().hasLabel(100).outE().inV().label().toList()
If you like me and often get confused by inV() and outV() you can use otherV which gets the opposite vertex. Like so:
graph.V().hasLabel(100).outE().otherV().label().toList()
Finally you can even shorten your traversal by not explicitly stating the edge part:
graph.V().hasLabel(100).out().label().toList()
By using out() instead of outE() you don't need to specify you want the vertex, out() gets you the vertex directly.

Related

How to convert a type Any List to a type Double (Scala)

I am new to Scala and I would like to understand some basic stuff.
First of all, I need to calculate the average of a certain column of a DataFrame and use the result as a double type variable.
After some Internet research I was able to calculate the average and at the same time pass it into a List type Any by using the following command:
val avgX_List = mainDataFrame.groupBy().agg(mean("_c1")).collect().map(_(0)).toList
where "_c1" is the second column of my dataframe. This line of code returns a List with type List[Any].
To pass the result into a variable I used the following command:
var avgX = avgX_List(0)
hoping that the var avgX would be type double automatically but that didn't happen obviously.
So now let the questions begin:
What does map(_(0)) do? I know the basic definition of the map() transformation but I can't find an explanation with this exact argument
I know that by using .toList method in the end of the command my result will be a List with type Any. Is there a way that I could change this into List which contains type Double elements? Or even convert this one
Do you think that it would be much more appropriate to pass the column of my Dataframe into a List[Double] and then calculate the average of its elements?
Is the solution I showed above at any point of view correct based on my problem? I know that "it is working" is different from "correct solution"?
Summing up, I need to calculate the average of a certain column of a Dataframe and have the result as a double type variable.
Note that: I am Greek and I find it hard sometimes to understand some English coding "slang".
map(_(0)) is a shortcut for map( (r: Row) => r(0) ), which is in turn a shortcut for map( (r: Row) => r.apply(0) ). The apply method returns Any, and so you are losing the right type. Try using map(_.getAs[Double](0)) or map(_.getDouble(0)) instead.
Collecting all entries of the column and then computing the average would be highly counterproductive, because you'd have to send huge amounts of data to the master node, and then do all the calculations on this single central node. That would be the exact opposite of what Spark is good for.
You also don't need collect(...).toList, because you can access the 0-th entry directly (it doesn't matter whether you get it from an Array or from a List). Since you are collapsing everything into a single Row anyway, you could get rid of the map step entirely by reordering the methods a little bit:
val avgX = mainDataFrame.groupBy().agg(mean("_c1")).collect()(0).getDouble(0)
It can be written even shorter using the first method:
val avgX = mainDataFrame.groupBy().agg(mean("_c1")).first().getDouble(0)
#Any dataType in Scala can't be directly converted to Double.
#Use toString & then toDouble on final captured result.
#Eg-
#scala> x
#res22: Any = 1.0
#scala> x.toString.toDouble
#res23: Double = 1.0
#Note- Instead of using map().toList() directly use (0)(0) to get the final value from your resultset.
#TestSample(Scala)-
val wa = Array("one","two","two")
val wrdd = sc.parallelize(wa,3).map(x=>(x,1))
val wdf = wrdd.toDF("col1","col2")
val x = wdf.groupBy().agg(mean("col2")).collect()(0)(0).toString.toDouble
#O/p-
#scala> val x = wdf.groupBy().agg(mean("col2")).collect()(0)(0).toString.toDouble
#x: Double = 1.0

Tinkerpop3 - degree centrality

I'm looking to find the most liked nodes so basically the degree centrality recipe. This query kind of works but I'd like to return the full vertex (including properties) rather than just the id's.
( I am using Tinkerpop 3.0.1-incubating )
g.V()
.where( inE("likes") )
.group()
.by()
.by( inE("likes").count() )
Result
{
"8240": [
2
],
"8280": [
1
],
"12376": [
1
],
"24704": [
1
],
"40976": [
1
]
}
You're probably looking for the order step, using an anonymous traversal passed to the by() modulator:
g.V().order().by(inE('likes').count(), decr)
Note: this will require iterating over all vertices in Titan v1.0.0 and this query cannot be optimized, it will only work over smaller graphs in OLTP.
To get the 10 most liked:
g.V().order().by(inE('likes').count(), decr).limit(10)
If you want to get the full properties, simply chain .valueMap() or .valueMap(true) (for id and label) on the query.
See also:
http://tinkerpop.apache.org/docs/3.0.1-incubating/#order-step
https://groups.google.com/d/topic/gremlin-users/rt3qRKyAqts/discussion
GraphSON, as it is JSON based, does not support the conversion of complex objects to keys. A key in JSON must be string based and, as in this case, cannot be a map. To deal with this JSON limitation, GraphSON converts complex objects that are to be keys via the Java's toString() or by other methods for certain objects like graph elements (which returns a string representation of the element identifier, explaining why you received the output that you did).
If you want to return properties of elements while using GraphSON, you will have to figure out a workaround. In this specific case, you might do:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().group().
......1> by(id).
......2> by(union(__(), outE('knows').count()).fold())
==>[1:[v[1],2],2:[0,v[2]],3:[v[3],0],4:[0,v[4]],5:[v[5],0],6:[0,v[6]]]
In this way you get the vertex identifier as the map id and in the value you get the full vertex plus the count. TinkerPop is working on improving this issue, but I don't expect a fast fix.

Is it possible to return a map of key values using gremlin scala

Currently i have two gremlin queries which will fetch two different values and i am populating in a map.
Scenario : A->B , A->C , A->D
My queries below,
graph.V().has(ID,A).out().label().toList()
Fetch the list of outE labels of A .
Result : List(B,C,D)
graph.traversal().V().has("ID",A).outE("interference").as("x").otherV().has("ID",B).select("x").values("value").headOption()
Given A and B , get the egde property value (A->B)
Return : 10
Is it possible that i can combine both there queries to get a return as Map[(B,10)(C,11)(D,12)]
I am facing some performance issue when i have two queries. Its taking more time
There is probably a better way to do this but I managed to get something with the following traversal:
gremlin> graph.traversal().V().has("ID","A").outE("interference").as("x").otherV().has("ID").label().as("y").select("x").by("value").as("z").select("y", "z").select(values);
==>[B,1]
==>[C,2]
I would wait for more answers though as I suspect there is a better traversal out there.
Below is working in scala
val b = StepLabel[Edge]()
val y = StepLabel[Label]()
val z = StepLabel[Integer]()
graph.traversal().V().has("ID",A).outE("interference").as(b)
.otherV().label().as(y)
.select(b).values("name").as(z)
.select((y,z)).toMap[String,Integer]
This will return Map[String,Int]

Ullman’s Subgraph Isomorphism Algorithm

Could somebody give me a working Ullman's graph isomorphism problem implementation in MATLAB, or link to it. Or if you have at least in C so I would try to implement it in MATLAB.
Thanks
i'm lookign for it too. I've been loking in the web but with no luck so far, but i've found this:
Algorithm, where the algorithm is explained.
On another hand, i found this:
def search(graph,subgraph,assignments,possible_assignments):
update_possible_assignments(graph,subgraph,possible_assignments)
i=len(assignments)
# Make sure that every edge between assigned vertices in the subgraph is also an
# edge in the graph.
for edge in subgraph.edges:
if edge.first<i and edge.second<i:
if not graph.has_edge(assignments[edge.first],assignments[edge.second]):
return False
# If all the vertices in the subgraph are assigned, then we are done.
if i==subgraph.n_vertices:
return True
for j in possible_assignments[i]:
if j not in assignments:
assignments.append(j)
# Create a new set of possible assignments, where graph node j is the only
# possibility for the assignment of subgraph node i.
new_possible_assignments = deep_copy(possible_assignments)
new_possible_assignments[i] = [j]
if search(graph,subgraph,assignments,new_possible_assignments):
return True
assignments.pop()
possible_assignments[i].remove(j)
update_possible_assignments(graph,subgraph,possible_assignments)
def find_isomporhism(graph,subgraph):
assignments=[]
possible_assignments = [[True]*graph.n_vertices for i in range(subgraph.n_vertices)]
if search(graph,subgraph,asignments,possible_assignments):
return assignments
return None
here: implementation. I do not have the skills to transform this into Matlab, if you have them , i would really appreciate if you could share your code when you're done.

How to groupBy groupBy?

I need to map through a List[(A,B,C)] to produce an html report. Specifically, a
List[(Schedule,GameResult,Team)]
Schedule contains a gameDate property that I need to group by on to get a
Map[JodaTime, List(Schedule,GameResult,Team)]
which I use to display gameDate table row headers. Easy enough:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate)
Now the tricky bit (for me) is, how to further refine the grouping in order to enable mapping through the game results as pairs? To clarify, each GameResult consists of a team's "version" of the game (i.e. score, location, etc.), sharing a common Schedule gameID with the opponent team.
Basically, I need to display a game result outcome on one row as:
3 London Dragons vs. Paris Frogs 2
Grouping on gameDate let's me do something like:
data.map{case(date,games) =>
// game date row headers
<tr><td>{date.toString("MMMM dd, yyyy")}</td></tr>
// print out game result data rows
games.map{case(schedule,result, team)=>
...
// BUT (result,team) slice is ungrouped, need grouped by Schedule gameID
}
}
In the old version of the existing application (PHP) I used to
for($x = 0; $x < $this->gameCnt; $x = $x + 2) {...}
but I'd prefer to refer to variable names and not the come-back-later-wtf-is-that-inducing:
games._._2(rowCnt).total games._._3(rowCnt).name games._._1(rowCnt).location games._._2(rowCnt+1).total games._._3(rowCnt+1).name
maybe zip or double up for(t1 <- data; t2 <- data) yield(?) or something else entirely will do the trick. Regardless, there's a concise solution, just not coming to me right now...
Maybe I'm misunderstanding your requirements, but it seems to me that all you need is an additional groupBy:
repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID))
The result will be of type:
Map[JodaTime, Map[GameId, List[(Schedule,GameResult,Team)]]]
(where GameId is the type of the return type of Schedule.gameId)
Update: if you want the results as pairs, then pattern matching is your friend, as shown by Arjan. This would give us:
val byDate = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate)
val data = byDate.mapValues(_.groupBy(_._1.gameID).mapValues{ case List((sa, ra, ta), (sb, rb, tb)) => (sa, (ta, ra), (tb, rb)))
This time the result is of type:
Map[JodaTime, Iterable[ (Schedule,(Team,GameResult),(Team,GameResult))]]
Note that this will throw a MatchError if there are not exactly 2 entries with the same gameId. In real code you will definitely want to check for this case.
Ok a soultion from Régis Jean-Gilles:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID))
You said it was not correct, maybe you just didnt use it the right way?
Every List in the result is a pair of games with the same GameId.
You could pruduce html like that:
data.map{case(date,games) =>
// game date row headers
<tr><td>{date.toString("MMMM dd, yyyy")}</td></tr>
// print out game result data rows
games.map{case (gameId, List((schedule, result, team), (schedule, result, team))) =>
...
}
}
And since you dont need a gameId, you can return just the paired games:
val data = repo.games.findAllByDate(fooDate).groupBy(_._1.gameDate).mapValues(_.groupBy(_._1.gameID).values)
Tipe of result is now:
Map[JodaTime, Iterable[List[(Schedule,GameResult,Team)]]]
Every list again a pair of two games with the same GameId