How to get node type in graph scala - scala

I am generating a graph using graph-scala library and I need to set the coordinates after building the graph.
My object are Ball and Figure extending from GraphNode. And I generate my Graph using the GraphNode object:
val ball1=new Ball(1,"BALL-A")
val figure1=new Figure(1)
val figure2=new Figure(2)
val figure3=new Figure(3)
val edges = Seq(
(ball1, figure1),
(figure2, ball1),
(ball1, figure3)
)
val graph1: Graph[GraphNode, HyperEdge] = edges
.map({ case (node1, node2) =>
Graph[GraphNode, HyperEdge](node1 ~> node2)
})
.reduce(_ ++ _)
And now I want to set X, Y and Width properties for each node:
graph1.nodes
.map(node => {
node match {
case b: Ball =>
println("is a ball!")
if (b.nodeType.equals("BALL-A"))
b.copy(x = 0, y = 0, width = 100)
else
b.copy(x = 30, y = 30, width = 200)
case otherType =>
val name = otherType.getClass.getSimpleName
println(name)
}
node.toJson
})
.foreach(println)
But I get the type "NodeBase" instead of setting the node. Any suggestions to set properties once I built the graph? My base issue is to get the type for each node to set the property but I am not able to.

Related

What is the proper way to compute graph diameter in GraphX

I'm implementing an algorithm on GraphX for which I need to also compute the diameter of some relatively small graphs.
The problem is that GraphX doesn't have any notion of undirected graphs, so when using the built-in method from ShortestPaths, it obsviously gets the shortets directed paths. This doesn't help much in computing a graph diameter (Longest Shorted undirected path between any pairs of nodes).
I thought of duplicating the the edges of my graph (instead of |E| I would have 2|E| edges) but I didn't feel it's the right way to do it. So, are there a better way to do it on GraphX notably?
Here is my code for a directed graph:
// computing the query diameter
def getDiameter(graph: Graph[String, Int]):Long = {
// Get ids of vertices of the graph
val vIds = graph.vertices.collect.toList.map(_._1)
// Compute list of shortest paths for every vertex in the graph
val shortestPaths = lib.ShortestPaths.run(graph, vIds).vertices.collect
// extract only the distance values from a list of tuples <VertexId, Map> where map contains <key, value>: <dst vertex, shortest directed distance>
val values = shortestPaths.map(element => element._2).map(element => element.values)
// diamter is the longest shortest undirected distance between any pair of nodes in te graph
val diameter = values.map(m => m.max).max
diameter
}
GraphX actually has no notion of direction it you don't tell it to use it.
If you look at the inner workings of the ShortestPaths library you'll see that it uses Pregel and the direction is default (EdgeDirection.Either). This means that for all triplets it will add both source & dest to the activeset.
However if you specify in the sendMsg function of Pregel to only keep the srcId in the active set (as is happening in the ShortestPaths lib) certain vertices (with only outgoing edges) will not be reevaluated.
Anyway a solution is to write your own Diameter object/library, maybe looking like this (heavily based on ShortestPath, so maybe there are even better solutions?)
object Diameter extends Serializable {
type SPMap = Map[VertexId, Int]
def makeMap(x: (VertexId, Int)*) = Map(x: _*)
def incrementMap(spmap: SPMap): SPMap = spmap.map { case (v, d) => v -> (d + 1) }
def addMaps(spmap1: SPMap, spmap2: SPMap): SPMap = {
(spmap1.keySet ++ spmap2.keySet).map {
k => k -> math.min(spmap1.getOrElse(k, Int.MaxValue), spmap2.getOrElse(k, Int.MaxValue))
}(collection.breakOut) // more efficient alternative to [[collection.Traversable.toMap]]
}
// Removed landmarks, since all paths have to be taken in consideration
def run[VD, ED: ClassTag](graph: Graph[VD, ED]): Int = {
val spGraph = graph.mapVertices { (vid, _) => makeMap(vid -> 0) }
val initialMessage:SPMap = makeMap()
def vertexProgram(id: VertexId, attr: SPMap, msg: SPMap): SPMap = {
addMaps(attr, msg)
}
def sendMessage(edge: EdgeTriplet[SPMap, _]): Iterator[(VertexId, SPMap)] = {
// added the concept of updating the dstMap based on the srcMap + 1
val newSrcAttr = incrementMap(edge.dstAttr)
val newDstAttr = incrementMap(edge.srcAttr)
List(
if (edge.srcAttr != addMaps(newSrcAttr, edge.srcAttr)) Some((edge.srcId, newSrcAttr)) else None,
if (edge.dstAttr != addMaps(newDstAttr, edge.dstAttr)) Some((edge.dstId, newDstAttr)) else None
).flatten.toIterator
}
val pregel = Pregel(spGraph, initialMessage)(vertexProgram, sendMessage, addMaps)
// each vertex will contain map with all shortest paths, so just get first
pregel.vertices.first()._2.values.max
}
}
val diameter = Diameter.run(graph)

GraphX - How to get all connected vertices from vertexId (not just the firsts adjacents)?

Considering this graph:
Exemple graph
How can I get all connected vertices from a vertexID?
For example, from VertexId 5, it should return 5-3-7-8-10
CollectNeighbors only returns the first adjacent ones.
I'm trying to use pregel, but I don't know how to start from a specific vertex. I don't want to calculate all the nodes.
Thanks!
I just noticed that the graph is directed. then you can use the code of the shortest path example here. if the distance of a specific node is not infinity then you can reach this node.
or there is a better idea you can modify the shortest path algorithm to satisfy your needs.
import org.apache.spark.graphx.{Graph, VertexId}
import org.apache.spark.graphx.util.GraphGenerators
// A graph with edge attributes containing distances
val graph: Graph[Long, Double] =
GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => e.attr.toDouble)
val sourceId: VertexId = 42 // The ultimate source
// Initialize the graph such that all vertices except the root have canReach = false.
val initialGraph: Graph[Boolean, Double] = graph.mapVertices((id, _) => id == sourceId)
val sssp = initialGraph.pregel(false)(
(id, canReach, newCanReach) => canReach || newCanReach, // Vertex Program
triplet => { // Send Message
if (triplet.srcAttr && !triplet.dstAttr) {
Iterator((triplet.dstId, true))
} else {
Iterator.empty
}
},
(a, b) => a || b // Merge Message
)
println(sssp.vertices.collect.mkString("\n"))

Scala - how to make the SortedSet with custom ordering hold multiple different objects that have the same value by which we sort?

as mentioned in the title I have a SortedSet with custom ordering. The set holds objects of class Edge (representing an edge in a graph). Each Edge has a cost associated with it as well as it's start and end point.
case class Edge(firstId : Int, secondId : Int, cost : Int) {}
My ordering for SortedSet of edges looks like this (it's for the A* algorithm) :
object Ord {
val edgeCostOrdering: Ordering[Edge] = Ordering.by { edge : Edge =>
if (edge.secondId == goalId) graphRepresentation.calculateStraightLineCost(edge.firstId, goalId) else edge.cost + graphRepresentation.calculateStraightLineCost(edge.secondId, goalId)
}
}
However after I apply said ordering to the set and I try to sort edges that have different start/end points but the same cost - only the last encountered edge retains in the set.
For example :
val testSet : SortedSet[Edge] = SortedSet[Edge]()(edgeOrder)
val testSet2 = testSet + Edge(1,4,2)
val testSet3 = testSet2 + Edge(3,2,2)
println(testSet3)
Only prints (3,2,2)
Aren't these distinct objects? They only share the same value for one field so shouldn't the Set be able to handle this?
Consider using a mutable.PriorityQueue instead, it can keep multiple elements that have the same order. Here is a simpler example where we order pairs by the second component:
import collection.mutable.PriorityQueue
implicit val twoOrd = math.Ordering.by{ (t: (Int, Int)) => t._2 }
val p = new PriorityQueue[(Int, Int)]()(twoOrd)
p += ((1, 2))
p += ((42, 2))
Even though both pairs are mapped to 2, and therefore have the same priority, the queue does not lose any elements:
p foreach println
(1,2)
(42,2)
To retain all the distinct Edges with the same ordering cost value in the SortedSet, you can modify your Ordering.by's function to return a Tuple that includes the edge Ids as well:
val edgeCostOrdering: Ordering[Edge] = Ordering.by { edge: Edge =>
val cost = if (edge.secondId == goalId) ... else ...
(cost, edge.firstId, edge.secondId)
}
A quick proof of concept below:
import scala.collection.immutable.SortedSet
case class Foo(a: Int, b: Int)
val fooOrdering: Ordering[Foo] = Ordering.by(_.b)
val ss = SortedSet(Foo(2, 2), Foo(2, 1), Foo(1, 2))(fooOrdering)
// ss: scala.collection.immutable.SortedSet[Foo] = TreeSet(Foo(2,1), Foo(1,2))
val fooOrdering: Ordering[Foo] = Ordering.by(foo => (foo.b, foo.a))
val ss = SortedSet(Foo(2, 2), Foo(2, 1), Foo(1, 2))(fooOrdering)
// ss: scala.collection.immutable.SortedSet[Foo] = TreeSet(Foo(1,2), Foo(2,1), Foo(2,2))

How to take different types of elements from Map

Here I got two hash sets:
var vertexes = new HashSet[String]()
var edges = new HashSet[RDFTriple]() //RDFTriple is a class
I want to put them into a map like this:
var graph = Map[String, HashSet[_]]()
graph.put("e", edges)
graph.put("v", vertexes)
But now I want to take vertexes and edges respectively but failed. I have tried something like the following:
val a = graph.get("v")
a match {
case _ => val v = a
}
val b = graph.get("e")
b match {
case _ => val e = b
}
But v and e are recognized as Option[HashSet[_]] while I want are HashSet[String] and HashSet[RDFTriple].
How can I do this?
I will apprecicate it so much cuz it bothers me too long.
It is not recommended to use different types in the same Map, however you could some the problem by using Some and asInstanceOf like this:
val v = a match {
case Some(a) => a.asInstanceOf[HashSet[String]]
case None => // do something
}
Note that the assignment val v = ... is done outside the match to allow usage of the variable afterwards. The match for the edges is similar.
However, a better solution would be to use a case class for the graph. Then you would avoid a lot of hassle.
case class Graph(vertexes: HashSet[String], edges: HashSet[RDFTriple])
val graph = Graph(vertexes, edges)
val v = graph.vertexes // HashSet[String]
val e = graph.edges // HashSet[RDFTriple]

applying a function to graph data using mapReduceTriplets in spark and graphx

I'm having some problems applying the mapReduceTriplets to my graph network in spark using graphx.
I've been following the tutorials and read in my own data which is put together as [Array[String],Int], so for example my vertices are:
org.apache.spark.graphx.VertexRDD[Array[String]] e.g. (3999,Array(17, Low, 9))
And my edges are:
org.apache.spark.graphx.EdgeRDD[Int]
e.g. Edge(3999,4500,1)
I'm trying to apply an aggregate type function using mapReduceTriplets which counts how many of the last integer in the array of a vertices (in the above example 9) is the same or different to the first integer (in the above example 17) of all connected vertices.
So you would end up with a list of counts for the number of matches or non-matches.
The problem I am having is applying any function using mapReduceTriplets, I am quite new to scala so this may be really obvious, but in the graphx tutorials it has an example which is using a graph with the format Graph[Double, Int], however my graph is in the format of Graph[Array[String],Int], so i'm just trying as a first step to figure out how I can use my graph in the example and then work from there.
The example on the graphx website is as follows:
val olderFollowers: VertexRDD[(Int, Double)] = graph.mapReduceTriplets[(Int, Double)](
triplet => { // Map Function
if (triplet.srcAttr > triplet.dstAttr) {
// Send message to destination vertex containing counter and age
Iterator((triplet.dstId, (1, triplet.srcAttr)))
} else {
// Don't send a message for this triplet
Iterator.empty
}
},
// Add counter and age
(a, b) => (a._1 + b._1, a._2 + b._2) // Reduce Function
)
Any advice would be most appreciated, or if you think there is a better way than using mapreducetriplets I would be happy to hear it.
Edited new code
val nodes = (sc.textFile("C~nodeData.csv")
.map(line => line.split(",")).map( parts => (parts.head.toLong, parts.tail) ))
val edges = GraphLoader.edgeListFile(sc, "C:~edges.txt")
val graph = edges.outerJoinVertices(nodes) {
case (uid, deg, Some(attrList)) => attrList
case (uid, deg, None) => Array.empty[String]
}
val countsRdd = graph.collectNeighbors(EdgeDirection.Either).leftOuterJoin(graph.vertices).map {
case (id, t) => {
val neighbors: Array[(VertexId, Array[String])] = t._1
val nodeAttr = (t._2)
neighbors.map(_._2).count( x => x.apply(x.size - 1) == nodeAttr(0))
}
}
I think you want to use GraphOps.collectNeighbors instead of either mapReduceTriplets or aggregateMessages.
collectNeighbors will give you an RDD with, for every VertexId in your graph, the connected nodes as an array. Just reduce the Array based on your needs. Something like:
val countsRdd = graph.collectNeighbors(EdgeDirection.Either)
.join(graph.vertices)
.map{ case (vid,t) => {
val neighbors = t._1
val nodeAttr = t._2
neighbors.map(_._2).filter( <add logic here> ).size
}
If this doesn't get you going in the right direction, or you get stuck, let me know (the "" part, for example).