Recursive DFS graph traversal in Scala - scala

I've seen the following code to traverse a graph depth first, in Scala:
def dfs(node: Node, seen: Set[Node]) = {
visit(node)
node.neighbours.filterNot(seen).foreach(neighbour => dfs(node, seen + node))
}
It seems to me this code is not correct as shown with the following example.
Nodes are 1, 2, 3.
Edges are 1 -> 3, 1 -> 2, 2 -> 3
dfs(1, Set.empty) would visit node 1, then node 3, then node 2, then node 3 again because we don't maintain a global Set of seen nodes, but only add them in the recursive call to dfs.
What would instead be a correct implementation of DFS in Scala, without using mutable structures?

Something like this should work (though you might consider the foldLeft to be cheating a little):
def dfs(node: Node): Set[Node] = {
def helper(parent: Node, seen: Set[Node]): Set[Node] = {
neighbors(parent).foldLeft(seen)({ case (nowSeen, child) =>
if (nowSeen.contains(child)) nowSeen else {
visit(child)
helper(child, nowSeen + child)
}
})
}
visit(node)
helper(node, Set(node))
}

Related

Mutating a Node of a general tree in Scala

Consider the following standard class:
class Node(val data: NodeData, val children: Seq[Node])
The NodeData class is also simple:
case class NodeData(text: String, foo: List[Bar])
Also, the tree has arbitrary depth, it's not fixed.
Clearly, implementing a breath-first or depth-first search on that structure is trivial with idiomatic Scala. However, consider that I want not only to visit each of these nodes, but I also want to mutate them on each visit. More concretely, I want to mutate an object in that foo list. How would I go about implementing this? One way I thought about this is to somehow update the nodes and build a new tree while traversing it, but my intuition tells me there is a simpler solution than that.
If you really want to stay immutable, I would define a function recMap on Node, like this:
def recMap(f: NodeData => NodeData) : Node = Node(f(data), children.map(_.recMap(f)))
You could then use it like in this example (I made Node a case class too):
type Bar = Int
case class NodeData(text: String, foo: List[Bar])
case class Node(data: NodeData, children: Seq[Node]) {
def recMap(f: NodeData => NodeData) : Node = Node(f(data), children.map(_.recMap(f)))
}
val tree = new Node(NodeData("parent", List(1, 2, 3, 4)), Seq(
Node(NodeData("a child", List(5, 6, 7, 8)), Seq.empty),
Node(NodeData("another child", List(9, 10, 11, 12)), Seq.empty)
))
val modifiedTree = tree.recMap(
data => NodeData(
if(data.text == "parent") "I am the parent!" else "I am a child!",
data.foo.filter(_ % 2 == 0)
)
)
println(modifiedTree)
Try it out!
Maybe that's what you are searching for.

Find all paths from a vertex until there are no more direct successors

I am working on an app that simulates a workflow into a form of a graph. I am using Scala-Graph to achieve this and I want to find out all the possible direct edge paths until there are no more direct successors.
For example, for this graph:
val z = Graph(1~>2, 2~>3, 2~>4, 3~>4, 5~>4)
I would like to find out all the possible paths from vertex 1 to vertices with no more direct connections. So the output of the logic should be similar to
1~>2~>3~>4
2~>4
Main questions:
Is there a native API provided by scala-graph to achieve this?
Should I write a customer traverse method?
In regards to question 2, I have written an initial version of the code to achieve it but it's always returning an empty value ;( I would appreciate some feedback on this too
def getAllPaths(g: z.NodeT, paths: List[Any]): Unit = {
val directs = g.diSuccessors.toList
if (directs.length == 0) {
paths
} else {
getAllPaths(directs(0), paths :+ directs(0))
}
}
val accum = List[Any]()
println(getAllPaths(z.get(1), accum)) // nothing
So the idea is, by passing in a starting point to the method getAllPaths, it will traverse according to diSuccessors and stops when its length is 0. Ideal output for the example graph z is
[
[1~2, 2~3, 3~4]
[2~4]
]
Why does the custom method return an empty list?
So I've written this to achieve it
// Example graph 1
val z = Graph(1~>2, 2~>3, 2~>4, 3~>4, 5~>4)
// Example graph 2
val z1 = Graph(1~>2, 2~>3, 2~>4, 3~>4, 4~>6, 6~>7, 5~>4)
def getAllPaths(g: z1.NodeT, paths: List[z1.NodeT]): List[Any] = {
val directs = g.diSuccessors.toList
if (directs.length == 0) {
// No more direct successor left, return the array itself
paths
} else if (directs.length == 1) {
// Node with single direction, simply returns itself
if (paths.length == 0) {
// appends g itself and its direct successor for the first iteration
getAllPaths(directs(0), paths :+ g :+ directs(0))
} else {
// Appends only the direct successor
getAllPaths(directs(0), paths :+ directs(0))
}
} else {
directs.map(d => {
getAllPaths(d, paths :+ d)
})
}
}
val accum = List[z1.NodeT]()
println(getAllPaths(z1.get(1), accum))
// Results in: List(List(1, 2, 3, 4, 6, 7), List(1, 2, 4, 6, 7))
Leaving it here just in case if anyone is interested in solving the same problem!
Also please help me write it more elegantly ;) .. I'm still a beginner in Scala
Follow up questions:
How do I reference the type NodeT without going through a variable like above example, which is accessing it via z1.NodeT?

Functional Breadth First Search in Scala with the State Monad

I'm trying to implement a functional Breadth First Search in Scala to compute the distances between a given node and all the other nodes in an unweighted graph. I've used a State Monad for this with the signature as :-
case class State[S,A](run:S => (A,S))
Other functions such as map, flatMap, sequence, modify etc etc are similar to what you'd find inside a standard State Monad.
Here's the code :-
case class Node(label: Int)
case class BfsState(q: Queue[Node], nodesList: List[Node], discovered: Set[Node], distanceFromSrc: Map[Node, Int]) {
val isTerminated = q.isEmpty
}
case class Graph(adjList: Map[Node, List[Node]]) {
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
val initialBfsState = BfsState(Queue(src), List(src), Set(src), Map(src -> 0))
val output = bfsComp(initialBfsState)
(output.nodesList,output.distanceFromSrc)
}
#tailrec
private def bfsComp(currState:BfsState): BfsState = {
if (currState.isTerminated) currState
else bfsComp(searchNode.run(currState)._2)
}
private def searchNode: State[BfsState, Unit] = for {
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
s <- get
_ <- sequence(adjList(node).filter(!s.discovered(_)).map(n => {
modify[BfsState](s => {
s.copy(s.q.enqueue(n), n :: s.nodesList, s.discovered + n, s.distanceFromSrc + (n -> (s.distanceFromSrc(node) + 1)))
})
}))
} yield ()
}
Please can you advice on :-
Should the State Transition on dequeue in the searchNode function be a member of BfsState itself?
How do I make this code more performant/concise/readable?
First off, I suggest moving all the private defs related to bfs into bfs itself. This is the convention for methods that are solely used to implement another.
Second, I suggest simply not using State for this matter. State (like most monads) is about composition. It is useful when you have many things that all need access to the same global state. In this case, BfsState is specialized to bfs, will likely never be used anywhere else (it might be a good idea to move the class into bfs too), and the State itself is always run, so the outer world never sees it. (In many cases, this is fine, but here the scope is too small for State to be useful.) It'd be much cleaner to pull the logic of searchNode into bfsComp itself.
Third, I don't understand why you need both nodesList and discovered, when you can just call _.toList on discovered once you've done your computation. I've left it in in my reimplementation, though, in case there's more to this code that you haven't displayed.
def bfsComp(old: BfsState): BfsState = {
if(old.q.isEmpty) old // You don't need isTerminated, I think
else {
val (currNode, newQ) = old.q.dequeue
val newState = old.copy(q = newQ)
adjList(curNode)
.filterNot(s.discovered) // Set[T] <: T => Boolean and filterNot means you don't need to write !s.discovered(_)
.foldLeft(newState) { case (BfsState(q, nodes, discovered, distance), adjNode) =>
BfsState(
q.enqueue(adjNode),
adjNode :: nodes,
discovered + adjNode,
distance + (adjNode -> (distance(currNode) + 1)
)
}
}
}
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
// I suggest moving BfsState and bfsComp into this method
val output = bfsComp(BfsState(Queue(src), List(src), Set(src), Map(src -> 0)))
(output.nodesList, output.distanceFromSrc)
// Could get rid of nodesList and say output.discovered.toList
}
In the event that you think you do have a good reason for using State here, here are my thoughts.
You use def searchNode. The point of a State is that it is pure and immutable, so it should be a val, or else you reconstruct the same State every use.
You write:
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
First off, Scala's syntax was designed so that you don't need to have both a () and {} surrounding an anonymous function:
node <- State[BfsState, Node] { s =>
// ...
}
Second, this doesn't look quite right to me. One benefit of using for-syntax is that the anonymous functions are hidden from you and there is minimal indentation. I'd just write it out
oldState <- get
(node, newQ) = oldState.q.dequeue
newState = oldState.copy(q = newQ)
Footnote: would it make sense to make Node an inner class of Graph? Just a suggestion.

.with alternative in scala

I come from Groovy and it has a .with method on every type which accepts a single-argument closure; the argument is the object on which the .with method is being called. This allows a very cool technique of extending the functional chaining capabilities, which releases you from obligation to introduce temporary variables, factors your code, makes it easier to read and does other niceties.
I want to be able to do something like this:
Seq(1, 2, 3, 4, 5)
.filter(_ % 2 == 0)
.with(it => if (!it.isEmpty) println(it))
Instead of
val yetAnotherMeaninglessNameForTemporaryVariable =
Seq(1, 2, 3, 4, 5).filter(_ % 2 == 0)
if (!yetAnotherMeaninglessNameForTemporaryVariable.isEmpty)
println(yetAnotherMeaninglessNameForTemporaryVariable)
In other words in the first example the .with is kinda similar to .foreach but instead of iterating thru the items of the object it is being called once on the object itself. So it is equal to Seq(1, 2, 3, 4, 5).filter(_ % 2 == 0).
Since I was very surprised not to find anything like that in Scala, my questions are:
am I missing something?
are there any alternative techniques native to Scala?
if not, are there any decent reasons why this feature is not implemented in Scala?
Update:
An appropriate feature request has been posted on the Scala issue tracker: https://issues.scala-lang.org/browse/SI-5324. Please vote and promote
There doesn't exist any such method in the standard library, but it's not hard to define your own.
implicit def aW[A](a: A) = new AW(a)
class AW[A](a: A) {
def tap[U](f: A => U): A = {
f(a)
a
}
}
val seq = Seq(2, 3, 11).
map(_ * 3).tap(x => println("After mapping: " + x)).
filter(_ % 2 != 0).tap(x => println("After filtering: " + x))
EDIT: (in response to the comment)
Oh, I misunderstood. What you need is there in the Scalaz library. It comes under name |> (referred to as pipe operator). With that, your example would look like shown below:
Seq(1, 2, 3, 4, 5).filter(_ % 2 == 0) |> { it => if(!it.isEmpty) println(it) }
If you cannot use Scalaz, you can define the operator on your own:
implicit def aW[A](a: A) = new AW(a)
class AW[A](a: A) {
def |>[B](f: A => B): B = f(a)
}
And it's not a bad practice to pimp useful method(s) on existing types. You should use implicit conversions sparingly, but I think these two combinators are common enough for their pimps to be justifiable.
There is some syntax for this pattern included in Scala:
Seq(1, 2, 3, 4, 5).filter(_ % 2 == 0) match { case it => if (!it.isEmpty) println(it) }
However, this is no accepted idiom so you should maybe refrain from (ab)using it.
If you dislike inventing loads and loads of names for dummy variables, remember that you can use scope braces:
val importantResult = {
val it = Seq(1,2,3).filter(_ % 2 == 0)
if (!it.isEmpty) println(it)
it
}
val otherImportantResultWithASpeakingVariableName = {
val it = // ...
/* ... */
it
}
Try sth like this.
println(Seq(1, 2, 3, 4, 5).filter(_ % 2 == 0).ensuring(!_.isEmpty))
Throws an assertion exception if the condition is not met.
Remember call by name? Perhaps it gives you the capablity you want:
object Test {
def main(args: Array[String]) {
delayed(time());
}
def time() = {
println("Getting time in nano seconds")
System.nanoTime
}
def delayed( t: => Long ) = {
println("In delayed method")
println("Param: " + t)
t
}
}
as described in http://www.tutorialspoint.com/scala/functions_call_by_name.htm
Although I like other solutions better (as they are more local and therefore easier to follow), do not forget that you can
{ val x = Seq(1,2,3,4,5).filter(_ % 2 == 0); println(x); x }
to avoid name collisions on your meaningless variables and keep them constrained to the appropriate scope.
This is just function application f(x) flipped on its head: x.with(f)... If you're looking for an idiomatic way of doing with in Scala, un-flip it:
(it => if (!it.isEmpty) println(it)) (Seq(1, 2, 3, 4, 5).filter(_ % 2 == 0))
Similarly, if you want x.with(f).with(g), just use g(f(x))...

Scala Graph Cycle Detection Algo 'return' needed?

I have implemented a small cycle detection algorithm for a DAG in Scala.
The 'return' bothers me - I'd like to have a version without the return...possible?
def isCyclic() : Boolean = {
lock.readLock().lock()
try {
nodes.foreach(node => node.marker = 1)
nodes.foreach(node => {if (1 == node.marker && visit(node)) return true})
} finally {
lock.readLock().unlock()
}
false
}
private def visit(node: MyNode): Boolean = {
node.marker = 3
val nodeId = node.id
val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
children.foreach(child => {
if (3 == child.marker || (1 == child.marker && visit(child))) return true
})
node.marker = 2
false
}
Yes, by using '.find' instead of 'foreach' + 'return':
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Seq
def isCyclic() : Boolean = {
def visit(node: MyNode): Boolean = {
node.marker = 3
val nodeId = node.id
val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
val found = children.exists(child => (3 == child.marker || (1 == child.marker && visit(child))))
node.marker = 2
found
}
lock.readLock().lock()
try {
nodes.foreach(node => node.marker = 1)
nodes.exists(node => node.marker && visit(node))
} finally {
lock.readLock().unlock()
}
}
Summary:
I have originated two solutions as generic FP functions which detect cycles within a directed graph. And per your implied preference, the use of an early return to escape the recursive function has been eliminated. The first, isCyclic, simply returns a Boolean as soon as the DFS (Depth First Search) repeats a node visit. The second, filterToJustCycles, returns a copy of the input Map filtered down to just the nodes involved in any/all cycles, and returns an empty Map when no cycles are found.
Details:
For the following, please Consider a directed graph encoded as such:
val directedGraphWithCyclesA: Map[String, Set[String]] =
Map(
"A" -> Set("B", "E", "J")
, "B" -> Set("E", "F")
, "C" -> Set("I", "G")
, "D" -> Set("G", "L")
, "E" -> Set("H")
, "F" -> Set("G")
, "G" -> Set("L")
, "H" -> Set("J", "K")
, "I" -> Set("K", "L")
, "J" -> Set("B")
, "K" -> Set("B")
)
In both functions below, the type parameter N refers to whatever "Node" type you care to provide. It is important the provided "Node" type be both immutable and have stable equals and hashCode implementations (all of which occur automatically with use of immutable case classes).
The first function, isCyclic, is a similar in nature to the version of the solution provided by #the-archetypal-paul. It assumes the directed graph has been transformed into a Map[N, Set[N]] where N is the identity of a node in the graph.
If you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.
Calling the isCyclic function as such:
val isCyclicResult = isCyclic(directedGraphWithCyclesA)
will return:
`true`
No further information is provided. And the DFS (Depth First Search) is aborted at detection of the first repeated visit to a node.
def isCyclic[N](nsByN: Map[N, Set[N]]) : Boolean = {
def hasCycle(nAndNs: (N, Set[N]), visited: Set[N] = Set[N]()): Boolean =
if (visited.contains(nAndNs._1))
true
else
nAndNs._2.exists(
n =>
nsByN.get(n) match {
case Some(ns) =>
hasCycle((n, ns), visited + nAndNs._1)
case None =>
false
}
)
nsByN.exists(hasCycle(_))
}
The second function, filterToJustCycles, uses the set reduction technique to recursively filter away unreferenced root nodes in the Map. If there are no cycles in the supplied graph of nodes, then .isEmpty will be true on the returned Map. If however, there are any cycles, all of the nodes required to participate in any of the cycles are returned with all of the other non-cycle participating nodes filtered away.
Again, if you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.
Calling the filterToJustCycles function as such:
val cycles = filterToJustCycles(directedGraphWithCyclesA)
will return:
Map(E -> Set(H), J -> Set(B), B -> Set(E), H -> Set(J, K), K -> Set(B))
It's trivial to then create a traversal across this Map to produce any or all of the various cycle pathways through the remaining nodes.
def filterToJustCycles[N](nsByN: Map[N, Set[N]]): Map[N, Set[N]] = {
def recursive(nsByNRemaining: Map[N, Set[N]], referencedRootNs: Set[N] = Set[N]()): Map[N, Set[N]] = {
val (referencedRootNsNew, nsByNRemainingNew) = {
val referencedRootNsNewTemp =
nsByNRemaining.values.flatten.toSet.intersect(nsByNRemaining.keySet)
(
referencedRootNsNewTemp
, nsByNRemaining.collect {
case (t, ts) if referencedRootNsNewTemp.contains(t) && referencedRootNsNewTemp.intersect(ts.toSet).nonEmpty =>
(t, referencedRootNsNewTemp.intersect(ts.toSet))
}
)
}
if (referencedRootNsNew == referencedRootNs)
nsByNRemainingNew
else
recursive(nsByNRemainingNew, referencedRootNsNew)
}
recursive(nsByN)
}
So, how does one generically transform a custom directed graph implementation into a Map[N, Set[N]]?
In essence, "Go Scala case classes!"
First, let's define an example case of a real node in a pre-existing directed graph:
class CustomNode (
val equipmentIdAndType: String //"A387.Structure" - identity is embedded in a string and must be parsed out
, val childrenNodes: List[CustomNode] //even through Set is implied, for whatever reason this implementation used List
, val otherImplementationNoise: Option[Any] = None
)
Again, this is just an example. Yours could involve subclassing, delegation, etc. The purpose is to have access to a something that will be able to fetch the two essential things to make this work:
the identity of a node; i.e. something to distinguish it and makes it unique from all other nodes in the same directed graph
a collection of the identities of the immediate children of a specific node - if the specific node doesn't have any children, this collection will be empty
Next, we define a helper object, DirectedGraph, which will contain the infrastructure for the conversion:
Node: an adapter trait which will wrap CustomNode
toMap: a function which will take a List[CustomNode] and convert it to a Map[Node, Set[Node]] (which is type equivalent to our target type of Map[N, Set[N]])
Here's the code:
object DirectedGraph {
trait Node[S, I] {
def source: S
def identity: I
def children: Set[I]
}
def toMap[S, I, N <: Node[S, I]](ss: List[S], transformSToN: S => N): Map[N, Set[N]] = {
val (ns, nByI) = {
val iAndNs =
ss.map(
s => {
val n =
transformSToN(s)
(n.identity, n)
}
)
(iAndNs.map(_._2), iAndNs.toMap)
}
ns.map(n => (n, n.children.map(nByI(_)))).toMap
}
}
Now, we must generate the actual adapter, CustomNodeAdapter, which will wrap each CustomNode instance. This adapter uses a case class in a very specific way; i.e. specifying two constructor parameters lists. It ensures the case class conforms to a Set's requirement that a Set member have correct equals and hashCode implementations. For more details on why and how to use a case class this way, please see this StackOverflow question and answer:
object CustomNodeAdapter extends (CustomNode => CustomNodeAdapter) {
def apply(customNode: CustomNode): CustomNodeAdapter =
new CustomNodeAdapter(fetchIdentity(customNode))(customNode) {}
def fetchIdentity(customNode: CustomNode): String =
fetchIdentity(customNode.equipmentIdAndType)
def fetchIdentity(eiat: String): String =
eiat.takeWhile(char => char.isLetter || char.isDigit)
}
abstract case class CustomNodeAdapter(identity: String)(customNode: CustomNode) extends DirectedGraph.Node[CustomNode, String] {
val children =
customNode.childrenNodes.map(CustomNodeAdapter.fetchIdentity).toSet
val source =
customNode
}
We now have the infrastructure in place. Let's define a "real world" directed graph consisting of CustomNode:
val customNodeDirectedGraphWithCyclesA =
List(
new CustomNode("A.x", List("B.a", "E.a", "J.a"))
, new CustomNode("B.x", List("E.b", "F.b"))
, new CustomNode("C.x", List("I.c", "G.c"))
, new CustomNode("D.x", List("G.d", "L.d"))
, new CustomNode("E.x", List("H.e"))
, new CustomNode("F.x", List("G.f"))
, new CustomNode("G.x", List("L.g"))
, new CustomNode("H.x", List("J.h", "K.h"))
, new CustomNode("I.x", List("K.i", "L.i"))
, new CustomNode("J.x", List("B.j"))
, new CustomNode("K.x", List("B.k"))
, new CustomNode("L.x", Nil)
)
Finally, we can now do the conversion which looks like this:
val transformCustomNodeDirectedGraphWithCyclesA =
DirectedGraph.toMap[CustomNode, String, CustomNodeAdapter](customNodes1, customNode => CustomNodeAdapter(customNode))
And we can take transformCustomNodeDirectedGraphWithCyclesA, which is of type Map[CustomNodeAdapter,Set[CustomNodeAdapter]], and submit it to the two original functions.
Calling the isCyclic function as such:
val isCyclicResult = isCyclic(transformCustomNodeDirectedGraphWithCyclesA)
will return:
`true`
Calling the filterToJustCycles function as such:
val cycles = filterToJustCycles(transformCustomNodeDirectedGraphWithCyclesA)
will return:
Map(
CustomNodeAdapter(B) -> Set(CustomNodeAdapter(E))
, CustomNodeAdapter(E) -> Set(CustomNodeAdapter(H))
, CustomNodeAdapter(H) -> Set(CustomNodeAdapter(J), CustomNodeAdapter(K))
, CustomNodeAdapter(J) -> Set(CustomNodeAdapter(B))
, CustomNodeAdapter(K) -> Set(CustomNodeAdapter(B))
)
And if needed, this Map can then be converted back to Map[CustomNode, List[CustomNode]]:
cycles.map {
case (customNodeAdapter, customNodeAdapterChildren) =>
(customNodeAdapter.source, customNodeAdapterChildren.toList.map(_.source))
}
If you have any questions, issues or concerns, please let me know and I will address them ASAP.
I think the problem can be solved without changing the state of the node with the marker field. The following is a rough code of what i think the isCyclic should look like. I am currently storing the node objects which are visited instead you can store the node ids if the node doesnt have equality based on node id.
def isCyclic() : Boolean = nodes.exists(hasCycle(_, HashSet()))
def hasCycle(node:Node, visited:Seq[Node]) = visited.contains(node) || children(node).exists(hasCycle(_, node +: visited))
def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
Answer added just to show that the mutable-visited isn't too unreadable either (untested, though!)
def isCyclic() : Boolean =
{
var visited = HashSet()
def hasCycle(node:Node) = {
if (visited.contains(node)) {
true
} else {
visited :+= node
children(node).exists(hasCycle(_))
}
}
nodes.exists(hasCycle(_))
}
def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
If p = node => node.marker==1 && visit(node) and assuming nodes is a List you can pick any of the following:
nodes.filter(p).length>0
nodes.count(p)>0
nodes.exists(p) (I think the most relevant)
I am not sure of the relative complexity of each method and would appreciate a comment from fellow members of the community