Functional Breadth First Search in Scala with the State Monad - scala

I'm trying to implement a functional Breadth First Search in Scala to compute the distances between a given node and all the other nodes in an unweighted graph. I've used a State Monad for this with the signature as :-
case class State[S,A](run:S => (A,S))
Other functions such as map, flatMap, sequence, modify etc etc are similar to what you'd find inside a standard State Monad.
Here's the code :-
case class Node(label: Int)
case class BfsState(q: Queue[Node], nodesList: List[Node], discovered: Set[Node], distanceFromSrc: Map[Node, Int]) {
val isTerminated = q.isEmpty
}
case class Graph(adjList: Map[Node, List[Node]]) {
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
val initialBfsState = BfsState(Queue(src), List(src), Set(src), Map(src -> 0))
val output = bfsComp(initialBfsState)
(output.nodesList,output.distanceFromSrc)
}
#tailrec
private def bfsComp(currState:BfsState): BfsState = {
if (currState.isTerminated) currState
else bfsComp(searchNode.run(currState)._2)
}
private def searchNode: State[BfsState, Unit] = for {
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
s <- get
_ <- sequence(adjList(node).filter(!s.discovered(_)).map(n => {
modify[BfsState](s => {
s.copy(s.q.enqueue(n), n :: s.nodesList, s.discovered + n, s.distanceFromSrc + (n -> (s.distanceFromSrc(node) + 1)))
})
}))
} yield ()
}
Please can you advice on :-
Should the State Transition on dequeue in the searchNode function be a member of BfsState itself?
How do I make this code more performant/concise/readable?

First off, I suggest moving all the private defs related to bfs into bfs itself. This is the convention for methods that are solely used to implement another.
Second, I suggest simply not using State for this matter. State (like most monads) is about composition. It is useful when you have many things that all need access to the same global state. In this case, BfsState is specialized to bfs, will likely never be used anywhere else (it might be a good idea to move the class into bfs too), and the State itself is always run, so the outer world never sees it. (In many cases, this is fine, but here the scope is too small for State to be useful.) It'd be much cleaner to pull the logic of searchNode into bfsComp itself.
Third, I don't understand why you need both nodesList and discovered, when you can just call _.toList on discovered once you've done your computation. I've left it in in my reimplementation, though, in case there's more to this code that you haven't displayed.
def bfsComp(old: BfsState): BfsState = {
if(old.q.isEmpty) old // You don't need isTerminated, I think
else {
val (currNode, newQ) = old.q.dequeue
val newState = old.copy(q = newQ)
adjList(curNode)
.filterNot(s.discovered) // Set[T] <: T => Boolean and filterNot means you don't need to write !s.discovered(_)
.foldLeft(newState) { case (BfsState(q, nodes, discovered, distance), adjNode) =>
BfsState(
q.enqueue(adjNode),
adjNode :: nodes,
discovered + adjNode,
distance + (adjNode -> (distance(currNode) + 1)
)
}
}
}
def bfs(src: Node): (List[Node], Map[Node, Int]) = {
// I suggest moving BfsState and bfsComp into this method
val output = bfsComp(BfsState(Queue(src), List(src), Set(src), Map(src -> 0)))
(output.nodesList, output.distanceFromSrc)
// Could get rid of nodesList and say output.discovered.toList
}
In the event that you think you do have a good reason for using State here, here are my thoughts.
You use def searchNode. The point of a State is that it is pure and immutable, so it should be a val, or else you reconstruct the same State every use.
You write:
node <- State[BfsState, Node](s => {
val (n, newQ) = s.q.dequeue
(n, s.copy(q = newQ))
})
First off, Scala's syntax was designed so that you don't need to have both a () and {} surrounding an anonymous function:
node <- State[BfsState, Node] { s =>
// ...
}
Second, this doesn't look quite right to me. One benefit of using for-syntax is that the anonymous functions are hidden from you and there is minimal indentation. I'd just write it out
oldState <- get
(node, newQ) = oldState.q.dequeue
newState = oldState.copy(q = newQ)
Footnote: would it make sense to make Node an inner class of Graph? Just a suggestion.

Related

Scala: for-comprehension for chain of operations

I have a task to transform the following code-block:
val instance = instanceFactory.create
val result = instance.ackForResult
to for-comprehension expression.
As for-comprehension leans on enumeration of elements, I tried to get around it with wrapper class:
case class InstanceFactoryWrapper(value:InstanceFactory) {
def map(f: InstanceFactory => Instance): Instance
= value.create()
}
where map-method must handle only one element and return a single result: Instance
I tested this approach with this expression:
for {
mediationApi <- InstanceFactoryWrapper(instanceFactoryWrapper)
}
But it does't work: IDEA recommends me to use foreach in this part. But "foreach" doesn't return anything, as opposed to map.
What am I doing wrong?
Simply put when working with List\Option\Either or other lang types comprehensions are useful to transform nested map\flatMap\withFilter into sequences.
Use custom classes in for-comprehension
But what about your own classes or other 3rd party ones?
You need to implement monadic operations in order to use them in for-comprehensions.
The bare minimum: map and flatMap.
Take the following example with a custom Config class:
case class Config[T](content: T) {
def flatMap[S](f: T => Config[S]): Config[S] =
f(content)
def map[S](f: T => S): Config[S] =
this.copy(content = f(content))
}
for {
first <- Config("..")
_ = println("Going through a test")
second <- Config(first + "..")
third <- Config(second + "..")
} yield third
This is how you enable for-comprehension.

Chaining a number of transitions with the state Monad

I am starting to use the state monad to clean up my code. I have got it working for my problem where I process a transaction called CDR and modify the state accordingly.
It is working perfectly fine for individual transactions, using this function to perform the state update.
def addTraffic(cdr: CDR): Network => Network = ...
Here is an example:
scala> val processed: (CDR) => State[Network, Long] = cdr =>
| for {
| m <- init
| _ <- modify(Network.addTraffic(cdr))
| p <- get
| } yield p.count
processed: CDR => scalaz.State[Network,Long] = $$Lambda$4372/1833836780#1258d5c0
scala> val r = processed(("122","celda 1", 3))
r: scalaz.State[Network,Long] = scalaz.IndexedStateT$$anon$13#4cc4bdde
scala> r.run(Network.empty)
res56: scalaz.Id.Id[(Network, Long)] = (Network(Map(122 -> (1,0.0)),Map(celda 1 -> (1,0.0)),Map(1 -> Map(1 -> 3)),1,true),1)
What i want to do now is to chain a number of transactions on an iterator. I have found something that works quite well but the state transitions take no inputs (state changes through RNG)
import scalaz._
import scalaz.std.list.listInstance
type RNG = scala.util.Random
val f = (rng:RNG) => (rng, rng.nextInt)
val intGenerator: State[RNG, Int] = State(f)
val rng42 = new scala.util.Random
val applicative = Applicative[({type l[Int] = State[RNG,Int]})#l]
// To generate the first 5 Random integers
val chain: State[RNG, List[Int]] = applicative.sequence(List.fill(5)(intGenerator))
val chainResult: (RNG, List[Int]) = chain.run(rng42)
chainResult._2.foreach(println)
I have unsuccessfully tried to adapt this, but I can not get they types signatures to match because my state function requires the cdr (transaction) input
Thanks
TL;DR
you can use traverse from the Traverse type-class on a collection (e.g. List) of CDRs, using a function with this signature: CDR => State[Network, Long]. The result will be a State[Network, List[Long]]. Alternatively, if you don't care about the List[Long] there, you can use traverse_ instead, which will return State[Network, Unit]. Finally, should you want to "aggregate" the results T as they come along, and T forms a Monoid, you can use foldMap from Foldable, which will return State[Network, T], where T is the combined (e.g. folded) result of all Ts in your chain.
A code example
Now some more details, with code examples. I will answer this using Cats State rather than Scalaz, as I never used the latter, but the concept is the same and, if you still have problems, I will dig out the correct syntax.
Assume that we have the following data types and imports to work with:
import cats.implicits._
import cats.data.State
case class Position(x : Int = 0, y : Int = 0)
sealed trait Move extends Product
case object Up extends Move
case object Down extends Move
case object Left extends Move
case object Right extends Move
As it is clear, the Position represents a point in a 2D plane and a Move can move such point up, down, left or right.
Now, lets create a method that will allow us to see where we are at a given time:
def whereAmI : State[Position, String] = State.inspect{ s => s.toString }
and a method to change our position, given a Move:
def move(m : Move) : State[Position, String] = State{ s =>
m match {
case Up => (s.copy(y = s.y + 1), "Up!")
case Down => (s.copy(y = s.y - 1), "Down!")
case Left => (s.copy(x = s.x - 1), "Left!")
case Right => (s.copy(x = s.x + 1), "Right!")
}
}
Notice that this will return a String, with the name of the move followed by an exclamation mark. This is just to simulate the type change from Move to something else, and show how the results will be aggregated. More on this in a bit.
Now let's try to play with our methods:
val positions : State[Position, List[String]] = for{
pos1 <- whereAmI
_ <- move(Up)
_ <- move(Right)
_ <- move(Up)
pos2 <- whereAmI
_ <- move(Left)
_ <- move(Left)
pos3 <- whereAmI
} yield List(pos1,pos2,pos3)
And we can feed it an initial Position and see the result:
positions.runA(Position()).value // List(Position(0,0), Position(1,2), Position(-1,2))
(you can ignore the .value there, it's a quirk due to the fact that State[S,A] is really just an alias for StateT[Eval,S,A])
As you can see, this behaves as you would expect, and you can create different "blueprints" (e.g. sequences of state modifications), which will be applied once an initial state is provided.
Now, to actually answer to you question, say we have a List[Move] and we want to apply them sequentially to an initial state, and get the result: we use traverse from the Traverse type-class.
val moves = List(Down, Down, Left, Up)
val result : State[Position, List[String]] = moves.traverse(move)
result.run(Position()).value // (Position(-1,-1),List(Down!, Down!, Left!, Up!))
Alternatively, should you not need the A at all (the List in you case), you can use traverse_, instead of traverse and the result type will be:
val result_ : State[Position, List[String]] = moves.traverse_(move)
result_.run(Position()).value // (Position(-1,-1),Unit)
Finally, if your A type in State[S,A] forms a Monoid, then you could also use foldMap from Foldable to combine (e.g. fold) all As as they are calculated. A trivial example (probably useless, because this will just concatenate all Strings) would be this:
val result : State[Position,String] = moves.foldMap(move)
result.run(Position()).value // (Position(-1,-1),Down!Down!Left!Up!)
Whether this final approach is useful or not to you, really depends on what A you have and if it makes sense to combine it.
And this should be all you need in your scenario.

Encoding recursive tree-creation with while loop + stacks

I'm a bit embarassed to admit this, but I seem to be pretty stumped by what should be a simple programming problem. I'm building a decision tree implementation, and have been using recursion to take a list of labeled samples, recursively split the list in half, and turn it into a tree.
Unfortunately, with deep trees I run into stack overflow errors (ha!), so my first thought was to use continuations to turn it into tail recursion. Unfortunately Scala doesn't support that kind of TCO, so the only solution is to use a trampoline. A trampoline seems kinda inefficient and I was hoping there would be some simple stack-based imperative solution to this problem, but I'm having a lot of trouble finding it.
The recursive version looks sort of like (simplified):
private def trainTree(samples: Seq[Sample], usedFeatures: Set[Int]): DTree = {
if (shouldStop(samples)) {
DTLeaf(makeProportions(samples))
} else {
val featureIdx = getSplittingFeature(samples, usedFeatures)
val (statsWithFeature, statsWithoutFeature) = samples.partition(hasFeature(featureIdx, _))
DTBranch(
trainTree(statsWithFeature, usedFeatures + featureIdx),
trainTree(statsWithoutFeature, usedFeatures + featureIdx),
featureIdx)
}
}
So basically I'm recursively subdividing the list into two according to some feature of the data, and passing through a list of used features so I don't repeat - that's all handled in the "getSplittingFeature" function so we can ignore it. The code is really simple! Still, I'm having trouble figuring out a stack-based solution that doesn't just use closures and effectively become a trampoline. I know we'll at least have to keep around little "frames" of arguments in the stack but I would like to avoid closure calls.
I get that I should be writing out explicitly what the callstack and program counter handle for me implicitly in the recursive solution, but I'm having trouble doing that without continuations. At this point it's hardly even about efficiency, I'm just curious. So please, no need to remind me that premature optimization is the root of all evil and the trampoline-based solution will probably work just fine. I know it probably will - this is basically a puzzle for it's own sake.
Can anyone tell me what the canonical while-loop-and-stack-based solution to this sort of thing is?
UPDATE: Based on Thipor Kong's excellent solution, I've coded up a while-loops/stacks/hashtable based implementation of the algorithm that should be a direct translation of the recursive version. This is exactly what I was looking for:
FINAL UPDATE: I've used sequential integer indices, as well as putting everything back into arrays instead of maps for performance, added maxDepth support, and finally have a solution with the same performance as the recursive version (not sure about memory usage but I would guess less):
private def trainTreeNoMaxDepth(startingSamples: Seq[Sample], startingMaxDepth: Int): DTree = {
// Use arraybuffer as dense mutable int-indexed map - no IndexOutOfBoundsException, just expand to fit
type DenseIntMap[T] = ArrayBuffer[T]
def updateIntMap[#specialized T](ab: DenseIntMap[T], idx: Int, item: T, dfault: T = null.asInstanceOf[T]) = {
if (ab.length <= idx) {ab.insertAll(ab.length, Iterable.fill(idx - ab.length + 1)(dfault)) }
ab.update(idx, item)
}
var currentChildId = 0 // get childIdx or create one if it's not there already
def child(childMap: DenseIntMap[Int], heapIdx: Int) =
if (childMap.length > heapIdx && childMap(heapIdx) != -1) childMap(heapIdx)
else {currentChildId += 1; updateIntMap(childMap, heapIdx, currentChildId, -1); currentChildId }
// go down
val leftChildren, rightChildren = new DenseIntMap[Int]() // heapIdx -> childHeapIdx
val todo = Stack((startingSamples, Set.empty[Int], startingMaxDepth, 0)) // samples, usedFeatures, maxDepth, heapIdx
val branches = new Stack[(Int, Int)]() // heapIdx, featureIdx
val nodes = new DenseIntMap[DTree]() // heapIdx -> node
while (!todo.isEmpty) {
val (samples, usedFeatures, maxDepth, heapIdx) = todo.pop()
if (shouldStop(samples) || maxDepth == 0) {
updateIntMap(nodes, heapIdx, DTLeaf(makeProportions(samples)))
} else {
val featureIdx = getSplittingFeature(samples, usedFeatures)
val (statsWithFeature, statsWithoutFeature) = samples.partition(hasFeature(featureIdx, _))
todo.push((statsWithFeature, usedFeatures + featureIdx, maxDepth - 1, child(leftChildren, heapIdx)))
todo.push((statsWithoutFeature, usedFeatures + featureIdx, maxDepth - 1, child(rightChildren, heapIdx)))
branches.push((heapIdx, featureIdx))
}
}
// go up
while (!branches.isEmpty) {
val (heapIdx, featureIdx) = branches.pop()
updateIntMap(nodes, heapIdx, DTBranch(nodes(child(leftChildren, heapIdx)), nodes(child(rightChildren, heapIdx)), featureIdx))
}
nodes(0)
}
Just store the binary tree in an array, as described on Wikipedia: For node i, the left child goes into 2*i+1 and the right child in to 2*i+2. When doing "down", you keep a collection of todos, that still have to be splitted to reach a leaf. Once you've got only leafs, to go upward (from right to left in the array) to build the decision nodes:
Update: A cleaned up version, that also supports the features stored int the branches (type parameter B) and that is more functional/fully pure and that supports sparse trees with a map as suggested by ron.
Update2-3: Make economical use of name space for node ids and abstract over type of ids to allow of large trees. Take node ids from Stream.
sealed trait DTree[A, B]
case class DTLeaf[A, B](a: A, b: B) extends DTree[A, B]
case class DTBranch[A, B](left: DTree[A, B], right: DTree[A, B], b: B) extends DTree[A, B]
def mktree[A, B, Id](a: A, b: B, split: (A, B) => Option[(A, A, B)], ids: Stream[Id]) = {
#tailrec
def goDown(todo: Seq[(A, B, Id)], branches: Seq[(Id, B, Id, Id)], leafs: Map[Id, DTree[A, B]], ids: Stream[Id]): (Seq[(Id, B, Id, Id)], Map[Id, DTree[A, B]]) =
todo match {
case Nil => (branches, leafs)
case (a, b, id) :: rest =>
split(a, b) match {
case None =>
goDown(rest, branches, leafs + (id -> DTLeaf(a, b)), ids)
case Some((left, right, b2)) =>
val leftId #:: rightId #:: idRest = ids
goDown((right, b2, rightId) +: (left, b2, leftId) +: rest, (id, b2, leftId, rightId) +: branches, leafs, idRest)
}
}
#tailrec
def goUp[A, B](branches: Seq[(Id, B, Id, Id)], nodes: Map[Id, DTree[A, B]]): Map[Id, DTree[A, B]] =
branches match {
case Nil => nodes
case (id, b, leftId, rightId) :: rest =>
goUp(rest, nodes + (id -> DTBranch(nodes(leftId), nodes(rightId), b)))
}
val rootId #:: restIds = ids
val (branches, leafs) = goDown(Seq((a, b, rootId)), Seq(), Map(), restIds)
goUp(branches, leafs)(rootId)
}
// try it out
def split(xs: Seq[Int], b: Int) =
if (xs.size > 1) {
val (left, right) = xs.splitAt(xs.size / 2)
Some((left, right, b + 1))
} else {
None
}
val tree = mktree(0 to 1000, 0, split _, Stream.from(0))
println(tree)

Functional Alternative to Game Loop

I'm just starting out with the Scala and am trying a little toy program - in this case a text based TicTacToe. I wrote a working version based on what I know about scala, but noticed it was mostly imperative and my classes were mutable.
I'm going through and trying to implement some functional idioms and have managed to at least make the classes representing the game state immutable. However, I'm left with a class responsible for performing the game loop relying on mutable state and imperative loop as follows:
var board: TicTacToeBoard = new TicTacToeBoard
def start() {
var gameState: GameState = new XMovesNext
outputState(gameState)
while (!gameState.isGameFinished) {
val position: Int = getSelectionFromUser
board = board.updated(position, gameState.nextTurn)
gameState = getGameState(board)
outputState(gameState)
}
}
What would be a more idiomatic way to program what I'm doing imperatively in this loop?
Full source code is here https://github.com/whaley/TicTacToe-in-Scala/tree/master/src/main/scala/com/jasonwhaley/tictactoe
imho for Scala, the imperative loop is just fine. You can always write a recursive function to behave like a loop. I also threw in some pattern matching.
def start() {
def loop(board: TicTacToeBoard) = board.state match {
case Finished => Unit
case Unfinished(gameState) => {
gameState.output()
val position: Int = getSelectionFromUser()
loop(board.updated(position))
}
}
loop(new TicTacToeBoard)
}
Suppose we had a function whileSome : (a -> Option[a]) a -> (), which runs the input function until its result is None. That would strip away a little boilerplate.
def start() {
def step(board: TicTacToeBoard) = {
board.gameState.output()
val position: Int = getSelectionFromUser()
board.updated(position) // returns either Some(nextBoard) or None
}
whileSome(step, new TicTacToeBoard)
}
whileSome should be trivial to write; it is simply an abstraction of the former pattern. I'm not sure if it's in any common Scala libs, but in Haskell you could grab whileJust_ from monad-loops.
You could implement it as a recursive method. Here's an unrelated example:
object Guesser extends App {
val MIN = 1
val MAX = 100
readLine("Think of a number between 1 and 100. Press enter when ready")
def guess(max: Int, min: Int) {
val cur = (max + min) / 2
readLine("Is the number "+cur+"? (y/n) ") match {
case "y" => println("I thought so")
case "n" => {
def smallerGreater() {
readLine("Is it smaller or greater? (s/g) ") match {
case "s" => guess(cur - 1, min)
case "g" => guess(max, cur + 1)
case _ => smallerGreater()
}
}
smallerGreater()
}
case _ => {
println("Huh?")
guess(max, min)
}
}
}
guess(MAX, MIN)
}
How about something like:
Stream.continually(processMove).takeWhile(!_.isGameFinished)
where processMove is a function that gets selection from user, updates board and returns new state.
I'd go with the recursive version, but here's a proper implementation of the Stream version:
var board: TicTacToeBoard = new TicTacToeBoard
def start() {
def initialBoard: TicTacToeBoard = new TicTacToeBoard
def initialGameState: GameState = new XMovesNext
def gameIterator = Stream.iterate(initialBoard -> initialGameState) _
def game: Stream[GameState] = {
val (moves, end) = gameIterator {
case (board, gameState) =>
val position: Int = getSelectionFromUser
val updatedBoard = board.updated(position, gameState.nextTurn)
(updatedBoard, getGameState(board))
}.span { case (_, gameState) => !gameState.isGameFinished }
(moves ::: end.take(1)) map { case (_, gameState) => gameState }
}
game foreach outputState
}
This looks weirder than it should. Ideally, I'd use takeWhile, and then map it afterwards, but it won't work as the last case would be left out!
If the moves of the game could be discarded, then dropWhile followed by head would work. If I had the side effect (outputState) instead the Stream, I could go that route, but having side-effect inside a Stream is way worse than a var with a while loop.
So, instead, I use span which gives me both takeWhile and dropWhile but forces me to save the intermediate results -- which can be real bad if memory is a concern, as the whole game will be kept in memory because moves points to the head of the Stream. So I had to encapsulate all that inside another method, game. That way, when I foreach through the results of game, there won't be anything pointing to the Stream's head.
Another alternative would be to get rid of the other side effect you have: getSelectionFromUser. You can get rid of that with an Iteratee, and then you can save the last move and reapply it.
OR... you could write yourself a takeTo method and use that.

Scala Graph Cycle Detection Algo 'return' needed?

I have implemented a small cycle detection algorithm for a DAG in Scala.
The 'return' bothers me - I'd like to have a version without the return...possible?
def isCyclic() : Boolean = {
lock.readLock().lock()
try {
nodes.foreach(node => node.marker = 1)
nodes.foreach(node => {if (1 == node.marker && visit(node)) return true})
} finally {
lock.readLock().unlock()
}
false
}
private def visit(node: MyNode): Boolean = {
node.marker = 3
val nodeId = node.id
val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
children.foreach(child => {
if (3 == child.marker || (1 == child.marker && visit(child))) return true
})
node.marker = 2
false
}
Yes, by using '.find' instead of 'foreach' + 'return':
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Seq
def isCyclic() : Boolean = {
def visit(node: MyNode): Boolean = {
node.marker = 3
val nodeId = node.id
val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
val found = children.exists(child => (3 == child.marker || (1 == child.marker && visit(child))))
node.marker = 2
found
}
lock.readLock().lock()
try {
nodes.foreach(node => node.marker = 1)
nodes.exists(node => node.marker && visit(node))
} finally {
lock.readLock().unlock()
}
}
Summary:
I have originated two solutions as generic FP functions which detect cycles within a directed graph. And per your implied preference, the use of an early return to escape the recursive function has been eliminated. The first, isCyclic, simply returns a Boolean as soon as the DFS (Depth First Search) repeats a node visit. The second, filterToJustCycles, returns a copy of the input Map filtered down to just the nodes involved in any/all cycles, and returns an empty Map when no cycles are found.
Details:
For the following, please Consider a directed graph encoded as such:
val directedGraphWithCyclesA: Map[String, Set[String]] =
Map(
"A" -> Set("B", "E", "J")
, "B" -> Set("E", "F")
, "C" -> Set("I", "G")
, "D" -> Set("G", "L")
, "E" -> Set("H")
, "F" -> Set("G")
, "G" -> Set("L")
, "H" -> Set("J", "K")
, "I" -> Set("K", "L")
, "J" -> Set("B")
, "K" -> Set("B")
)
In both functions below, the type parameter N refers to whatever "Node" type you care to provide. It is important the provided "Node" type be both immutable and have stable equals and hashCode implementations (all of which occur automatically with use of immutable case classes).
The first function, isCyclic, is a similar in nature to the version of the solution provided by #the-archetypal-paul. It assumes the directed graph has been transformed into a Map[N, Set[N]] where N is the identity of a node in the graph.
If you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.
Calling the isCyclic function as such:
val isCyclicResult = isCyclic(directedGraphWithCyclesA)
will return:
`true`
No further information is provided. And the DFS (Depth First Search) is aborted at detection of the first repeated visit to a node.
def isCyclic[N](nsByN: Map[N, Set[N]]) : Boolean = {
def hasCycle(nAndNs: (N, Set[N]), visited: Set[N] = Set[N]()): Boolean =
if (visited.contains(nAndNs._1))
true
else
nAndNs._2.exists(
n =>
nsByN.get(n) match {
case Some(ns) =>
hasCycle((n, ns), visited + nAndNs._1)
case None =>
false
}
)
nsByN.exists(hasCycle(_))
}
The second function, filterToJustCycles, uses the set reduction technique to recursively filter away unreferenced root nodes in the Map. If there are no cycles in the supplied graph of nodes, then .isEmpty will be true on the returned Map. If however, there are any cycles, all of the nodes required to participate in any of the cycles are returned with all of the other non-cycle participating nodes filtered away.
Again, if you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.
Calling the filterToJustCycles function as such:
val cycles = filterToJustCycles(directedGraphWithCyclesA)
will return:
Map(E -> Set(H), J -> Set(B), B -> Set(E), H -> Set(J, K), K -> Set(B))
It's trivial to then create a traversal across this Map to produce any or all of the various cycle pathways through the remaining nodes.
def filterToJustCycles[N](nsByN: Map[N, Set[N]]): Map[N, Set[N]] = {
def recursive(nsByNRemaining: Map[N, Set[N]], referencedRootNs: Set[N] = Set[N]()): Map[N, Set[N]] = {
val (referencedRootNsNew, nsByNRemainingNew) = {
val referencedRootNsNewTemp =
nsByNRemaining.values.flatten.toSet.intersect(nsByNRemaining.keySet)
(
referencedRootNsNewTemp
, nsByNRemaining.collect {
case (t, ts) if referencedRootNsNewTemp.contains(t) && referencedRootNsNewTemp.intersect(ts.toSet).nonEmpty =>
(t, referencedRootNsNewTemp.intersect(ts.toSet))
}
)
}
if (referencedRootNsNew == referencedRootNs)
nsByNRemainingNew
else
recursive(nsByNRemainingNew, referencedRootNsNew)
}
recursive(nsByN)
}
So, how does one generically transform a custom directed graph implementation into a Map[N, Set[N]]?
In essence, "Go Scala case classes!"
First, let's define an example case of a real node in a pre-existing directed graph:
class CustomNode (
val equipmentIdAndType: String //"A387.Structure" - identity is embedded in a string and must be parsed out
, val childrenNodes: List[CustomNode] //even through Set is implied, for whatever reason this implementation used List
, val otherImplementationNoise: Option[Any] = None
)
Again, this is just an example. Yours could involve subclassing, delegation, etc. The purpose is to have access to a something that will be able to fetch the two essential things to make this work:
the identity of a node; i.e. something to distinguish it and makes it unique from all other nodes in the same directed graph
a collection of the identities of the immediate children of a specific node - if the specific node doesn't have any children, this collection will be empty
Next, we define a helper object, DirectedGraph, which will contain the infrastructure for the conversion:
Node: an adapter trait which will wrap CustomNode
toMap: a function which will take a List[CustomNode] and convert it to a Map[Node, Set[Node]] (which is type equivalent to our target type of Map[N, Set[N]])
Here's the code:
object DirectedGraph {
trait Node[S, I] {
def source: S
def identity: I
def children: Set[I]
}
def toMap[S, I, N <: Node[S, I]](ss: List[S], transformSToN: S => N): Map[N, Set[N]] = {
val (ns, nByI) = {
val iAndNs =
ss.map(
s => {
val n =
transformSToN(s)
(n.identity, n)
}
)
(iAndNs.map(_._2), iAndNs.toMap)
}
ns.map(n => (n, n.children.map(nByI(_)))).toMap
}
}
Now, we must generate the actual adapter, CustomNodeAdapter, which will wrap each CustomNode instance. This adapter uses a case class in a very specific way; i.e. specifying two constructor parameters lists. It ensures the case class conforms to a Set's requirement that a Set member have correct equals and hashCode implementations. For more details on why and how to use a case class this way, please see this StackOverflow question and answer:
object CustomNodeAdapter extends (CustomNode => CustomNodeAdapter) {
def apply(customNode: CustomNode): CustomNodeAdapter =
new CustomNodeAdapter(fetchIdentity(customNode))(customNode) {}
def fetchIdentity(customNode: CustomNode): String =
fetchIdentity(customNode.equipmentIdAndType)
def fetchIdentity(eiat: String): String =
eiat.takeWhile(char => char.isLetter || char.isDigit)
}
abstract case class CustomNodeAdapter(identity: String)(customNode: CustomNode) extends DirectedGraph.Node[CustomNode, String] {
val children =
customNode.childrenNodes.map(CustomNodeAdapter.fetchIdentity).toSet
val source =
customNode
}
We now have the infrastructure in place. Let's define a "real world" directed graph consisting of CustomNode:
val customNodeDirectedGraphWithCyclesA =
List(
new CustomNode("A.x", List("B.a", "E.a", "J.a"))
, new CustomNode("B.x", List("E.b", "F.b"))
, new CustomNode("C.x", List("I.c", "G.c"))
, new CustomNode("D.x", List("G.d", "L.d"))
, new CustomNode("E.x", List("H.e"))
, new CustomNode("F.x", List("G.f"))
, new CustomNode("G.x", List("L.g"))
, new CustomNode("H.x", List("J.h", "K.h"))
, new CustomNode("I.x", List("K.i", "L.i"))
, new CustomNode("J.x", List("B.j"))
, new CustomNode("K.x", List("B.k"))
, new CustomNode("L.x", Nil)
)
Finally, we can now do the conversion which looks like this:
val transformCustomNodeDirectedGraphWithCyclesA =
DirectedGraph.toMap[CustomNode, String, CustomNodeAdapter](customNodes1, customNode => CustomNodeAdapter(customNode))
And we can take transformCustomNodeDirectedGraphWithCyclesA, which is of type Map[CustomNodeAdapter,Set[CustomNodeAdapter]], and submit it to the two original functions.
Calling the isCyclic function as such:
val isCyclicResult = isCyclic(transformCustomNodeDirectedGraphWithCyclesA)
will return:
`true`
Calling the filterToJustCycles function as such:
val cycles = filterToJustCycles(transformCustomNodeDirectedGraphWithCyclesA)
will return:
Map(
CustomNodeAdapter(B) -> Set(CustomNodeAdapter(E))
, CustomNodeAdapter(E) -> Set(CustomNodeAdapter(H))
, CustomNodeAdapter(H) -> Set(CustomNodeAdapter(J), CustomNodeAdapter(K))
, CustomNodeAdapter(J) -> Set(CustomNodeAdapter(B))
, CustomNodeAdapter(K) -> Set(CustomNodeAdapter(B))
)
And if needed, this Map can then be converted back to Map[CustomNode, List[CustomNode]]:
cycles.map {
case (customNodeAdapter, customNodeAdapterChildren) =>
(customNodeAdapter.source, customNodeAdapterChildren.toList.map(_.source))
}
If you have any questions, issues or concerns, please let me know and I will address them ASAP.
I think the problem can be solved without changing the state of the node with the marker field. The following is a rough code of what i think the isCyclic should look like. I am currently storing the node objects which are visited instead you can store the node ids if the node doesnt have equality based on node id.
def isCyclic() : Boolean = nodes.exists(hasCycle(_, HashSet()))
def hasCycle(node:Node, visited:Seq[Node]) = visited.contains(node) || children(node).exists(hasCycle(_, node +: visited))
def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
Answer added just to show that the mutable-visited isn't too unreadable either (untested, though!)
def isCyclic() : Boolean =
{
var visited = HashSet()
def hasCycle(node:Node) = {
if (visited.contains(node)) {
true
} else {
visited :+= node
children(node).exists(hasCycle(_))
}
}
nodes.exists(hasCycle(_))
}
def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
If p = node => node.marker==1 && visit(node) and assuming nodes is a List you can pick any of the following:
nodes.filter(p).length>0
nodes.count(p)>0
nodes.exists(p) (I think the most relevant)
I am not sure of the relative complexity of each method and would appreciate a comment from fellow members of the community