Handle Akka stream's first element specially - scala

Is there an idiomatic way of handling Akka stream's Source first element in a special way? What I have now is:
var firstHandled = false
source.map { elem =>
if(!firstHandled) {
//handle specially
firstHandled = true
} else {
//handle normally
}
}
Thanks

While I would generally go with Ramon's answer, you could also use prefixAndTail, with a prefix of 1, together with flatMapConcat to achieve something similar:
val src = Source(List(1, 2, 3, 4, 5))
val fst = Flow[Int].map(i => s"First: $i")
val rst = Flow[Int].map(i => s"Rest: $i")
val together = src.prefixAndTail(1).flatMapConcat { case (head, tail) =>
// `head` is a Seq of the prefix elements, which in our case is
// just the first one. We can convert it to a source of just
// the first element, processed via our fst flow, and then
// concatenate `tail`, which is the remainder...
Source(head).via(fst).concat(tail.via(rst))
}
Await.result(together.runForeach(println), 10.seconds)
// First: 1
// Rest: 2
// Rest: 3
// Rest: 4
// Rest: 5
This of course works not just for the first item, but for the first N items, with the proviso that those items will be taken up as a strict collection.

Using zipWith
You could zip the original Source with a Source of Booleans that only returns true the first time. This zipped Source can then be processed.
First we'll need a Source that emits the Booleans:
//true, false, false, false, ...
def firstTrueIterator() : Iterator[Boolean] =
(Iterator single true) ++ (Iterator continually false)
def firstTrueSource : Source[Boolean, _] =
Source fromIterator firstTrueIterator
We can then define a function that handles the two different cases:
type Data = ???
type OutputData = ???
def processData(data : Data, firstRun : Boolean) : OutputData =
if(firstRun) { ... }
else { ... }
This function can then be used in a zipWith of your original Source:
val originalSource : Source[Data,_] = ???
val contingentSource : Source[OutputData,_] =
originalSource.zipWith(firstTrueSource)(processData)
Using Stateful Flow
You could create a Flow that contains state similar to the example in the question but with a more functional approach:
def firstRunner(firstCall : (Data) => OutputData,
otherCalls : (Data) => OutputData) : (Data) => OutputData = {
var firstRun = true
(data : Data) => {
if(firstRun) {
firstRun = false
firstCall(data)
}
else
otherCalls(data)
}
}//end def firstRunner
def firstRunFlow(firstCall : (Data) => OutputData,
otherCalls : (Data) => OutputData) : Flow[Data, OutputData, _] =
Flow[Data] map firstRunner(firstCall, otherCalls)
This Flow can then be applied to your original Source:
def firstElementFunc(data : Data) : OutputData = ???
def remainingElsFunc(data : Data) : OutputData = ???
val firstSource : Source[OutputData, _] =
originalSource via firstRunFlow(firstElementFunc,remainingElseFunc)
"Idiomatic Way"
Answering your question directly requires dictating the "idiomatic way". I answer that part last because it is the least verifiable by the compiler and is therefore closer to opinion. I would never claim to be a valid classifier of idiomatic code.
My personal experience with akka-streams has been that it is best to switch my perspective to imagining an actual stream (I think of a train with boxcars) of Data elements. Do I need to break it up into multiple fixed size trains? Do only certain boxcars make it through? Can I attach another train side-by-side that contains Boolean cars which can signal the front? I would prefer the zipWith method due to my regard of streams (trains). My initial approach is always to use other stream parts connected together.
Also, I find it best to embed as little code in an akka Stream component as possible. firstTrueIterator and processData have no dependency on akka at all. Concurrently, the firstTrueSource and contingentSource definitions have virtually no logic. This allows you to test the logic independent of a clunky ActorSystem and the guts can be used in Futures, or Actors.

You can use prepend to prepend a source to flows. Just prepend single item source to the flow, after it is drained, rest of the original source will continue.
https://doc.akka.io/docs/akka/current/stream/operators/Source-or-Flow/prepend.html
Source(List(1, 2, 3))
.prepend(Source.single(0))
.runWith(Sink.foreach(println))
0
1
2
3

While I prefer the approach with zip, one can also use statefulMapConcat:
source
.statefulMapConcat { _ =>
var firstRun = true
elem => {
if (firstRun) {
//first
firstRun = false
} else {
//not first
}
}
}

Related

request timeout from flatMapping over cats.effect.IO

I am attempting to transform some data that is encapsulated in cats.effect.IO with a Map that also is in an IO monad. I'm using http4s with blaze server and when I use the following code the request times out:
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// get the shifts
var getDbShifts: IO[List[Shift]] = shiftModel.findByUserId(userId)
// use the userRoleId to get the RoleId then get the tasks for this role
val taskMap : IO[Map[String, Double]] = taskModel.findByUserId(userId).flatMap {
case tskLst: List[Task] => IO(tskLst.map((task: Task) => (task.name -> task.standard)).toMap)
}
val traversed: IO[List[Shift]] = for {
shifts <- getDbShifts
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: IO[List[ShiftJson]] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) =>
taskMap.flatMap((tm: Map[String, Double]) =>
IO(ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / tm.get(sj.name).get)))
).sequence
//TODO: this flatMap is bricking my request
lstShiftJson.flatMap((sjLst: List[ShiftJson]) => {
IO(Shift(shift.id, shift.shiftDate, shift.shiftStart, shift.shiftEnd,
shift.lunchDuration, shift.shiftDuration, shift.breakOffProd, shift.systemDownOffProd,
shift.meetingOffProd, shift.trainingOffProd, shift.projectOffProd, shift.miscOffProd,
write[List[ShiftJson]](sjLst), shift.userRoleId, shift.isApproved, shift.score, shift.comments
))
})
})
} yield traversed
traversed.flatMap((sLst: List[Shift]) => Ok(write[List[Shift]](sLst)))
}
as you can see the TODO comment. I've narrowed down this method to the flatmap below the TODO comment. If I remove that flatMap and merely return "IO(shift)" to the traversed variable the request does not timeout; However, that doesn't help me much because I need to make use of the lstShiftJson variable which has my transformed json.
My intuition tells me I'm abusing the IO monad somehow, but I'm not quite sure how.
Thank you for your time in reading this!
So with the guidance of Luis's comment I refactored my code to the following. I don't think it is optimal (i.e. the flatMap at the end seems unecessary, but I couldnt' figure out how to remove it. BUT it's the best I've got.
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
}
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// FOR EACH SHIFT
// - read the shift.roleTasks into a ShiftJson object
// - divide each task value by the task.standard where task.name = shiftJson.name
// - write the list of shiftJson back to a string
val traversed = for {
taskMap <- taskModel.findByUserId(userId).map((tList: List[Task]) => tList.map((task: Task) => (task.name -> task.standard)).toMap)
shifts <- shiftModel.findByUserId(userId)
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: List[ShiftJson] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) => ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / taskMap.get(sj.name).get ))
shift.roleTasks = write[List[ShiftJson]](lstShiftJson)
IO(shift)
})
} yield traversed
traversed.flatMap((t: List[Shift]) => Ok(write[List[Shift]](t)))
}
Luis mentioned that mapping my List[Shift] to a Map[String, Double] is a pure operation so we want to use a map instead of flatMap.
He mentioned that I'm wrapping every operation that comes from the database in IO which is causing a great deal of recomputation. (including DB transactions)
To solve this issue I moved all of the database operations inside of my for loop, using the "<-" operator to flatMap each of the return values allows the variables being used to preside within the IO monads, hence preventing the recomputation experienced before.
I do think there must be a better way of returning my return value. flatMapping the "traversed" variable to get back inside of the IO monad seems to be unnecessary recomputation, so please anyone correct me.

What is the best way, if at all, to implement a sorting function with internal state?

What is the best way to implement a sorting function that has an internal state received else where?
Something like:
type Sorter[Item] = (Item, Item) => Boolean
type StringSorter = Sorter[String]
def customSorter : StringSorter = (i1,i2) =>
{
val i1Cnt = itemCountMap.get(i1)
val i2Cnt = itemCountMap.get(i2)
if (i1Cnt==None || i2Cnt==None ) {
i1<i2
} else {
i1Cnt.get<i2Cnt.get
}
}
And here is an example:
val l1 = List("a","b","c")
val itemCountMap = Map("a"->2,"b"->3,"c"1)
l1.sortWith(customSorter)
//The returned list will be ["c","a","b"]
I am pretty new to Scala, and in general, lambda functions are not suppose to have states (right?).
Why you ask? 'cause I am using a generic type lists in spark which later, deep in the code of the executors, I want to analyze based on a specific order, and this order may depend on some static list, and I also want to control this order function.
First, this i1Cnt==null will never be true because get on Map never returns null, it returns an Option WHICH IS NOT A NULL, it is very different.
Second, there is no problem with state in a lambda, there is a problem with mutable shared state (and not only in a lambda, but everywhere).
Third, your function would be better if it receives the Map to use instead of relying on a global variable.
Fourth, here is the fixed code.
type Sorter[Item] = (Item, Item) => Boolean
def customSorter(map: Map[String, Int]): Sorter[String] = { (s1, s2) =>
(map.get(s1), map.get(s2)) match {
case (Some(i1), Some(i2)) => i1 < i2
case _ => s1 < s2
}
}
You can see it running here)

How to create a play.api.libs.iteratee.Enumerator which inserts some data between the items of a given Enumerator?

I use Play framework with ReactiveMongo. Most of ReactiveMongo APIs are based on the Play Enumerator. As long as I fetch some data from MongoDB and return it "as-is" asynchronously, everything is fine. Also the transformation of the data, like converting BSON to String, using Enumerator.map is obvious.
But today I faced a problem which at the bottom line narrowed to the following code. I wasted half of the day trying to create an Enumerator which would consume items from the given Enumerator and insert some items between them. It is important not to load all the items at once, as there could be many of them (the code example has only two items "1" and "2"). But semantically it is similar to mkString of the collections. I am sure it can be done very easily, but the best I could come with - was this code. Very similar code creating an Enumerator using Concurrent.broadcast serves me well for WebSockets. But here even that does not work. The HTTP response never comes back. When I look at Enumeratee, it looks that it is supposed to provide such functionality, but I could not find the way to do the trick.
P.S. Tried to call chan.eofAndEnd in Iteratee.mapDone, and chunked(enums >>> Enumerator.eof instead of chunked(enums) - did not help. Sometimes the response comes back, but does not contain the correct data. What do I miss?
def trans(in:Enumerator[String]):Enumerator[String] = {
val (res, chan) = Concurrent.broadcast[String]
val iter = Iteratee.fold(true) { (isFirst, curr:String) =>
if (!isFirst)
chan.push("<-------->")
chan.push(curr)
false
}
in.apply(iter)
res
}
def enums:Enumerator[String] = {
val en12 = Enumerator[String]("1", "2")
trans(en12)
//en12 //if I comment the previous line and uncomment this, it prints "12" as expected
}
def enum = Action {
Ok.chunked(enums)
}
Here is my solution which I believe to be correct for this type of problem. Comments are welcome:
def fill[From](
prefix: From => Enumerator[From],
infix: (From, From) => Enumerator[From],
suffix: From => Enumerator[From]
)(implicit ec:ExecutionContext) = new Enumeratee[From, From] {
override def applyOn[A](inner: Iteratee[From, A]): Iteratee[From, Iteratee[From, A]] = {
//type of the state we will use for fold
case class State(prev:Option[From], it:Iteratee[From, A])
Iteratee.foldM(State(None, inner)) { (prevState, newItem:From) =>
val toInsert = prevState.prev match {
case None => prefix(newItem)
case Some(prevItem) => infix (prevItem, newItem)
}
for(newIt <- toInsert >>> Enumerator(newItem) |>> prevState.it)
yield State(Some(newItem), newIt)
} mapM {
case State(None, it) => //this is possible when our input was empty
Future.successful(it)
case State(Some(lastItem), it) =>
suffix(lastItem) |>> it
}
}
}
// if there are missing integers between from and to, fill that gap with 0
def fillGap(from:Int, to:Int)(implicit ec:ExecutionContext) = Enumerator enumerate List.fill(to-from-1)(0)
def fillFrom(x:Int)(input:Int)(implicit ec:ExecutionContext) = fillGap(x, input)
def fillTo(x:Int)(input:Int)(implicit ec:ExecutionContext) = fillGap(input, x)
val ints = Enumerator(10, 12, 15)
val toStr = Enumeratee.map[Int] (_.toString)
val infill = fill(
fillFrom(5),
fillGap,
fillTo(20)
)
val res = ints &> infill &> toStr // res will have 0,0,0,0,10,0,12,0,0,15,0,0,0,0
You wrote that you are working with WebSockets, so why don't you use dedicated solution for that? What you wrote is better for Server-Sent-Events rather than WS. As I understood you, you want to filter your results before sending them back to client? If its correct then you Enumeratee instead of Enumerator. Enumeratee is transformation from-to. This is very good piece of code how to use Enumeratee. May be is not directly about what you need but I found there inspiration for my project. Maybe when you analyze given code you would find best solution.

Scala: Read some data of an Enumerator[T] and return the remaining Enumerator[T]

I am using the asynchronous I/O library of the playframework which uses Iteratees and Enumerators. I now have an Iterator[T] as data sink (for simplification say it's an Iterator[Byte] which stores its content into a file). This Iterator[Byte] is passed to the function which handles the writing.
But before writing I want to add some statistical information at the file begin (for simplification say it's one Byte), so I transfer the iterator the following way before passing it to the write function:
def write(value: Byte, output: Iteratee[Byte]): Iteratee[Byte] =
Iteratee.flatten(output.feed(Input.El(value)))
When I now read the stored file from the disk, I get an Enumerator[Byte] for it.
At first I want to read and remove the additional data and then I want to pass the rest of the Enumerator[Byte] to a function which handles the reading.
So I also need to transform the enumerator:
def read(input: Enumerator[Byte]): (Byte, Enumerator[Byte]) = {
val firstEnumeratorEntry = ...
val remainingEnumerator = ...
(firstEnumeratorEntry, remainingEnumerator)
}
But I have no idea, how to do this. How can I read some bytes from an Enumerator and get the remaining Enumerator?
Replacing Iteratee[Byte] with OutputStream and Enumerator[Byte] with InputStream, this would be very easy:
def write(value: Byte, output: OutputStream) = {
output.write(value)
output
}
def read(input: InputStream) = (input.read,input)
But I need the asynchronous I/O of the play framework.
I wonder if you can tackle your goal from another angle.
That function that would use the remaining enumerator, let's call it remaining, presumably it applies to an iteratee to do the processing of the remainder: remaining |>> iteratee yielding another iteratee. Let's call that resulting iteratee iteratee2... Can you check whether you can get a reference to iteratee2? If that's the case, then you can get and process the first byte using a first iteratee head, then combine head and iteratee2 through flatMap:
val head = Enumeratee.take[Byte](1) &>> Iteratee.foreach[Byte](println)
val processing = for { h <- head; i <- iteratee2 } yield (h, i)
Iteratee.flatten(processing).run
If you cannot get a hold of iteratee2 - which would be the case if your enumerator combines with an enumeratee that you did not implement - then this approach won't work.
Here is one way to achieve this by folding within the Iteratee and an appropriate (kind-of) State accumulator (a tuple here)
I go read the routes file, the first byte will be read as a Char and the other will be appended to a String as UTF-8 bytestrings.
def index = Action {
/*let's do everything asyncly*/
Async {
/*for comprehension for read-friendly*/
for (
i <- read; /*read the file */
(r:(Option[Char], String)) <- i.run /*"create" the related Promise and run it*/
) yield Ok("first : " + r._1.get + "\n" + "rest" + r._2) /* map the Promised result in a correct Request's Result*/
}
}
def read = {
//get the routes file in an Enumerator
val file: Enumerator[Array[Byte]] = Enumerator.fromFile(Play.getFile("/conf/routes"))
//apply the enumerator with an Iteratee that folds the data as wished
file(Iteratee.fold((None, ""):(Option[Char], String)) { (acc, b) =>
acc._1 match {
/*on the first chunk*/ case None => (Some(b(0).toChar), acc._2 + new String(b.tail, Charset.forName("utf-8")))
/*on other chunks*/ case x => (x, acc._2 + new String(b, Charset.forName("utf-8")))
}
})
}
EDIT
I found yet another way using Enumeratee but it needs to create 2 Enumerator s (one short lived). However is it a bit more elegant. We use a "kind-of" Enumeratee but the Traversal one which works at a finer level than Enumeratee (chunck level).
We use take 1 that will take only 1 byte and then close the stream. On the other one, we use drop that simply drops the first byte (because we're using a Enumerator[Array[Byte]])
Furthermore, now read2 has a signature much more closer than what you wished, because it returns 2 enumerators (not so far from Promise, Enumerator)
def index = Action {
Async {
val (first, rest) = read2
val enee = Enumeratee.map[Array[Byte]] {bs => new String(bs, Charset.forName("utf-8"))}
def useEnee(enumor:Enumerator[Array[Byte]]) = Iteratee.flatten(enumor &> enee |>> Iteratee.consume[String]()).run.asInstanceOf[Promise[String]]
for {
f <- useEnee(first);
r <- useEnee(rest)
} yield Ok("first : " + f + "\n" + "rest" + r)
}
}
def read2 = {
def create = Enumerator.fromFile(Play.getFile("/conf/routes"))
val file: Enumerator[Array[Byte]] = create
val file2: Enumerator[Array[Byte]] = create
(file &> Traversable.take[Array[Byte]](1), file2 &> Traversable.drop[Array[Byte]](1))
}
Actually we like Iteratees because they compose. So instead of creating multiple Enumerators from your original one, you rather compose the two Iteratees sequentially (read-first and read-rest), and feed it with your single Enumerator.
For this you need a sequential composition method, now I call it andThen. Here is a rough implementation. Note that returning the unconsumed input is a bit harsh, maybe could customize behavior with a typeclass based on the Input type. Also it doesn't handle passing the leftover stuff from the first iterator to the second one (Exercise :).
object Iteratees {
def andThen[E, A, B](a: Iteratee[E, A], b: Iteratee[E, B]): Iteratee[E, (A,B)] = new Iteratee[E, (A,B)] {
def fold[C](
done: ((A, B), Input[E]) => Promise[C],
cont: ((Input[E]) => Iteratee[E, (A, B)]) => Promise[C],
error: (String, Input[E]) => Promise[C]): Promise[C] = {
a.fold(
(ra, aleft) => b.fold(
(rb, bleft) => done((ra, rb), aleft /* could be magicop(aleft, bleft)*/),
(bcont) => cont(e => bcont(e) map (rb => (ra, rb))),
(s, err) => error(s, err)
),
(acont) => cont(e => andThen[E, A, B](acont(e), b)),
(s, err) => error(s, err)
)
}
}
}
Now you can just use the following:
object Application extends Controller {
def index = Action { Async {
val strings: Enumerator[String] = Enumerator("1","2","3","4")
val takeOne = Cont[String, String](e => e match {
case Input.El(e) => Done(e, Input.Empty)
case x => Error("not enough", x)
})
val takeRest = Iteratee.consume[String]()
val firstAndRest = Iteratees.andThen(takeOne, takeRest)
val futureRes = strings(firstAndRest) flatMap (_.run)
futureRes.map(x => Ok(x.toString)) // prints (1,234)
} }
}

Scala Graph Cycle Detection Algo 'return' needed?

I have implemented a small cycle detection algorithm for a DAG in Scala.
The 'return' bothers me - I'd like to have a version without the return...possible?
def isCyclic() : Boolean = {
lock.readLock().lock()
try {
nodes.foreach(node => node.marker = 1)
nodes.foreach(node => {if (1 == node.marker && visit(node)) return true})
} finally {
lock.readLock().unlock()
}
false
}
private def visit(node: MyNode): Boolean = {
node.marker = 3
val nodeId = node.id
val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
children.foreach(child => {
if (3 == child.marker || (1 == child.marker && visit(child))) return true
})
node.marker = 2
false
}
Yes, by using '.find' instead of 'foreach' + 'return':
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Seq
def isCyclic() : Boolean = {
def visit(node: MyNode): Boolean = {
node.marker = 3
val nodeId = node.id
val children = vertexMap.getChildren(nodeId).toList.map(nodeId => id2nodeMap(nodeId))
val found = children.exists(child => (3 == child.marker || (1 == child.marker && visit(child))))
node.marker = 2
found
}
lock.readLock().lock()
try {
nodes.foreach(node => node.marker = 1)
nodes.exists(node => node.marker && visit(node))
} finally {
lock.readLock().unlock()
}
}
Summary:
I have originated two solutions as generic FP functions which detect cycles within a directed graph. And per your implied preference, the use of an early return to escape the recursive function has been eliminated. The first, isCyclic, simply returns a Boolean as soon as the DFS (Depth First Search) repeats a node visit. The second, filterToJustCycles, returns a copy of the input Map filtered down to just the nodes involved in any/all cycles, and returns an empty Map when no cycles are found.
Details:
For the following, please Consider a directed graph encoded as such:
val directedGraphWithCyclesA: Map[String, Set[String]] =
Map(
"A" -> Set("B", "E", "J")
, "B" -> Set("E", "F")
, "C" -> Set("I", "G")
, "D" -> Set("G", "L")
, "E" -> Set("H")
, "F" -> Set("G")
, "G" -> Set("L")
, "H" -> Set("J", "K")
, "I" -> Set("K", "L")
, "J" -> Set("B")
, "K" -> Set("B")
)
In both functions below, the type parameter N refers to whatever "Node" type you care to provide. It is important the provided "Node" type be both immutable and have stable equals and hashCode implementations (all of which occur automatically with use of immutable case classes).
The first function, isCyclic, is a similar in nature to the version of the solution provided by #the-archetypal-paul. It assumes the directed graph has been transformed into a Map[N, Set[N]] where N is the identity of a node in the graph.
If you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.
Calling the isCyclic function as such:
val isCyclicResult = isCyclic(directedGraphWithCyclesA)
will return:
`true`
No further information is provided. And the DFS (Depth First Search) is aborted at detection of the first repeated visit to a node.
def isCyclic[N](nsByN: Map[N, Set[N]]) : Boolean = {
def hasCycle(nAndNs: (N, Set[N]), visited: Set[N] = Set[N]()): Boolean =
if (visited.contains(nAndNs._1))
true
else
nAndNs._2.exists(
n =>
nsByN.get(n) match {
case Some(ns) =>
hasCycle((n, ns), visited + nAndNs._1)
case None =>
false
}
)
nsByN.exists(hasCycle(_))
}
The second function, filterToJustCycles, uses the set reduction technique to recursively filter away unreferenced root nodes in the Map. If there are no cycles in the supplied graph of nodes, then .isEmpty will be true on the returned Map. If however, there are any cycles, all of the nodes required to participate in any of the cycles are returned with all of the other non-cycle participating nodes filtered away.
Again, if you need to see how to generically transform your custom directed graph implementation into a Map[N, Set[N]], I have outlined a generic solution towards the end of this answer.
Calling the filterToJustCycles function as such:
val cycles = filterToJustCycles(directedGraphWithCyclesA)
will return:
Map(E -> Set(H), J -> Set(B), B -> Set(E), H -> Set(J, K), K -> Set(B))
It's trivial to then create a traversal across this Map to produce any or all of the various cycle pathways through the remaining nodes.
def filterToJustCycles[N](nsByN: Map[N, Set[N]]): Map[N, Set[N]] = {
def recursive(nsByNRemaining: Map[N, Set[N]], referencedRootNs: Set[N] = Set[N]()): Map[N, Set[N]] = {
val (referencedRootNsNew, nsByNRemainingNew) = {
val referencedRootNsNewTemp =
nsByNRemaining.values.flatten.toSet.intersect(nsByNRemaining.keySet)
(
referencedRootNsNewTemp
, nsByNRemaining.collect {
case (t, ts) if referencedRootNsNewTemp.contains(t) && referencedRootNsNewTemp.intersect(ts.toSet).nonEmpty =>
(t, referencedRootNsNewTemp.intersect(ts.toSet))
}
)
}
if (referencedRootNsNew == referencedRootNs)
nsByNRemainingNew
else
recursive(nsByNRemainingNew, referencedRootNsNew)
}
recursive(nsByN)
}
So, how does one generically transform a custom directed graph implementation into a Map[N, Set[N]]?
In essence, "Go Scala case classes!"
First, let's define an example case of a real node in a pre-existing directed graph:
class CustomNode (
val equipmentIdAndType: String //"A387.Structure" - identity is embedded in a string and must be parsed out
, val childrenNodes: List[CustomNode] //even through Set is implied, for whatever reason this implementation used List
, val otherImplementationNoise: Option[Any] = None
)
Again, this is just an example. Yours could involve subclassing, delegation, etc. The purpose is to have access to a something that will be able to fetch the two essential things to make this work:
the identity of a node; i.e. something to distinguish it and makes it unique from all other nodes in the same directed graph
a collection of the identities of the immediate children of a specific node - if the specific node doesn't have any children, this collection will be empty
Next, we define a helper object, DirectedGraph, which will contain the infrastructure for the conversion:
Node: an adapter trait which will wrap CustomNode
toMap: a function which will take a List[CustomNode] and convert it to a Map[Node, Set[Node]] (which is type equivalent to our target type of Map[N, Set[N]])
Here's the code:
object DirectedGraph {
trait Node[S, I] {
def source: S
def identity: I
def children: Set[I]
}
def toMap[S, I, N <: Node[S, I]](ss: List[S], transformSToN: S => N): Map[N, Set[N]] = {
val (ns, nByI) = {
val iAndNs =
ss.map(
s => {
val n =
transformSToN(s)
(n.identity, n)
}
)
(iAndNs.map(_._2), iAndNs.toMap)
}
ns.map(n => (n, n.children.map(nByI(_)))).toMap
}
}
Now, we must generate the actual adapter, CustomNodeAdapter, which will wrap each CustomNode instance. This adapter uses a case class in a very specific way; i.e. specifying two constructor parameters lists. It ensures the case class conforms to a Set's requirement that a Set member have correct equals and hashCode implementations. For more details on why and how to use a case class this way, please see this StackOverflow question and answer:
object CustomNodeAdapter extends (CustomNode => CustomNodeAdapter) {
def apply(customNode: CustomNode): CustomNodeAdapter =
new CustomNodeAdapter(fetchIdentity(customNode))(customNode) {}
def fetchIdentity(customNode: CustomNode): String =
fetchIdentity(customNode.equipmentIdAndType)
def fetchIdentity(eiat: String): String =
eiat.takeWhile(char => char.isLetter || char.isDigit)
}
abstract case class CustomNodeAdapter(identity: String)(customNode: CustomNode) extends DirectedGraph.Node[CustomNode, String] {
val children =
customNode.childrenNodes.map(CustomNodeAdapter.fetchIdentity).toSet
val source =
customNode
}
We now have the infrastructure in place. Let's define a "real world" directed graph consisting of CustomNode:
val customNodeDirectedGraphWithCyclesA =
List(
new CustomNode("A.x", List("B.a", "E.a", "J.a"))
, new CustomNode("B.x", List("E.b", "F.b"))
, new CustomNode("C.x", List("I.c", "G.c"))
, new CustomNode("D.x", List("G.d", "L.d"))
, new CustomNode("E.x", List("H.e"))
, new CustomNode("F.x", List("G.f"))
, new CustomNode("G.x", List("L.g"))
, new CustomNode("H.x", List("J.h", "K.h"))
, new CustomNode("I.x", List("K.i", "L.i"))
, new CustomNode("J.x", List("B.j"))
, new CustomNode("K.x", List("B.k"))
, new CustomNode("L.x", Nil)
)
Finally, we can now do the conversion which looks like this:
val transformCustomNodeDirectedGraphWithCyclesA =
DirectedGraph.toMap[CustomNode, String, CustomNodeAdapter](customNodes1, customNode => CustomNodeAdapter(customNode))
And we can take transformCustomNodeDirectedGraphWithCyclesA, which is of type Map[CustomNodeAdapter,Set[CustomNodeAdapter]], and submit it to the two original functions.
Calling the isCyclic function as such:
val isCyclicResult = isCyclic(transformCustomNodeDirectedGraphWithCyclesA)
will return:
`true`
Calling the filterToJustCycles function as such:
val cycles = filterToJustCycles(transformCustomNodeDirectedGraphWithCyclesA)
will return:
Map(
CustomNodeAdapter(B) -> Set(CustomNodeAdapter(E))
, CustomNodeAdapter(E) -> Set(CustomNodeAdapter(H))
, CustomNodeAdapter(H) -> Set(CustomNodeAdapter(J), CustomNodeAdapter(K))
, CustomNodeAdapter(J) -> Set(CustomNodeAdapter(B))
, CustomNodeAdapter(K) -> Set(CustomNodeAdapter(B))
)
And if needed, this Map can then be converted back to Map[CustomNode, List[CustomNode]]:
cycles.map {
case (customNodeAdapter, customNodeAdapterChildren) =>
(customNodeAdapter.source, customNodeAdapterChildren.toList.map(_.source))
}
If you have any questions, issues or concerns, please let me know and I will address them ASAP.
I think the problem can be solved without changing the state of the node with the marker field. The following is a rough code of what i think the isCyclic should look like. I am currently storing the node objects which are visited instead you can store the node ids if the node doesnt have equality based on node id.
def isCyclic() : Boolean = nodes.exists(hasCycle(_, HashSet()))
def hasCycle(node:Node, visited:Seq[Node]) = visited.contains(node) || children(node).exists(hasCycle(_, node +: visited))
def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
Answer added just to show that the mutable-visited isn't too unreadable either (untested, though!)
def isCyclic() : Boolean =
{
var visited = HashSet()
def hasCycle(node:Node) = {
if (visited.contains(node)) {
true
} else {
visited :+= node
children(node).exists(hasCycle(_))
}
}
nodes.exists(hasCycle(_))
}
def children(node:Node) = vertexMap.getChildren(node.id).toList.map(nodeId => id2nodeMap(nodeId))
If p = node => node.marker==1 && visit(node) and assuming nodes is a List you can pick any of the following:
nodes.filter(p).length>0
nodes.count(p)>0
nodes.exists(p) (I think the most relevant)
I am not sure of the relative complexity of each method and would appreciate a comment from fellow members of the community