Akka-stream group only Right elements of Either - scala

I have a source which emits Either[String, MyClass].
I want to call an external service with batches of MyClass and continue downstream with Either[String, ExternalServiceResponse], that's why I need to group elements of stream.
If the stream would emit only MyClass elements, it would be easy - just call grouped:
val source: Source[MyClass, NotUsed] = <custom implementation>
source
.grouped(10) // Seq[MyClass]
.map(callExternalService(_)) // ExternalServiceResponse
But how to group only elements on the right side of Either in my scenario?
val source: Source[Either[String, MyClass], NotUsed] = <custom implementation>
source
.??? // Either[String, Seq[MyClass]]
.map {
case Right(myClasses) => callExternalService(myClasses)
case Left(string) => Left(string)
} // Either[String, ExternalServiceResponse]
The following works, but is there any more idiomatic way?
val source: Source[Either[String, MyClass], NotUsed] = <custom implementation>
source
.groupBy(2, either => either.isRight)
.grouped(10)
.map(input => input.headOption match {
case Some(Right(_)) =>
callExternalService(input.map(item => item.right.get))
case _ =>
input
})
.mapConcat(_.to[scala.collection.immutable.Iterable])
.mergeSubstreams

This should transform a source of Either[L, R] into a source of Either[L, Seq[R]] with a configurable grouping of Rights.
def groupRights[L, R](groupSize: Int)(in: Source[Either[L, R], NotUsed]): Source[Either[L, Seq[R]], NotUsed] =
in.map(Option _) // Yep, an Option[Either[L, R]]
.concat(Source.single(None)) // to emit when `in` completes
.statefulMapConcat { () =>
val buffer = new scala.collection.mutable.ArrayBuffer[R](groupSize)
def dumpBuffer(): List[Either[L, Seq[R]] = {
val out = List(Right(buffer.toList))
buffer.clear()
out
}
incoming: Option[Either[L,R]] => {
incoming.map { _.fold(
l => List(Left(l)), // unfortunate that we have to re-wrap
r => {
buffer += r
if (buffer.size == groupSize) {
dumpBuffer()
} else {
Nil
}
}
)
}.getOrElse(dumpBuffer()) // End of stream
}
}
Beyond that, I'll note that the downstream code to call the external service could be rewritten as
.map(_.right.map(callExternalService))
If you can reliably call the external service with parallelism n, it may also be worth doing that with:
.mapAsync(n) { e.fold(
l => Future.successful(Left(l)),
r => Future { Right(callExternalService(r)) }
)
}
You could even, if wanting to maximize throughput at the cost of preserving order, replace mapAsync with mapAsyncUnordered.

You could divide your source of eithers in two branches in order to process rights their own way and then merge back the two sub-flows:
// case class MyClass(x: Int)
// case class ExternalServiceResponse(xs: Seq[MyClass])
// def callExternalService(xs: Seq[MyClass]): ExternalServiceResponse =
// ExternalServiceResponse(xs)
// val source: Source[Either[String, MyClass], _] =
// Source(List(Right(MyClass(1)), Left("2"), Right(MyClass(3)), Left("4"), Right(MyClass(5))))
val lefts: Source[Either[String, Nothing], _] =
source
.collect { case Left(l) => Left(l) }
val rights: Source[Either[Nothing, ExternalServiceResponse], _] =
source
.collect { case Right(x: MyClass) => x }
.grouped(2)
.map(callExternalService)
.map(Right(_))
val out: Source[Either[String, ExternalServiceResponse], _] = rights.merge(lefts)
// out.runForeach(println)
// Left(2)
// Right(ExternalServiceResponse(Vector(MyClass(1), MyClass(3))))
// Left(4)
// Right(ExternalServiceResponse(Vector(MyClass(5))))

Related

FS2 through2 closing all resources when the first stream is finished?

Let's suppose we have 2 fs2 Streams:
val stream1 = fs2.Stream.bracket(IO { println("Acquire 1"); 2})(_ => IO { println("Release 1") })
.flatMap(p => fs2.Stream.range(1,p))
val stream2 = fs2.Stream.bracket(IO { println("Acquire 2"); 4})(_ => IO { println("Release 2") })
.flatMap(p => fs2.Stream.range(1,p))
which I would like to connect with each other:
def connect[F[_]]: (fs2.Stream[F, Int], fs2.Stream[F, Int]) => fs2.Stream[F, Int] = {
def go(stream1: fs2.Stream[F, Int], stream2: fs2.Stream[F, Int]): Pull[F, Int, Unit] =
stream1.pull.uncons1.flatMap { stream1Element =>
stream2.pull.uncons1.flatMap { stream2Element =>
(stream1Element, stream2Element) match {
case (Some((stream1Head, stream1Tail)), Some((stream2Head, stream2Tail))) =>
println("Some, Some")
Pull.output1(stream1Head + stream2Head) >> go(stream1Tail, stream2Tail)
case (Some((stream1Head, stream1Tail)), None) =>
println("1 Stream still available")
Pull.output1(stream1Head) >> go(fs2.Stream.empty, stream1Tail)
case (None, Some((stream2Head, stream2Tail))) =>
println("2 Stream still available")
Pull.output1(stream2Head) >> go(fs2.Stream.empty, stream2Tail)
case _ => Pull.output1(-1)
}
}
}
(one, two) => go(one, two).stream
}
now checking logs I see:
Acquire 1
Acquire 2
Some, Some
Release 2
Release 1
2 Stream still available
2 Stream still available
which is a bit surprising for me because it seems that once the first Stream is finished the resources of the second one are closed as well. Suppose now that the resource is the connection to the database, then the elements from the second stream cannot be fetched anymore.
Is it correct behavior? Is there any way to avoid closing the resource of the second stream? Surprisingly if the first Stream has more elements than the second one, everything works as expected(stream 1's resource is not closed when the second stream is finished)
By checking the implementation of the zipAllWith function I found out that indeed uncons1 in such cases should be avoided. The final solution would be to use the stepLeg function instead of uncons1. So the function from the above should look like this:
def connect[F[_]]: (fs2.Stream[F, Int], fs2.Stream[F, Int]) => fs2.Stream[F, Int] = {
def go(stream1: fs2.Stream[F, Int], stream2: fs2.Stream[F, Int]): Pull[F, Int, Unit] =
stream1.pull.stepLeg.flatMap { stream1Element =>
stream2.pull.stepLeg.flatMap { stream2Element =>
(stream1Element, stream2Element) match {
case (Some(sl1), Some(sl2)) =>
println("Some, Some")
val one = sl1.head(0)
val two = sl2.head(0)
Pull.output1(one + two) >> go(sl1.stream, sl2.stream)
case (Some(sl1), None) =>
val one = sl1.head(0)
println("1 Stream still available")
Pull.output1(one) >> go(sl1.stream, fs2.Stream.empty)
case (None, Some(sl2)) =>
val two = sl2.head(0)
println("2 Stream still available")
Pull.output1(two) >> go(fs2.Stream.empty, sl2.stream)
case _ => Pull.output1(-1)
}
}
}
(one, two) => {
go(one.flatMap(fs2.Stream.emit), two.flatMap(fs2.Stream.emit)).stream
}
}
And the logs:
Acquire 1
Acquire 2
Some, Some
Release 1
2 Stream still available
2 Stream still available
Release 2
An additional example of this issue can be found here:
uncons vs stepLeg

Iterate data source asynchronously in batch and stop while remote return no data in Scala

Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}

Find difference between two enumerators with sorted entries in scala

Given two scala play enumerators A and B that each provide sorted integers, is there a way to derive an enumerator of integers that exist in B that don't exist in A?
For example:
val A: Enumerator[Int] = Enumerator(1,3,5,9,11,13)
and
val B: Enumerator[Int] = Enumerator(1,3,5,7,9,11,13)
I would somehow get:
val C: Enumerator[Int] // This enumerator will output 7
Doing it in a reactive way with enumerators/iteratees/enumeratees is preferred.
One solution I've thought of is to interleave the enumerators and somehow use Iteratee.fold to maintain a buffer to compare the two streams but that seems like it should be unnecessary.
I had somewhat similar question
How to merge 2 Enumerators in one, based on merge rule
I modified given answer, to fit your needs
object Disjunction {
def disjunction[E: Ordering](enumA: Enumerator[E], enumB: Enumerator[E])(implicit ec: ExecutionContext) = new Enumerator[E] {
def apply[A](iter: Iteratee[E, A]) = {
case class IterateeReturn(o: Option[(Promise[Promise[IterateeReturn]], E)])
val failP: Promise[Nothing] = Promise() // Fail promise
val failPF: Future[Nothing] = failP.future // Fail promise future
val initState1: Future[Seq[IterateeReturn]] = Future.traverse(Seq(enumA, enumB)) {
enum =>
val p: Promise[IterateeReturn] = Promise[IterateeReturn]()
// The flow to transform Enumerator in IterateeReturn form
enum.run(Iteratee.foldM(p)({
(oldP: Promise[IterateeReturn], elem: E) =>
val p = Promise[Promise[IterateeReturn]]()
// Return IterateeReturn pointing to the next foldM Future reference, and current element
oldP success IterateeReturn(Some(p, elem))
// Return new Future as a Result of foldM
p.future
}) map ({
promise => promise success IterateeReturn(None) // Finish last promise with empty IterateeReturn
})
) onFailure {
// In case of failure main flow needs to be informed
case t => failP failure t
}
p.future
}
val initState: Future[List[(Promise[Promise[IterateeReturn]], E)]] = initState1 map (_.map(_.o).flatten.toList)
val newEnum: Enumerator[Option[E]] = Enumerator.unfoldM(initState) { fstate =>
// Whatever happens first, fstate returned of failure happened during iteration
Future.firstCompletedOf(Seq(fstate, failPF)) map { state =>
// state is List[(Promise[Promise[IterateeReturn]], E)
// sort elements by E
if (state.isEmpty) {
None
} else if (state.length == 1) {
val (oldP, elem) = state.head
val p = Promise[IterateeReturn]()
oldP success p
// Return newState, with this iterator moved
val newState: Future[List[(Promise[Promise[IterateeReturn]], E)]] = p.future.map(ir => ir.o.map(List(_)).getOrElse(Nil))
Some(newState, Some(elem))
} else {
val sorted = state.sortBy(_._2)
val (firstP, fe) = sorted.head
val (secondP, se) = sorted.tail.head
if (fe != se) {
// Move first and combine with the second
val p = Promise[IterateeReturn]()
firstP success p
val newState: Future[List[(Promise[Promise[IterateeReturn]], E)]] = p.future.map(ir => ir.o.map(List(_, (secondP, se))).getOrElse(List((secondP, se))))
// Return new state
Some(newState, Some(fe))
} else {
// Move future 1
val p1 = Promise[IterateeReturn]()
firstP success p1
val fState: Future[Option[(Promise[Promise[IterateeReturn]], E)]] = p1.future.map(ir => ir.o)
// Move future 2
val p2 = Promise[IterateeReturn]()
secondP success p2
val sState: Future[Option[(Promise[Promise[IterateeReturn]], E)]] = p2.future.map(ir => ir.o)
// Combine in new state
val newState = Future.sequence(List(fState, sState)).map(_.flatten)
// Return
Some(newState , None)
}
}
}
}
newEnum &>
Enumeratee.filter(_.isDefined) &>
Enumeratee.map(_.get) apply iter
}
}
}
I checked, it works.

How to combine two Enumerators based on a key (Maintaining Iteratee state between Dones)?

I'm trying to combine two Play Framework Enumerators together but merging values that come through which have the same key value. For the most part it works, except, the Map used to keep previous values that do not have a match as of yet gets lost each time a match is found and a Done Iteratee is returned.
Is there a way to provide the state to the next invocation of step after a Done has been returned?
Any examples I've found thus far all seem to be around grouping consecutive values together and then passing the whole grouping along, and none on grouping some arbitrary values from the stream and only passing specific values along once grouped.
Ideally once the match is made it'll send the matched values along.
What I've gotten to thus far, (pretty much based off of Creating a time-based chunking Enumeratee )
def virtualSystemGrouping[E](system:ParentSystem): Iteratee[Detail, Detail] = {
def step(state: Map[String, Detail])(input:Input[Detail]): Iteratee[Detail, Detail] = {
input match {
case Input.EOF => {Done(null, Input.EOF)}
case Input.Empty =>{Cont[Detail, Detail](i => step(state)(i))}
case Input.El(e) => {
if (!system.isVirtual) Done(e)
if (state.exists((k) =>{k._1.equals(e.name)})) {
val other = state(e.name)
// ??? should have a; state - e.name
// And pass new state and merged value out.
Done(e + other)
} else {
Cont[Detail, Detail](i => step(state + (e.name -> e))(i))
}
}
}
}
Cont(step(Map[String,Detail]()))
}
The calling of this looks like;
val systems:List[ParentSystem] = getSystems()
val start = Enumerator.empty[Detail]
val send = systems.foldLeft(start){(b,p) =>
b interleave Concurrent.unicast[Detail]{channel =>
implicit val timeout = Timeout (1 seconds)
val actor = SystemsActor.lookupActor(p.name + "/details")
actor map {
case Some(a) => {a ! SendDetailInformation(channel)}
case None => {channel.eofAndEnd}
} recover {
case t:Throwable => {channel.eofAndEnd}
}
}
} &> Enumeratee.grouped(virtualSystemGrouping(parent)) |>> Iteratee.foreach(e => {output.push(e)})
send.onComplete(t => output.eofAndEnd)
The one method that I've been able to come up with that works, is to use a Concurrent.unicast and pass the channel into the combining function. I'm sure there is a way to create an Iteratee/Enumerator that does the work all in one nice neat package, but that is eluding me at the time being.
Updated combining function;
def virtualSystemGrouping[E](system:ParentSystem, output:Channel): Iteratee[Detail, Detail] = {
def step(state: Map[String, Detail])(input:Input[Detail]): Iteratee[Detail, Detail] = {
input match {
case Input.EOF => {
state.mapValues(r=>output.push(r))
output.eofAndEnd
Done(null, Input.EOF)
}
case Input.Empty =>{Cont[Detail, Detail](i => step(state)(i))}
case Input.El(e) => {
if (!system.isVirtual) {output.push(e); Done(e, Input.Empty)}
if (state.exists((k) =>{k._1.equals(e.name)})) {
val other = state(e.name)
output.push(e + other)
Cont[Detail, Detail](i => step(state - e.name)(i))
} else {
Cont[Detail, Detail](i => step(state + (e.name -> e))(i))
}
}
}
}
Cont(step(Map[String,Detail]()))
}
Here any combined values are pushed into the output channel and then subsequently processed.
The usage of this looks like the following;
val systems:List[ParentSystem] = getSystems(parent)
val start = Enumerator.empty[Detail]
val concatDetail = systems.foldLeft(start){(b,p) =>
b interleave Concurrent.unicast[Detail]{channel =>
implicit val timeout = Timeout (1 seconds)
val actor = SystemsActor.lookupActor(p.name + "/details")
actor map {
case Some(a) => {a ! SendRateInformation(channel)}
case None => {channel.eofAndEnd}
} recover {
case t:Throwable => {channel.eofAndEnd}
}
}
}
val combinedDetail = Concurrent.unicast[Detail]{channel =>
concatDetail &> Enumeratee.grouped(virtualSystemGrouping(parent, channel)) |>> Iteratee.ignore
}
val send = combinedDetail |>> Iteratee.foreach(e => {output.push(e)})
send.onComplete(t => output.eofAndEnd)
Very similar to the original except now the calling to the combining function is done within the unicast onStart block (where channel is defined). concatDetail is the Enumerator created from the interleaved results of the child systems. This is fed through the system grouping function which in turn pushes any combined results (and remaining results at EOF) through the provided channel.
The combinedDetails Enumerator is then taken in and pushed through to the upstream output channel.
EDIT:
The virtualSystemGrouping can be generalized as;
def enumGroup[E >: Null, K, M](
key:(E) => K,
merge:(E, Option[E]) => M,
output:Concurrent.Channel[M]
): Iteratee[E, E] = {
def step(state: Map[K, E])(input:Input[E]): Iteratee[E, E] = {
input match {
case Input.EOF => {
state.mapValues(f => output.push(merge(f, None))) //Push along any remaining values.
output.eofAndEnd();
Done(null, Input.EOF)
}
case Input.Empty =>{ Cont[E, E](i => step(state)(i))}
case Input.El(e) => {
if (state.contains(key(e))) {
output.push(merge(e, state.get(key(e))))
Cont[E, E](i => step(state - key(e))(i))
} else {
Cont[E, E](i => step(state + (key(e) -> e))(i))
}
}
}
}
Cont(step(Map[K,E]()))
}
With a call such as;
Enumeratee.grouped(
enumGroup(
(k=>k.name),
((e1, e2) => e2.fold(e1)(v => e1 + v)),
channel)
)

Finally closing stream using Scala exception catching

Anybody know a solution to this problem ? I rewrote try catch finally construct to a functional way of doing things, but I can't close the stream now :-)
import scala.util.control.Exception._
def gunzip() = {
logger.info(s"Gunziping file ${f.getAbsolutePath}")
catching(classOf[IOException], classOf[FileNotFoundException]).
andFinally(println("how can I close the stream ?")).
either ({
val is = new GZIPInputStream(new FileInputStream(f))
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}) match {
case Left(e) =>
val msg = s"IO error reading file ${f.getAbsolutePath} ! on host ${Setup.smtpHost}"
logger.error(msg, e)
MailClient.send(msg, msg)
new Array[Byte](0)
case Right(v) => v
}
}
I rewrote it based on Senia's solution like this :
def gunzip() = {
logger.info(s"Gunziping file ${file.getAbsolutePath}")
def closeAfterReading(c: InputStream)(f: InputStream => Array[Byte]) = {
catching(classOf[IOException], classOf[FileNotFoundException])
.andFinally(c.close())
.either(f(c)) match {
case Left(e) => {
val msg = s"IO error reading file ${file.getAbsolutePath} ! on host ${Setup.smtpHost}"
logger.error(msg, e)
new Array[Byte](0)
}
case Right(v) => v
}
}
closeAfterReading(new GZIPInputStream(new FileInputStream(file))) { is =>
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}
}
I prefer this construction for such cases:
def withCloseable[T <: Closeable, R](t: T)(f: T => R): R = {
allCatch.andFinally{t.close} apply { f(t) }
}
def read(f: File) =
withCloseable(new GZIPInputStream(new FileInputStream(f))) { is =>
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}
Now you could wrap it with Try and recover on some exceptions:
val result =
Try { read(f) }.recover{
case e: IOException => recover(e) // logging, default value
case e: FileNotFoundException => recover(e)
}
val array = result.get // Exception here!
take "scala-arm"
take Apache "commons-io"
then do the following
val result =
for {fis <- resource.managed(new FileInputStream(f))
gis <- resource.managed(new GZIPInputStream(fis))}
yield IOUtils.toString(gis, "UTF-8")
result.acquireFor(identity) fold (reportExceptions _, v => v)
One way how to handle it would be to use a mutable list of things that are opened and need to be closed later:
val cs: Buffer[Closeable] = new ArrayBuffer();
def addClose[C <: Closeable](c: C) = { cs += c; c; }
catching(classOf[IOException], classOf[FileNotFoundException]).
andFinally({ cs.foreach(_.close()) }).
either ({
val is = addClose(new GZIPInputStream(new FileInputStream(f)))
Stream.continually(is.read()).takeWhile(-1 !=).map(_.toByte).toArray
}) // ...
Update: You could use scala-conduit library (I'm the author) for this purpose. (The library is currently not considered production ready.) The main aim of pipes (AKA conduids) is to construct composable components with well defined resource handling. Each pipe repeatedly receives input and produces input. Optionally, it also produces a final result when it finishes. Pips has finalizers that are run after a pipe finishes - either on its own or when its downstream pipe finishes. Your example could be reworked (using Java NIO) as follows:
/**
* Filters buffers until a given character is found. The last buffer
* (truncated up to the character) is also included.
*/
def untilPipe(c: Byte): Pipe[ByteBuffer,ByteBuffer,Unit] = ...
// Create a new source that chunks a file as ByteBuffer's.
// (Note that the buffer changes on every step.)
val source: Source[ByteBuffer,Unit] = ...
// Sink that prints bytes to the standard output.
// You would create your own sink doing whatever you want.
val sink: Sink[ByteBuffer,Unit]
= NIO.writeChannel(Channels.newChannel(System.out));
runPipe(source >-> untilPipe(-1) >-> sink);
As soon as untilPipe(-1) finds -1 and finishes, its upstream source pipe's finalizer is run and the input is closed. If an exception occurs anywhere in the pipeline, the input is closed as well.
The full example can be found here.
I have another one proposition for cases when a closeble object like java.io.Socket may run for a long time, so one have to wrap it in Future. This is handful when you also control a timeout, when Socket is not responding.
object CloseableFuture {
type Closeable = {
def close(): Unit
}
private def withClose[T, F1 <: Closeable](f: => F1, andThen: F1 => Future[T]): Future[T] = future(f).flatMap(closeable => {
val internal = andThen(closeable)
internal.onComplete(_ => closeable.close())
internal
})
def apply[T, F1 <: Closeable](f: => F1, andThen: F1 => T): Future[T] =
withClose(f, {c: F1 => future(andThen(c))})
def apply[T, F1 <: Closeable, F2 <: Closeable](f1: => F1, thenF2: F1 => F2, andThen: (F1,F2) => T): Future [T] =
withClose(f1, {c1:F1 => CloseableFuture(thenF2(c1), {c2:F2 => andThen(c1,c2)})})
}
After I open a java.io.Socket and java.io.InputStream for it, and then execute code which reads from WhoisServer, I got both of them closed finally. Full code:
CloseableFuture(
{new Socket(server.address, WhoisPort)},
(s: Socket) => s.getInputStream,
(socket: Socket, inputStream: InputStream) => {
val streamReader = new InputStreamReader(inputStream)
val bufferReader = new BufferedReader(streamReader)
val outputStream = socket.getOutputStream
val writer = new OutputStreamWriter(outputStream)
val bufferWriter = new BufferedWriter(writer)
bufferWriter.write(urlToAsk+System.getProperty("line.separator"))
bufferWriter.flush()
def readBuffer(acc: List[String]): List[String] = bufferReader.readLine() match {
case null => acc
case str => {
readBuffer(str :: acc)
}
}
val result = readBuffer(Nil).reverse.mkString("\r\n")
WhoisResult(urlToAsk, result)
}
)