Akka streams change return type of 3rd party flow/stage - scala

I have a graph that reads from sqs, writes to another system and then deletes from sqs. In order to delete from sqs i need a receipt handle on the SqsMessage object
In the case of Http flows the signature of the flow allows me to say which type gets emitted downstream from the flow,
Flow[(HttpRequest, T), (Try[HttpResponse], T), HostConnectionPool]
In this case i can set T to SqsMessage and i still have all the data i need.
However some connectors e.g google cloud pub sub emits a completely useless (to me) pub sub id.
Downstream of the pub sub flow I need to be able to access the sqs message id which i had prior to the pub sub flow.
What is the best way to work around this without rewriting the pub sub connector
I conceptually want something a bit like this:
Flow[SqsMessage] //i have my data at this point
within(
.map(toPubSubMessage)
.via(pubSub))
... from here i have the same type i had before within however it still behaves like a linear graph with back pressure etc

You can use PassThrough integration pattern.
As example of usage look on akka-streams-kafka -> Class akka.kafka.scaladsl.Producer -> Mehtod def flow[K, V, PassThrough]
So just implement your own Stage with PassThrough element, example inakka.kafka.internal.ProducerStage[K, V, PassThrough]
package my.package
import java.util.concurrent.atomic.AtomicInteger
import scala.concurrent.Future
import scala.util.{Failure, Success, Try}
import akka.stream._
import akka.stream.ActorAttributes.SupervisionStrategy
import akka.stream.stage._
final case class Message[V, PassThrough](record: V, passThrough: PassThrough)
final case class Result[R, PassThrough](result: R, message: PassThrough)
class PathThroughStage[R, V, PassThrough]
extends GraphStage[FlowShape[Message[V, PassThrough], Future[Result[R, PassThrough]]]] {
private val in = Inlet[Message[V, PassThrough]]("messages")
private val out = Outlet[Result[R, PassThrough]]("result")
override val shape = FlowShape(in, out)
override protected def createLogic(inheritedAttributes: Attributes) = {
val logic = new GraphStageLogic(shape) with StageLogging {
lazy val decider = inheritedAttributes.get[SupervisionStrategy]
.map(_.decider)
.getOrElse(Supervision.stoppingDecider)
val awaitingConfirmation = new AtomicInteger(0)
#volatile var inIsClosed = false
var completionState: Option[Try[Unit]] = None
override protected def logSource: Class[_] = classOf[PathThroughStage[R, V, PassThrough]]
def checkForCompletion() = {
if (isClosed(in) && awaitingConfirmation.get == 0) {
completionState match {
case Some(Success(_)) => completeStage()
case Some(Failure(ex)) => failStage(ex)
case None => failStage(new IllegalStateException("Stage completed, but there is no info about status"))
}
}
}
val checkForCompletionCB = getAsyncCallback[Unit] { _ =>
checkForCompletion()
}
val failStageCb = getAsyncCallback[Throwable] { ex =>
failStage(ex)
}
setHandler(out, new OutHandler {
override def onPull() = {
tryPull(in)
}
})
setHandler(in, new InHandler {
override def onPush() = {
val msg = grab(in)
val f = Future[Result[R, PassThrough]] {
try {
Result(// TODO YOUR logic
msg.record,
msg.passThrough)
} catch {
case exception: Exception =>
decider(exception) match {
case Supervision.Stop =>
failStageCb.invoke(exception)
case _ =>
Result(exception, msg.passThrough)
}
}
if (awaitingConfirmation.decrementAndGet() == 0 && inIsClosed) checkForCompletionCB.invoke(())
}
awaitingConfirmation.incrementAndGet()
push(out, f)
}
override def onUpstreamFinish() = {
inIsClosed = true
completionState = Some(Success(()))
checkForCompletion()
}
override def onUpstreamFailure(ex: Throwable) = {
inIsClosed = true
completionState = Some(Failure(ex))
checkForCompletion()
}
})
override def postStop() = {
log.debug("Stage completed")
super.postStop()
}
}
logic
}
}

Related

Scala stream and ExecutionContext issue

I'm new in Scala and i'm facing a few problems in my assignment :
I want to build a stream class that can do 3 main tasks : filter,map,and forEach.
My streams data is an array of elements. Each of the 3 main tasks should run in 2 different threads on my streams array.
In addition, I need to divde the logic of the action and its actual run to two different parts. First declare all tasks in stream and only when I run stream.run() I want the actual actions to happen.
My code :
class LearningStream[A]() {
val es: ExecutorService = Executors.newFixedThreadPool(2)
val ec = ExecutionContext.fromExecutorService(es)
var streamValues: ArrayBuffer[A] = ArrayBuffer[A]()
var r: Runnable = () => "";
def setValues(streamv: ArrayBuffer[A]) = {
streamValues = streamv;
}
def filter(p: A => Boolean): LearningStream[A] = {
var ls_filtered: LearningStream[A] = new LearningStream[A]()
r = () => {
println("running real filter..")
val (l,r) = streamValues.splitAt(streamValues.length/2)
val a:ArrayBuffer[A]=es.submit(()=>l.filter(p)).get()
val b:ArrayBuffer[A]=es.submit(()=>r.filter(p)).get()
ms_filtered.setValues(a++b)
}
return ls_filtered
}
def map[B](f: A => B): LearningStream[B] = {
var ls_map: LearningStream[B] = new LearningStream[B]()
r = () => {
println("running real map..")
val (l,r) = streamValues.splitAt(streamValues.length/2)
val a:ArrayBuffer[B]=es.submit(()=>l.map(f)).get()
val b:ArrayBuffer[B]=es.submit(()=>r.map(f)).get()
ls_map.setValues(a++b)
}
return ls_map
}
def forEach(c: A => Unit): Unit = {
r=()=>{
println("running real forEach")
streamValues.foreach(c)}
}
def insert(a: A): Unit = {
streamValues += a
}
def start(): Unit = {
ec.submit(r)
}
def shutdown(): Unit = {
ec.shutdown()
}
}
my main :
def main(args: Array[String]): Unit = {
var factorial=0
val s = new LearningStream[String]
s.filter(str=>str.startsWith("-")).map(s=>s.toInt*(-1)).forEach(i=>factorial=factorial*i)
for(i <- -5 to 5){
s.insert(i.toString)
}
println(s.streamValues)
s.start()
println(factorial)
}
The main prints only the filter`s output and the factorial isnt changed (still 1).
What am I missing here ?
My solution: #Levi Ramsey left a few good hints in the comments if you want to get hints and not the real solution.
First problem: Only one command (filter) run and the other didn't. solution: insert to the runnable of each command a call for the next stream via:
ec.submit(ms_map.r)
In order to be able to close all sessions, we need to add another LearningStream data member to the class. However we can't add just a regular LearningStream object because it depends on parameter [A]. Therefore, I implemented a trait that has the close function and my data member was of that trait type.

How to unit test BroadcastProcessFunction in flink when processElement depends on broadcasted data

I implemented a flink stream with a BroadcastProcessFunction. From the processBroadcastElement I get my model and I apply it on my event in processElement.
I don't find a way to unit test my stream as I don't find a solution to ensure the model is dispatched prior to the first event.
I would say there are two ways for achieving this:
1. Find a solution to have the model pushed in the stream first
2. Have the broadcast state filled with the model prio to the execution of the stream so that it is restored
I may have missed something, but I have not found an simple way to do this.
Here is a simple unit test with my issue:
import org.apache.flink.api.common.state.MapStateDescriptor
import org.apache.flink.streaming.api.functions.co.BroadcastProcessFunction
import org.apache.flink.streaming.api.functions.sink.SinkFunction
import org.apache.flink.streaming.api.scala._
import org.apache.flink.util.Collector
import org.scalatest.Matchers._
import org.scalatest.{BeforeAndAfter, FunSuite}
import scala.collection.mutable
class BroadCastProcessor extends BroadcastProcessFunction[Int, (Int, String), String] {
import BroadCastProcessor._
override def processElement(value: Int,
ctx: BroadcastProcessFunction[Int, (Int, String), String]#ReadOnlyContext,
out: Collector[String]): Unit = {
val broadcastState = ctx.getBroadcastState(broadcastStateDescriptor)
if (broadcastState.contains(value)) {
out.collect(broadcastState.get(value))
}
}
override def processBroadcastElement(value: (Int, String),
ctx: BroadcastProcessFunction[Int, (Int, String), String]#Context,
out: Collector[String]): Unit = {
ctx.getBroadcastState(broadcastStateDescriptor).put(value._1, value._2)
}
}
object BroadCastProcessor {
val broadcastStateDescriptor: MapStateDescriptor[Int, String] = new MapStateDescriptor[Int, String]("int_to_string", classOf[Int], classOf[String])
}
class CollectSink extends SinkFunction[String] {
import CollectSink._
override def invoke(value: String): Unit = {
values += value
}
}
object CollectSink { // must be static
val values: mutable.MutableList[String] = mutable.MutableList[String]()
}
class BroadCastProcessTest extends FunSuite with BeforeAndAfter {
before {
CollectSink.values.clear()
}
test("add_elem_to_broadcast_and_process_should_apply_broadcast_rule") {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val dataToProcessStream = env.fromElements(1)
val ruleToBroadcastStream = env.fromElements(1 -> "1", 2 -> "2", 3 -> "3")
val broadcastStream = ruleToBroadcastStream.broadcast(BroadCastProcessor.broadcastStateDescriptor)
dataToProcessStream
.connect(broadcastStream)
.process(new BroadCastProcessor)
.addSink(new CollectSink())
// execute
env.execute()
CollectSink.values should contain("1")
}
}
Update thanks to David Anderson
I went for the buffer solution. I defined a process function for the synchronization:
class SynchronizeModelAndEvent(modelNumberToWaitFor: Int) extends CoProcessFunction[Int, (Int, String), Int] {
val eventBuffer: mutable.MutableList[Int] = mutable.MutableList[Int]()
var modelEventsNumber = 0
override def processElement1(value: Int, ctx: CoProcessFunction[Int, (Int, String), Int]#Context, out: Collector[Int]): Unit = {
if (modelEventsNumber < modelNumberToWaitFor) {
eventBuffer += value
return
}
out.collect(value)
}
override def processElement2(value: (Int, String), ctx: CoProcessFunction[Int, (Int, String), Int]#Context, out: Collector[Int]): Unit = {
modelEventsNumber += 1
if (modelEventsNumber >= modelNumberToWaitFor) {
eventBuffer.foreach(event => out.collect(event))
}
}
}
And so I need to add it to my stream:
dataToProcessStream
.connect(ruleToBroadcastStream)
.process(new SynchronizeModelAndEvent(3))
.connect(broadcastStream)
.process(new BroadCastProcessor)
.addSink(new CollectSink())
Thanks
There isn't an easy way to do this. You could have processElement buffer all of its input until the model has been received by processBroadcastElement. Or run the job once with no event traffic and take a savepoint once the model has been broadcast. Then restore that savepoint into the same job, but with its event input connected.
By the way, the capability you are looking for is often referred to as "side inputs" in the Flink community.
Thanks to David Anderson and Matthieu I wrote this generic CoFlatMap function that makes the requested delay on the event stream:
import org.apache.flink.streaming.api.functions.co.CoProcessFunction
import org.apache.flink.util.Collector
import scala.collection.mutable
class SynchronizeEventsWithRules[A,B](rulesToWait: Int) extends CoProcessFunction[A, B, A] {
val eventBuffer: mutable.MutableList[A] = mutable.MutableList[A]()
var processedRules = 0
override def processElement1(value: A, ctx: CoProcessFunction[A, B, A]#Context, out: Collector[A]): Unit = {
if (processedRules < rulesToWait) {
println("1 item buffered")
println(rulesToWait+"--"+processedRules)
eventBuffer += value
return
}
eventBuffer.clear()
println("send input to output without buffering:")
out.collect(value)
}
override def processElement2(value: B, ctx: CoProcessFunction[A, B, A]#Context, out: Collector[A]): Unit = {
processedRules += 1
println("1 rule processed processedRules:"+processedRules)
if (processedRules >= rulesToWait && eventBuffer.length>0) {
println("send buffered data to output")
eventBuffer.foreach(event => out.collect(event))
eventBuffer.clear()
}
}
}
but unfortunately, it does not help at all in my case, because the subject under the test was a KeyedBroadcastProcessFunction that makes the delay on event data irrelevant because of that I tried to apply a flatmap that makes the rule stream n times larger, n was the number of CPUs. so I will sure that the resulting event stream will be always sync with the rule stream and will be arrived after that but it does not help either.
after all, I came to this simple solution that of course it is not deterministic but because of the nature of parallelism and concurrency the problem itself is not deterministic either.
If we set the delayMilis big enough (>100) the result will be deterministic:
val delayMilis = 100
val synchronizedInput = inputEventStream.map(x=>{
Thread.sleep(delayMilis)
x
}).keyBy(_.someKey)
you can also change the mapping function to this to apply the delay only on the first element:
package util
import org.apache.flink.api.common.functions.MapFunction
class DelayEvents[T](delayMilis: Int) extends MapFunction[T,T] {
var delayed = false
override def map(value: T): T = {
if (!delayed) {
delayed=true
Thread.sleep(delayMilis)
}
value
}
}
val delayMilis = 100
val synchronizedInput = inputEventStream.map(new DelayEvents(100)).keyBy(_.someKey)

How to deal with future inside a customized akka Sink?

I am trying to implement a customized Akka Sink, but I could not find a way to handle future inside it.
class EventSink(...) {
val in: Inlet[EventEnvelope2] = Inlet("EventSink")
override val shape: SinkShape[EventEnvelope2] = SinkShape(in)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = {
new GraphStageLogic(shape) {
// This requests one element at the Sink startup.
override def preStart(): Unit = pull(in)
setHandler(in, new InHandler {
override def onPush(): Unit = {
val future = handle(grab(in))
Await.ready(future, Duration.Inf)
/*
future.onComplete {
case Success(_) =>
logger.info("pulling next events")
pull(in)
case Failure(failure) =>
logger.error(failure.getMessage, failure)
throw failure
}*/
pull(in)
}
})
}
}
private def handle(envelope: EventEnvelope2): Future[Unit] = {
val EventEnvelope2(query.Sequence(offset), _/*persistenceId*/, _/*sequenceNr*/, event) = envelope
...
db.run(statements.transactionally)
}
}
I have to go with blocking future at the moment, which does not look good. The non-blocking one I commented out only works for the first event. Could anyone please help?
Updated Thanks #ViktorKlang. It seems to be working now.
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
{
new GraphStageLogic(shape) {
val callback = getAsyncCallback[Try[Unit]] {
case Success(_) =>
//completeStage()
pull(in)
case Failure(error) =>
failStage(error)
}
// This requests one element at the Sink startup.
override def preStart(): Unit = {
pull(in)
}
setHandler(in, new InHandler {
override def onPush(): Unit = {
val future = handle(grab(in))
future.onComplete { result =>
callback.invoke(result)
}
}
})
}
}
I am trying to implement a Rational DB event sink connnecting to ReadJournal.eventsByTag. So this is a continuous stream, which will never end unless there is an error - This is what I want. Is my approach correct?
Two more questions:
Will the GraphStage never end unless I manually invoke completeStage or failStage?
Am I right or normal to declare callback outside preStart method? and Am I right to invoke pull(in) in preStart in this case?
Thanks,
Cheng
Avoid Custom Stages
In general, you should try to exhaust all possibilities with the given methods of the library's Source, Flow, and Sink. Custom stages are almost never necessary and make your code difficult to maintain.
Writing Your "Custom" Stage Using Standard Methods
Based on the details of your question's example code I don't see any reason why you would use a custom Sink to begin with.
Given your handle method, you could slightly modify it to do the logging that you specified in the question:
val loggedHandle : (EventEnvelope2) => Future[Unit] =
handle(_) transform {
case Success(_) => {
logger.info("pulling next events")
Success(Unit)
}
case Failure(failure) => {
logger.error(failure.getMessage, failure)
Failure(failure)
}
}
Then just use Sink.foreachParallel to handle the envelopes:
val createEventEnvelope2Sink : (Int) => Sink[EventEnvelope2, Future[Done]] =
(parallelism) =>
Sink[EventEnvelope2].foreachParallel(parallelism)(handle _)
Now, even if you want each EventEnvelope2 to be sent to the db in order you can just use 1 for parallelism:
val inOrderDBInsertSink : Sink[EventEnvelope2, Future[Done]] =
createEventEnvelope2Sink(1)
Also, if the database throws an exception you can still get a hold of it when the foreachParallel completes:
val someEnvelopeSource : Source[EventEnvelope2, _] = ???
someEnvelopeSource
.to(createEventEnvelope2Sink(1))
.run()
.andThen {
case Failure(throwable) => { /*deal with db exception*/ }
case Success(_) => { /*all inserts succeeded*/ }
}

Asynchronous Iterable over remote data

There is some data that I have pulled from a remote API, for which I use a Future-style interface. The data is structured as a linked-list. A relevant example data container is shown below.
case class Data(information: Int) {
def hasNext: Boolean = ??? // Implemented
def next: Future[Data] = ??? // Implemented
}
Now I'm interested in adding some functionality to the data class, such as map, foreach, reduce, etc. To do so I want to implement some form of IterableLike such that it inherets these methods.
Given below is the trait Data may extend, such that it gets this property.
trait AsyncIterable[+T]
extends IterableLike[Future[T], AsyncIterable[T]]
{
def hasNext : Boolean
def next : Future[T]
// How to implement?
override def iterator: Iterator[Future[T]] = ???
override protected[this] def newBuilder: mutable.Builder[Future[T], AsyncIterable[T]] = ???
override def seq: TraversableOnce[Future[T]] = ???
}
It should be a non-blocking implementation, which when acted on, starts requesting the next data from the remote data source.
It is then possible to do cool stuff such as
case class Data(information: Int) extends AsyncIterable[Data]
val data = Data(1) // And more, of course
// Asynchronously print all the information.
data.foreach(data => println(data.information))
It is also acceptable for the interface to be different. But the result should in some way represent asynchronous iteration over the collection. Preferably in a way that is familiar to developers, as it will be part of an (open source) library.
In production I would use one of following:
Akka Streams
Reactive Extensions
For private tests I would implement something similar to following.
(Explanations are below)
I have modified a little bit your Data:
abstract class AsyncIterator[T] extends Iterator[Future[T]] {
def hasNext: Boolean
def next(): Future[T]
}
For it we can implement this Iterable:
class AsyncIterable[T](sourceIterator: AsyncIterator[T])
extends IterableLike[Future[T], AsyncIterable[T]]
{
private def stream(): Stream[Future[T]] =
if(sourceIterator.hasNext) {sourceIterator.next #:: stream()} else {Stream.empty}
val asStream = stream()
override def iterator = asStream.iterator
override def seq = asStream.seq
override protected[this] def newBuilder = throw new UnsupportedOperationException()
}
And if see it in action using following code:
object Example extends App {
val source = "Hello World!";
val iterator1 = new DelayedIterator[Char](100L, source.toCharArray)
new AsyncIterable(iterator1).foreach(_.foreach(print)) //prints 1 char per 100 ms
pause(2000L)
val iterator2 = new DelayedIterator[String](100L, source.toCharArray.map(_.toString))
new AsyncIterable(iterator2).reduceLeft((fl: Future[String], fr) =>
for(l <- fl; r <- fr) yield {println(s"$l+$r"); l + r}) //prints 1 line per 100 ms
pause(2000L)
def pause(duration: Long) = {println("->"); Thread.sleep(duration); println("\n<-")}
}
class DelayedIterator[T](delay: Long, data: Seq[T]) extends AsyncIterator[T] {
private val dataIterator = data.iterator
private var nextTime = System.currentTimeMillis() + delay
override def hasNext = dataIterator.hasNext
override def next = {
val thisTime = math.max(System.currentTimeMillis(), nextTime)
val thisValue = dataIterator.next()
nextTime = thisTime + delay
Future {
val now = System.currentTimeMillis()
if(thisTime > now) Thread.sleep(thisTime - now) //Your implementation will be better
thisValue
}
}
}
Explanation
AsyncIterable uses Stream because it's calculated lazily and it's simple.
Pros:
simplicity
multiple calls to iterator and seq methods return same iterable with all items.
Cons:
could lead to memory overflow because stream keeps all prevously obtained values.
first value is eagerly gotten during creation of AsyncIterable
DelayedIterator is very simplistic implementation of AsyncIterator, don't blame me for quick and dirty code here.
It's still strange for me to see synchronous hasNext and asynchronous next()
Using Twitter Spool I've implemented a working example.
To implement spool I modified the example in the documentation.
import com.twitter.concurrent.Spool
import com.twitter.util.{Await, Return, Promise}
import scala.concurrent.{ExecutionContext, Future}
trait AsyncIterable[+T <: AsyncIterable[T]] { self : T =>
def hasNext : Boolean
def next : Future[T]
def spool(implicit ec: ExecutionContext) : Spool[T] = {
def fill(currentPage: Future[T], rest: Promise[Spool[T]]) {
currentPage foreach { cPage =>
if(hasNext) {
val nextSpool = new Promise[Spool[T]]
rest() = Return(cPage *:: nextSpool)
fill(next, nextSpool)
} else {
val emptySpool = new Promise[Spool[T]]
emptySpool() = Return(Spool.empty[T])
rest() = Return(cPage *:: emptySpool)
}
}
}
val rest = new Promise[Spool[T]]
if(hasNext) {
fill(next, rest)
} else {
rest() = Return(Spool.empty[T])
}
self *:: rest
}
}
Data is the same as before, and now we can use it.
// Cool stuff
implicit val ec = scala.concurrent.ExecutionContext.global
val data = Data(1) // And others
// Print all the information asynchronously
val fut = data.spool.foreach(data => println(data.information))
Await.ready(fut)
It will trow an exception on the second element, because the implementation of next was not provided.

What Automatic Resource Management alternatives exist for Scala?

I have seen many examples of ARM (automatic resource management) on the web for Scala. It seems to be a rite-of-passage to write one, though most look pretty much like one another. I did see a pretty cool example using continuations, though.
At any rate, a lot of that code has flaws of one type or another, so I figured it would be a good idea to have a reference here on Stack Overflow, where we can vote up the most correct and appropriate versions.
Chris Hansen's blog entry 'ARM Blocks in Scala: Revisited' from 3/26/09 talks about about slide 21 of Martin Odersky's FOSDEM presentation. This next block is taken straight from slide 21 (with permission):
def using[T <: { def close() }]
(resource: T)
(block: T => Unit)
{
try {
block(resource)
} finally {
if (resource != null) resource.close()
}
}
--end quote--
Then we can call like this:
using(new BufferedReader(new FileReader("file"))) { r =>
var count = 0
while (r.readLine != null) count += 1
println(count)
}
What are the drawbacks of this approach? That pattern would seem to address 95% of where I would need automatic resource management...
Edit: added code snippet
Edit2: extending the design pattern - taking inspiration from python with statement and addressing:
statements to run before the block
re-throwing exception depending on the managed resource
handling two resources with one single using statement
resource-specific handling by providing an implicit conversion and a Managed class
This is with Scala 2.8.
trait Managed[T] {
def onEnter(): T
def onExit(t:Throwable = null): Unit
def attempt(block: => Unit): Unit = {
try { block } finally {}
}
}
def using[T <: Any](managed: Managed[T])(block: T => Unit) {
val resource = managed.onEnter()
var exception = false
try { block(resource) } catch {
case t:Throwable => exception = true; managed.onExit(t)
} finally {
if (!exception) managed.onExit()
}
}
def using[T <: Any, U <: Any]
(managed1: Managed[T], managed2: Managed[U])
(block: T => U => Unit) {
using[T](managed1) { r =>
using[U](managed2) { s => block(r)(s) }
}
}
class ManagedOS(out:OutputStream) extends Managed[OutputStream] {
def onEnter(): OutputStream = out
def onExit(t:Throwable = null): Unit = {
attempt(out.close())
if (t != null) throw t
}
}
class ManagedIS(in:InputStream) extends Managed[InputStream] {
def onEnter(): InputStream = in
def onExit(t:Throwable = null): Unit = {
attempt(in.close())
if (t != null) throw t
}
}
implicit def os2managed(out:OutputStream): Managed[OutputStream] = {
return new ManagedOS(out)
}
implicit def is2managed(in:InputStream): Managed[InputStream] = {
return new ManagedIS(in)
}
def main(args:Array[String]): Unit = {
using(new FileInputStream("foo.txt"), new FileOutputStream("bar.txt")) {
in => out =>
Iterator continually { in.read() } takeWhile( _ != -1) foreach {
out.write(_)
}
}
}
Daniel,
I've just recently deployed the scala-arm library for automatic resource management. You can find the documentation here: https://github.com/jsuereth/scala-arm/wiki
This library supports three styles of usage (currently):
1) Imperative/for-expression:
import resource._
for(input <- managed(new FileInputStream("test.txt")) {
// Code that uses the input as a FileInputStream
}
2) Monadic-style
import resource._
import java.io._
val lines = for { input <- managed(new FileInputStream("test.txt"))
val bufferedReader = new BufferedReader(new InputStreamReader(input))
line <- makeBufferedReaderLineIterator(bufferedReader)
} yield line.trim()
lines foreach println
3) Delimited Continuations-style
Here's an "echo" tcp server:
import java.io._
import util.continuations._
import resource._
def each_line_from(r : BufferedReader) : String #suspendable =
shift { k =>
var line = r.readLine
while(line != null) {
k(line)
line = r.readLine
}
}
reset {
val server = managed(new ServerSocket(8007)) !
while(true) {
// This reset is not needed, however the below denotes a "flow" of execution that can be deferred.
// One can envision an asynchronous execuction model that would support the exact same semantics as below.
reset {
val connection = managed(server.accept) !
val output = managed(connection.getOutputStream) !
val input = managed(connection.getInputStream) !
val writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(output)))
val reader = new BufferedReader(new InputStreamReader(input))
writer.println(each_line_from(reader))
writer.flush()
}
}
}
The code makes uses of a Resource type-trait, so it's able to adapt to most resource types. It has a fallback to use structural typing against classes with either a close or dispose method. Please check out the documentation and let me know if you think of any handy features to add.
Here's James Iry solution using continuations:
// standard using block definition
def using[X <: {def close()}, A](resource : X)(f : X => A) = {
try {
f(resource)
} finally {
resource.close()
}
}
// A DC version of 'using'
def resource[X <: {def close()}, B](res : X) = shift(using[X, B](res))
// some sugar for reset
def withResources[A, C](x : => A #cps[A, C]) = reset{x}
Here are the solutions with and without continuations for comparison:
def copyFileCPS = using(new BufferedReader(new FileReader("test.txt"))) {
reader => {
using(new BufferedWriter(new FileWriter("test_copy.txt"))) {
writer => {
var line = reader.readLine
var count = 0
while (line != null) {
count += 1
writer.write(line)
writer.newLine
line = reader.readLine
}
count
}
}
}
}
def copyFileDC = withResources {
val reader = resource[BufferedReader,Int](new BufferedReader(new FileReader("test.txt")))
val writer = resource[BufferedWriter,Int](new BufferedWriter(new FileWriter("test_copy.txt")))
var line = reader.readLine
var count = 0
while(line != null) {
count += 1
writer write line
writer.newLine
line = reader.readLine
}
count
}
And here's Tiark Rompf's suggestion of improvement:
trait ContextType[B]
def forceContextType[B]: ContextType[B] = null
// A DC version of 'using'
def resource[X <: {def close()}, B: ContextType](res : X): X #cps[B,B] = shift(using[X, B](res))
// some sugar for reset
def withResources[A](x : => A #cps[A, A]) = reset{x}
// and now use our new lib
def copyFileDC = withResources {
implicit val _ = forceContextType[Int]
val reader = resource(new BufferedReader(new FileReader("test.txt")))
val writer = resource(new BufferedWriter(new FileWriter("test_copy.txt")))
var line = reader.readLine
var count = 0
while(line != null) {
count += 1
writer write line
writer.newLine
line = reader.readLine
}
count
}
For now Scala 2.13 has finally supported: try with resources by using Using :), Example:
val lines: Try[Seq[String]] =
Using(new BufferedReader(new FileReader("file.txt"))) { reader =>
Iterator.unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
}
or using Using.resource avoid Try
val lines: Seq[String] =
Using.resource(new BufferedReader(new FileReader("file.txt"))) { reader =>
Iterator.unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
}
You can find more examples from Using doc.
A utility for performing automatic resource management. It can be used to perform an operation using resources, after which it releases the resources in reverse order of their creation.
I see a gradual 4 step evolution for doing ARM in Scala:
No ARM: Dirt
Only closures: Better, but multiple nested blocks
Continuation Monad: Use For to flatten the nesting, but unnatural separation in 2 blocks
Direct style continuations: Nirava, aha! This is also the most type-safe alternative: a resource outside withResource block will be type error.
There is light-weight (10 lines of code) ARM included with better-files. See: https://github.com/pathikrit/better-files#lightweight-arm
import better.files._
for {
in <- inputStream.autoClosed
out <- outputStream.autoClosed
} in.pipeTo(out)
// The input and output streams are auto-closed once out of scope
Here is how it is implemented if you don't want the whole library:
type Closeable = {
def close(): Unit
}
type ManagedResource[A <: Closeable] = Traversable[A]
implicit class CloseableOps[A <: Closeable](resource: A) {
def autoClosed: ManagedResource[A] = new Traversable[A] {
override def foreach[U](f: A => U) = try {
f(resource)
} finally {
resource.close()
}
}
}
How about using Type classes
trait GenericDisposable[-T] {
def dispose(v:T):Unit
}
...
def using[T,U](r:T)(block:T => U)(implicit disp:GenericDisposable[T]):U = try {
block(r)
} finally {
Option(r).foreach { r => disp.dispose(r) }
}
Another alternative is Choppy's Lazy TryClose monad. It's pretty good with database connections:
val ds = new JdbcDataSource()
val output = for {
conn <- TryClose(ds.getConnection())
ps <- TryClose(conn.prepareStatement("select * from MyTable"))
rs <- TryClose.wrap(ps.executeQuery())
} yield wrap(extractResult(rs))
// Note that Nothing will actually be done until 'resolve' is called
output.resolve match {
case Success(result) => // Do something
case Failure(e) => // Handle Stuff
}
And with streams:
val output = for {
outputStream <- TryClose(new ByteArrayOutputStream())
gzipOutputStream <- TryClose(new GZIPOutputStream(outputStream))
_ <- TryClose.wrap(gzipOutputStream.write(content))
} yield wrap({gzipOutputStream.flush(); outputStream.toByteArray})
output.resolve.unwrap match {
case Success(bytes) => // process result
case Failure(e) => // handle exception
}
More info here: https://github.com/choppythelumberjack/tryclose
Here is #chengpohi's answer, modified so it works with Scala 2.8+, instead of just Scala 2.13 (yes, it works with Scala 2.13 also):
def unfold[A, S](start: S)(op: S => Option[(A, S)]): List[A] =
Iterator
.iterate(op(start))(_.flatMap{ case (_, s) => op(s) })
.map(_.map(_._1))
.takeWhile(_.isDefined)
.flatten
.toList
def using[A <: AutoCloseable, B](resource: A)
(block: A => B): B =
try block(resource) finally resource.close()
val lines: Seq[String] =
using(new BufferedReader(new FileReader("file.txt"))) { reader =>
unfold(())(_ => Option(reader.readLine()).map(_ -> ())).toList
}
While Using is OK, I prefer the monadic style of resource composition. Twitter Util's Managed is pretty nice, except for its dependencies and its not-very-polished API.
To that end, I've published https://github.com/dvgica/managerial for Scala 2.12, 2.13, and 3.0.0. Largely based on the Twitter Util Managed code, no dependencies, with some API improvements inspired by cats-effect Resource.
The simple example:
import ca.dvgi.managerial._
val fileContents = Managed.from(scala.io.Source.fromFile("file.txt")).use(_.mkString)
But the real strength of the library is composing resources via for comprehensions.
Let me know what you think!