Suppose that I want to convert some legacy asynchronous API into FS2 Streams.
The API provides an interface with 3 callbacks: next element, success, error.
I'd like the Stream to emit all the elements and then complete upon receiving success or error callback.
FS2 guide (https://functional-streams-for-scala.github.io/fs2/guide.html) suggests using fs2.Queue for such situations,
and it works great for enqueueing, but all the examples I've seen so far expect that the stream that queue.dequeue returns will never complete -
there's no obvious way to handle success/error callback in my situation.
I've tried to use queue.dequeue.interruptWhen(...here goes the signal...), but if success/error callback arrives before the client has read the data from the stream,
stream gets terminated prematurely - there are still unread elements. I'd like the consumer to finish reading them before completing the stream.
Is it possible to do that with FS2? With Akka Streams it's trivial - SourceQueueWithComplete has complete and fail methods.
UPDATE:
I was able to get good enough result by wrapping elements in Option and considering None as a signal to stop reading the stream, and additionally by using a Promise to propagate errors:
queue.dequeue
.interruptWhen(interruptingPromise.get)
.takeWhile(_.isDefined).map(_.get)
However, did I overlook more natural way of doing such things?
One idiomatic way to do this is to create a Queue[Option[A]] instead of Queue[A]. When enqueueing, wrap in Some, and you can explicitly enqueue None to signal completion. On the dequeueing side, do q.dequeue.unNoneTerminate, which gives you a Stream[F, A] that terminates once the Queue emits None
Answer to your update: Combine unNoneTerminate with rethrow, which takes a Stream[F, Either[Throwable, A]] and returns a Stream[F, A] that errors out with Stream.raiseError when it encouters a throwable.
Your complete stack would then be a Stream[F, Either[Throwable, Option[A]]] and you unwrap into Stream[F,A] by calling .rethrow.unNoneTerminate
Related
Need to fill in the methods next and hasNext and preserve laziness
new Iterator[T] {
val stream: fs2.Stream[IO, T] = ...
def next(): T = ???
def hasNext(): Boolean = ???
}
But cannot figure out how an earth to do this from a fs2.Stream? All the methods on a Stream (or on the "compiled" thing) are fairly useless.
If this is simply impossible to do in a reasonable amount of code, then that itself is a satisfactory answer and we will just rip out fs2.Stream from the codebase - just want to check first!
fs2.Stream, while similar in concept to Iterator, cannot be converted to one while preserving laziness. I'll try to elaborate on why...
Both represent a pull-based series of items, but the way in which they represent that series and implement the laziness differs too much.
As you already know, Iterator represents its pull in terms of the next() and hasNext methods, both of which are synchronous and blocking. To consume the iterator and return a value, you can directly call those methods e.g. in a loop, or use one of its many convenience methods.
fs2.Stream supports two capabilities that make it incompatible with that interface:
cats.effect.Resource can be included in the construction of a Stream. For example, you could construct a fs2.Stream[IO, Byte] representing the contents of a file. When consuming that stream, even if you abort early or do some strange flatMap, the underlying Resource is honored and your file handle is guaranteed to be closed. If you were trying to do the same thing with iterator, the "abort early" case would pose problems, forcing you to do something like Iterator[Byte] with Closeable and the caller would have to make sure to .close() it, or some other pattern.
Evaluation of "effects". In this context, effects are types like IO or Future, where the process of obtaining the value may perform some possibly-asynchronous action, and may perform side-effects. Asynchrony poses a problem when trying to force the process into a synchronous interface, since it forces you to block your current thread to wait for the asynchronous answer, which can cause deadlocks if you aren't careful. Libraries like cats-effect strongly discourage you from calling methods like unsafeRunSync.
fs2.Stream does allow for some special cases that prevent the inclusion of Resource and Effects, via its Pure type alias which you can use in place of IO. That gets you access to Stream.PureOps, but that only gets you methods that consume the whole stream by building a collection; the laziness you want to preserve would be lost.
Side note: you can convert an Iterator to a Stream.
The only way to "convert" a Stream to an Iterator is to consume it to some collection type via e.g. .compile.toList, which would get you an IO[List[T]], then .map(_.iterator) that to get an IO[Iterator[T]]. But ultimately that doesn't fit what you're asking for since it forces you to consume the stream to a buffer, breaking laziness.
#Dima mentioned the "XY Problem", which was poorly-received since they didn't really elaborate (initially) on the incompatibility, but they're right. It would be helpful to know why you're trying to make a Stream-to-Iterator conversion, in case there's some other approach that would serve your overall goal instead.
I have an infinite fs2.Stream which may encounter errors. I'd like to skip those errors with doing nothing (probably log) and keep streaming further elements. Example:
//An example
val stream = fs2.Stream
.awakeEvery[IO](1.second)
.evalMap(_ => IO.raiseError(new RuntimeException))
In this specific case, I'd like to get infinite fs2.Stream of Left(new RuntimeException) emitting every second.
There is a Stream.attempt method producing the stream that got terminated after the first error is encountered. Is there a way to just skip errors and keep pulling further elements?
The IO.raiseError(new RuntimeException).attempt won't work in general since it would require attempting all effects in all places of the stream pipeline composition.
There's no way to handle errors in the way you described.
When stream encounters the first error it is terminated. Please check this gitter question.
You can handle it in two ways:
Attempt the effect (but you already mentioned it is not possible in your case).
Restart stream after it is terminated:
val stream: Stream[IO, Either[Throwable, Unit]] = Stream
.awakeEvery[IO](1.second)
.evalMap(_ => IO.raiseError(new RuntimeException))
.handleErrorWith(t => Stream(Left(t)) ++ stream) //put Left to the stream and restart it
//since stream will infinitely restart I take only 3 first values
println(stream.take(3).compile.toList.unsafeRunSync())
It must be damn simple. But for some reason I cannot make it work.
If I do io.linesR(...), I have a stream of lines of the file, it's ok.
If I do Processor.emitAll(), I have a stream of pre-defined values. It also works.
But what I actually need is to produce values for scalaz-stream asynchronously (well, from Akka actor).
I have tried:
async.unboundedQueue[String]
async.signal[String]
Then called queue.enqueueOne(...).run or signal.set(...).run and listened to queue.dequeue or signal.discrete. Just with .map and .to. With an example proved to work with another kind of stream -- either with Processor or lines from the file.
What is the secret? What is the preferred way to create a channel to be streamed later? How to feed it with values from another context?
Thanks!
If the values are produced asynchronously but in a way that can be driven from the stream, I've found it easiest to use the "primitive" await method and construct the process "by hand". You need an indirectly recursive function:
def processStep(v: Int): Process[Future, Int] =
Process.emit(v) ++ Process.await(myActor ? NextValuePlease())(w => processStep(w))
But if you need a truly async process, driven from elsewhere, I've never done that.
When process a message, is it possible to send out an message to another actor and wait for that actor to reply, and consume the replied message and then continue, like the following, is it doable?
val lineMap=HashMap[String,Int]()
receive {
case bigTaskMap=>
for (line <-readSomeFile){
if(lineMap.get(line)==None){
anotherActor!line // that actor will reply a hashmap which contain the key for line
receive {
case x:HashMap => lineMap=x
}
}
lineMap.get(line) // use that value to do further work
}
}
This answer is for Akka (old Scala actors are deprecated in Scala 2.10).
Yes. You can use ask to get a future (rather than creating a fully-fledged actor yourself) and then call onComplete on the Future returned to set an action which will be executed when the Future's value (or an error) becomes available. Don't worry about how quickly the Future might yield a value - it doesn't matter, because the onComplete action will be executed even if the Future is already available when onComplete is called!
However, be very careful: you should not directly access any of the state (i.e. the variables) in the containing actor in your action(s), because the onComplete action(s) will not run in the same execution context as the actor (i.e. they could be running at the same time as the original actor is processing a message). Instead, send further messages back to the original actor, or forward them on.
In fact, in some cases you may find the simplest thing to do is simply to send a message, and let the original actor handle the reply. become and unbecome may help here. However, again, be careful if using become, that in the actor's new behaviour, it doesn't drop "ordinary" messages that should be handled in the ordinary way.
What is a good way of creating non-blocking methods in Scala? One way I can think of is to create a thread/actor and the method just send a message to the thread and returns. Is there a better way of creating a non-blocking method?
Use scala.actors.Future:
import actors._
def asyncify[A, B](f: A => B): A => Future[B] = (a => Futures.future(f(a)))
// normally blocks when called
def sleepFor(seconds: Int) = {
Thread.sleep(seconds * 1000)
seconds
}
val asyncSleepFor = asyncify(sleepFor)
val future = asyncSleepFor(5) // now it does NOT block
println("waiting...") // prints "waiting..." rightaway
println("future returns %d".format(future())) // prints "future returns 5" after 5 seconds
Overloaded "asyncify" that takes a function with more than one parameter is left as an exercise.
One caveat, however, is exception handling. The function that is being "asyncified" has to handle all exceptions itself by catching them. Behavior for exceptions thrown out of the function is undefined.
Learn about actors.
It depends on your definition of "blocking." Strictly speaking, anything that requires acquisition of a lock is blocking. All operations that are dependent on an actor's internal state acquire a lock on the actor. This includes message sends. If lots of threads try to send a message to an actor all-at-once they have to get in line.
So if you really need non-blocking, there are various options in java.util.concurrent.
That being said, from a practical perspective actors give you something that close enough to non-blocking because none of the synchronized operations do a significant amount of work, so chances are actors meet your need.