I'm a newbie trying to grasp the intuition behind fs2 Queues.
I'm trying to do a basic example for pulling data from a Stream[IO, Int]. But the documentation for me is not enough as it directly dives into advanced stuff directly.
Here what I've done so far:
import cats.effect.{ ExitCode, IO, IOApp}
import fs2._
import fs2.concurrent.Queue
class QueueInt(q: Queue[IO, Int]) {
def startPushingtoQueue: Stream[IO, Unit] = {
Stream(1, 2, 3).covary[IO].through(q.enqueue)
q.dequeue.evalMap(n => IO.delay(println(s"Pulling element $n from Queue")))
}
}
object testingQueues extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val stream = for {
q <- Queue.bounded(10)
b = new QueueInt(q)
_ <- b.startPushingtoQueue.drain
} yield ()
}
}
Question 1: I'm getting No implicit argument of type Concurrent[F_],
Knowing that I'm not using any concurrent effect I can not seem to figure out what a am I missing?
Question 2: What can I do to print the results.
Question 3: Can someone direct me to some resources to learn fs2
I found several issues in your code:
If you're getting an error about missing implicits you can very often fix them with explicitly stating type arguments:
q <- Queue.bounded[IO, Unit](10) // it will fix your error with implicits
The resulting type of your for comprehension is IO[Unit], but in order to make it run you'd have to return it from run method. You'd also need to change the type from unit to ExitCode:
stream.as(ExitCode.Success)
In your method startPushingToQueue you're creating Steam but you're not assigning it anywhere. It will just create a description of the stream, but it won't be run.
What I think you wanted to achieve is to create on the method which will push elements to queue and another which would get elements from the queue and print them. Please check my solution:
import cats.effect.{ ExitCode, IO, IOApp}
import fs2._
import fs2.concurrent.Queue
import scala.concurrent.duration._
class QueueInt(q: Queue[IO, Int])(implicit timer: Timer[IO]) { //I need implicit timer for metered
def startPushingToQueue: Stream[IO, Unit] = Stream(1, 2, 3)
.covary[IO]
.evalTap(n => IO.delay(println(s"Pushing element $n to Queue"))) //eval tap evaluates effect on an element but doesn't change stream
.metered(500.millis) //it will create 0.5 delay between enqueueing elements of stream,
// I added it to make visible that elements can be pushed and pulled from queue concurrently
.through(q.enqueue)
def pullAndPrintElements: Stream[IO, Unit] = q.dequeue.evalMap(n => IO.delay(println(s"Pulling element $n from Queue")))
}
object testingQueues extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val program = for {
q <- Queue.bounded[IO, Int](10)
b = new QueueInt(q)
_ <- b.startPushingToQueue.compile.drain.start //start at the end will start running stream in another Fiber
_ <- b.pullAndPrintElements.compile.drain //compile.draing compiles stream into io byt pulling all elements.
} yield ()
program.as(ExitCode.Success)
}
}
On console, you will see lines telling about pushing and pulling from queue interleaved.
If you remove start you will see that firstly stream from startPushingToQueue finishes after pushing all elements, and only then pullAndPrintElements starts.
If you're looking for good sources to learn fs2, I would suggest that you should start with checking out fs2-related talks. Prefer newer talks, than the old one, because they could reference the older API.
You should also check guide on fs2 documentation.
Related
I have a small test of fs2 streams, process elements, waiting and then write them to a file.
I'm getting a type error saying and I could not figure out what it does mean:
Error : required: fs2.Stream[[x]cats.effect.IO[x],Unit] => fs2.Stream[[+A]cats.effect.IO[A],Unit],
found : [F[_]]fs2.Pipe[F,Byte,Unit]
import java.nio.file.Paths
import cats.effect.{Blocker, ExitCode, IO, IOApp, Timer}
import fs2.Stream
import fs2.io
import fs2.concurrent.Queue
import scala.concurrent.duration._
import scala.util.Random
class StreamTypeIntToDouble(q: Queue[IO, Int])(implicit timer: Timer[IO]) {
import core.Processing._
val blocker: Blocker =
Blocker.liftExecutionContext(
scala.concurrent.ExecutionContext.Implicits.global
)
def storeInQueue: Stream[IO, Unit] = {
Stream(1, 2, 3)
.covary[IO]
.evalTap(n => IO.delay(println(s"Pushing $n to Queue")))
.metered(Random.between(1, 20).seconds)
.through(q.enqueue)
}
def getFromQueue: Stream[IO, Unit] = {
q.dequeue
.evalMap(n => IO.delay(println(s"Pulling from queue $n")))
.through(
io.file
.writeAll(Paths.get("file.txt"), blocker)
)
}
}
object Five extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val program = for {
q <- Queue.bounded[IO, Int](10)
b = new StreamTypeIntToDouble(q)
_ <- b.storeInQueue.compile.drain.start
_ <- b.getFromQueue.compile.drain
} yield ()
program.as(ExitCode.Success)
}
}
There are a couple of issues here, and the first is the most confusing. writeAll is polymorphic in its context F[_], but it requires a ContextShift instance for F (as well as Sync). You don't currently have a ContextShift[IO] in scope, so the compiler won't infer that the F for writeAll should be IO. If you add something like this:
implicit val ioContextShift: ContextShift[IO] =
IO.contextShift(scala.concurrent.ExecutionContext.Implicits.global)
…then the compiler will infer IO as you'd expect.
My suggestion for cases like this is to skip the type inference. Writing it out with the type parameter is only marginally more verbose:
.through(
io.file
.writeAll[IO](Paths.get("file.txt"), blocker)
)
…and it means that you'll get useful error messages for things like missing type class instances.
Once you've fixed that issue there are a couple of others. The next is that using evalMap in this context means that you'll have a stream of () values. If you change it to evalTap, the logging side effects will still happen appropriately, but you won't lose the actual values of the stream you call it on.
The last issue is that writeAll requires a stream of bytes, while you've given it a stream of Ints. How you want to deal with that discrepancy depends on the intended semantics, but for the sake of example something like .map(_.toByte) would make it compile.
I have an external (that is, I cannot change it) Java API which looks like this:
public interface Sender {
void send(Event e);
}
I need to implement a Sender which accepts each event, transforms it to a JSON object, collects some number of them into a single bundle and sends over HTTP to some endpoint. This all should be done asynchronously, without send() blocking the calling thread, with some fixed-size buffer and dropping new events if the buffer is full.
With akka-streams this is quite simple: I create a graph of stages (which uses akka-http to send HTTP requests), materialize it and use the materialized ActorRef to push new events to the stream:
lazy val eventPipeline = Source.actorRef[Event](Int.MaxValue, OverflowStrategy.fail)
.via(CustomBuffer(bufferSize)) // buffer all events
.groupedWithin(batchSize, flushDuration) // group events into chunks
.map(toBundle) // convert each chunk into a JSON message
.mapAsyncUnordered(1)(sendHttpRequest) // send an HTTP request
.toMat(Sink.foreach { response =>
// print HTTP response for debugging
})(Keep.both)
lazy val (eventsActor, completeFuture) = eventPipeline.run()
override def send(e: Event): Unit = {
eventsActor ! e
}
Here CustomBuffer is a custom GraphStage which is very similar to the library-provided Buffer but tailored to our specific needs; it probably does not matter for this particular question.
As you can see, interacting with the stream from non-stream code is very simple - the ! method on the ActorRef trait is asynchronous and does not need any additional machinery to be called. Each event which is sent to the actor is then processed through the entire reactive pipeline. Moreover, because of how akka-http is implemented, I even get connection pooling for free, so no more than one connection is opened to the server.
However, I cannot find a way to do the same thing with FS2 properly. Even discarding the question of buffering (I will probably need to write a custom Pipe implementation which does additional things that we need) and HTTP connection pooling, I'm still stuck with a more basic thing - that is, how to push the data to the reactive stream "from outside".
All tutorials and documentation that I can find assume that the entire program happens inside some effect context, usually IO. This is not my case - the send() method is invoked by the Java library at unspecified times. Therefore, I just cannot keep everything inside one IO action, I necessarily have to finalize the "push" action inside the send() method, and have the reactive stream as a separate entity, because I want to aggregate events and hopefully pool HTTP connections (which I believe is naturally tied to the reactive stream).
I assume that I need some additional data structure, like Queue. fs2 does indeed have some kind of fs2.concurrent.Queue, but again, all documentation shows how to use it inside a single IO context, so I assume that doing something like
val queue: Queue[IO, Event] = Queue.unbounded[IO, Event].unsafeRunSync()
and then using queue inside the stream definition and then separately inside the send() method with further unsafeRun calls:
val eventPipeline = queue.dequeue
.through(customBuffer(bufferSize))
.groupWithin(batchSize, flushDuration)
.map(toBundle)
.mapAsyncUnordered(1)(sendRequest)
.evalTap(response => ...)
.compile
.drain
eventPipeline.unsafeRunAsync(...) // or something
override def send(e: Event) {
queue.enqueue(e).unsafeRunSync()
}
is not the correct way and most likely would not even work.
So, my question is, how do I properly use fs2 to solve my problem?
Consider the following example:
import cats.implicits._
import cats.effect._
import cats.effect.implicits._
import fs2._
import fs2.concurrent.Queue
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
object Answer {
type Event = String
trait Sender {
def send(event: Event): Unit
}
def main(args: Array[String]): Unit = {
val sender: Sender = {
val ec = ExecutionContext.global
implicit val cs: ContextShift[IO] = IO.contextShift(ec)
implicit val timer: Timer[IO] = IO.timer(ec)
fs2Sender[IO](2)
}
val events = List("a", "b", "c", "d")
events.foreach { evt => new Thread(() => sender.send(evt)).start() }
Thread sleep 3000
}
def fs2Sender[F[_]: Timer : ContextShift](maxBufferedSize: Int)(implicit F: ConcurrentEffect[F]): Sender = {
// dummy impl
// this is where the actual logic for batching
// and shipping over the network would live
val consume: Pipe[F, Event, Unit] = _.evalMap { event =>
for {
_ <- F.delay { println(s"consuming [$event]...") }
_ <- Timer[F].sleep(1.seconds)
_ <- F.delay { println(s"...[$event] consumed") }
} yield ()
}
val suspended = for {
q <- Queue.bounded[F, Event](maxBufferedSize)
_ <- q.dequeue.through(consume).compile.drain.start
sender <- F.delay[Sender] { evt =>
val enqueue = for {
wasEnqueued <- q.offer1(evt)
_ <- F.delay { println(s"[$evt] enqueued? $wasEnqueued") }
} yield ()
enqueue.toIO.unsafeRunAsyncAndForget()
}
} yield sender
suspended.toIO.unsafeRunSync()
}
}
The main idea is to use a concurrent Queue from fs2. Note, that the above code demonstrates that neither the Sender interface nor the logic in main can be changed. Only an implementation of the Sender interface can be swapped out.
I don't have much experience with exactly that library but it should look somehow like that:
import cats.effect.{ExitCode, IO, IOApp}
import fs2.concurrent.Queue
case class Event(id: Int)
class JavaProducer{
new Thread(new Runnable {
override def run(): Unit = {
var id = 0
while(true){
Thread.sleep(1000)
id += 1
send(Event(id))
}
}
}).start()
def send(event: Event): Unit ={
println(s"Original producer prints $event")
}
}
class HackedProducer(queue: Queue[IO, Event]) extends JavaProducer {
override def send(event: Event): Unit = {
println(s"Hacked producer pushes $event")
queue.enqueue1(event).unsafeRunSync()
println(s"Hacked producer pushes $event - Pushed")
}
}
object Test extends IOApp{
override def run(args: List[String]): IO[ExitCode] = {
val x: IO[Unit] = for {
queue <- Queue.unbounded[IO, Event]
_ = new HackedProducer(queue)
done <- queue.dequeue.map(ev => {
println(s"Got $ev")
}).compile.drain
} yield done
x.map(_ => ExitCode.Success)
}
}
We can create a bounded queue that will consume elements from sender and make them available to fs2 stream processing.
import cats.effect.IO
import cats.effect.std.Queue
import fs2.Stream
trait Sender[T]:
def send(e: T): Unit
object Sender:
def apply[T](bufferSize: Int): IO[(Sender[T], Stream[IO, T])] =
for
q <- Queue.bounded[IO, T](bufferSize)
yield
val sender: Sender[T] = (e: T) => q.offer(e).unsafeRunSync()
def stm: Stream[IO, T] = Stream.eval(q.take) ++ stm
(sender, stm)
Then we'll have two ends - one for Java worlds, to send new elements to Sender. Another one - for stream processing in fs2.
class TestSenderQueue:
#Test def testSenderQueue: Unit =
val (sender, stream) = Sender[Int](1)
.unsafeRunSync()// we have to run it preliminary to make `sender` available to external system
val processing =
stream
.map(i => i * i)
.evalMap{ ii => IO{ println(ii)}}
sender.send(1)
processing.compile.toList.start//NB! we start processing in a separate fiber
.unsafeRunSync() // immediately right now.
sender.send(2)
Thread.sleep(100)
(0 until 100).foreach(sender.send)
println("finished")
Note that we push data in the current thread and have to run fs2 in a separate thread (.start).
This is what I'm trying right now but it only prints "hey" and not metrics.
I don't want to add metric related stuff in the main function.
import java.util.Date
import monix.eval.Task
import monix.execution.Scheduler.Implicits.global
import scala.concurrent.Await
import scala.concurrent.duration.Duration
class A {
def fellow(): Task[Unit] = {
val result = Task {
println("hey")
Thread.sleep(1000)
}
result
}
}
trait AA extends A {
override def fellow(): Task[Unit] = {
println("AA")
val result = super.fellow()
val start = new Date()
result.foreach(e => {
println("AA", new Date().getTime - start.getTime)
})
result
}
}
val a = new A with AA
val res: Task[Unit] = a.fellow()
Await.result(res.runAsync, Duration.Inf)
You can describe a function such as this:
def measure[A](task: Task[A], logMillis: Long => Task[Unit]): Task[A] =
Task.deferAction { sc =>
val start = sc.clockMonotonic(TimeUnit.MILLISECONDS)
val stopTimer = Task.suspend {
val end = sc.clockMonotonic(TimeUnit.MILLISECONDS)
logMillis(end - start)
}
task.redeemWith(
a => stopTimer.map(_ => a)
e => stopTimer.flatMap(_ => Task.raiseError(e))
)
}
Some piece of advice:
Task values should be pure, along with the functions returning Tasks — functions that trigger side effects and return Task as results are broken
Task is not a 1:1 replacement for Future; when describing a Task, all side effects should be suspended (wrapped) in Task
foreach triggers the Task's evaluation and that's not good, because it triggers the side effects; I was thinking of deprecating and removing it, since its presence is tempting
stop using trait inheritance and just use plain functions — unless you deeply understand OOP and subtyping, it's best to avoid it if possible; and if you're into the Cake pattern, stop doing it and maybe join a support group 🙂
never measure time duration via new Date(), you need a monotonic clock for that and on top of the JVM that's System.nanoTime, which can be accessed via Monix's Scheduler by clockMonotonic, as exemplified above, the Scheduler being given to you via deferAction
stop blocking threads, because that's error prone — instead of doing Thread.sleep, do Task.sleep and all Await.result calls are problematic, unless they are in main or in some other place where dealing with asynchrony is not possible
Hope this helps.
Cheers,
Like #Pierre mentioned, latest version of Monix Task has Task.timed, you can do
timed <- task.timed
(duration, t) = timed
I am a newbie to scala futures and I have a doubt regarding the return value of scala futures.
So, generally syntax for a scala future is
def downloadPage(url: URL) = Future[List[Int]] {
}
I want to know how to access the List[Int] from some other method which calls this method.
In other words,
val result = downloadPage("localhost")
then what should be the approach to get List[Int] out of the future ?
I have tried using map method but not able to do this successfully.`
The case of Success(listInt) => I want to return the listInt and I am not able to figure out how to do that.
The best practice is that you don't return the value. Instead you just pass the future (or a version transformed with map, flatMap, etc.) to everyone who needs this value and they can add their own onComplete.
If you really need to return it (e.g. when implementing a legacy method), then the only thing you can do is to block (e.g. with Await.result) and you need to decide how long to await.
You need to wait for the future to complete to get the result given some timespan, here's something that would work:
import scala.concurrent.duration._
def downloadPage(url: URL) = Future[List[Int]] {
List(1,2,3)
}
val result = downloadPage("localhost")
val myListInt = result.result(10 seconds)
Ideally, if you're using a Future, you don't want to block the executing thread, so you would move your logic that deals with the result of your Future into the onComplete method, something like this:
result.onComplete({
case Success(listInt) => {
//Do something with my list
}
case Failure(exception) => {
//Do something with my error
}
})
I hope you already solved this since it was asked in 2013 but maybe my answer can help someone else:
If you are using Play Framework, it support async Actions (actually all Actions are async inside). An easy way to create an async Action is using Action.async(). You need to provide a Future[Result]to this function.
Now you can just make transformations from your Future[List[Int]] to Future[Result] using Scala's map, flatMap, for-comprehension or async/await. Here an example from Play Framework documentation.
import play.api.libs.concurrent.Execution.Implicits.defaultContext
def index = Action.async {
val futureInt = scala.concurrent.Future { intensiveComputation() }
futureInt.map(i => Ok("Got result: " + i))
}
You can do something like that. If The wait time that is given in Await.result method is less than it takes the awaitable to execute, you will have a TimeoutException, and you need to handle the error (or any other error).
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{Try, Success, Failure}
import scala.concurrent.duration._
object MyObject {
def main(args: Array[String]) {
val myVal: Future[String] = Future { silly() }
// values less than 5 seconds will go to
// Failure case, because silly() will not be done yet
Try(Await.result(myVal, 10 seconds)) match {
case Success(extractedVal) => { println("Success Happened: " + extractedVal) }
case Failure(_) => { println("Failure Happened") }
case _ => { println("Very Strange") }
}
}
def silly(): String = {
Thread.sleep(5000)
"Hello from silly"
}
}
The best way I’ve found to think of a Future is a box that will, at some point, contain the thing that you want. The key thing with a Future is that you never open the box. Trying to force open the box will lead you to blocking and grief. Instead, you put the Future in another, larger box, typically using the map method.
Here’s an example of a Future that contains a String. When the Future completes, then Console.println is called:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Main {
def main(args:Array[String]) : Unit = {
val stringFuture: Future[String] = Future.successful("hello world!")
stringFuture.map {
someString =>
// if you use .foreach you avoid creating an extra Future, but we are proving
// the concept here...
Console.println(someString)
}
}
}
Note that in this case, we’re calling the main method and then… finishing. The string’s Future, provided by the global ExecutionContext, does the work of calling Console.println. This is great, because when we give up control over when someString is going to be there and when Console.println is going to be called, we let the system manage itself. In constrast, look what happens when we try to force the box open:
val stringFuture: Future[String] = Future.successful("hello world!")
val someString = Future.await(stringFuture)
In this case, we have to wait — keep a thread twiddling its thumbs — until we get someString back. We’ve opened the box, but we’ve had to commandeer the system’s resources to get at it.
It wasn't yet mentioned, so I want to emphasize the point of using Future with for-comprehension and the difference of sequential and parallel execution.
For example, for sequential execution:
object FuturesSequential extends App {
def job(n: Int) = Future {
Thread.sleep(1000)
println(s"Job $n")
}
val f = for {
f1 <- job(1)
f2 <- job(2)
f3 <- job(3)
f4 <- job(4)
f5 <- job(5)
} yield List(f1, f2, f3, f4, f5)
f.map(res => println(s"Done. ${res.size} jobs run"))
Thread.sleep(6000) // We need to prevent main thread from quitting too early
}
And for parallel execution (note that the Future are before the for-comprehension):
object FuturesParallel extends App {
def job(n: Int) = Future {
Thread.sleep(1000)
println(s"Job $n")
}
val j1 = job(1)
val j2 = job(2)
val j3 = job(3)
val j4 = job(4)
val j5 = job(5)
val f = for {
f1 <- j1
f2 <- j2
f3 <- j3
f4 <- j4
f5 <- j5
} yield List(f1, f2, f3, f4, f5)
f.map(res => println(s"Done. ${res.size} jobs run"))
Thread.sleep(6000) // We need to prevent main thread from quitting too early
}
My code which uses mapAync(1) doesn't work what I want it to do. But when I changed the mapAsync(1) to map by using Await.result, it works. So I have a question.
Does the following (A) Use map and (B) use mapAsync(1) yield the same result at anytime?
// (A) Use map
someSource
.map{r =>
val future = makeFuture(r) // returns the same future if r is the same
Await.result(future, Duration.Inf)
}
// (B) Use mapAsync(1)
someSource
.mapAsync(1){r =>
val future = makeFuture(r) // returns the same future if r is the same
future
}
Actually, I want to paste my real code, but it is too long to paste and has some dependencies of my original stages.
While semantically the type of both streams ends up being the same (Source[Int, NotUsed]), the style displayed in example (A) is very very bad – please don't block (Await) inside streams.
Such cases are exactly the use case for mapAsync. Your operation returns a Future[T], and you want to push that value downwards through the stream once the future completes. Please note that there is no blocking in mapAsync, it schedules a callback to push the value of the future internally and does so once it completes.
To answer your question about "do they do the same thing?", technically yes but the first one will cause performance issues in the threadpool you're running on, avoid map+blocking when mapAsync can do the job.
These calls are semantically very similar, although blocking by using Await is probably not a good idea. The type signature of both these calls is, of course, the same (Source[Int, NotUsed]), and in many cases these calls will produce the same results (blocking aside). The following, for example, which includes scheduled futures and a non-default supervision strategy for failures, gives the same results for both map with an Await inside and mapAsync:
import akka.actor._
import akka.stream.ActorAttributes.supervisionStrategy
import akka.stream.Supervision.resumingDecider
import akka.stream._
import akka.stream.scaladsl._
import scala.concurrent._
import scala.concurrent.duration._
import scala.language.postfixOps
object Main {
def main(args: Array[String]) {
implicit val system = ActorSystem("TestSystem")
implicit val materializer = ActorMaterializer()
import scala.concurrent.ExecutionContext.Implicits.global
import system.scheduler
def makeFuture(r: Int) = {
akka.pattern.after(2 seconds, scheduler) {
if (r % 3 == 0)
Future.failed(new Exception(s"Failure for input $r"))
else
Future(r + 100)
}
}
val someSource = Source(1 to 20)
val mapped = someSource
.map { r =>
val future = makeFuture(r)
Await.result(future, Duration.Inf)
}.withAttributes(supervisionStrategy(resumingDecider))
val mappedAsync = someSource
.mapAsyncUnordered(1) { r =>
val future = makeFuture(r)
future
}.withAttributes(supervisionStrategy(resumingDecider))
mapped runForeach println
mappedAsync runForeach println
}
}
It is possible that your upstream code is relying on the blocking behaviour in your map call in some way. Can you produce a concise reproduction of the issue that you are seeing?