FS2 stream to unread InputStream - scala

I'd like to convert fs2.Stream to java.io.InputStream so I can pass that input stream to an http framework (Finch and Akka Http).
I found a fs2.io.toInputStream, but this doesn't work (it prints nothing):
import java.io.{ByteArrayInputStream, InputStream}
import cats.effect.IO
import scala.concurrent.ExecutionContext.Implicits.global
object IOTest {
def main(args: Array[String]): Unit = {
val is: InputStream = new ByteArrayInputStream("test".getBytes)
val stream: fs2.Stream[IO, Byte] = fs2.io.readInputStream(IO(is), 128)
val test: Seq[InputStream] = stream.through(fs2.io.toInputStream).compile.toList.unsafeRunSync()
println(scala.io.Source.fromInputStream(test.head).mkString)
}
}
As far as I understand when I run .unsafeRunSync() it's consuming the whole stream, so even though it returns a Seq[InputStream] the under-laying input stream is already consumed.
Is there any way I can convert fs2.Stream[IO, Byte] to java.io.InputStream without it being consumed?
Thnaks!

The problem is that compile is being invoked prematurely. I'm sure that under the hood fs2.io.toInputStream does the correct thing and brackets the created InputStream. Which means that the InputStream must be accessed inside the Stream itself (e.g., in a map/flatMap call):
val wire: fs2.Stream[IO, Byte] = ???
val result: fs2.Stream[IO, String] = for {
is <- wire.through(fs2.io.toInputStream)
str = scala.io.Source.fromInputStream(is).mkString //<--- use the InputStream here
} yield str
println( result.compile.lastOrError.unsafeRunSync() ) //<--- compile at the _very_ end
Outputs:
test

It looks that Finch has fs2 support https://github.com/finagle/finch/tree/master/fs2 and Akka also has it's stream implementation and there are fs2 - Akka Stream interop libraries like https://github.com/krasserm/streamz/tree/master/streamz-converter
So i recommend you to take a look to the implementations because they take care of the resources life cycle. Probably you don't need the whole library but it serves as guideline.
And if you are starting at the "safe zone" with fs2, why moving out of there :)

Related

Stop the fs2-stream after a timeout

I want to use a function similar to take(n: Int) but in a time dimension:
consume(period: Duration. So I want a stream to terminate if a timeout occurs. I know that I could compile a stream to something like IO[List[T]] and cancel it, but then I'll lose the result. In reality I want to convert an endless stream into a limited one and preserve the results.
More on the wider scope of the problem. I have an endless stream of events from a messaging broker, but I also have rotating credentials to connect to the broker. So what I want is to consume the stream of events for some time, then stop, acquire new credentials, connect again to the broker creating a new stream and concatenate two streams into one.
There is a method, that does exactly this:
/**
* Interrupts this stream after the specified duration has passed.
*/
def interruptAfter[F2[x] >: F[x]: Concurrent: Timer](duration: FiniteDuration): Stream[F2, O]
You need something like that
import scala.util.Random
import scala.concurrent.ExecutionContext
import fs2._
import fs2.concurrent.SignallingRef
implicit val ex = ExecutionContext.global
implicit val t: Timer[IO] = IO.timer(ex)
implicit val cs: ContextShift[IO] = IO.contextShift(ex)
val effect: IO[Long] = IO.sleep(1.second).flatMap(_ => IO{
val next = Random.nextLong()
println("NEXT: " + next)
next
})
val signal = SignallingRef[IO, Boolean](false).unsafeRunSync()
val timer = Stream.sleep(10.seconds).flatMap(_ =>
Stream.eval(signal.set(true)).flatMap(_ =>
Stream.emit(println("Finish")).covary[IO]))
val stream = timer concurrently
Stream.repeatEval(effect).interruptWhen(signal)
stream.compile.drain.unsafeRunSync()
Also if you want to save your result of publishing data you need to have some unbounded Queue from fs2 for converting published data to your result via queue.stream

Change a materialized value in a source using the contents of the stream

Alpakka provides a great way to access dozens of different data sources. File oriented sources such as HDFS and FTP sources are delivered as Source[ByteString, Future[IOResult]. However, HTTP requests via Akka HTTP are delivered as entity streams of Source[ByteString, NotUsed]. In my use case, I would like to retrieve content from HTTP sources as Source[ByteString, Future[IOResult] so I can build a unified resource fetcher that works from multiple schemes (hdfs, file, ftp and S3 in this case).
In particular, I would like to convert the Source[ByteString, NotUsed] source to
Source[ByteString, Future[IOResult] where I am able to calculate the IOResult from the incoming byte stream. There are plenty of methods like flatMapConcat and viaMat but none seem to be able to extract details from the input stream (such as number of bytes read) or initialise the IOResult structure properly. Ideally, I am looking for a method with the following signature that will update the IOResult as the stream comes in.
def matCalc(src: Source[ByteString, Any]) = Source[ByteString, Future[IOResult]] = {
src.someMatFoldMagic[ByteString, IOResult](IOResult.createSuccessful(0))(m, b) => m.withCount(m.count + b.length))
}
i can't recall any existing functionality, which can out of the box do this, but you can use alsoToMat (surprisingly didn't find it in akka streams docs, although you can look it in source code documentation & java api) flow function together with Sink.fold to accumulate some value and give it in the very end. eg:
def magic(source: Source[Int, Any]): Source[Int, Future[Int]] =
source.alsoToMat(Sink.fold(0)((acc, _) => acc + 1))((_, f) => f)
the thing is that alsoToMat combines input mat value with the one provided in alsoToMat. at the same time the values produced by source are not affected by the sink in alsoToMat:
def alsoToMat[Mat2, Mat3](that: Graph[SinkShape[Out], Mat2])(matF: (Mat, Mat2) ⇒ Mat3): ReprMat[Out, Mat3] =
viaMat(alsoToGraph(that))(matF)
it's not that hard to adapt this function to return IOResult, which is according to the source code:
final case class IOResult(count: Long, status: Try[Done]) { ... }
one more last thing which you need to pay attention - you want your source be like:
Source[ByteString, Future[IOResult]]
but if you wan't to carry these mat value till the very end of stream definition, and then do smth based on this future completion, that might be error prone approach. eg, in this example i finish the work based on that future, so the last value will not be processed:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Keep, Sink, Source}
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
object App extends App {
private implicit val sys: ActorSystem = ActorSystem()
private implicit val mat: ActorMaterializer = ActorMaterializer()
private implicit val ec: ExecutionContext = sys.dispatcher
val source: Source[Int, Any] = Source((1 to 5).toList)
def magic(source: Source[Int, Any]): Source[Int, Future[Int]] =
source.alsoToMat(Sink.fold(0)((acc, _) => acc + 1))((_, f) => f)
val f = magic(source).throttle(1, 1.second).toMat(Sink.foreach(println))(Keep.left).run()
f.onComplete(t => println(s"f1 completed - $t"))
Await.ready(f, 5.minutes)
mat.shutdown()
sys.terminate()
}
This can be done by using a Promise for the materialized value propagation.
val completion = Promise[IoResult]
val httpWithIoResult = http.mapMaterializedValue(_ => completion.future)
What is left now is to complete the completion promise when the relevant data becomes available.
Alternative approach would be to drop down to the GraphStage API where you get lower level control of materialized value propagation. But even there using Promises is often the chosen implementation for materialized value propagation. Take a look at built in operator implementations like Ignore.

FS2 stream run till the end of InputStream

I'm very new to FS2 and need some help about the desing. I'm trying to design a stream which will pull the chunks from the underlying InputStream till it's over. Here is what I tried:
import java.io.{File, FileInputStream, InputStream}
import cats.effect.IO
import cats.effect.IO._
object Fs2 {
def main(args: Array[String]): Unit = {
val is = new FileInputStream(new File("/tmp/my-file.mf"))
val stream = fs2.Stream.eval(read(is))
stream.compile.drain.unsafeRunSync()
}
def read(is: InputStream): IO[Array[Byte]] = IO {
val buf = new Array[Byte](4096)
is.read(buf)
println(new String(buf))
buf
}
}
And the program prints the only first chunk. This is reasonable. But I want to find a way to "signal" where to stop reading and where to not stop. I mean keep calling read(is) till its end. Is there a way to achieve that?
I also tried repeatEval(read(is)) but it keeps reading forever... I need something in between.
Use fs2.io.readInputStream or fs2.io.readInputStreamAsync. The former blocks the current thread; the latter blocks a thread in the ExecutionContext. For example:
val is: InputStream = new FileInputStream(new File("/tmp/my-file.mf"))
val stream = fs2.io.readInputStreamAsync(IO(is), 128)

How do you write to and read from an external process using scalaz streams

I would like to be able to send data from a scalaz stream into an external program and then get the result of that item back in about 100ms in the future. Although I was able to do this with the code below by zipping the output stream Sink with the input stream Process and then throwing away the Sink side effect, I feel like this solution may be very brittle.
If the external program has an error for one of the input items everything will be out of sync. I feel like the best bet would be to send some sort of incremental ID into the external program which it can echo back out in the future so that if an error occurs we can resync.
The main trouble I am having is joining together the result of sending data into the external program Process[Task, Unit] with the output of the program Process[Task, String]. I feel like I should be using something from wyn but not really sure.
import java.io.PrintStream
import scalaz._
import scalaz.concurrent.Task
import scalaz.stream.Process._
import scalaz.stream._
object Main extends App {
/*
# echo.sh just prints to stdout what it gets on stdin
while read line; do
sleep 0.1
echo $line
done
*/
val p: java.lang.Process = Runtime.getRuntime.exec("/path/to/echo.sh")
val source: Process[Task, String] = Process.repeatEval(Task{
Thread.sleep(1000)
System.currentTimeMillis().toString
})
val linesR: stream.Process[Task, String] = stream.io.linesR(p.getInputStream)
val printLines: Sink[Task, String] = stream.io.printLines(new PrintStream(p.getOutputStream))
val in: Process[Task, Unit] = source to printLines
val zip: Process[Task, (Unit, String)] = in.zip(linesR)
val out: Process[Task, String] = zip.map(_._2) observe stream.io.stdOutLines
out.run.run
}
After delving a little deeper into the more advanced types. It looks like Exchange does exactly what I want.
import java.io.PrintStream
import scalaz._
import scalaz.concurrent.Task
import scalaz.stream._
import scalaz.stream.io._
object Main extends App {
/*
# echo.sh just prints to stdout what it gets on stdin
while read line; do
sleep 0.1
echo $line
done
*/
val program: java.lang.Process = Runtime.getRuntime.exec("./echo.sh")
val source: Process[Task, String] = Process.repeatEval(Task{
Thread.sleep(100)
System.currentTimeMillis().toString
})
val read: stream.Process[Task, String] = linesR(program.getInputStream)
val write: Sink[Task, String] = printLines(new PrintStream(program.getOutputStream))
val exchange: Exchange[String, String] = Exchange(read, write)
println(exchange.run(source).take(10).runLog.run)
}

I just want to get an HTML page using spray

But their documentation looks assuming I'm already familiar with Scala, Akka and Spray itself. I mean I couldn't find out how to do this simple basic thing, that I would love to have as one snippet of code in their home page...
The only thing I could find is how to build a request with their spray-httpx:
import spray.httpx.RequestBuilder._
val req = Get("http://url")
The object doesn't have operation to send itself to anywhere, so I'm sure I'm supposed to use Akka things to do it, but their documentation doesn't show the process. Please tell me how to do it. If spray-can do the same thing, I know it can, I would prefer the way.
There is an example here: http://spray.io/documentation/1.1-SNAPSHOT/spray-client/
import spray.http._
import spray.client.pipelining._
implicit val system = ActorSystem()
import system.dispatcher // execution context for futures
val pipeline: HttpRequest => Future[HttpResponse] = sendReceive
val response: Future[HttpResponse] = pipeline(Get("http://spray.io/"))
and even simpler example here: https://github.com/spray/spray/wiki/spray-client
val conduit = new HttpConduit("github.com")
val responseFuture = conduit.sendReceive(HttpRequest(GET, uri = "/"))
In both cases you have to process the result like you normally process a Future, e.g.:
for {response <- responseFuture} yield { someFunction(response) }