How to produce a Traversable with cats-effect's IO given an async that will call multiple times - scala

What I'm really trying to do is monitor multiple files and when any of them is modified I'd like to update some state and produce a side effect using this state. I imagine what I want is a scan over a Traversable that produces a Traversable[IO[_]]. But I don't see the path there.
as a minimal attempt to produce this I wrote
package example
import better.files.{File, FileMonitor}
import cats.implicits._
import com.monovore.decline._
import cats.effect.IO
import java.nio.file.{Files, Path}
import scala.concurrent.ExecutionContext.Implicits.global
object Hello extends CommandApp(
name = "cats-effects-playground",
header = "welcome",
main = {
val filesOpts = Opts.options[Path]("input", help = "input files")
filesOpts.map { files =>
IO.async[File] { cb =>
val watchers = files.map { path =>
new FileMonitor(path, recursive = false) {
override def onModify(file: File, count: Int) = cb(Right(file))
}
}
watchers.toList.foreach(_.start)
}
.flatMap(f => IO { println(f) })
.unsafeRunSync
}
}
)
but this has two major flaws. One it creates a thread for each file I'm watching, which is a little heavy. But more importantly the program finishes as soon as a single file is modified, even though onModify would be called more times if the program stayed running.
I'm not married to using better-files, it just seemed like the path of least resistance. But I do require using Cats IO.

This solution doesn't solve the issue of creating a bunch of threads, and it doesn't strictly produce a Traversable, but it solves the underlying use case. I'm very open to this being critiqued and a better solution provided.
package example
import better.files.{File, FileMonitor}
import cats.implicits._
import com.monovore.decline._
import cats.effect.IO
import java.nio.file.{Files, Path}
import java.util.concurrent.LinkedBlockingQueue
import scala.concurrent.ExecutionContext.Implicits.global
object Hello extends CommandApp(
name = "cats-effects-playground",
header = "welcome",
main = {
val filesOpts = Opts.options[Path]("input", help = "input files")
filesOpts.map { files =>
val bq: LinkedBlockingQueue[IO[File]] = new LinkedBlockingQueue()
val watchers = files.map { path =>
new FileMonitor(path, recursive = false) {
override def onModify(file: File, count: Int) = bq.put(IO(file))
}
}
def ioLoop(): IO[Unit] = bq.take()
.flatMap(f => IO(println(f)))
.flatMap(_ => ioLoop())
watchers.toList.foreach(_.start)
ioLoop.unsafeRunSync
}
}
)

Related

How to reserve ZIO response inside a custom method in

I have this method
import ClientServer.*
import zio.http.{Client, *}
import zio.json.*
import zio.http.model.Method
import zio.{ExitCode, URIO, ZIO}
import sttp.capabilities.*
import sttp.client3.Request
import zio.*
import zio.http.model.Headers.Header
import zio.http.model.Version.Http_1_0
import zio.stream.*
import java.net.InetAddress
import sttp.model.sse.ServerSentEvent
import sttp.client3._
object fillFileWithLeagues:
def fill = for {
openDotaResponse <- Client.request("https://api.opendota.com/api/leagues")
bodyOfResponse <- openDotaResponse.body.asString
listOfLeagues <- ZIO.fromEither(bodyOfResponse.fromJson[List[League]].left.map(error => new Exception(error)))
save = FileStorage.saveToFile(listOfLeagues.toJson) //Ok
}yield ()
println("Im here fillFileWithLeagues.fill ")
and when I try use
fillFileWithLeagues.fill
nothing happens
I'm trying fill file with data from target api using
fillFileWithLeagues.fill
def readFromFileV8(path: Path = Path("src", "main", "resources", "data.json")): ZIO[Any, Throwable, String] =
val zioStr = (for bool <- Files.isReadable(path) yield bool).flatMap(bool =>
if (bool) Files.readAllLines(path, Charset.Standard.utf8).map(_.head)
else {
fillFileWithLeagues.fill
wait(10000)
println("im here readFromFileV8")
readFromFileV8()})
zioStr
I'm expecting data.json file must created from
Client.request("https://api.opendota.com/api/leagues")
but there is nothing happens
Maybe I should use some sttp, or some other tools?
If we fix indentation of the code we'll find this:
object fillFileWithLeagues {
def fill = {
for {
openDotaResponse <- Client.request("https://api.opendota.com/api/leagues")
bodyOfResponse <- openDotaResponse.body.asString
listOfLeagues <- ZIO.fromEither(bodyOfResponse.fromJson[List[League]].left.map(error => new Exception(error)))
save = FileStorage.saveToFile(listOfLeagues.toJson) //Ok
} yield ()
}
println("Im here fillFileWithLeagues.fill ")
}
As you see the println is part of fillFileWithLeagues, not of fill.
Another potential problem is that an expression like fillFileWithLeagues.fill only returns a ZIO instance, it is not yet evaluated. To evaluate it, it needs to be run. For example as follows:
import zio._
object MainApp extends ZIOAppDefault {
def run = fillFileWithLeagues.fill
}

Akka HTTP - max-open-requests and substreams?

I'm writing an app using Scala 2.13 with Akka HTTP 10.2.4 and Akka Stream 2.6.15. I'm trying to query a web service in a parallel manner, like so:
package com.example
import akka.actor.typed.scaladsl.ActorContext
import akka.http.scaladsl.Http
import akka.http.scaladsl.client.RequestBuilding.Get
import akka.http.scaladsl.model.HttpResponse
import akka.http.scaladsl.unmarshalling.Unmarshal
import akka.stream.scaladsl.{Flow, JsonFraming, Sink, Source}
import spray.json.DefaultJsonProtocol
import spray.json.DefaultJsonProtocol.jsonFormat2
import scala.util.Try
case class ClientStockPortfolio(id: Long, symbol: String)
case class StockTicker(symbol: String, price: Double)
trait SprayFormat extends DefaultJsonProtocol {
implicit val stockTickerFormat = jsonFormat2(StockTicker)
}
class StockTrader(context: ActorContext[_]) extends SprayFormat {
implicit val system = context.system.classicSystem
val httpPool = Http().superPool()[Seq[ClientStockPortfolio]]
def collectPrices() = {
val src = Source(Seq(
ClientStockPortfolio(1, "GOOG"),
ClientStockPortfolio(2, "AMZN"),
ClientStockPortfolio(3, "MSFT")
)
)
val graph = src
.groupBy(8, _.id % 8)
.via(createPost)
.via(httpPool)
.via(decodeTicker)
.mergeSubstreamsWithParallelism(8)
.to(
Sink.fold(0.0) { (totalPrice, ticker) =>
insertIntoDatabase(ticker)
totalPrice + ticker.price
}
)
graph.run()
}
def createPost = Flow[ClientStockPortfolio]
.grouped(10)
.map { port =>
(
Get(uri = s"http://wherever/?symbols=${port.map(_.symbol).mkString(",")}"),
port
)
}
def decodeTicker = Flow[(Try[HttpResponse], Seq[ClientStockPortfolio])]
.flatMapConcat { x =>
x._1.get.entity.dataBytes
.via(JsonFraming.objectScanner(Int.MaxValue))
.mapAsync(4)(bytes => Unmarshal(bytes).to[StockTicker])
.mapConcat { ticker =>
lookupPreviousPrices(ticker)
}
}
def lookupPreviousPrices(ticker: StockTicker): List[StockTicker] = ???
def insertIntoDatabase(ticker: StockTicker) = ???
}
I have two questions. First, will the groupBy call that splits the stream into substreams run them in parallel like I want? And second, when I call this code, I run into the max-open-requests error, since I haven't increased the setting from the default. But even if I am running in parallel, I'm only running 8 threads - how is the Http().superPool() getting backed up with 32 requests?

Monitoring Akka Streams Sources with ScalaFX

What Im trying to solve is the following case:
Given an infinite running Akka Stream I want to be able to monitor certain points of the stream. The best way I could think of where to send the messages at this point to an Actor wich is also a Source. This makes it very flexible for me to then connect either individual Sources or merge multiple sources to a websocket or whatever other client I want to connect. However in this specific case Im trying to connect ScalaFX with Akka Source but it is not working as expected.
When I run the code below both counters start out ok but after a short while one of them stops and never recovers. I know there are special considerations with threading when using ScalaFX but I dont have the knowledge enough to understand what is going on here or debug it. Below is a minimal example to run, the issue should be visible after about 5 seconds.
My question is:
How could I change this code to work as expected?
import akka.NotUsed
import scalafx.Includes._
import akka.actor.{ActorRef, ActorSystem}
import akka.stream.{ActorMaterializer, OverflowStrategy, ThrottleMode}
import akka.stream.scaladsl.{Flow, Sink, Source}
import scalafx.application.JFXApp
import scalafx.beans.property.{IntegerProperty, StringProperty}
import scalafx.scene.Scene
import scalafx.scene.layout.BorderPane
import scalafx.scene.text.Text
import scala.concurrent.duration._
/**
* Created by henke on 2017-06-10.
*/
object MonitorApp extends JFXApp {
implicit val system = ActorSystem("monitor")
implicit val mat = ActorMaterializer()
val value1 = StringProperty("0")
val value2 = StringProperty("0")
stage = new JFXApp.PrimaryStage {
title = "Akka Stream Monitor"
scene = new Scene(600, 400) {
root = new BorderPane() {
left = new Text {
text <== value1
}
right = new Text {
text <== value2
}
}
}
}
override def stopApp() = system.terminate()
val monitor1 = createMonitor[Int]
val monitor2 = createMonitor[Int]
val marketChangeActor1 = monitor1
.to(Sink.foreach{ v =>
value1() = v.toString
}).run()
val marketChangeActor2 = monitor2
.to(Sink.foreach{ v =>
value2() = v.toString
}).run()
val monitorActor = Source[Int](1 to 100)
.throttle(1, 1.second, 1, ThrottleMode.shaping)
.via(logToMonitorAndContinue(marketChangeActor1))
.map(_ * 10)
.via(logToMonitorAndContinue(marketChangeActor2))
.to(Sink.ignore).run()
def createMonitor[T]: Source[T, ActorRef] = Source.actorRef[T](Int.MaxValue, OverflowStrategy.fail)
def logToMonitorAndContinue[T](monitor: ActorRef): Flow[T, T, NotUsed] = {
Flow[T].map{ e =>
monitor ! e
e
}
}
}
It seems that you assign values to the properties (and therefore affect the UI) in the actor system threads. However, all interaction with the UI should be done in the JavaFX GUI thread. Try wrapping value1() = v.toString and the second one in Platform.runLater calls.
I wasn't able to find a definitive statement about using runLater to interact with JavaFX data except in the JavaFX-Swing integration document, but this is quite a common thing in UI libraries; same is also true for Swing with its SwingUtilities.invokeLater method, for example.

Scala: Parallel execution with ListBuffer appends doesn't produce expected outcome

I know I'm doing something wrong with mutable.ListBuffer but I can't figure out how to fix it (and a proper explanation of the issue).
I simplified the code below to reproduce the behavior.
I'm basically trying to run functions in parallel to add elements to a list as my first list get processed. I end up "losing" elements.
import java.util.Properties
import scala.collection.mutable.ListBuffer
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
import scala.concurrent.{ExecutionContext}
import ExecutionContext.Implicits.global
object MyTestObject {
var listBufferOfInts = new ListBuffer[Int]() // files that are processed
def runFunction(): Int = {
listBufferOfInts = new ListBuffer[Int]()
val inputListOfInts = 1 to 1000
val fut = Future.traverse(inputListOfInts) { i =>
Future {
appendElem(i)
}
}
Await.ready(fut, Duration.Inf)
listBufferOfInts.length
}
def appendElem(elem: Int): Unit = {
listBufferOfInts ++= List(elem)
}
}
MyTestObject.runFunction()
MyTestObject.runFunction()
MyTestObject.runFunction()
which returns:
res0: Int = 937
res1: Int = 992
res2: Int = 997
Obviously I would expect 1000 to be returned all the time. How can I fix my code to keep the "architecture" but make my ListBuffer "synchronized" ?
I don't know what exact problem is as you said you simplified it, but still you have an obvious race condition, multiple threads modify a single mutable collection and that is very bad. As other answers pointed out you need some locking so that only one thread could modify collection at the same time. If your calculations are heavy, appending result in synchronized way to a buffer shouldn't notably affect the performance but when in doubt always measure.
But synchronization is not needed, you can do something else instead, without vars and mutable state. Let each Future return your partial result and then merge them into a list, in fact Future.traverse does just that.
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
import scala.concurrent.ExecutionContext.Implicits.global
def runFunction: Int = {
val inputListOfInts = 1 to 1000
val fut: Future[List[Int]] = Future.traverse(inputListOfInts.toList) { i =>
Future {
// some heavy calculations on i
i * 4
}
}
val listOfInts = Await.result(fut, Duration.Inf)
listOfInts.size
}
Future.traverse already gives you an immutable list with all your results combined, no need to append them to a mutable buffer.
Needless to say, you will always get 1000 back.
# List.fill(10000)(runFunction).exists(_ != 1000)
res18: Boolean = false
I'm not sure the above shows what you are trying to do correctly. Maybe the issue is that you are actually sharing a var ListBuffer which you reinitialise within runFunction.
When I take this out I collect all the events I'm expecting correctly:
import java.util.Properties
import scala.collection.mutable.ListBuffer
import scala.concurrent.duration.Duration
import scala.concurrent.{ Await, Future }
import scala.concurrent.{ ExecutionContext }
import ExecutionContext.Implicits.global
object BrokenTestObject extends App {
var listBufferOfInts = ( new ListBuffer[Int]() )
def runFunction(): Int = {
val inputListOfInts = 1 to 1000
val fut = Future.traverse(inputListOfInts) { i =>
Future {
appendElem(i)
}
}
Await.ready(fut, Duration.Inf)
listBufferOfInts.length
}
def appendElem(elem: Int): Unit = {
listBufferOfInts.append( elem )
}
BrokenTestObject.runFunction()
BrokenTestObject.runFunction()
BrokenTestObject.runFunction()
println(s"collected ${listBufferOfInts.length} elements")
}
If you really have a synchronisation issue you can use something like the following:
import java.util.Properties
import scala.collection.mutable.ListBuffer
import scala.concurrent.duration.Duration
import scala.concurrent.{ Await, Future }
import scala.concurrent.{ ExecutionContext }
import ExecutionContext.Implicits.global
class WrappedListBuffer(val lb: ListBuffer[Int]) {
def append(i: Int) {
this.synchronized {
lb.append(i)
}
}
}
object MyTestObject extends App {
var listBufferOfInts = new WrappedListBuffer( new ListBuffer[Int]() )
def runFunction(): Int = {
val inputListOfInts = 1 to 1000
val fut = Future.traverse(inputListOfInts) { i =>
Future {
appendElem(i)
}
}
Await.ready(fut, Duration.Inf)
listBufferOfInts.lb.length
}
def appendElem(elem: Int): Unit = {
listBufferOfInts.append( elem )
}
MyTestObject.runFunction()
MyTestObject.runFunction()
MyTestObject.runFunction()
println(s"collected ${listBufferOfInts.lb.size} elements")
}
Changing
listBufferOfInts ++= List(elem)
to
synchronized {
listBufferOfInts ++= List(elem)
}
Make it work. Probably can become a performance issue? I'm still interested in an explanation and maybe a better way of doing things!

Creating a time-based chunking Enumeratee

I want to create a Play 2 Enumeratee that takes in values and outputs them, chunked together, every x seconds/milliseconds. That way, in a multi-user websocket environment with lots of user input, one could limit the number of received frames per second.
I know that it's possible to group a set number of items together like this:
val chunker = Enumeratee.grouped(
Traversable.take[Array[Double]](5000) &>> Iteratee.consume()
)
Is there a built-in way to do this based on time rather than based on the number of items?
I was thinking about doing this somehow with a scheduled Akka job, but on first sight this seems inefficient, and I'm not sure if concurency issues would arise.
How about like this? I hope this is helpful for you.
package controllers
import play.api._
import play.api.Play.current
import play.api.mvc._
import play.api.libs.iteratee._
import play.api.libs.concurrent.Akka
import play.api.libs.concurrent.Promise
object Application extends Controller {
def index = Action {
val queue = new scala.collection.mutable.Queue[String]
Akka.future {
while( true ){
Logger.info("hogehogehoge")
queue += System.currentTimeMillis.toString
Thread.sleep(100)
}
}
val timeStream = Enumerator.fromCallback { () =>
Promise.timeout(Some(queue), 200)
}
Ok.stream(timeStream.through(Enumeratee.map[scala.collection.mutable.Queue[String]]({ queue =>
var str = ""
while(queue.nonEmpty){
str += queue.dequeue + ", "
}
str
})))
}
}
And this document is also helpful for you.
http://www.playframework.com/documentation/2.0/Enumerators
UPDATE
This is for play2.1 version.
package controllers
import play.api._
import play.api.Play.current
import play.api.mvc._
import play.api.libs.iteratee._
import play.api.libs.concurrent.Akka
import play.api.libs.concurrent.Promise
import scala.concurrent._
import ExecutionContext.Implicits.global
object Application extends Controller {
def index = Action {
val queue = new scala.collection.mutable.Queue[String]
Akka.future {
while( true ){
Logger.info("hogehogehoge")
queue += System.currentTimeMillis.toString
Thread.sleep(100)
}
}
val timeStream = Enumerator.repeatM{
Promise.timeout(queue, 200)
}
Ok.stream(timeStream.through(Enumeratee.map[scala.collection.mutable.Queue[String]]({ queue =>
var str = ""
while(queue.nonEmpty){
str += queue.dequeue + ", "
}
str
})))
}
}
Here I've quickly defined an iteratee that will take values from an input for a fixed time length t measured in milliseconds and an enumeratee that will allow you to group and further process an input stream divided into segments constructed within such length t. It relies on JodaTime to keep track of how much time has passed since the iteratee began.
def throttledTakeIteratee[E](timeInMillis: Long): Iteratee[E, List[E]] = {
var startTime = new Instant()
def step(state: List[E])(input: Input[E]): Iteratee[E, List[E]] = {
val timePassed = new Interval(startTime, new Instant()).toDurationMillis
input match {
case Input.EOF => { startTime = new Instant; Done(state, Input.EOF) }
case Input.Empty => Cont[E, List[E]](i => step(state)(i))
case Input.El(e) =>
if (timePassed >= timeInMillis) { startTime = new Instant; Done(e::state, Input.Empty) }
else Cont[E, List[E]](i => step(e::state)(i))
}
}
Cont(step(List[E]()))
}
def throttledTake[T](timeInMillis: Long) = Enumeratee.grouped(throttledTakeIteratee[T](timeInMillis))