Grouping event with fs2.Stream - scala

I have event stream as follows:
sealed trait Event
val eventStream: fs2.Stream[IO, Event] = //...
I want to group this events received within a single minute (i.e from 0 sec to 59 sec of every minute). This sounds pretty straightforward with fs2
val groupedEventsStream = eventStream groupAdjacentBy {event =>
TimeUnit.MILLISECONDS.toMinutes(System.currentTimeMillis())
}
The problem is that the grouping function is not pure. It uses currentTimeMillis. I can workaroud this as follows:
stream.evalMap(t => IO(System.currentTimeMillis(), t))
.groupAdjacentBy(t => TimeUnit.MILLISECONDS.toMinutes(t._1))
The thing is that adds clumsy boilerplate with tuples I'd like to avoid. Is there any other solutions?
Or maybe using impure function is not that bad for such a case?

You could remove some of the boilerplate by using cats.effect.Clock:
def groupedEventsStream[A](stream: fs2.Stream[IO, A])
(implicit clock: Clock[IO], eq: Eq[Long]): fs2.Stream[IO, (Long, Chunk[(Long, A)])] =
stream.evalMap(t => clock.realTime(TimeUnit.MINUTES).map((_, t)))
.groupAdjacentBy(_._1)

Related

How to functionally handle a logging side effect

I want to log in the event that a record doesn't have an adjoining record. Is there a purely functional way to do this? One that separates the side effect from the data transformation?
Here's an example of what I need to do:
val records: Seq[Record] = Seq(record1, record2, ...)
val accountsMap: Map[Long, Account] = Map(record1.id -> account1, ...)
def withAccount(accountsMap: Map[Long, Account])(r: Record): (Record, Option[Account]) = {
(r, accountsMap.get(r.id))
}
def handleNoAccounts(tuple: (Record, Option[Account]) = {
val (r, a) = tuple
if (a.isEmpty) logger.error(s"no account for ${record.id}")
tuple
}
def toRichAccount(tuple: (Record, Option[Account]) = {
val (r, a) = tuple
a.map(acct => RichAccount(r, acct))
}
records
.map(withAccount(accountsMap))
.map(handleNoAccounts) // if no account is found, log
.flatMap(toRichAccount)
So there are multiple issues with this approach that I think make it less than optimal.
The tuple return type is clumsy. I have to destructure the tuple in both of the latter two functions.
The logging function has to handle the logging and then return the tuple with no changes. It feels weird that this is passed to .map even though no transformation is taking place -- maybe there is a better way to get this side effect.
Is there a functional way to clean this up?
I could be wrong (I often am) but I think this does everything that's required.
records
.flatMap(r =>
accountsMap.get(r.id).fold{
logger.error(s"no account for ${r.id}")
Option.empty[RichAccount]
}{a => Some(RichAccount(r,a))})
If you're using scala 2.13 or newer you could use tapEach, which takes function A => Unit to apply side effect on every element of function and then passes collection unchanged:
//you no longer need to return tuple in side-effecting function
def handleNoAccounts(tuple: (Record, Option[Account]): Unit = {
val (r, a) = tuple
if (a.isEmpty) logger.error(s"no account for ${record.id}")
}
records
.map(withAccount(accountsMap))
.tapEach(handleNoAccounts) // if no account is found, log
.flatMap(toRichAccount)
In case you're using older Scala, you could provide extension method (updated according to Levi's Ramsey suggestion):
implicit class SeqOps[A](s: Seq[A]) {
def tapEach(f: A => Unit): Seq[A] = {
s.foreach(f)
s
}
}

Akka combining Sinks without access to Flows

I am using an API that accepts a single AKKA Sink and fills it with data:
def fillSink(sink:Sink[String, _])
Is there a way, without delving into the depths of akka, to handle the output with two sinks instead of one?
For example
val mySink1:Sink = ...
val mySink2:Sink = ...
//something
fillSink( bothSinks )
If I had access to the Flow used by the fillSink method I could use flow.alsoTo(mySink1).to(mySink2) but the flow is not exposed.
The only workaround at the moment is to pass a single Sink which handles the strings and passes it on to two StringBuilders to replace mySink1/mySink2, but it feels like that defeats the point of AKKA. Without spending a couple days learning AKKA, I can't tell if there is a way to split output from sinks.
Thanks!
The combine Sink operator, which combines two or more Sinks using a provided Int => Graph[UniformFanOutShape[T, U], NotUsed]] function, might be what you're seeking:
def combine[T, U](first: Sink[U, _], second: Sink[U, _], rest: Sink[U, _]*)(strategy: Int => Graph[UniformFanOutShape[T, U], NotUsed]): Sink[T, NotUsed]
A trivialized example:
val doubleSink = Sink.foreach[Int](i => println(s"Doubler: ${i*2}"))
val tripleSink = Sink.foreach[Int](i => println(s" Triper: ${i*3}"))
val combinedSink = Sink.combine(doubleSink, tripleSink)(Broadcast[Int](_))
Source(List(1, 2, 3)).runWith(combinedSink)
// Doubler: 2
// Triper: 3
// Doubler: 4
// Triper: 6
// Doubler: 6
// Triper: 9

Scala - evaluate function calls sequentially until one return

I have a few 'legacy' endpoints that can return the Data I'm looking for.
def mainCall(id): Data {
maybeMyDataInEndpoint1(id: UUID): DataA
maybeMyDataInEndpoint2(id: UUID): DataB
maybeMyDataInEndpoint3(id: UUID): DataC
}
null can be returned if no DataX found
return types for each method are different. There are a convert method that converting each DataX to unified Data.
The endpoints are not Scala-ish
What is the best Scala approach to evaluate those method calls sequentially until I have the value I need?
In pseudo I would do something like:
val myData = maybeMyDataInEndpoint1 getOrElse maybeMyDataInEndpoint2 getOrElse maybeMyDataInEndpoint3
I'd use an easier approach, though the other Answers use more elaborate language features.
Just use Option() to catch the null, chain with orElse. I'm assuming methods convertX(d:DataX):Data for explicit conversion. As it might not be found at all we return an Option
def mainCall(id: UUID): Option[Data] {
Option(maybeMyDataInEndpoint1(id)).map(convertA)
.orElse(Option(maybeMyDataInEndpoint2(id)).map(convertB))
.orElse(Option(maybeMyDataInEndpoint3(id)).map(convertC))
}
Maybe You can lift these methods as high order functions of Lists and collectFirst, like:
val fs = List(maybeMyDataInEndpoint1 _, maybeMyDataInEndpoint2 _, maybeMyDataInEndpoint3 _)
val f = (a: UUID) => fs.collectFirst {
case u if u(a) != null => u(a)
}
r(myUUID)
The best Scala approach IMHO is to do things in the most straightforward way.
To handle optional values (or nulls from Java land), use Option.
To sequentially evaluate a list of methods, fold over a Seq of functions.
To convert from one data type to another, use either (1.) implicit conversions or (2.) regular functions depending on the situation and your preference.
(Edit) Assuming implicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(implicit convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1),
legacyEndpoint(maybeMyDataInEndpoint2),
legacyEndpoint(maybeMyDataInEndpoint3)
)
def mainCall(id: UUID): Option[Data] =
legacyEndpoints.foldLeft(Option.empty[Data])(_ orElse _(id))
(Edit) Using explicit conversions:
def legacyEndpoint[A](endpoint: UUID => A)(convert: A => Data) =
(id: UUID) => Option(endpoint(id)).map(convert)
val legacyEndpoints = Seq(
legacyEndpoint(maybeMyDataInEndpoint1)(fromDataA),
legacyEndpoint(maybeMyDataInEndpoint2)(fromDataB),
legacyEndpoint(maybeMyDataInEndpoint3)(fromDataC)
)
... // same as before
Here is one way to do it.
(1) You can make your convert methods implicit (or wrap them into implicit wrappers) for convenience.
(2) Then use Stream to build chain from method calls. You should give type inference a hint that you want your stream to contain Data elements (not DataX as returned by legacy methods) so that appropriate implicit convert will be applied to each result of a legacy method call.
(3) Since Stream is lazy and evaluates its tail "by name" only first method gets called so far. At this point you can apply lazy filter to skip null results.
(4) Now you can actually evaluate chain, getting first non-null result with headOption
(HACK) Unfortunately, scala type inference (at the time of writing, v2.12.4) is not powerful enough to allow using #:: stream methods, unless you guide it every step of the way. Using cons makes inference happy but is cumbersome. Also, building stream using vararg apply method of companion object is not an option too, since scala does not support "by-name" varargs yet. In my example below I use combination of stream and toLazyData methods. stream is a generic helper, builds streams from 0-arg functions. toLazyData is an implicit "by-name" conversion designed to interplay with implicit convert functions that convert from DataX to Data.
Here is the demo that demonstrates the idea with more detail:
object Demo {
case class Data(value: String)
class DataA
class DataB
class DataC
def maybeMyDataInEndpoint1(id: String): DataA = {
println("maybeMyDataInEndpoint1")
null
}
def maybeMyDataInEndpoint2(id: String): DataB = {
println("maybeMyDataInEndpoint2")
new DataB
}
def maybeMyDataInEndpoint3(id: String): DataC = {
println("maybeMyDataInEndpoint3")
new DataC
}
implicit def convert(data: DataA): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataB): Data = if (data == null) null else Data(data.toString)
implicit def convert(data: DataC): Data = if (data == null) null else Data(data.toString)
implicit def toLazyData[T](value: => T)(implicit convert: T => Data): (() => Data) = () => convert(value)
def stream[T](xs: (() => T)*): Stream[T] = {
xs.toStream.map(_())
}
def main (args: Array[String]) {
val chain = stream(
maybeMyDataInEndpoint1("1"),
maybeMyDataInEndpoint2("2"),
maybeMyDataInEndpoint3("3")
)
val result = chain.filter(_ != null).headOption.getOrElse(Data("default"))
println(result)
}
}
This prints:
maybeMyDataInEndpoint1
maybeMyDataInEndpoint2
Data(Demo$DataB#16022d9d)
Here maybeMyDataInEndpoint1 returns null and maybeMyDataInEndpoint2 needs to be invoked, delivering DataB, maybeMyDataInEndpoint3 never gets invoked since we already have the result.
I think #g.krastev's answer is perfectly good for your use case and you should accept that. I'm just expending a bit on it to show how you can make the last step slightly better with cats.
First, the boilerplate:
import java.util.UUID
final case class DataA(i: Int)
final case class DataB(i: Int)
final case class DataC(i: Int)
type Data = Int
def convertA(a: DataA): Data = a.i
def convertB(b: DataB): Data = b.i
def convertC(c: DataC): Data = c.i
def maybeMyDataInEndpoint1(id: UUID): DataA = DataA(1)
def maybeMyDataInEndpoint2(id: UUID): DataB = DataB(2)
def maybeMyDataInEndpoint3(id: UUID): DataC = DataC(3)
This is basically what you have, in a way that you can copy/paste in the REPL and have compile.
Now, let's first declare a way to turn each of your endpoints into something safe and unified:
def makeSafe[A, B](evaluate: UUID ⇒ A, f: A ⇒ B): UUID ⇒ Option[B] =
id ⇒ Option(evaluate(id)).map(f)
With this in place, you can, for example, call the following to turn maybeMyDataInEndpoint1 into a UUID => Option[A]:
makeSafe(maybeMyDataInEndpoint1, convertA)
The idea is now to turn your endpoints into a list of UUID => Option[A] and fold over that list. Here's your list:
val endpoints = List(
makeSafe(maybeMyDataInEndpoint1, convertA),
makeSafe(maybeMyDataInEndpoint2, convertB),
makeSafe(maybeMyDataInEndpoint3, convertC)
)
You can now fold on it manually, which is what #g.krastev did:
def mainCall(id: UUID): Option[Data] =
endpoints.foldLeft(None: Option[Data])(_ orElse _(id))
If you're fine with a cats dependency, the notion of folding over a list of options is just a concrete use case of a common pattern (the interaction of Foldable and Monoid):
import cats._
import cats.implicits._
def mainCall(id: UUID): Option[Data] = endpoints.foldMap(_(id))
There are other ways to make this nicer still, but they might be overkill in this context - I'd probably declare a type class to turn any type into a Data, say, to give makeSafe a cleaner type signature.

Elegant way of reusing akka-stream flows

I am looking for a way to easily reuse akka-stream flows.
I treat the Flow I intend to reuse as a function, so I would like to keep its signature like:
Flow[Input, Output, NotUsed]
Now when I use this flow I would like to be able to 'call' this flow and keep the result aside for further processing.
So I want to start with Flow emiting [Input], apply my flow, and proceed with Flow emitting [(Input, Output)].
example:
val s: Source[Int, NotUsed] = Source(1 to 10)
val stringIfEven = Flow[Int].filter(_ % 2 == 0).map(_.toString)
val via: Source[(Int, String), NotUsed] = ???
Now this is not possible in a straightforward way because combining flow with .via() would give me Flow emitting just [Output]
val via: Source[String, NotUsed] = s.via(stringIfEven)
Alternative is to make my reusable flow emit [(Input, Output)] but this requires every flow to push its input through all the stages and make my code look bad.
So I came up with a combiner like this:
def tupledFlow[In,Out](flow: Flow[In, Out, _]):Flow[In, (In,Out), NotUsed] = {
Flow.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val broadcast = b.add(Broadcast[In](2))
val zip = b.add(Zip[In, Out])
broadcast.out(0) ~> zip.in0
broadcast.out(1) ~> flow ~> zip.in1
FlowShape(broadcast.in, zip.out)
})
}
that is broadcasting the input to the flow and as well in a parallel line directly -> both to the 'Zip' stage where I join values into a tuple. It then can be elegantly applied:
val tupled: Source[(Int, String), NotUsed] = s.via(tupledFlow(stringIfEven))
Everything great but when given flow is doing a 'filter' operation - this combiner is stuck and stops processing further events.
I guess that is due to 'Zip' behaviour that requires all subflows to do the same - in my case one branch is passing given object directly so another subflow cannot ignore this element with. filter(), and since it does - the flow stops because Zip is waiting for push.
Is there a better way to achieve flow composition?
Is there anything I can do in my tupledFlow to get desired behaviour when 'flow' ignores elements with 'filter' ?
Two possible approaches - with debatable elegance - are:
1) avoid using filtering stages, mutating your filter into a Flow[Int, Option[Int], NotUsed]. This way you can apply your zipping wrapper around your whole graph, as was your original plan. However, the code looks more tainted, and there is added overhead by passing around Nones.
val stringIfEvenOrNone = Flow[Int].map{
case x if x % 2 == 0 => Some(x.toString)
case _ => None
}
val tupled: Source[(Int, String), NotUsed] = s.via(tupledFlow(stringIfEvenOrNone)).collect{
case (num, Some(str)) => (num,str)
}
2) separate the filtering and transforming stages, and apply the filtering ones before your zipping wrapper. Probably a more lightweight and better compromise.
val filterEven = Flow[Int].filter(_ % 2 == 0)
val toString = Flow[Int].map(_.toString)
val tupled: Source[(Int, String), NotUsed] = s.via(filterEven).via(tupledFlow(toString))
EDIT
3) Posting another solution here for clarity, as per the discussions in the comments.
This flow wrapper allows to emit each element from a given flow, paired with the original input element that generated it. It works for any kind of inner flow (emitting 0, 1 or more elements for each input).
def tupledFlow[In,Out](flow: Flow[In, Out, _]): Flow[In, (In,Out), NotUsed] =
Flow[In].flatMapConcat(in => Source.single(in).via(flow).map( out => in -> out))
I came up with an implementation of TupledFlow that works when wrapped Flow uses filter() or mapAsync() and when wrapped Flow emits 0,1 or N elements for every input:
def tupledFlow[In,Out](flow: Flow[In, Out, _])(implicit materializer: Materializer, executionContext: ExecutionContext):Flow[In, (In,Out), NotUsed] = {
val v:Flow[In, Seq[(In, Out)], NotUsed] = Flow[In].mapAsync(4) { in: In =>
val outFuture: Future[Seq[Out]] = Source.single(in).via(flow).runWith(Sink.seq)
val bothFuture: Future[Seq[(In,Out)]] = outFuture.map( seqOfOut => seqOfOut.map((in,_)) )
bothFuture
}
val onlyDefined: Flow[In, (In, Out), NotUsed] = v.mapConcat[(In, Out)](seq => seq.to[scala.collection.immutable.Iterable])
onlyDefined
}
the only drawback I see here is that I am instantiating and materializing a flow for a single entity - just to get a notion of 'calling a flow as a function'.
I didn't do any performance tests on that - however since heavy-lifting is done in a wrapped Flow which is executed in a future - I believe this will be ok.
This implementation passes all the tests from https://gist.github.com/kretes/8d5f2925de55b2a274148b69f79e55ac#file-tupledflowspec-scala

How to deal with source that emits Future[T]?

Let's say I have some iterator:
val nextElemIter: Iterator[Future[Int]] = Iterator.continually(...)
And I want to build a source from that iterator:
val source: Source[Future[Int], NotUsed] =
Source.fromIterator(() => nextElemIter)
So now my source emits Futures. I have never seen futures being passed between stages in Akka docs or anywhere else, so instead, I could always do something like this:
val source: Source[Int, NotUsed] =
Source.fromIterator(() => nextElemIter).mapAsync(1)(identity /* was n => n */)
And now I have a regular source that emits T instead of Future[T]. But this feels hacky and wrong.
What's the proper way to deal with such situations?
Answering your question directly: I agree with Vladimir's comment that there is nothing "hacky" about using mapAsync for the purpose you described. I can't think of any more direct way to unwrap the Future from around your underlying Int values.
Answering your question indirectly...
Try to stick with Futures
Streams, as a concurrency mechanism, are incredibly useful when backpressure is required. However, pure Future operations have their place in applications as well.
If your Iterator[Future[Int]] is going to produce a known, limited, number of Future values then you may want to stick with using the Futures for concurrency.
Imagine you want to filter, map, & reduce the Int values.
def isGoodInt(i : Int) : Boolean = ??? //filter
def transformInt(i : Int) : Int = ??? //map
def combineInts(i : Int, j : Int) : Int = ??? //reduce
Futures provide a direct way of using these functions:
val finalVal : Future[Int] =
Future sequence {
for {
f <- nextElemIter.toSeq //launch all of the Futures
i <- f if isGoodInt(i)
} yield transformInt(i)
} map (_ reduce combineInts)
Compared with a somewhat indirect way of using the Stream as you suggested:
val finalVal : Future[Int] =
Source.fromIterator(() => nextElemIter)
.via(Flow[Future[Int]].mapAsync(1)(identity))
.via(Flow[Int].filter(isGoodInt))
.via(Flow[Int].map(transformInt))
.to(Sink.reduce(combineInts))
.run()