One of the steps in my akka streams pipeline is a transformation that throws an exception when it receives an invalid input. I would like to discard these problematic inputs. So, I came up with the following solution:
...
.map( input => Try( transformation( input ) ).toOption )
.filter( _.nonEmpty )
.map( _.get )
...
Which takes 3 steps for what is, in fact, just a flatMap.
Is there a more straightforward akka way of doing this?
You can use Supervision Strategies. Taken from the doc:
val decider: Supervision.Decider = {
case _: ArithmeticException => Supervision.Resume
case _ => Supervision.Stop
}
val flow = Flow[Int]
.filter(100 / _ < 50)
.map(elem => 100 / (5 - elem))
.withAttributes(ActorAttributes.supervisionStrategy(decider))
You can configure the Decider for doing whatever you need. If you need to skip that element for all Exceptions use
case _: Throwable => Supervision.Resume
Take a look to https://doc.akka.io/docs/akka/current/stream/stream-error.html
If you want to silently discard the exceptions as indicated in your sample code, here's a couple of ways to reduce the steps:
// A dummy transformation
def transformation(i: Int): Int = 100 / i
// #1: Use `collect`
Source(List(5, 2, 0, 1)).
map(input => Try(transformation(input)).toOption).
collect{ case x if x.nonEmpty => x.get }.
runForeach(println)
// Result: 20, 50, 100
// #2: Use `mapConcat`
Source(List(5, 2, 0, 1)).
mapConcat(input => List(Try(transformation(input)).toOption).flatten).
runForeach(println)
// Result: 20, 50, 100
Note that there is no flatMap for Akka Source/Flow, although mapConcat (and flatMapConcat) does function in a somewhat similar fashion.
Related
I'd like to model a function which would return a set of values with weighed probability. Something like this:
25% => return "a"
25% => return "b"
50% => return "c"
Most of the documentation I've seen so far is rather heavy and delves quickly into scientific depths without examples, thus the question:
What's the easiest way to achieve this?
EDIT: I am using Gatling DSL to write a load test with weighed actions. The built-in weighed distribution has a limitation (won't work in loops), which I would like to avoid by having an own implementation. The snippet looks like this:
override def scenarioBuilder = scenario(getClass.getSimpleName)
.exec(Account.create)
.exec(Account.login)
.exec(Account.activate)
.exec(Loop.create)
.forever(getAction)
def getAction = {
// Here is where I lost my wits
// 27.6% => Log.putEvents
// 18.2% => Log.putBinary
// 17.1% => Loop.list
// 14.8% => Key.listIncomingRequests
// rest => Account.get
}
Here's the shortest version of a generic function to choose based on probabilities:
val probabilityMap = List(0.25 -> "a", 0.25 -> "b")
val otherwise = "c"
def getWithProbability[T](probs: List[(Double, T)], otherwise: T): T = {
// some input validations:
assert(probs.map(_._1).sum <= 1.0)
assert(probs.map(_._1).forall(_ > 0))
// get random in (0, 1) range
val rand = Random.nextDouble()
// choose by probability:
probs.foldLeft((rand, otherwise)) {
case ((r, _), (prob, value)) if prob > r => (1.0, value)
case ((r, result), (prob, _)) => (r-prob, result)
}._2
}
foldLeft keeps advancing over the probabilities until it finds one where r is smaller; If we haven't found it, we move onto the next probability with r-prob to "remove" the part of the range we've already covered.
EDIT: an equivalent but perhaps easier to read version can use scanLeft to create cumulative ranges before "searching" for the range within which rand has landed:
def getWithProbability[T](probs: List[(Double, T)], otherwise: T): T = {
// same validations..
// get random in (0, 1) range
val rand = Random.nextDouble()
// create cumulative values, in this case (0.25, a), (0,5, b)
val ranges = probs.tail.scanLeft(probs.head) {
case ((prob1, _), (prob2, value)) => (prob1+prob2, value)
}
ranges.dropWhile(_._1 < rand).map(_._2).headOption.getOrElse(otherwise)
}
There, clean and ready:
randomSwitchOrElse(
27.6 -> exec(Log.putEvents),
18.2 -> exec(Log.putBinary))
I have an Akka Stream application with a single flow/graph. I want to measure the flow rate at the source and log it every 5 seconds, like 'received 3 messages in the last 5 seconds'. I tried with,
someOtherFlow
.groupedWithin(Integer.MAX_VALUE, 5 seconds)
.runForeach(seq =>
log.debug(s"received ${seq.length} messages in the last 5 seconds")
)
but it only outputs when there are messages, no empty list when there are 0 messages. I want the 0's as well. Is this possible?
You could try something like
src
.conflateWithSeed(_ ⇒ 1){ case (acc, _) ⇒ acc + 1 }
.zip(Source.tick(5.seconds, 5.seconds, NotUsed))
.map(_._1)
which should batch your elements until the tick releases them. This is inspired from an example in the docs.
On a different note, if you need this for monitoring purposes, you could leverage a 3rd party tool for this purpose - e.g. Kamon.
A sample akka stream logging.
implicit val system: ActorSystem = ActorSystem("StreamLoggingActorSystem")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val adapter: LoggingAdapter = Logging(system, "customLogger")
implicit val ec: ExecutionContextExecutor = system.dispatcher
def randomInt = Random.nextInt()
val source = Source.repeat(NotUsed).map(_ ⇒ randomInt)
val logger = source
.groupedWithin(Integer.MAX_VALUE, 5.seconds)
.log(s"in the last 5 seconds number of messages received : ", _.size)
.withAttributes(
Attributes.logLevels(
onElement = Logging.WarningLevel,
onFinish = Logging.InfoLevel,
onFailure = Logging.DebugLevel
)
)
val sink = Sink.ignore
val result: Future[Done] = logger.runWith(sink)
result.onComplete{
case Success(_) =>
println("end of stream")
case Failure(_) =>
println("stream ended with failure")
}
source code is here.
Extending Stefano's answer a little I created the following flows:
def flowRate[T](metric: T => Int = (_: T) => 1, outputDelay: FiniteDuration = 1 second): Flow[T, Double, NotUsed] =
Flow[T]
.conflateWithSeed(metric(_)){ case (acc, x) ⇒ acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed))
.map(_._1.toDouble / outputDelay.toUnit(SECONDS))
def printFlowRate[T](name: String, metric: T => Int = (_: T) => 1,
outputDelay: FiniteDuration = 1 second): Flow[T, T, NotUsed] =
Flow[T]
.alsoTo(flowRate[T](metric, outputDelay)
.to(Sink.foreach(r => log.info(s"Rate($name): $r"))))
The first converts the flow into a rate per second. You can supply a metric which gives a value to each object passing through. Say you want to measure the rate of characters in a flow of strings then you could pass _.length. The second parameter is the delay between flow rate reports (defaults to one second).
The second flow can be used inline to print the flow rate for debugging purposes without modifying the value passing through the stream. eg
stringFlow
.via(printFlowRate[String]("Char rate", _.length, 10 seconds))
.map(_.toLowercase) // still a string
...
which will show every 10 seconds the average the rate (per second) of characters.
N.B. The above flowRate would however be lagging one outputDelay period behind, because the zip will consume from the conflate and then wait for a tick (which can be easily verified by putting a log after the conflateWithSeed). To obtain a non lagging flow rate (metric), one could duplicate the tick, in order to force the zip to consume a second fresh element from the conflate, and then aggregate both ticks, i.e.:
Flow[T]
.conflateWithSeed(metric(_)){case (acc, x) => acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed)
.mapConcat(_ => Seq(NotUsed, NotUsed))
)
.grouped(2).map {
case Seq((a, _), (b, _)) => a + b
}
.map(_.toDouble / outputDelay.toUnit(SECONDS))
I am processing a stream of data from a file that is grouped by a key. I have created a class with an apply method that can be used to split the stream by key called KeyChanges[T,K]. Before the first item of a substream is processed, I need to retrieve some data from a DB. Once each substream is completed, I need to emit a message to a queue. In a standard scala sequence I would do something like this:
val groups: Map[Key, Seq[Value]] = stream.groupBy(v => v.k)
val groupSummaryF = Future.sequence(groups.map { case (k, group) =>
retrieveMyData(k).flatMap { data =>
Future.sequence(group.map(v => process(data, v))).map(
k -> _.foldLeft(0) { (a,t) =>
t match {
case Success(v) => a + 1
case Failure(ex) =>
println(s"failure: $ex")
a
}
}
).andThen {
case Success((key,count)) =>
sendMessage(count,key)
}
}
})
I would like to do something similar with Akka Streams. On the data retrieval, I could just cache the data and call the retrieval function for each element but for the queue message, I really do need to know when the substream is completed. So far I have not seen a way around this. Any ideas?
You can just run Stream, and execute action from Sink.
val categories = Array("DEBUG", "INFO", "WARN", "ERROR")
// assume we have a stream from file which produces categoryId -> message
val lines = (1 to 100).map(x => (Random.nextInt(categories.length), s"message $x"))
def loadDataFromDatabase(categoryId: Int): Future[String] =
Future.successful(categories(categoryId))
// assume this emits message to the queue
def emitToQueue(x: (String, Int)): Unit =
println(s"${x._2} messages from category ${x._1}")
val flow =
Flow[(Int, String)].
groupBy(4, _._1).
fold((0, List.empty[String])) { case ((_, acc), (catId, elem)) =>
(catId, elem :: acc)
}.
mapAsync(1) { case (catId, messages) =>
// here you load your stuff from the database
loadDataFromDatabase(catId).map(cat => (cat, messages))
}. // here you may want to do some more processing
map(x => (x._1, x._2.size)).
mergeSubstreams
// assume the source is a file
Source.fromIterator(() => lines.iterator).
via(flow).
to(Sink.foreach(emitToQueue)).run()
If you want to run it for multiple files, and report sums once for example, you can do it like that.
val futures = (1 to 4).map { x =>
Source.fromIterator(() => lines.iterator).via(flow).toMat(Sink.seq[(String, Int)])(Keep.right).run()
}
Future.sequence(futures).map { results =>
results.flatten.groupBy(_._1).foreach { case (cat, xs) =>
val total = xs.map(_._2).sum
println(s"$total messages from category $cat")
}
}
As you see, when you run the flow, you get a future. It will contain a materialized value (result of the flow), when it's finished, and you can do with it whatever you want.
How can I transform a
Map[Int, Future[Seq[T]]]
to
Future[Map[Int, Seq[T]]]
in Scala without waiting for the future.
Example:
Map(
1 -> Future.successful(Seq(100, 200, 300)),
2 -> Future.successful(Seq(500, 600, 700))
)
This ought to do it:
val m = Map(
1 -> Future.successful(Seq(100, 200, 300)),
2 -> Future.successful(Seq(500, 600, 700))
)
Future.sequence { m.map { case (i, f) => f.map((i, _))} }.map(_.toMap)
Working from the inside out, I mapped the key-values from (Int, Future[T]) to Future[(Int, T)], then was able to use Future.sequence on the resulting sequence of Futures. Then that collapsed Future can be mapped back to a Map.
This can be made slightly shorter using Future.traverse as suggested by #IonutG.Stan :
Future.traverse(m){ case (i, f) => f.map((i, _))}.map(_.toMap)
This will build a new collection within a Future from m, using the same function provided earlier to map tuples with futures to future tuples.
Option monad is a great expressive way to deal with something-or-nothing things in Scala. But what if one needs to log a message when "nothing" occurs? According to the Scala API documentation,
The Either type is often used as an
alternative to scala.Option where Left
represents failure (by convention) and
Right is akin to Some.
However, I had no luck to find best practices using Either or good real-world examples involving Either for processing failures. Finally I've come up with the following code for my own project:
def logs: Array[String] = {
def props: Option[Map[String, Any]] = configAdmin.map{ ca =>
val config = ca.getConfiguration(PID, null)
config.properties getOrElse immutable.Map.empty
}
def checkType(any: Any): Option[Array[String]] = any match {
case a: Array[String] => Some(a)
case _ => None
}
def lookup: Either[(Symbol, String), Array[String]] =
for {val properties <- props.toRight('warning -> "ConfigurationAdmin service not bound").right
val logsParam <- properties.get("logs").toRight('debug -> "'logs' not defined in the configuration").right
val array <- checkType(logsParam).toRight('warning -> "unknown type of 'logs' confguration parameter").right}
yield array
lookup.fold(failure => { failure match {
case ('warning, msg) => log(LogService.WARNING, msg)
case ('debug, msg) => log(LogService.DEBUG, msg)
case _ =>
}; new Array[String](0) }, success => success)
}
(Please note this is a snippet from a real project, so it will not compile on its own)
I'd be grateful to know how you are using Either in your code and/or better ideas on refactoring the above code.
Either is used to return one of possible two meaningful results, unlike Option which is used to return a single meaningful result or nothing.
An easy to understand example is given below (circulated on the Scala mailing list a while back):
def throwableToLeft[T](block: => T): Either[java.lang.Throwable, T] =
try {
Right(block)
} catch {
case ex => Left(ex)
}
As the function name implies, if the execution of "block" is successful, it will return "Right(<result>)". Otherwise, if a Throwable is thrown, it will return "Left(<throwable>)". Use pattern matching to process the result:
var s = "hello"
throwableToLeft { s.toUpperCase } match {
case Right(s) => println(s)
case Left(e) => e.printStackTrace
}
// prints "HELLO"
s = null
throwableToLeft { s.toUpperCase } match {
case Right(s) => println(s)
case Left(e) => e.printStackTrace
}
// prints NullPointerException stack trace
Hope that helps.
Scalaz library has something alike Either named Validation. It is more idiomatic than Either for use as "get either a valid result or a failure".
Validation also allows to accumulate errors.
Edit: "alike" Either is complettly false, because Validation is an applicative functor, and scalaz Either, named \/ (pronounced "disjonction" or "either"), is a monad.
The fact that Validation can accumalate errors is because of that nature. On the other hand, / has a "stop early" nature, stopping at the first -\/ (read it "left", or "error") it encounters. There is a perfect explanation here: http://typelevel.org/blog/2014/02/21/error-handling.html
See: http://scalaz.googlecode.com/svn/continuous/latest/browse.sxr/scalaz/example/ExampleValidation.scala.html
As requested by the comment, copy/paste of the above link (some lines removed):
// Extracting success or failure values
val s: Validation[String, Int] = 1.success
val f: Validation[String, Int] = "error".fail
// It is recommended to use fold rather than pattern matching:
val result: String = s.fold(e => "got error: " + e, s => "got success: " + s.toString)
s match {
case Success(a) => "success"
case Failure(e) => "fail"
}
// Validation is a Monad, and can be used in for comprehensions.
val k1 = for {
i <- s
j <- s
} yield i + j
k1.toOption assert_≟ Some(2)
// The first failing sub-computation fails the entire computation.
val k2 = for {
i <- f
j <- f
} yield i + j
k2.fail.toOption assert_≟ Some("error")
// Validation is also an Applicative Functor, if the type of the error side of the validation is a Semigroup.
// A number of computations are tried. If the all success, a function can combine them into a Success. If any
// of them fails, the individual errors are accumulated.
// Use the NonEmptyList semigroup to accumulate errors using the Validation Applicative Functor.
val k4 = (fNel <**> fNel){ _ + _ }
k4.fail.toOption assert_≟ some(nel1("error", "error"))
The snippet you posted seems very contrived. You use Either in a situation where:
It's not enough to just know the data isn't available.
You need to return one of two distinct types.
Turning an exception into a Left is, indeed, a common use case. Over try/catch, it has the advantage of keeping the code together, which makes sense if the exception is an expected result. The most common way of handling Either is pattern matching:
result match {
case Right(res) => ...
case Left(res) => ...
}
Another interesting way of handling Either is when it appears in a collection. When doing a map over a collection, throwing an exception might not be viable, and you may want to return some information other than "not possible". Using an Either enables you to do that without overburdening the algorithm:
val list = (
library
\\ "books"
map (book =>
if (book \ "author" isEmpty)
Left(book)
else
Right((book \ "author" toList) map (_ text))
)
)
Here we get a list of all authors in the library, plus a list of books without an author. So we can then further process it accordingly:
val authorCount = (
(Map[String,Int]() /: (list filter (_ isRight) map (_.right.get)))
((map, author) => map + (author -> (map.getOrElse(author, 0) + 1)))
toList
)
val problemBooks = list flatMap (_.left.toSeq) // thanks to Azarov for this variation
So, basic Either usage goes like that. It's not a particularly useful class, but if it were you'd have seen it before. On the other hand, it's not useless either.
Cats has a nice way to create an Either from exception-throwing code:
val either: Either[NumberFormatException, Int] =
Either.catchOnly[NumberFormatException]("abc".toInt)
// either: Either[NumberFormatException,Int] = Left(java.lang.NumberFormatException: For input string: "abc")
in https://typelevel.org/cats/datatypes/either.html#working-with-exception-y-code