I’m attempting to setup a simple graph structure that process data via invoking rest services, forwards the result of each service to an intermediary processing unit before forwarding the result. Here is a high level architecture :
Can this be defined using Akka graph streams ? Reading https://doc.akka.io/docs/akka/current/stream/stream-graphs.html I don't understand how to even implement this simple architecture.
I've tried to implement custom code to execute functions within a graph :
package com.graph
class RestG {
def flow (in : String) : String = {
return in + "extra"
}
}
object RestG {
case class Flow(in: String) {
def out : String = in+"out"
}
def main(args: Array[String]): Unit = {
List(new RestG().flow("test") , new RestG().flow("test2")).foreach(println)
}
}
I'm unsure how to send data between the functions. So I think I should be using Akka Graphs but how to implement the architecture above ?
Here's how I would approach the problem. First some types:
type Data = Int
type RestService1Response = String
type RestService2Response = String
type DisplayedResult = Boolean
Then stub functions to asynchronously call the external services:
def callRestService1(data: Data): Future[RestService1Response] = ???
def callRestService2(data: Data): Future[RestService2Response] = ???
def resultCombiner(resp1: RestService1Response, resp2: RestService2Response): DisplayedResult = ???
Now for the Akka Streams (I'm leaving out setting up an ActorSystem etc.)
import akka.Done
import akka.stream.scaladsl._
type SourceMatVal = Any
val dataSource: Source[Data, SourceMatVal] = ???
def restServiceFlow[Response](callF: Data => Future[Data, Response], maxInflight: Int) = Flow[Data].mapAsync(maxInflight)(callF)
// NB: since we're fanning out, there's no reason to have different maxInflights here...
val service1 = restServiceFlow(callRestService1, 4)
val service2 = restServiceFlow(callRestService2, 4)
val downstream = Flow[(RestService1Response, RestService2Response)]
.map((resultCombiner _).tupled)
val splitAndCombine = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val fanOut = b.add(Broadcast[Data](2))
val fanIn = b.add(Zip[RestService1Response, RestService2Response])
fanOut.out(0).via(service1) ~> fanIn.in0
fanOut.out(1).via(service2) ~> fanIn.in1
FlowShape(fanOut.in, fanIn.out)
}
// This future will complete with a `Done` if/when the stream completes
val future: Future[Done] = dataSource
.via(splitAndCombine)
.via(downstream)
.runForeach { displayableData =>
??? // Display the data
}
It's possible to do all the wiring within the Graph DSL, but I generally prefer to keep my graph stages as simple as possible and only use them to the extent that the standard methods on Source/Flow/Sink can't do what I want.
Related
I'm building a toy project to learn Scala 3 and i'm stuck in one problem, first of all i'm following the tagless-final approach using cats-effect, the approach is working as expected except for the entity serialization, when i try to create a route using akka-http i have the following problem:
def routes: Route = pathPrefix("security") {
(path("auth") & post) {
entity(as[LoginUserByCredentialsCommand]) {
(command: LoginUserByCredentialsCommand) =>
complete {
login(command)
}
}
}}
F[
Either[com.moralyzr.magickr.security.core.errors.AuthError,
com.moralyzr.magickr.security.core.types.TokenType.Token
]
]
Required: akka.http.scaladsl.marshalling.ToResponseMarshallable
where: F is a type in class SecurityApi with bounds <: [_] =>> Any
For what i understood, akka-http does not know how to serialize the F highly-kinded type, by searching a little bit i found the following solution, it consists of creating an implicit called marshallable to show the akka-http how to serialize the type, however when i implement it i get a StackOverflow error :(
import akka.http.scaladsl.marshalling.ToResponseMarshaller
import cats.effect.IO
trait Marshallable[F[_]]:
def marshaller[A: ToResponseMarshaller]: ToResponseMarshaller[F[A]]
object Marshallable:
implicit def marshaller[F[_], A : ToResponseMarshaller](implicit M: Marshallable[F]): ToResponseMarshaller[F[A]] =
M.marshaller
given ioMarshaller: Marshallable[IO] with
def marshaller[A: ToResponseMarshaller] = implicitly
I'm really stuck right now, does anyone have an idea on how can i fix this problem? The complete code can be found here
Edit: This is the login code
For clarity, here are the class that instantiate the security api and the security api itself
object Magickr extends IOApp:
override def run(args: List[String]): IO[ExitCode] =
val server = for {
// Actors
actorsSystem <- ActorsSystemResource[IO]()
streamMaterializer <- AkkaMaterializerResource[IO](actorsSystem)
// Configs
configs <- Resource.eval(MagickrConfigs.makeConfigs[IO]())
httpConfigs = AkkaHttpConfig[IO](configs)
databaseConfigs = DatabaseConfig[IO](configs)
flywayConfigs = FlywayConfig[IO](configs)
jwtConfig = JwtConfig[IO](configs)
// Interpreters
jwtManager = JwtBuilder[IO](jwtConfig)
authentication = InternalAuthentication[IO](
passwordValidationAlgebra = new SecurityValidationsInterpreter(),
jwtManager = jwtManager
)
// Database
_ <- Resource.eval(
DbMigrations.migrate[IO](flywayConfigs, databaseConfigs)
)
transactor <- DatabaseConnection.makeTransactor[IO](databaseConfigs)
userRepository = UserRepository[IO](transactor)
// Services
securityManagement = SecurityManagement[IO](
findUser = userRepository,
authentication = authentication
)
// Api
secApi = new SecurityApi[IO](securityManagement)
routes = pathPrefix("api") {
secApi.routes()
}
akkaHttp <- AkkaHttpResource.makeHttpServer[IO](
akkaHttpConfig = httpConfigs,
routes = routes,
actorSystem = actorsSystem,
materializer = streamMaterializer
)
} yield (actorsSystem)
return server.useForever
And
class SecurityApi[F[_]: Async](
private val securityManagement: SecurityManagement[F]
) extends LoginUserByCredentials[F]
with SecurityProtocols:
def routes()(using marshaller: Marshallable[F]): Route = pathPrefix("security") {
(path("auth") & post) {
entity(as[LoginUserByCredentialsCommand]) {
(command: LoginUserByCredentialsCommand) =>
complete {
login(command)
}
}
}
}
override def login(
command: LoginUserByCredentialsCommand
): F[Either[AuthError, Token]] =
securityManagement.loginWithCredentials(command = command).value
================= EDIT 2 =========================================
With the insight provided by Luis Miguel, it makes a clearer sense that i need to unwrap the IO into a Future at the Marshaller level, something like this:
def ioToResponseMarshaller[A: ToResponseMarshaller](
M: Marshallable[IO]
): ToResponseMarshaller[IO[A]] =
Marshaller.futureMarshaller.compose(M.entity.unsafeToFuture())
However, i have this problem:
Found: cats.effect.unsafe.IORuntime => scala.concurrent.Future[A]
Required: cats.effect.IO[A] => scala.concurrent.Future[A]
I think i'm close! Is there a way to unwrap the IO keeping the IO type?
I managed to make it work! Thanks to #luismiguel insight, the problem was that the Akka HTTP Marshaller was not able to deal with Cats-Effect IO monad, so the solution was an implementation who unwraps the IO monad using the unsafeToFuture inside the marshaller, that way i was able to keep the Tagless-Final style from point to point, here's the solution:
This implicit fetches the internal marshaller for the type
import akka.http.scaladsl.marshalling.ToResponseMarshaller
import cats.effect.IO
trait Marshallable[F[_]]:
def marshaller[A: ToResponseMarshaller]: ToResponseMarshaller[F[A]]
object Marshallable:
implicit def marshaller[F[_], A: ToResponseMarshaller](implicit
M: Marshallable[F]
): ToResponseMarshaller[F[A]] = M.marshaller
given ioMarshallable: Marshallable[IO] with
def marshaller[A: ToResponseMarshaller] = CatsEffectsMarshallers.ioMarshaller
This one unwraps the IO monad and flatMaps the marshaller using a future, which akka-http knows how to deal with.
import akka.http.scaladsl.marshalling.{
LowPriorityToResponseMarshallerImplicits,
Marshaller,
ToResponseMarshaller
}
import cats.effect.IO
import cats.effect.unsafe.implicits.global
trait CatsEffectsMarshallers extends LowPriorityToResponseMarshallerImplicits:
implicit def ioMarshaller[A](implicit
m: ToResponseMarshaller[A]
): ToResponseMarshaller[IO[A]] =
Marshaller(implicit ec => _.unsafeToFuture().flatMap(m(_)))
object CatsEffectsMarshallers extends CatsEffectsMarshallers
I have a method in which there are multiple calls to db. As I have not implemented any concurrent processing, a 2nd db call has to wait until the 1st db call gets completed, 3rd has to wait until the 2nd gets completed and so on.
All db calls are independent of each other. I want to make this in such a way that all DB calls run concurrently.
I am new to Akka framework.
Can someone please help me with small sample or references would help. Application is developed in Scala Lang.
There are three primary ways that you could achieve concurrency for the given example needs.
Futures
For the particular use case that is asked about in the question I would recommend Futures before any akka construct.
Suppose we are given the database calls as functions:
type Data = ???
val dbcall1 : () => Data = ???
val dbcall2 : () => Data = ???
val dbcall3 : () => Data = ???
Concurrency can be easily applied, and then the results can be collected, using Futures:
val f1 = Future { dbcall1() }
val f2 = Future { dbcall2() }
val f3 = Future { dbcall3() }
for {
v1 <- f1
v2 <- f2
v3 <- f3
} {
println(s"All data collected: ${v1}, ${v2}, ${v3}")
}
Akka Streams
There is a similar stack answer which demonstrates how to use the akka-stream library to do concurrent db querying.
Akka Actors
It is also possible to write an Actor to do the querying:
object MakeQuery
class DBActor(dbCall : () => Data) extends Actor {
override def receive = {
case _ : MakeQuery => sender ! dbCall()
}
}
val dbcall1ActorRef = system.actorOf(Props(classOf[DBActor], dbcall1))
However, in this use case Actors are less helpful because you still need to collect all of the data together.
You can either use the same technique as the "Futures" section:
val f1 : Future[Data] = (dbcall1ActorRef ? MakeQuery).mapTo[Data]
for {
v1 <- f1
...
Or, you would have to wire the Actors together by hand through the constructor and handle all of the callback logic for waiting on the other Actor:
class WaitingDBActor(dbCall : () => Data, previousActor : ActorRef) {
override def receive = {
case _ : MakeQuery => previousActor forward MakeQuery
case previousData : Data => sender ! (dbCall(), previousData)
}
}
If you want to querying database, you should use something like slick which is a modern database query and access library for Scala.
quick example of slick:
case class User(id: Option[Int], first: String, last: String)
class Users(tag: Tag) extends Table[User](tag, "users") {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
def first = column[String]("first")
def last = column[String]("last")
def * = (id.?, first, last) <> (User.tupled, User.unapply)
}
val users = TableQuery[Users]
then your need to create configuration for your db:
mydb = {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
databaseName = "mydb"
user = "myuser"
password = "secret"
}
numThreads = 10
}
and in your code you load configuration:
val db = Database.forConfig("mydb")
then run your query with db.run method which gives you future as result, for example you can get all rows by calling method result
val allRows: Future[Seq[User]] = db.run(users.result)
this query run without blocking current thread.
If you have task which take long time to execute or calling to another service, you should use futures.
Example of that is simple HTTP call to external service. you can find example in here
If you have task which take long time to execute and for doing so, you have to keep mutable states, in this case the best option is using Akka Actors which encapsulate your state inside an actor which solve problem of concurrency and thread safety as simple as possible.Example of suck tasks are:
import akka.actor.Actor
import scala.concurrent.Future
case class RegisterEndpoint(endpoint: String)
case class NewUpdate(update: String)
class UpdateConsumer extends Actor {
val endpoints = scala.collection.mutable.Set.empty[String]
override def receive: Receive = {
case RegisterEndpoint(endpoint) =>
endpoints += endpoint
case NewUpdate(update) =>
endpoints.foreach { endpoint =>
deliverUpdate(endpoint, update)
}
}
def deliverUpdate(endpoint: String, update: String): Future[Unit] = {
Future.successful(Unit)
}
}
If you want to process huge amount of live data, or websocket connection, processing CSV file which is growing over time, ... or etc, the best option is Akka stream. For example reading data from kafka topic using Alpakka:Alpakka kafka connector
Within an akka stream stage FlowShape[A, B] , part of the processing I need to do on the A's is to save/query a datastore with a query built with A data. But that datastore driver query gives me a future, and I am not sure how best to deal with it (my main question here).
case class Obj(a: String, b: Int, c: String)
case class Foo(myobject: Obj, name: String)
case class Bar(st: String)
//
class SaveAndGetId extends GraphStage[FlowShape[Foo, Bar]] {
val dao = new DbDao // some dao with an async driver
override def createLogic(inheritedAttributes: Attributes) = new GraphStageLogic(shape) {
setHandlers(in, out, new InHandler with Outhandler {
override def onPush() = {
val foo = grab(in)
val add = foo.record.value()
val result: Future[String] = dao.saveAndGetRecord(add.myobject)//saves and returns id as string
//the naive approach
val record = Await(result, Duration.inf)
push(out, Bar(record))// ***tests pass every time
//mapping the future approach
result.map { x=>
push(out, Bar(x))
} //***tests fail every time
The next stage depends on the id of the db record returned from query, but I want to avoid Await. I am not sure why mapping approach fails:
"it should work" in {
val source = Source.single(Foo(Obj("hello", 1, "world")))
val probe = source
.via(new SaveAndGetId))
.runWith(TestSink.probe)
probe
.request(1)
.expectBarwithId("one")//say we know this will be
.expectComplete()
}
private implicit class RichTestProbe(probe: Probe[Bar]) {
def expectBarwithId(expected: String): Probe[Bar] =
probe.expectNextChainingPF{
case r # Bar(str) if str == expected => r
}
}
When run with mapping future, I get failure:
should work ***FAILED***
java.lang.AssertionError: assertion failed: expected: message matching partial function but got unexpected message OnComplete
at scala.Predef$.assert(Predef.scala:170)
at akka.testkit.TestKitBase$class.expectMsgPF(TestKit.scala:406)
at akka.testkit.TestKit.expectMsgPF(TestKit.scala:814)
at akka.stream.testkit.TestSubscriber$ManualProbe.expectEventPF(StreamTestKit.scala:570)
The async side channels example in the docs has the future in the constructor of the stage, as opposed to building the future within the stage, so doesn't seem to apply to my case.
I agree with Ramon. Constructing a new FlowShapeis not necessary in this case and it is too complicated. It is very much convenient to use mapAsync method here:
Here is a code snippet to utilize mapAsync:
import akka.stream.scaladsl.{Sink, Source}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object MapAsyncExample {
val numOfParallelism: Int = 10
def main(args: Array[String]): Unit = {
Source.repeat(5)
.mapAsync(parallelism)(x => asyncSquare(x))
.runWith(Sink.foreach(println)) previous stage
}
//This method returns a Future
//You can replace this part with your database operations
def asyncSquare(value: Int): Future[Int] = Future {
value * value
}
}
In the snippet above, Source.repeat(5) is a dummy source to emit 5 indefinitely. There is a sample function asyncSquare which takes an integer and calculates its square in a Future. .mapAsync(parallelism)(x => asyncSquare(x)) line uses that function and emits the output of Future to the next stage. In this snipet, the next stage is a sink which prints every item.
parallelism is the maximum number of asyncSquare calls that can run concurrently.
I think your GraphStage is unnecessarily overcomplicated. The below Flow performs the same actions without the need to write a custom stage:
val dao = new DbDao
val parallelism = 10 //number of parallel db queries
val SaveAndGetId : Flow[Foo, Bar, _] =
Flow[Foo]
.map(foo => foo.record.value().myobject)
.mapAsync(parallelism)(rec => dao.saveAndGetRecord(rec))
.map(Bar.apply)
I generally try to treat GraphStage as a last resort, there is almost always an idiomatic way of getting the same Flow by using the methods provided by the akka-stream library.
Trying out the newly minted Akka Streams. It seems to be working except for one small thing - there's no output.
I have the following table definition:
case class my_stream(id: Int, value: String)
class Streams(tag: Tag) extends Table[my_stream](tag, "my_stream") {
def id = column[Int]("id")
def value = column[String]("value")
def * = (id, value) <> (my_stream.tupled, my_stream.unapply)
}
And I'm trying to output the contents of the table to stdout like this:
def main(args: Array[String]) : Unit = {
implicit val system = ActorSystem("Subscriber")
implicit val materializer = ActorMaterializer()
val strm = TableQuery[Streams]
val db = Database.forConfig("pg-postgres")
try{
var src = Source.fromPublisher(db.stream(strm.result))
src.runForeach(r => println(s"${r.id},${r.value}"))(materializer)
} finally {
system.shutdown
db.close
}
}
I have verified that the query is being run by configuring debug logging. However, all I get is this:
08:59:24.099 [main] INFO com.zaxxer.hikari.HikariDataSource - pg-postgres - is starting.
08:59:24.428 [main] INFO com.zaxxer.hikari.pool.HikariPool - pg-postgres - is closing down.
The cause is that Akka Streams is asynchronous and runForeach returns a Future which will be completed once the stream completes, but that Future is not being handled and as such the system.shutdown and db.close executes immediately instead of after the stream completes.
Just in case it helps anyone searching this very same issue but in MySQL, take into account that you should enable the driver stream support "manually":
def enableStream(statement: java.sql.Statement): Unit = {
statement match {
case s: com.mysql.jdbc.StatementImpl => s.enableStreamingResults()
case _ =>
}
}
val publisher = sourceDb.stream(query.result.withStatementParameters(statementInit = enableStream))
Source: http://www.slideshare.net/kazukinegoro5/akka-streams-100-scalamatsuri
Ended up using #ViktorKlang answer and just wrapped the run with an Await.result. I also found an alternative answer in the docs which demonstrates using the reactive streams publisher and subscriber interfaces:
The stream method returns a DatabasePublisher[T] and Source.fromPublisher returns a Source[T, NotUsed]. This means you have to attach a subscriber instead of using runForEach - according to the release notes NotUsed is a replacement for Unit. Which means nothing gets passed to the Sink.
Since Slick implements the reactive streams interface and not the Akka Stream interfaces you need to use the fromPublisher and fromSubscriber integration point. That means you need to implement the org.reactivestreams.Subscriber[T] interface.
Here's a quick and dirty Subscriber[T] implementation which simply calls println:
class MyStreamWriter extends org.reactivestreams.Subscriber[my_stream] {
private var sub : Option[Subscription] = None;
override def onNext(t: my_stream): Unit = {
println(t.value)
if(sub.nonEmpty) sub.head.request(1)
}
override def onError(throwable: Throwable): Unit = {
println(throwable.getMessage)
}
override def onSubscribe(subscription: Subscription): Unit = {
sub = Some(subscription)
sub.head.request(1)
}
override def onComplete(): Unit = {
println("ALL DONE!")
}
}
You need to make sure you call the Subscription.request(Long) method in onSubscribe and then in onNext to ask for data or nothing will be sent or you won't get the full set of results.
And here's how you use it:
def main(args: Array[String]) : Unit = {
implicit val system = ActorSystem("Subscriber")
implicit val materializer = ActorMaterializer()
val strm = TableQuery[Streams]
val db = Database.forConfig("pg-postgres")
try{
val src = Source.fromPublisher(db.stream(strm.result))
val flow = src.to(Sink.fromSubscriber(new MyStreamWriter()))
flow.run()
} finally {
system.shutdown
db.close
}
}
I'm still trying to figure this out so I welcome any feedback. Thanks!
There is some data that I have pulled from a remote API, for which I use a Future-style interface. The data is structured as a linked-list. A relevant example data container is shown below.
case class Data(information: Int) {
def hasNext: Boolean = ??? // Implemented
def next: Future[Data] = ??? // Implemented
}
Now I'm interested in adding some functionality to the data class, such as map, foreach, reduce, etc. To do so I want to implement some form of IterableLike such that it inherets these methods.
Given below is the trait Data may extend, such that it gets this property.
trait AsyncIterable[+T]
extends IterableLike[Future[T], AsyncIterable[T]]
{
def hasNext : Boolean
def next : Future[T]
// How to implement?
override def iterator: Iterator[Future[T]] = ???
override protected[this] def newBuilder: mutable.Builder[Future[T], AsyncIterable[T]] = ???
override def seq: TraversableOnce[Future[T]] = ???
}
It should be a non-blocking implementation, which when acted on, starts requesting the next data from the remote data source.
It is then possible to do cool stuff such as
case class Data(information: Int) extends AsyncIterable[Data]
val data = Data(1) // And more, of course
// Asynchronously print all the information.
data.foreach(data => println(data.information))
It is also acceptable for the interface to be different. But the result should in some way represent asynchronous iteration over the collection. Preferably in a way that is familiar to developers, as it will be part of an (open source) library.
In production I would use one of following:
Akka Streams
Reactive Extensions
For private tests I would implement something similar to following.
(Explanations are below)
I have modified a little bit your Data:
abstract class AsyncIterator[T] extends Iterator[Future[T]] {
def hasNext: Boolean
def next(): Future[T]
}
For it we can implement this Iterable:
class AsyncIterable[T](sourceIterator: AsyncIterator[T])
extends IterableLike[Future[T], AsyncIterable[T]]
{
private def stream(): Stream[Future[T]] =
if(sourceIterator.hasNext) {sourceIterator.next #:: stream()} else {Stream.empty}
val asStream = stream()
override def iterator = asStream.iterator
override def seq = asStream.seq
override protected[this] def newBuilder = throw new UnsupportedOperationException()
}
And if see it in action using following code:
object Example extends App {
val source = "Hello World!";
val iterator1 = new DelayedIterator[Char](100L, source.toCharArray)
new AsyncIterable(iterator1).foreach(_.foreach(print)) //prints 1 char per 100 ms
pause(2000L)
val iterator2 = new DelayedIterator[String](100L, source.toCharArray.map(_.toString))
new AsyncIterable(iterator2).reduceLeft((fl: Future[String], fr) =>
for(l <- fl; r <- fr) yield {println(s"$l+$r"); l + r}) //prints 1 line per 100 ms
pause(2000L)
def pause(duration: Long) = {println("->"); Thread.sleep(duration); println("\n<-")}
}
class DelayedIterator[T](delay: Long, data: Seq[T]) extends AsyncIterator[T] {
private val dataIterator = data.iterator
private var nextTime = System.currentTimeMillis() + delay
override def hasNext = dataIterator.hasNext
override def next = {
val thisTime = math.max(System.currentTimeMillis(), nextTime)
val thisValue = dataIterator.next()
nextTime = thisTime + delay
Future {
val now = System.currentTimeMillis()
if(thisTime > now) Thread.sleep(thisTime - now) //Your implementation will be better
thisValue
}
}
}
Explanation
AsyncIterable uses Stream because it's calculated lazily and it's simple.
Pros:
simplicity
multiple calls to iterator and seq methods return same iterable with all items.
Cons:
could lead to memory overflow because stream keeps all prevously obtained values.
first value is eagerly gotten during creation of AsyncIterable
DelayedIterator is very simplistic implementation of AsyncIterator, don't blame me for quick and dirty code here.
It's still strange for me to see synchronous hasNext and asynchronous next()
Using Twitter Spool I've implemented a working example.
To implement spool I modified the example in the documentation.
import com.twitter.concurrent.Spool
import com.twitter.util.{Await, Return, Promise}
import scala.concurrent.{ExecutionContext, Future}
trait AsyncIterable[+T <: AsyncIterable[T]] { self : T =>
def hasNext : Boolean
def next : Future[T]
def spool(implicit ec: ExecutionContext) : Spool[T] = {
def fill(currentPage: Future[T], rest: Promise[Spool[T]]) {
currentPage foreach { cPage =>
if(hasNext) {
val nextSpool = new Promise[Spool[T]]
rest() = Return(cPage *:: nextSpool)
fill(next, nextSpool)
} else {
val emptySpool = new Promise[Spool[T]]
emptySpool() = Return(Spool.empty[T])
rest() = Return(cPage *:: emptySpool)
}
}
}
val rest = new Promise[Spool[T]]
if(hasNext) {
fill(next, rest)
} else {
rest() = Return(Spool.empty[T])
}
self *:: rest
}
}
Data is the same as before, and now we can use it.
// Cool stuff
implicit val ec = scala.concurrent.ExecutionContext.global
val data = Data(1) // And others
// Print all the information asynchronously
val fut = data.spool.foreach(data => println(data.information))
Await.ready(fut)
It will trow an exception on the second element, because the implementation of next was not provided.