Akka Streams for server streaming (gRPC, Scala) - scala

I am new to Akka Streams and gRPC, I am trying to build an endpoint where client sends a single request and the server sends multiple responses.
This is my protobuf
syntax = "proto3";
option java_multiple_files = true;
option java_package = "customer.service.proto";
service CustomerService {
rpc CreateCustomer(CustomerRequest) returns (stream CustomerResponse) {}
}
message CustomerRequest {
string customerId = 1;
string customerName = 2;
}
message CustomerResponse {
enum Status {
No_Customer = 0;
Creating_Customer = 1;
Customer_Created = 2;
}
string customerId = 1;
Status status = 2;
}
I am trying to achieve this by sending customer request then the server will first check and respond No_Customer then it will send Creating_Customer and finally server will say Customer_Created.
I have no idea where to start for it implementation, looked for hours but still clueless, I will be very thankful if anyone can point me in the right direction.

The place to start is the Akka gRPC documentation and, in particular, the service WalkThrough. It is pretty straightforward to get the samples working in a clean project.
The relevant server sample method is this:
override def itKeepsReplying(in: HelloRequest): Source[HelloReply, NotUsed] = {
println(s"sayHello to ${in.name} with stream of chars...")
Source(s"Hello, ${in.name}".toList).map(character => HelloReply(character.toString))
}
The problem is now to create a Source that returns the right results, but that depends on how you are planning to implement the server so it is difficult to answer. Check the Akka Streams documentation for various options.
The client code is simpler, just call runForeach on the Source that gets returned by CreateCustomer as in the sample:
def runStreamingReplyExample(): Unit = {
val responseStream = client.itKeepsReplying(HelloRequest("Alice"))
val done: Future[Done] =
responseStream.runForeach(reply => println(s"got streaming reply: ${reply.message}"))
done.onComplete {
case Success(_) =>
println("streamingReply done")
case Failure(e) =>
println(s"Error streamingReply: $e")
}
}

Related

How to do some cleanup after the client closes the connection

I'm creating a proxy API using akka that does some preparations before forwarding the request to the actual API. For one of the endpoints, the response is streaming json data and the client may close the connection at any time. Akka seems to handle this automatically, but the issue is I need to do some cleanup after the client closes the connection.
path("query") {
post {
decodeRequest {
entity(as[Query]) { query =>
// proxy does some preparations
val json: String = query.prepared.toJson.toString()
// proxy sends request to actual server
val request = HttpRequest(
method = HttpMethods.POST,
uri = serverUrl + "/query",
entity = HttpEntity(ContentTypes.`application/json`, json)
)
val responseFuture = Http().singleRequest(request)
val response: HttpResponse = Await.result(responseFuture, PROXY_TIMEOUT)
// proxy forwards server's response to user
complete(response)
}
}
}
}
I've tried doing something like
responseFuture.onComplete(_ => doCleanup())
But that doesn't work because responseFuture completes immediately even though the server continues to send data until the client closes the connection. complete(response) also returns immediately.
So I'm wondering how I can make a call to doCleanup() only after the client has closed the connection.
Edit: The cleanup I need to do is because the proxy creates some data streams that are meant to be temporary and only persist until the last message is sent by the server. Once that happens these streams need to be deleted.
You can do it with minimal changes to you code like that:
val responseFuture = Http().singleRequest(request)
val response: HttpResponse = try {
Await.result(responseFuture, PROXY_TIMEOUT)
} finally {
doCleanup()
}
complete(response)
or you can do it without blocking:
val responseFuture = Http().singleRequest(request)
val cleaned = responseFuture.andThen{case _ => doCleanUp()}
complete(cleaned) //it's possible to complete response with Future

NullPointerException in Flink custom SourceFunction

I wanted to create a SourceFunction which reads a http stream.
I used ScalaJ which does what I want (it splits the incoming text by \n-s).
Obviously the code works outside Flink, but I get a NullPointerExcetion every time I start it as a Flink job (sometimes immediately sometimes after 1-2 seconds after it transmitted 1-2 elements). It kind of looks like the Http object has some problems.
import org.apache.flink.streaming.api.functions.source.SourceFunction
import scala.io.Source.fromInputStream
import scalaj.http._
class HttpSource(url: String) extends SourceFunction[String] {
#volatile var isRunning = true
override def cancel(): Unit = isRunning = false
override def run(ctx: SourceFunction.SourceContext[String]): Unit =
httpStream(ctx.collect)
private def httpStream(f: String => Unit) = {
val request = Http(url)
request
.execute { inputStream =>
fromInputStream(inputStream)
.getLines()
.takeWhile(_ => isRunning)
.foreach(f)
}
}
}
Here's the exception I usually get:
(Sometimes it's a bit different, for example I tried to make the request value transient, then it's already null when it tries to refer to request)
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:78)
at java.io.InputStreamReader.<init>(InputStreamReader.java:129)
at scala.io.BufferedSource.reader(BufferedSource.scala:24)
at scala.io.BufferedSource.bufferedReader(BufferedSource.scala:25)
at scala.io.BufferedSource.scala$io$BufferedSource$$charReader$lzycompute(BufferedSource.scala:35)
at scala.io.BufferedSource.scala$io$BufferedSource$$charReader(BufferedSource.scala:33)
at scala.io.BufferedSource.scala$io$BufferedSource$$decachedReader(BufferedSource.scala:62)
at scala.io.BufferedSource$BufferedLineIterator.<init>(BufferedSource.scala:67)
at scala.io.BufferedSource.getLines(BufferedSource.scala:86)
at flinkextension.HttpSource$$anonfun$httpStream$1.apply(HttpSource.scala:21)
at flinkextension.HttpSource$$anonfun$httpStream$1.apply(HttpSource.scala:19)
at scalaj.http.HttpRequest$$anonfun$execute$1.apply(Http.scala:323)
at scalaj.http.HttpRequest$$anonfun$execute$1.apply(Http.scala:323)
at scalaj.http.HttpRequest$$anonfun$toResponse$3.apply(Http.scala:388)
at scalaj.http.HttpRequest$$anonfun$toResponse$3.apply(Http.scala:380)
at scala.Option.getOrElse(Option.scala:121)
at scalaj.http.HttpRequest.toResponse(Http.scala:380)
at scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:360)
at scalaj.http.HttpRequest.exec(Http.scala:335)
at scalaj.http.HttpRequest.execute(Http.scala:323)
at flinkextension.HttpSource.httpStream(HttpSource.scala:19)
at flinkextension.HttpSource.run(HttpSource.scala:14)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)
Everything else seems to be working fine, when I don't use a http request, but something else like file read with the same InputStream type, just a plain while loop with strings or even when I use single http requests, which aren't streaming.
I feel like I'm missing some theoretical background, maybe flink does something in the background which destroys the Http object or the InputStream, but I didn't find anything in the documentation.
UPDATE #1:
If I put a null check into the lambda, the job usually exits immediately, sometimes processes a few elements, sometimes timeouts after hanging for a minute. Here's this version of the httpStream function:
private def httpStream(f: String => Unit) = {
val request = Http(url)
request
.execute { inputStream =>
if (inputStream == null) println("null inputstream")
else {
println("not null inputstream")
fromInputStream(inputStream)
.getLines()
.takeWhile(_ => isRunning)
.foreach(f)
}
}
}
UPDATE #2:
The code actually works in distributed mode and with StreamExecutionEnvironment.createLocalEnvironment()
I only experience the issue if I use start-local.sh and submit the jar to it.

scalaz-stream how to implement `ask-then-wait-reply` tcp client

I want to implement an client app that first send an request to server then wait for its reply(similar to http)
My client process may be
val topic = async.topic[ByteVector]
val client = topic.subscribe
Here is the api
trait Client {
val incoming = tcp.connect(...)(client)
val reqBus = topic.pubsh()
def ask(req: ByteVector): Task[Throwable \/ ByteVector] = {
(tcp.writes(req).flatMap(_ => tcp.reads(1024))).to(reqBus)
???
}
}
Then, how to implement the remain part of ask ?
Usually, the implementation is done with publishing the message via sink and then awaiting some sort of reply on some source, like your topic.
Actually we have a lot of idioms of this in our code :
def reqRply[I,O,O2](src:Process[Task,I],sink:Sink[Task,I],reply:Process[Task,O])(pf: PartialFunction[O,O2]):Process[Task,O2] = {
merge.mergeN(Process(reply, (src to sink).drain)).collectFirst(pf)
}
Essentially this first hooks to reply stream to await any resulting O confirming our request sent. Then we publish message I and consult pf for any incoming O to be eventually translated to O2 and then terminate.

concurrent requests limit of Twitter-Finagle

I create a thrift server using Finagle like this
val server = Thrift.serveIface(bindAddr(), new MyService[Future] {
def myRPCFuction() {}
})
But, I found that the maximum number of concurrent requests is five( why 5? when more than 5, the server just ignore the excessed ones.) I look through the doc of Finagle really hard (http://twitter.github.io/finagle/guide/Protocols.html#thrift-and-scrooge), but find nothing hint to configure the max-request-limit.
How to config the maximum concurrent request num of Finagle? Thanks
I've solved this problem by myself and I share it here to help others who may run into the same case. Because I m a thrift user before and in Thrift when you return from the RPC function you return the values back to calling client. While in Finagle only when you use Future.value() you return the value to client. And when use Finagle, you should totally use the asynchronous way, that's to say you had better not sleep or do some other RPC synchronously in the RPC function.
/* THIS is BAD */
val server = Thrift.serveIface(bindAddr(), new MyService[Future] {
def myRPCFuction() {
val rpcFuture = rpcClient.callOtherRpc() // call other rpc which return a future
val result = Await.result(rpcFuture, TwitterDuration(rpcTimeoutSec()*1000, MILLISECONDS))
Future.value(result)
}
})
/* This is GOOD */
val server = Thrift.serveIface(bindAddr(), new MyService[Future] {
def myRPCFuction() {
val rpcFuture = rpcClient.callOtherRpc() // call other rpc which return a future
rpcFuture onSuccess { // do you job when success (you can return to client using Future.value) }
rpcFuture onFailure { // do your job when fail }
}
})
Then, can get a satisfactory concurrency. Hope it helps others who have the same issue.

unicast in Play framework and SSE (scala): how do i know which stream to send to?

my app lists hosts, and the list is dynamic and changing. it is based on Akka actors and Server Sent Events.
when a new client connects, they need to get the current list to display. but, i don't want to push the list to all clients every time a new one connects. so, followed the realtime elastic search example and emulated unicast by creating an (Enumerator, Channel) per Connect() and giving it an UUID. when i need to broadcast i will map over all and update them, with the intent of being able to do unicast to clients (and there should be very few of those).
my problem is - how do i get the new client its UUID so it can use it? the flow i am looking for is:
- client asks for EventStream
- server creates a new (Enumerator, channel) with a UUID, and returns Enumerator and UUID to client
- client asks for table using uuid
- server pushes table only on channel corresponding to the uuid
so, how would the client know about the UUID? had it been web socket, sending the request should have had the desired result, as it would have reached its own channel. but in SSE the client -> server is done on a different channel. any solutions to that?
code snippets:
case class Connected(uuid: UUID, enumerator: Enumerator[ JsValue ] )
trait MyActor extends Actor{
var channelMap = new HashMap[UUID,(Enumerator[JsValue], Channel[JsValue])]
def connect() = {
val con = Concurrent.broadcast[JsValue]
val uuid = UUID.randomUUID()
channelMap += (uuid -> con)
Connected(uuid, con._1)
}
...
}
object HostsActor extends MyActor {
...
override def receive = {
case Connect => {
sender ! connect
}
...
}
object Actors {
def hostsStream = {
getStream(getActor("hosts", Props (HostsActor)))
}
def getActor(actorPath: String, actorProps : Props): Future[ActorRef] = {
/* some regular code to create a new actor if the path does not exist, or return the existing one else */
}
def getStream(far: Future[ActorRef]) = {
far flatMap {ar =>
(ar ? Connect).mapTo[Connected].map { stream =>
stream
}
}
}
...
}
object AppController extends Controller {
def getHostsStream = Action.async {
Actors.hostsStream map { ac =>
************************************
** how do i use the UUID here?? **
************************************
Ok.feed(ac.enumerator &> EventSource()).as("text/event-stream")
}
}
I managed to solve it by asynchronously pushing the uuid after returning the channel, with some time in between:
override def receive = {
case Connect => {
val con = connect()
sender ! con
import scala.concurrent.ExecutionContext.Implicits.global
context.system.scheduler.scheduleOnce(0.1 seconds){
unicast(
con.uuid,
JsObject (
Seq (
"uuid" -> JsString(con.uuid.toString)
)
)
)
}
}
this achieved its goal - the client got the UUID and was able to cache and use it to push a getHostsList to the server:
#stream = new EventSource("/streams/hosts")
#stream.addEventListener "message", (event) =>
data = JSON.parse(event.data)
if data.uuid
#uuid = data.uuid
$.ajax
type: 'POST',
url: "/streams/hosts/" + #uuid + "/sendlist"
success: (data) ->
console.log("sent hosts request to server successfully")
error: () ->
console.log("failed sending hosts request to server")
else
****************************
* *
* handle parsing hosts *
* *
* *
****************************
#view.render()
while this works, i must say i don't like it. introducing an artificial delay so the client can get the channel and start listening (i tried with no delay, and the client didn't get the uuid) is dangerous, as it might still miss if the system get busier, but making it too long hurts the reactivity aspect.
if anyone has a solution in which this can be done synchronically - having the uuid returned as part of the original eventSource request - i would be more than happy to demote my solution.