Akka-http process requests with Stream - scala

I try write some simple akka-http and akka-streams based application, that handle http requests, always with one precompiled stream, because I plan to use long time processing with back-pressure in my requestProcessor stream
My application code:
import akka.actor.{ActorSystem, Props}
import akka.http.scaladsl._
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server._
import akka.stream.ActorFlowMaterializer
import akka.stream.actor.ActorPublisher
import akka.stream.scaladsl.{Sink, Source}
import scala.annotation.tailrec
import scala.concurrent.Future
object UserRegisterSource {
def props: Props = Props[UserRegisterSource]
final case class RegisterUser(username: String)
}
class UserRegisterSource extends ActorPublisher[UserRegisterSource.RegisterUser] {
import UserRegisterSource._
import akka.stream.actor.ActorPublisherMessage._
val MaxBufferSize = 100
var buf = Vector.empty[RegisterUser]
override def receive: Receive = {
case request: RegisterUser =>
if (buf.isEmpty && totalDemand > 0)
onNext(request)
else {
buf :+= request
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
object Main extends App {
val host = "127.0.0.1"
val port = 8094
implicit val system = ActorSystem("my-testing-system")
implicit val fm = ActorFlowMaterializer()
implicit val executionContext = system.dispatcher
val serverSource: Source[Http.IncomingConnection, Future[Http.ServerBinding]] = Http(system).bind(interface = host, port = port)
val mySource = Source.actorPublisher[UserRegisterSource.RegisterUser](UserRegisterSource.props)
val requestProcessor = mySource
.mapAsync(1)(fakeSaveUserAndReturnCreatedUserId)
.to(Sink.head[Int])
.run()
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
requestProcessor ! UserRegisterSource.RegisterUser(t)
???
}
}
}
def fakeSaveUserAndReturnCreatedUserId(param: UserRegisterSource.RegisterUser): Future[Int] =
Future.successful {
1
}
serverSource.to(Sink.foreach {
connection =>
connection handleWith Route.handlerFlow(route)
}).run()
}
I found solution about how create Source that can dynamically accept new items to process, but I can found any solution about how than obtain result of stream execution in my route

The direct answer to your question is to materialize a new Stream for each HttpRequest and use Sink.head to get the value you're looking for. Modifying your code:
val requestStream =
mySource.map(fakeSaveUserAndReturnCreatedUserId)
.to(Sink.head[Int])
//.run() - don't materialize here
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
//materialize a new Stream here
val userIdFut : Future[Int] = requestStream.run()
requestProcessor ! UserRegisterSource.RegisterUser(t)
//get the result of the Stream
userIdFut onSuccess { case userId : Int => ...}
}
}
}
However, I think your question is ill posed. In your code example the only thing you're using an akka Stream for is to create a new UserId. Futures readily solve this problem without the need for a materialized Stream (and all the accompanying overhead):
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
val user = RegisterUser(t)
fakeSaveUserAndReturnCreatedUserId(user) onSuccess { case userId : Int =>
...
}
}
}
}
If you want to limit the number of concurrent calls to fakeSaveUserAndReturnCreateUserId then you can create an ExecutionContext with a defined ThreadPool size, as explained in the answer to this question, and use that ExecutionContext to create the Futures:
val ThreadCount = 10 //concurrent queries
val limitedExecutionContext =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(ThreadCount))
def fakeSaveUserAndReturnCreatedUserId(param: UserRegisterSource.RegisterUser): Future[Int] =
Future { 1 }(limitedExecutionContext)

Related

How to real test the akka stream?

I am using WebsocketClient and would like to test against the received message. I've chosen the Scalatest framework and I know, that the test has be carry out asynchronously.
The websocket client looks as the following:
import akka.{Done}
import akka.http.scaladsl.Http
import akka.stream.scaladsl._
import akka.http.scaladsl.model.ws._
import io.circe.syntax._
import scala.concurrent.Future
object WsClient {
import Trigger._
private val convertJson: PreMsg => String = msg =>
msg.asJson.noSpaces
val send: PreMsg => (String => Unit) => RunnableGraph[Future[Done]] = msg => fn =>
Source.single(convertJson(msg))
.map(TextMessage(_))
.via(Http().webSocketClientFlow(WebSocketRequest(s"ws://{$Config.host}:{$Config.port}/saprs")))
.map(_.asTextMessage.getStrictText)
.toMat(Sink.foreach(fn))(Keep.right)
}
and the test:
feature("Process incoming messages") {
info("As a user, I want that incoming messages is going to process appropriately.")
info("A message should contain the following properties: `sap_id`, `sap_event`, `payload`")
scenario("Message is not intended for the server") {
Given("A message with `sap_id:unknown`")
val msg = PreMsg("unknown", "unvalid", "{}")
When("the message gets validated")
val ws = WsClient.send(msg)
Then("it should has the `status: REJECT` in the response content")
ws { msg =>
//Would like test against the msg here
}.run()
.map(_ => assert(1 == 1))
}
I would to test against the content of msg, but I do not know, how to do it.
I followed the play-scala-websocket-example
They use a WebSocketClient as a helper, see WebSocketClient.java
Then a test looks like:
Helpers.running(TestServer(port, app)) {
val myPublicAddress = s"localhost:$port"
val serverURL = s"ws://$myPublicAddress/ws"
val asyncHttpClient: AsyncHttpClient = client.underlying[AsyncHttpClient]
val webSocketClient = new WebSocketClient(asyncHttpClient)
val queue = new ArrayBlockingQueue[String](10)
val origin = serverURL
val consumer: Consumer[String] = new Consumer[String] {
override def accept(message: String): Unit = queue.put(message)
}
val listener = new WebSocketClient.LoggingListener(consumer)
val completionStage = webSocketClient.call(serverURL, origin, listener)
val f = FutureConverters.toScala(completionStage)
// Test we can get good output from the websocket
whenReady(f, timeout = Timeout(1.second)) { webSocket =>
val condition: Callable[java.lang.Boolean] = new Callable[java.lang.Boolean] {
override def call(): java.lang.Boolean = webSocket.isOpen && queue.peek() != null
}
await().until(condition)
val input: String = queue.take()
val json:JsValue = Json.parse(input)
val symbol = (json \ "symbol").as[String]
List(symbol) must contain oneOf("AAPL", "GOOG", "ORCL")
}
}
}
See here: FunctionalSpec.scala

How to test a Scala Play Framework websocket?

If I have a websocket like the following:
def websocket: WebSocket = WebSocket.accept[String, String] { _ =>
ActorFlow.actorRef(out => LightWebSocketActor.props(out))
}
For reference, this is the LightWebSocketActor:
class LightWebSocketActor(out: ActorRef) extends Actor {
val topic: String = service.topic
override def receive: Receive = {
case message: String =>
play.Logger.debug(s"Message: $message")
PublishService.publish("true")
out ! message
}
}
object LightWebSocketActor {
var list: ListBuffer[ActorRef] = ListBuffer.empty[ActorRef]
def props(out: ActorRef): Props = {
list += out
Props(new LightSocketActor(out))
}
def sendMessage(message: String): Unit = {
list.foreach(_ ! message)
}
}
This is using the akka websocket approach.
How should a test for this kind of controller be created?
How should I send information an expect a response?
What kind of information should be sent in the fake request?
For example I have this test for a regular html-returning controller:
"Application" should {
"render the index page" in new WithApplication {
val home = route(app, FakeRequest(GET, "/")).get
status(home) must equalTo(OK)
contentType(home) must beSome.which(_ == "text/html")
contentAsString(home) must contain ("shouts out")
}
}
Play 2.6
I followed this Example: play-scala-websocket-example
Main steps:
Create or provide a WebSocketClient that you can use in your
tests.
Create the client:
val asyncHttpClient: AsyncHttpClient = wsClient.underlying[AsyncHttpClient]
val webSocketClient = new WebSocketClient(asyncHttpClient)
Connect to the serverURL:
val listener = new WebSocketClient.LoggingListener(message => queue.put(message))
val completionStage = webSocketClient.call(serverURL, origin, listener)
val f = FutureConverters.toScala(completionStage)
Test the Messages sent by the Server:
whenReady(f, timeout = Timeout(1.second)) { webSocket =>
await().until(() => webSocket.isOpen && queue.peek() != null)
checkMsg1(queue.take())
checkMsg2(queue.take())
assert(queue.isEmpty)
}
For example, like:
private def checkMsg1(msg: String) {
val json: JsValue = Json.parse(msg)
json.validate[AdapterMsg] match {
case JsSuccess(AdapterNotRunning(None), _) => // ok
case other => fail(s"Unexpected result: $other")
}
}
The whole example can be found here: scala-adapters (JobCockpitControllerSpec)
Adapted to Playframework 2.7
import java.util.concurrent.ExecutionException
import java.util.function.Consumer
import com.typesafe.scalalogging.StrictLogging
import play.shaded.ahc.org.asynchttpclient.AsyncHttpClient
import play.shaded.ahc.org.asynchttpclient.netty.ws.NettyWebSocket
import play.shaded.ahc.org.asynchttpclient.ws.{WebSocket, WebSocketListener, WebSocketUpgradeHandler}
import scala.compat.java8.FutureConverters
import scala.concurrent.Future
class LoggingListener(onMessageCallback: Consumer[String]) extends WebSocketListener with StrictLogging {
override def onOpen(websocket: WebSocket): Unit = {
logger.info("onClose: ")
websocket.sendTextFrame("hello")
}
override def onClose(webSocket: WebSocket, i: Int, s: String): Unit =
logger.info("onClose: ")
override def onError(t: Throwable): Unit =
logger.error("onError: ", t);
override def onTextFrame(payload: String, finalFragment: Boolean, rsv: Int): Unit = {
logger.debug(s"$payload $finalFragment $rsv")
onMessageCallback.accept(payload)
}
}
class WebSocketClient(client: AsyncHttpClient) {
#throws[ExecutionException]
#throws[InterruptedException]
def call(url: String, origin: String, listener: WebSocketListener): Future[NettyWebSocket] = {
val requestBuilder = client.prepareGet(url).addHeader("Origin", origin)
val handler = new WebSocketUpgradeHandler.Builder().addWebSocketListener(listener).build
val listenableFuture = requestBuilder.execute(handler)
FutureConverters.toScala(listenableFuture.toCompletableFuture)
}
}
And in test:
val myPublicAddress = s"localhost:$port"
val serverURL = s"ws://$myPublicAddress/api/alarm/ws"
val asyncHttpClient = client.underlying[AsyncHttpClient]
val webSocketClient = new WebSocketClient(asyncHttpClient)
val origin = "ws://example.com/ws"
val consumer: Consumer[String] = (message: String) => logger.debug(message)
val listener = new LoggingListener(consumer)
val f = webSocketClient.call(serverURL, origin, listener)
Await.result(f, atMost = 1000.millis)
This is a complete example which uses the Akka Websocket Client to test a Websocket controller. There is some custom code, but it shows multiple test scenarios. This works for Play 2.7.
package controllers
import java.util.concurrent.{ LinkedBlockingDeque, TimeUnit }
import actors.WSBridge
import akka.Done
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.headers.{ Origin, RawHeader }
import akka.http.scaladsl.model.ws.{ BinaryMessage, Message, TextMessage, WebSocketRequest }
import akka.http.scaladsl.model.{ HttpResponse, StatusCodes, Uri }
import akka.stream.scaladsl.{ Flow, Keep, Sink, Source, SourceQueueWithComplete }
import akka.stream.{ ActorMaterializer, OverflowStrategy }
import models.WSTopic
import org.specs2.matcher.JsonMatchers
import play.api.Logging
import play.api.inject.guice.GuiceApplicationBuilder
import play.api.test._
import scala.collection.immutable.Seq
import scala.concurrent.Future
/**
* Test case for the [[WSController]] actor.
*/
class WSControllerSpec extends ForServer with WSControllerSpecContext with JsonMatchers {
"The `socket` method" should {
"return a 403 status code if the origin doesn't match" >> { implicit rs: RunningServer =>
val maybeSocket = await(websocketClient.connect(WebSocketRequest(endpoint)))
maybeSocket must beLeft[HttpResponse].like { case response =>
response.status must be equalTo StatusCodes.Forbidden
}
}
"return a 400 status code if the topic cannot be found" >> { implicit rs: RunningServer =>
val headers = Seq(Origin("http://localhost:9443"))
val maybeSocket = await(websocketClient.connect(WebSocketRequest(endpoint, headers)))
maybeSocket must beLeft[HttpResponse].like { case response =>
response.status must be equalTo StatusCodes.BadRequest
}
}
"return a 400 status code if the topic syntax isn't valid in query param" >> { implicit rs: RunningServer =>
val headers = Seq(Origin("http://localhost:9443"))
val request = WebSocketRequest(endpoint.withRawQueryString("?topic=."), headers)
val maybeSocket = await(websocketClient.connect(request))
maybeSocket must beLeft[HttpResponse].like { case response =>
response.status must be equalTo StatusCodes.BadRequest
}
}
"return a 400 status code if the topic syntax isn't valid in header param" >> { implicit rs: RunningServer =>
val headers = Seq(Origin("http://localhost:9443"), RawHeader("X-TOPIC", "."))
val maybeSocket = await(websocketClient.connect(WebSocketRequest(endpoint, headers)))
maybeSocket must beLeft[HttpResponse].like { case response =>
response.status must be equalTo StatusCodes.BadRequest
}
}
"receive an acknowledge message when connecting to a topic via query param" >> { implicit rs: RunningServer =>
val headers = Seq(Origin("http://localhost:9443"))
val request = WebSocketRequest(endpoint.withRawQueryString("topic=%2Fflowers%2Frose"), headers)
val maybeSocket = await(websocketClient.connect(request))
maybeSocket must beRight[(SourceQueue, MessageQueue)].like { case (_, messages) =>
messages.poll(1000, TimeUnit.MILLISECONDS) must be equalTo
WSBridge.Ack(WSTopic("/flowers/rose")).message.toJson.toString()
}
}
"receive an acknowledge message when connecting to a topic via query param" >> { implicit rs: RunningServer =>
val headers = Seq(Origin("http://localhost:9443"), RawHeader("X-TOPIC", "/flowers/tulip"))
val maybeSocket = await(websocketClient.connect(WebSocketRequest(endpoint, headers)))
maybeSocket must beRight[(SourceQueue, MessageQueue)].like { case (_, messages) =>
messages.poll(1000, TimeUnit.MILLISECONDS) must be equalTo
WSBridge.Ack(WSTopic("/flowers/tulip")).message.toJson.toString()
}
}
"receive a pong message when sending a ping" >> { implicit rs: RunningServer =>
val headers = Seq(Origin("http://localhost:9443"), RawHeader("X-TOPIC", "/flowers/tulip"))
val maybeSocket = await(websocketClient.connect(WebSocketRequest(endpoint, headers)))
maybeSocket must beRight[(SourceQueue, MessageQueue)].like { case (queue, messages) =>
queue.offer(WSBridge.Ping.toJson.toString())
messages.poll(1000, TimeUnit.MILLISECONDS) must be equalTo
WSBridge.Ack(WSTopic("/flowers/tulip")).message.toJson.toString()
messages.poll(1000, TimeUnit.MILLISECONDS) must be equalTo
WSBridge.Pong.toJson.toString()
}
}
}
}
/**
* The context for the [[WSControllerSpec]].
*/
trait WSControllerSpecContext extends ForServer with PlaySpecification with ApplicationFactories {
type SourceQueue = SourceQueueWithComplete[String]
type MessageQueue = LinkedBlockingDeque[String]
/**
* Provides the application factory.
*/
protected def applicationFactory: ApplicationFactory = withGuiceApp(GuiceApplicationBuilder())
/**
* Gets the WebSocket endpoint.
*
* #param rs The running server.
* #return The WebSocket endpoint.
*/
protected def endpoint(implicit rs: RunningServer): Uri =
Uri(rs.endpoints.httpEndpoint.get.pathUrl("/ws").replace("http://", "ws://"))
/**
* Provides an instance of the WebSocket client.
*
* This should be a method to return a fresh client for every test.
*/
protected def websocketClient = new AkkaWebSocketClient
/**
* An Akka WebSocket client that is optimized for testing.
*/
class AkkaWebSocketClient extends Logging {
/**
* The queue of received messages.
*/
private val messageQueue = new LinkedBlockingDeque[String]()
/**
* Connect to the WebSocket.
*
* #param wsRequest The WebSocket request instance.
* #return Either an [[HttpResponse]] if the upgrade process wasn't successful or a source and a message queue
* to which new messages may be offered.
*/
def connect(wsRequest: WebSocketRequest): Future[Either[HttpResponse, (SourceQueue, MessageQueue)]] = {
implicit val system: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
import system.dispatcher
// Store each incoming message in the messages queue
val incoming: Sink[Message, Future[Done]] = Sink.foreach {
case TextMessage.Strict(s) => messageQueue.offer(s)
case TextMessage.Streamed(s) => s.runFold("")(_ + _).foreach(messageQueue.offer)
case BinaryMessage.Strict(s) => messageQueue.offer(s.utf8String)
case BinaryMessage.Streamed(s) => s.runFold("")(_ + _.utf8String).foreach(messageQueue.offer)
}
// Out source is a queue to which we can offer messages that will be sent to the WebSocket server.
// All offered messages will be transformed into WebSocket messages.
val sourceQueue = Source.queue[String](Int.MaxValue, OverflowStrategy.backpressure)
.map { msg => TextMessage.Strict(msg) }
val (sourceMat, source) = sourceQueue.preMaterialize()
// The outgoing flow sends all messages which are offered to the queue (our stream source) to the WebSocket
// server.
val flow: Flow[Message, Message, Future[Done]] = Flow.fromSinkAndSourceMat(incoming, source)(Keep.left)
// UpgradeResponse is a Future[WebSocketUpgradeResponse] that completes or fails when the connection succeeds
// or fails and closed is a Future[Done] representing the stream completion from above
val (upgradeResponse, closed) = Http().singleWebSocketRequest(wsRequest, flow)
closed.foreach(_ => logger.info("Channel closed"))
upgradeResponse.map { upgrade =>
if (upgrade.response.status == StatusCodes.SwitchingProtocols) {
Right((sourceMat, messageQueue))
} else {
Left(upgrade.response)
}
}
}
}
}

Spark Scala UDP receive on listening port

The example mentioned in
http://spark.apache.org/docs/latest/streaming-programming-guide.html
Lets me receive data packets in a TCP stream and listening on port 9999
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
// Create a DStream that will connect to hostname:port, like localhost:9999
val lines = ssc.socketTextStream("localhost", 9999)
// Split each line into words
val words = lines.flatMap(_.split(" "))
import org.apache.spark.streaming.StreamingContext._ // not necessary since Spark 1.3
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
I am able to send data over TCP by creating a data server by using in my Linux system
$ nc -lk 9999
Question
I need to receive stream from an android phone streaming using UDP and the Scala/Spark
val lines = ssc.socketTextStream("localhost", 9999)
receives ONLY in TCP streams.
How can I receive UDP streams in a similar simple manner using Scala+Spark and create Spark DStream.
There isn't something built in, but it's not too much work to get it done youself. Here is a simple solution I made based on a custom UdpSocketInputDStream[T]:
import java.io._
import java.net.{ConnectException, DatagramPacket, DatagramSocket, InetAddress}
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
import scala.reflect.ClassTag
import scala.util.control.NonFatal
class UdpSocketInputDStream[T: ClassTag](
_ssc: StreamingContext,
host: String,
port: Int,
bytesToObjects: InputStream => Iterator[T],
storageLevel: StorageLevel
) extends ReceiverInputDStream[T](_ssc) {
def getReceiver(): Receiver[T] = {
new UdpSocketReceiver(host, port, bytesToObjects, storageLevel)
}
}
class UdpSocketReceiver[T: ClassTag](host: String,
port: Int,
bytesToObjects: InputStream => Iterator[T],
storageLevel: StorageLevel) extends Receiver[T](storageLevel) {
var udpSocket: DatagramSocket = _
override def onStart(): Unit = {
try {
udpSocket = new DatagramSocket(port, InetAddress.getByName(host))
} catch {
case e: ConnectException =>
restart(s"Error connecting to $port", e)
return
}
// Start the thread that receives data over a connection
new Thread("Udp Socket Receiver") {
setDaemon(true)
override def run() {
receive()
}
}.start()
}
/** Create a socket connection and receive data until receiver is stopped */
def receive() {
try {
val buffer = new Array[Byte](2048)
// Create a packet to receive data into the buffer
val packet = new DatagramPacket(buffer, buffer.length)
udpSocket.receive(packet)
val iterator = bytesToObjects(new ByteArrayInputStream(packet.getData, packet.getOffset, packet.getLength))
// Now loop forever, waiting to receive packets and printing them.
while (!isStopped() && iterator.hasNext) {
store(iterator.next())
}
if (!isStopped()) {
restart("Udp socket data stream had no more data")
}
} catch {
case NonFatal(e) =>
restart("Error receiving data", e)
} finally {
onStop()
}
}
override def onStop(): Unit = {
synchronized {
if (udpSocket != null) {
udpSocket.close()
udpSocket = null
}
}
}
}
In order to get StreamingContext to add a method on itself, we enrich it with an implicit class:
object Implicits {
implicit class StreamingContextOps(val ssc: StreamingContext) extends AnyVal {
def udpSocketStream[T: ClassTag](host: String,
port: Int,
converter: InputStream => Iterator[T],
storageLevel: StorageLevel): InputDStream[T] = {
new UdpSocketInputDStream(ssc, host, port, converter, storageLevel)
}
}
}
And here is how you call it all:
import java.io.{BufferedReader, InputStream, InputStreamReader}
import java.nio.charset.StandardCharsets
import org.apache.spark.SparkContext
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.reflect.ClassTag
object TestRunner {
import Implicits._
def main(args: Array[String]): Unit = {
val sparkContext = new SparkContext("local[*]", "udpTest")
val ssc = new StreamingContext(sparkContext, Seconds(4))
val stream = ssc.udpSocketStream("localhost",
3003,
bytesToLines,
StorageLevel.MEMORY_AND_DISK_SER_2)
stream.print()
ssc.start()
ssc.awaitTermination()
}
def bytesToLines(inputStream: InputStream): Iterator[String] = {
val dataInputStream = new BufferedReader(
new InputStreamReader(inputStream, StandardCharsets.UTF_8))
new NextIterator[String] {
protected override def getNext(): String = {
val nextValue = dataInputStream.readLine()
if (nextValue == null) {
finished = true
}
nextValue
}
protected override def close() {
dataInputStream.close()
}
}
}
abstract class NextIterator[U] extends Iterator[U] {
protected var finished = false
private var gotNext = false
private var nextValue: U = _
private var closed = false
override def next(): U = {
if (!hasNext) {
throw new NoSuchElementException("End of stream")
}
gotNext = false
nextValue
}
override def hasNext: Boolean = {
if (!finished) {
if (!gotNext) {
nextValue = getNext()
if (finished) {
closeIfNeeded()
}
gotNext = true
}
}
!finished
}
def closeIfNeeded() {
if (!closed) {
closed = true
close()
}
}
protected def getNext(): U
protected def close()
}
}
Most of this code is taken from the SocketInputDStream[T] provided by Spark, I simply re-used it. I also took the code for the NextIterator which is used by bytesToLines, all it does is consume the line from the packet and transform it to a String. If you have more complex logic, you can provide it by passing converter: InputStream => Iterator[T] your own implementation.
Testing it with simple UDP packet:
echo -n "hello hello hello!" >/dev/udp/localhost/3003
Yields:
-------------------------------------------
Time: 1482676728000 ms
-------------------------------------------
hello hello hello!
Of course, this has to be further tested. I also has a hidden assumption that each buffer created from the DatagramPacket is 2048 bytes, which is perhaps something you'll want to change.
The problem with the Yuval Itzchakov's solution is that the receiver receives one message and restarts itself. Just replace restart for receive as shown below.
def receive() {
try {
val buffer = new Array[Byte](200000)
// Create a packet to receive data into the buffer
val packet = new DatagramPacket(buffer, buffer.length)
udpSocket.receive(packet)
val iterator = bytesToLines(new ByteArrayInputStream(packet.getData, packet.getOffset, packet.getLength))
// Now loop forever, waiting to receive packets and printing them.
while (!isStopped() && iterator.hasNext) {
store(iterator)
}
if (!isStopped()) {
// restart("Udp socket data stream had no more data")
receive()
}
} catch {
case NonFatal(e) =>
restart("Error receiving data", e)
} finally {
onStop()
}
}

An akka streams function that creates a sink and a source that emits whatever the sink receives

The goal is to implement a function with this signature
def bindedSinkAndSource[A]:(Sink[A, Any], Source[A, Any]) = ???
where the returned source emits whatever the sink receives.
My primary goal is to implement a websocket forwarder by means of the handleWebSocketMessages directive.
The forwarder graph is:
leftReceiver ~> rightEmitter
leftEmitter <~ rightReceiver
where the leftReceiver and leftEmiter are the in and out of the left endpoint handler flow; and rightReceiver and rightEmitter are the in and out of the right endpoint handler flow.
For example:
import akka.NotUsed
import akka.http.scaladsl.model.ws.Message
import akka.http.scaladsl.server.Directive.addByNameNullaryApply
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server.Route
import akka.stream.scaladsl.Flow
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Source
def buildHandlers(): Route = {
val (leftReceiver, rightEmitter) = bindedSinkAndSource[Message];
val (rightReceiver, leftEmitter) = bindedSinkAndSource[Message];
val leftHandlerFlow = Flow.fromSinkAndSource(leftReceiver, leftEmitter)
val rightHandlerFlow = Flow.fromSinkAndSource(rightReceiver, rightEmitter)
pathPrefix("leftEndpointChannel") {
handleWebSocketMessages(leftHandlerFlow)
} ~
pathPrefix("rightEndpointChannel") {
handleWebSocketMessages(rightHandlerFlow)
}
}
All the ideas that came to me were frustrated by the fact that thehandleWebSocketMessages(..) directive don't give access to the materialized value of the received flow.
I found a way to achieve the goal, but there could be shorter and easier ways. If you know one, please don't hesitate to add your knowledge.
import org.reactivestreams.Publisher
import org.reactivestreams.Subscriber
import org.reactivestreams.Subscription
import akka.NotUsed
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Source
def bindedSinkAndSource[A]: (Sink[A, NotUsed], Source[A, NotUsed]) = {
class Binder extends Subscriber[A] with Publisher[A] { binder =>
var oUpStreamSubscription: Option[Subscription] = None;
var oDownStreamSubscriber: Option[Subscriber[_ >: A]] = None;
var pendingRequestFromDownStream: Option[Long] = None;
var pendingCancelFromDownStream: Boolean = false;
def onSubscribe(upStreamSubscription: Subscription): Unit = {
this.oUpStreamSubscription match {
case Some(_) => upStreamSubscription.cancel // rule 2-5
case None =>
this.oUpStreamSubscription = Some(upStreamSubscription);
if (pendingRequestFromDownStream.isDefined) {
upStreamSubscription.request(pendingRequestFromDownStream.get)
pendingRequestFromDownStream = None
}
if (pendingCancelFromDownStream) {
upStreamSubscription.cancel()
pendingCancelFromDownStream = false
}
}
}
def onNext(a: A): Unit = {
oDownStreamSubscriber.get.onNext(a)
}
def onComplete(): Unit = {
oDownStreamSubscriber.foreach { _.onComplete() };
this.oUpStreamSubscription = None
}
def onError(error: Throwable): Unit = {
oDownStreamSubscriber.foreach { _.onError(error) };
this.oUpStreamSubscription = None
}
def subscribe(downStreamSubscriber: Subscriber[_ >: A]): Unit = {
assert(this.oDownStreamSubscriber.isEmpty);
this.oDownStreamSubscriber = Some(downStreamSubscriber);
downStreamSubscriber.onSubscribe(new Subscription() {
def request(n: Long): Unit = {
binder.oUpStreamSubscription match {
case Some(usSub) => usSub.request(n);
case None =>
assert(binder.pendingRequestFromDownStream.isEmpty);
binder.pendingRequestFromDownStream = Some(n);
}
};
def cancel(): Unit = {
binder.oUpStreamSubscription match {
case Some(usSub) => usSub.cancel();
case None =>
assert(binder.pendingCancelFromDownStream == false);
binder.pendingCancelFromDownStream = true;
}
binder.oDownStreamSubscriber = None
}
})
}
}
val binder = new Binder;
val receiver = Sink.fromSubscriber(binder);
val emitter = Source.fromPublisher(binder);
(receiver, emitter);
}
Note that the instance vars of the Binder class may suffer concurrency problems if the sink and source this method creates are not fused later by the user. If that is not the case, all the accesses to these variables should be enclosed inside synchronized zones. Another solution would be to ensure that the sink and the source are materialized in an execution context with a single thread.
Two days later I discovered MergeHub and BroadcastHub. Using them the answer is much shorter:
import akka.stream.Materializer
def bindedSinkAndSource[T](implicit sm: Materializer): (Sink[T, NotUsed], Source[T, NotUsed]) = {
import akka.stream.scaladsl.BroadcastHub;
import akka.stream.scaladsl.MergeHub;
import akka.stream.scaladsl.Keep;
MergeHub.source[T](perProducerBufferSize = 8).toMat(BroadcastHub.sink[T](bufferSize = 256))(Keep.both) run
}
with the advantage that the returned sink and source can be materialized multiple times.

Akka Streams: State in a flow

I want to read multiple big files using Akka Streams to process each line. Imagine that each key consists of an (identifier -> value). If a new identifier is found, I want to save it and its value in the database; otherwise, if the identifier has already been found while processing the stream of lines, I want to save only the value. For that, I think that I need some kind of recursive stateful flow in order to keep the identifiers that have already been found in a Map. I think I'd receive in this flow a pair of (newLine, contextWithIdentifiers).
I've just started to look into Akka Streams. I guess I can manage myself to do the stateless processing stuff but I have no clue about how to keep the contextWithIdentifiers. I'd appreciate any pointers to the right direction.
Maybe something like statefulMapConcat can help you:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import scala.util.Random._
import scala.math.abs
import scala.concurrent.ExecutionContext.Implicits.global
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
//encapsulating your input
case class IdentValue(id: Int, value: String)
//some random generated input
val identValues = List.fill(20)(IdentValue(abs(nextInt()) % 5, "valueHere"))
val stateFlow = Flow[IdentValue].statefulMapConcat{ () =>
//state with already processed ids
var ids = Set.empty[Int]
identValue => if (ids.contains(identValue.id)) {
//save value to DB
println(identValue.value)
List(identValue)
} else {
//save both to database
println(identValue)
ids = ids + identValue.id
List(identValue)
}
}
Source(identValues)
.via(stateFlow)
.runWith(Sink.seq)
.onSuccess { case identValue => println(identValue) }
A few years later, here is an implementation I wrote if you only need a 1-to-1 mapping (not 1-to-N):
import akka.stream.stage.{GraphStage, GraphStageLogic}
import akka.stream.{Attributes, FlowShape, Inlet, Outlet}
object StatefulMap {
def apply[T, O](converter: => T => O) = new StatefulMap[T, O](converter)
}
class StatefulMap[T, O](converter: => T => O) extends GraphStage[FlowShape[T, O]] {
val in = Inlet[T]("StatefulMap.in")
val out = Outlet[O]("StatefulMap.out")
val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
val f = converter
setHandler(in, () => push(out, f(grab(in))))
setHandler(out, () => pull(in))
}
}
Test (and demo):
behavior of "StatefulMap"
class Counter extends (Any => Int) {
var count = 0
override def apply(x: Any): Int = {
count += 1
count
}
}
it should "not share state among substreams" in {
val result = await {
Source(0 until 10)
.groupBy(2, _ % 2)
.via(StatefulMap(new Counter()))
.fold(Seq.empty[Int])(_ :+ _)
.mergeSubstreams
.runWith(Sink.seq)
}
result.foreach(_ should be(1 to 5))
}