I'v been reading about Akka Streams in the last couple of days and I've been working with Rx libraries in Scala for the last couple of months. To me there seems to be some overlap in what both these libraries got to offer. RxScala was a bit easier to get started, understand and use. For example., here is a simple use case where I'm using Scala's Rx library to connect to Kafka topic, wrap that up into an Observable so that I could have subscribers getting those messages.
val consumerStream = consumer.createMessageStreamsByFilter(topicFilter(topics), 1, keyDecoder, valueDecoder).head
val observableConsumer = Observable.fromIterator(consumerStream).map(_.message())
This is quite simple and neat. Any clues on how I should get started with akka streams? I want to use the same example above where I want to emit events from the Source. I will later have a Flow and a Sink. Then finally, in my main class, I will combine these 3 to run the application data flow.
Any suggestions?
So here is what I came up with:
val kafkaStreamItr = consumer.createMessageStreamsByFilter(topicFilter(topics), 1, keyDecoder, valueDecoder).head
Source.fromIterator(() => kafkaStreamItr).map(_.message)
Related
I started to learn Akka and came across a challenge for which I can't find an easy solution, despite having waded through the documentation and related Stakoverflow questions:
Building on the Client-Side Websocket Support example on the Akka website, I am using as the basis the following code snippet in Scala:
val flow: Flow[Message, Message, Future[Done]] =
Flow.fromSinkAndSourceMat(printSink, Source.maybe)(Keep.left)
val (upgradeResponse, closed) =
Http().singleWebSocketRequest(WebSocketRequest("ws://localhost/ws"), flow)
The use case I have is a client (printSink) consuming a continuous stream from the websocket server. The communication uni-directional only, thus no need for a source.
My question is then as follows:
I need to regularly force a re-connection to the websocket server, and for that I need to disconnect first. But for the life of me, I can't find a way to do a simple disconnect
In a somewhat opposite scenario, I need to keep the websocket connection alive and "swap out" the sink. Is this even possible, i.e. without creating another websocket connection?
For question 1 (forcing a disconnect from the client), this should work
val flow: Flow[Message, Message, (Future[Done], Promise[Option[Message])] =
Flow.fromSinkAndSourceMat(
printSink,
Source.maybe
)(Keep.both)
val (upgradeResponse, (closed, disconnect)) =
Http().singleWebsocketRequest(WebSocketRequest("ws://localhost/ws"), flow)
disconnect can then be completed with a None to disconnect:
disconnect.success(None)
For question 2, my intuition is that that sort of dynamic stream operation would seem to require a custom stream operator (i.e. one level below the Graph DSL and two levels below the "normal" scaladsl/javadsl). I don't have a huge amount of direct experience there, to be honest.
I'm having a code that executing a pipeline using Akka streams.
My question is what is the best way of scale it out? Can it be done using Akka streams also?
Or it need to be converted into actors/other way?
The code snippet is:
val future = SqsSource(sqsEndpoint)(awsSqsClient)
.takeWhile(_=>true)
.map { m: Message =>
(m, Ack())
}.runWith(SqsAckSink(sqsEndpoint)(awsSqsClient))
If you modify your code a bit then your stream will be materialized into multiple Actor values. These materialized Actors will get you the concurrency you are looking for:
val future =
SqsSource(sqsEnpoint)(awsSqsClient) //Actor 1
.via(Flow[Message] map (m => (m, Ack()))) //Actor 2
.to(SqsAckSink(sqsEndpoint)(awsSqsClient)) //Actor 3
.run()
Note the use of via and to. These are important because they indicate that those stages of the stream should be materialized into separate Actors. In your example code you are using map and runWith on the Source which would result in only 1 Actor being created because of operator fusion.
Flows that Ask External Actors
If you're looking to extend to even more Actors then you can use Flow#mapAsync to query an external Actor to do more work, similar to this example.
I need to build the following graph:
val graph = getFromTopic1 ~> doSomeWork ~> writeToTopic2 ~> commitOffsetForTopic1
but trying to implement it in Reactive Kafka has me down a rabbit hole. And that seems wrong because this strikes me as a relatively common use case: I want to move data between Kafka topics while guaranteeing At Least Once Delivery Semantics.
Now it's no problem at all to write in parallel
val fanOut = new Broadcast(2)
val graph = getFromTopic1 ~> doSomeWork ~> fanOut ~> writeToTopic2
fanOut ~> commitOffsetForTopic1
This code works because writeToTopic2 can be implemented with ReactiveKafka#publish(..), which returns a Sink. But then I lose ALOS guarantees and thus data when my app crashes.
So what I really need is to write a Flow that writes to a Kafka topic. I have tried using Flow.fromSinkAndSource(..) with a custom GraphStage but run up against type issues for the data flowing through; for example, what gets committed in commitOffsetForTopic1 should not be included in writeToTopic2, meaning that I have to keep a wrapper object containing both pieces of data all the way through. But this conflicts with the requirements that writeToTopic2 accept a ProducerMessage[K,V]. My latest attempt to resolve this ran up against private and final classes in the reactive kafka library (extending/wrapping/replacing the underlying SubscriptionActor).
I don't really want to maintain a fork to make this happen. What am I missing? Why is this so hard? Am I somehow trying to build a pathological graph node or is this use case an oversight ... or is there something completely obvious I have somehow missed in the docs and source code I've been digging through?
Current version is 0.10.1. I can add more detailed information about any of my many attempts upon request.
My akka-streams learn-o-thon continues. I'd like to integrate my akka-streams application with akka-cluster and DistributedPubSubMediator.
Adding support for Publish is fairly straight forward, but the Subscribe part I'm having trouble with.
For reference, a subscriber is given as follows in the Typesafe sample:
class ChatClient(name: String) extends Actor {
val mediator = DistributedPubSub(context.system).mediator
mediator ! Subscribe("some topic", self)
def receive = {
case ChatClient.Message(from, text) =>
...process message...
}
}
My question is, how should I integrate this actor with my flow, and how should I ensure I'm getting publish messages in the absence of stream backpressure?
I'm trying to accomplish a pubsub model where one stream may publish a message and another stream would consume it (if subscribed).
You probably want to make your Actor extend ActorPublisher. Then you can create a Source from it and integrate that into your stream.
See the docs on ActorPublisher here: http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.3/scala/stream-integrations.html
The other answers are outdated: they suggest using ActorPublisher, which has been deprecated since version 2.5.0.
For those interested in a current approach, Colin Breck wrote an excellent series in his blog about integrating Akka Streams and Akka actors. Over the course of the series, Breck fleshes out a system that begins with Akka Streams and plain actors, then incorporates Akka Cluster and Akka Persistence. The first post in the series is here (the distributed stream processing piece is in part 3).
I found Slick 3.0 introduced a new feature called streaming
http://slick.typesafe.com/doc/3.0.0-RC1/database.html#streaming
I'm not familiar with Akka. streaming seems a lazy or async value, but it is not very clear for me to understand why it is useful, and when will it be useful..
Does anyone have ideas about this?
So lets imagine the following use case:
A "slow" client wants to get a large dataset from the server. The client sends a request to the server which loads all the data from the database, stores it in memory and then passes it down to the client.
And here we're faced with problems: The client handles the data not so fast as we wanted => we can't release the memory => this may result in an out of memory error.
Reactive streams solve this problem by using backpressure. We can wrap Slick's publisher around the Akka source and then "feed" it to the client via Akka HTTP.
The thing is that this backpressure is propagated through TCP via Akka HTTP down to the publisher that represents the database query.
That means that we only read from the database as fast as the client can consume the data.
P.S This just a little aspect where reactive streams can be applied.
You can find more information here:
http://www.reactive-streams.org/
https://youtu.be/yyz7Keg1w9E
https://youtu.be/9S-4jMM1gqE