Sample most recent element of Akka Stream with trigger signal, using zipWith? - scala

I have a Planning system that computes kind of a global Schedule from customer orders. This schedule changes over time when customers place or revoke orders to this system, or when certain resources used by events within the schedule become unavailable.
Now another system needs to know the status of certain events in the Schedule. The system sends a StatusRequest(EventName) on a message queue to which I must react with a corresponding StatusSignal(EventStatus) on another queue.
The Planning system gives me an akka-streams Source[Schedule] which emits a Schedule whenever the schedule changed, and I also have a Source[StatusRequest] from which I receive StatusRequests and a Sink[StatusSignal] to which I can send StatusSignal responses.
Whenever I receive a StatusRequest I must inspect the current schedule, ie, the most recent value emitted by Source[Schedule], and send a StatusSignal to the sink.
I came up with the following flow
scheduleSource
.zipWith(statusRequestSource) { (schedule, statusRequest) =>
findEventStatus(schedule, statusRequest.eventName))
}
.map(eventStatus => makeStatusSignal(eventStatus))
.runWith(statusSignalSink)
but I am not at all sure when this flow actually emits values and whether it actually implements my requirement (see bold text above).
The zipWith reference says (emphasis mine):
emits when all of the inputs have an element available
What does this mean? When statusRequestSource emits a value does the flow wait until scheduleSource emits, too? Or does it use the last value scheduleSource emitted? Likewise, what happens when scheduleSource emits a value? Does it trigger a status signal with the last element in statusRequestSource?
If the flow doesn't implement what I need, how could I achieve it instead?

To answer your first set of questions regarding the behavior of zipWith, here is a simple test:
val source1 = Source(1 to 5)
val source2 = Source(1 to 3)
source1
.zipWith(source2){ (s1Elem, s2Elem) => (s1Elem, s2Elem) }
.runForeach(println)
// prints:
// (1,1)
// (2,2)
// (3,3)
zipWith will emit downstream as long as both inputs have respective elements that can be zipped together.
One idea to fulfill your requirement is to decouple scheduleSource and statusRequestSource. Feed scheduleSource to an actor, and have the actor track the most recent element it has received from the stream. Then have statusRequestSource query this actor, which will reply with the most recent element from scheduleSource. This actor could look something like the following:
class LatestElementTracker extends Actor with ActorLogging {
var latestSchedule: Option[Schedule] = None
def receive = {
case schedule: Schedule =>
latestSchedule = Some(schedule)
case status: StatusRequest =>
if (latestSchedule.isEmpty) {
log.debug("No schedules have been received yet.")
} else {
val eventStatus = findEventStatus(latestSchedule.get, status.eventName)
sender() ! eventStatus
}
}
}
To integrate with the above actor:
scheduleSource.runForeach(s => trackerActor ! s)
statusRequestSource
.ask[EventStatus](parallelism = 1)(trackerActor) // adjust parallelism as needed
.map(eventStatus => makeStatusSignal(eventStatus))
.runWith(statusSignalSink)

Related

Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams. I would like to process this stream using Akka streams.
The Java API has a pub sub design and roughly works like this:
Subscription sub = createSubscription();
sub.addListener(new Listener() {
public void eventsReceived(List events) {
for (Event e : events)
buffer.enqueue(e)
}
});
I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success. Can anyone guide me on the best way to interface with this API using Akka? Is Akka Streams the best tool here?
To feed a Source, you don't necessarily need to use a custom graph stage. Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream.
There are a couple of tricky things to be aware of. The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription. Something like this:
def bufferSize: Int = ???
Source.fromMaterializer { (mat, att) =>
val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
val subscription = createSubscription()
subscription.addListener(
new Listener() {
def eventsReceived(events: java.util.List[Event]): Unit = {
import scala.collection.JavaConverters.iterableAsScalaIterable
import akka.stream.QueueOfferResult._
iterableAsScalaIterable(events).foreach { event =>
queue.offer(event) match {
case Enqueued => () // do nothing
case Dropped => ??? // handle a dropped pubsub element, might well do nothing
case QueueClosed => ??? // presumably cancel the subscription...
}
}
}
}
)
source.withAttributes(att)
}
Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue. Our subscription adds incoming elements to the queue.
The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up. The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.
Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst). If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer) to a stage which signals perpetual demand. For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed. Alternatively, it may be possible to combine your Events in some meaningful way, in which case a conflate stage will automatically combine incoming Events if the downstream can't process them quickly.
Great answer! I did build something similar. There are also kamon metrics to monitor queue size exc.
class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
private val logger = LoggerFactory.getLogger(getClass)
def bufferSize: Int = 1000
def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
Source.fromMaterializer { (mat, attr) =>
val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)
val receiver: MessageReceiver = {
(message: PubsubMessage, consumer: AckReplyConsumer) => {
metricsRegistry.inputEventQueueSize.update(queue.size())
queue.offer((message, consumer)) match {
case QueueOfferResult.Enqueued =>
metricsRegistry.inputQueueAddEventCounter.increment()
case QueueOfferResult.Dropped =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
case QueueOfferResult.Failure(ex) =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
case QueueOfferResult.QueueClosed =>
logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
consumer.nack()
mat.shutdown()
sys.exit(1)
}
}
}
val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
subscriber.startAsync().awaitRunning()
source.withAttributes(attr)
}
}
}

Throttling akka-actor that keeps only the newest messages

Situation
I am using akka actors to update data on my web-client. One of those actors is solely repsonsible for sending updates concerning single Agents. These agents are updated very rapidly (every 10ms). My goal now is to throttle this updating mechanism so that the newest version of every Agent is sent every 300ms.
My code
This is what I came up with so far:
/**
* Single agents are updated very rapidly. To limit the burden on the web-frontend, we throttle the messages here.
*/
class BroadcastSingleAgentActor extends Actor {
private implicit val ec: ExecutionContextExecutor = context.dispatcher
private var queue = Set[Agent]()
context.system.scheduler.schedule(0 seconds, 300 milliseconds) {
queue.foreach { a =>
broadcastAgent(self)(a) // sends the message to all connected clients
}
queue = Set()
}
override def receive: Receive = {
// this message is received every 10 ms for every agent present
case BroadcastAgent(agent) =>
// only keep the newest version of the agent
queue = queue.filter(_.id != agent.id) + agent
}
}
Question
This actor (BroadcastSingleAgentActor) works as expected, but I am not 100% sure if this is thread safe (updating the queue while potentionally clearing it). Also, this does not feel like I am making the best out of the tools akka provides me with. I found this article (Throttling Messages in Akka 2), but my problem is that I need to keep the newest Agent message while dropping any old version of it. Is there an example somewhere similar to what I need?
No, this isn't thread safe because the scheduling via the ActorSystem will happen on another thread than the receive. One potential idea is to do the scheduling within the receive method because incoming messages to the BroadcastSingleAgentActor will be handled sequentially.
override def receive: Receive = {
case Refresh =>
context.system.scheduler.schedule(0 seconds, 300 milliseconds) {
queue.foreach { a =>
broadcastAgent(self)(a) // sends the message to all connected clients
}
}
queue = Set()
// this message is received every 10 ms for every agent present
case BroadcastAgent(agent) =>
// only keep the newest version of the agent
queue = queue.filter(_.id != agent.id) + agent
}

Why does the Identify message find an actor in an actor selection after its termination?

Within a test-program, I want to check that an Actor (child) remains terminated and is not created again after some computation. My test looks like (it is part of a TestKit subclass):
val childSelection = system.actorSelection(parent.path / "*")
childSelection ! Identify(0)
val child = expectMsgPF {
case ActorIdentity(0, Some(ref)) => ref
}
watch(child)
// some computation that should end in stopping child
expectTermiated(child)
// some computation that should not create a new child
childSelection ! Identify(1)
expectMsg(ActorIdentity(1, None))
The last line sometimes unexpectedly fails, stating that the message ActorIdentity(1, Some(parent.path/child-name)) was received instead of the expected one. This means that, even after receiving the Terminated message (resulting from the expectTerminated(...) test), sending the Identify message to an actor selection does not necessarily result in the ActorIdentity(..., None) response.
Does anybody know what the akka framework actually does and how it works in this case? Thanks in advance for your help!
Meanwhile, I replaced the last line of my test with:
val identities = receiveWhile() {
case ActorIdentity(1, Some(ref)) => ref == child
}
if (identities.isEmpty) {
expectMsg(ActorIdentity(1, None))
} else {
expectNoMsg
}
which seems to work fine but is quite more complex to read (and write)...
Since you selection is on parent.path / "*", your parent actor probably have another child that is responding to the Identity message. Check what identity your receive after stopping the child, to figure out what other child is responding.

RxJS combineLatest: how to get emit after just one value changes?

I'm trying to learn the RxJS library. One of the cases I don't quite understand is described in this jsfiddle (code also below).
var A= new Rx.Subject();
var B= new Rx.Subject();
A.onNext(0);
// '.combineLatest' needs all the dependency Observables to get emitted, before its combined signal is emitted.
//
// How to have a combined signal emitted when any of the dependencies change (using earlier given values for the rest)?
//
A.combineLatest( B, function (a,b) { return a+b; } )
.subscribe( function (v) { console.log( "AB: "+ v ); } );
B.onNext("a");
A.onNext(1);
I'd like to get two emits to the "AB" logging. One from changing B to "a" (A already has the value 0). Another from changing A to 1.
However, only changes that occur after a subscribe seem to matter (even though A has a value and thus the combined result could be computed).
Should I use "hot observables" for this, or some other method than .combineLatest?
My problem in the actual code (bigger than this sample) is that I need to make separate initialisations after the subscribes, which cuts stuff in two separate places instead of having the initial values clearly up front.
Thanks
I think you have misunderstood how the Subjects work. Subjects are hot Observables. They do not hold on to values, so if they receive an onNext with no subscribers than that value will be lost to the world.
What you are looking for is a either the BehaviorSubject or the ReplaySubject both of which hold onto past values that re-emit them to new subscribers. In the former case you always construct it with an initial value
//All subscribers will receive 0
var subject = new Rx.BehaviorSubject(0);
//All subscribers will receive 1
//Including all future subscribers
subject.onNext(1);
in the latter you set the number of values to be replayed for each subscription
var subject = new Rx.ReplaySubject(1);
//All new subscribers will receive 0 until the subject receives its
//next onNext call
subject.onNext(0);
Rewriting your example it could be:
var A= new Rx.BehaviorSubject(0);
var B= new Rx.Subject();
// '.combineLatest' needs all the dependency Observables to get emitted, before its combined signal is emitted.
//
// How to have a combined signal emitted when any of the dependencies change (using earlier given values for the rest)?
//
A.combineLatest( B, function (a,b) { return a+b; } )
.subscribe( function (v) { console.log( "AB: "+ v ); } );
B.onNext("a");
A.onNext(1);
//AB: 0a
//AB: 1a
On another note, realizing of course that this is all new to you, in most cases you should not need to use a Subject directly as it generally means that you are trying to wrangle Rx into the safety of your known paradigms. You should ask yourself, where is your data coming from? How is it being created? If you ask those questions enough, following your chain of events back up to the source, 9 out of 10 times you will find that there is probably an Observable wrapper for it.

rx reactive extension: how to have each subscriber get a different value (the next one) from an observable?

Using reactive extension, it is easy to subscribe 2 times to the same observable.
When a new value is available in the observable, both subscribers are called with this same value.
Is there a way to have each subscriber get a different value (the next one) from this observable ?
Ex of what i'm after:
source sequence: [1,2,3,4,5,...] (infinite)
The source is constantly adding new items at an unknown rate.
I'm trying to execute a lenghty async action for each item using N subscribers.
1st subscriber: 1,2,4,...
2nd subscriber: 3,5,...
...
or
1st subscriber: 1,3,...
2nd subscriber: 2,4,5,...
...
or
1st subscriber: 1,3,5,...
2nd subscriber: 2,4,6,...
I would agree with Asti.
You could use Rx to populate a Queue (Blocking Collection) and then have competing consumers read from the queue. This way if one process was for some reason faster it could pick up the next item potentially before the other consumer if it was still busy.
However, if you want to do it, against good advice :), then you could just use the Select operator that will provide you with the index of each element. You can then pass that down to your subscribers and they can fiter on a modulus. (Yuck! Leaky abstractions, magic numbers, potentially blocking, potentiall side effects to the source sequence etc)
var source = Obserservable.Interval(1.Seconds())
.Select((i,element)=>{new Index=i, Element=element});
var subscription1 = source.Where(x=>x.Index%2==0).Subscribe(x=>DoWithThing1(x.Element));
var subscription2 = source.Where(x=>x.Index%2==1).Subscribe(x=>DoWithThing2(x.Element));
Also remember that the work done on the OnNext handler if it is blocking will still block the scheduler that it is on. This could affect the speed of your source/producer. Another reason why Asti's answer is a better option.
Ask if that is not clear :-)
How about:
IObservable<TRet> SomeLengthyOperation(T input)
{
return Observable.Defer(() => Observable.Start(() => {
return someCalculatedValueThatTookALongTime;
}, Scheduler.TaskPoolScheduler));
}
someObservableSource
.SelectMany(x => SomeLengthyOperation(input))
.Subscribe(x => Console.WriteLine("The result was {0}", x);
You can even limit the number of concurrent operations:
someObservableSource
.Select(x => SomeLengthyOperation(input))
.Merge(4 /* at a time */)
.Subscribe(x => Console.WriteLine("The result was {0}", x);
It's important for the Merge(4) to work, that the Observable returned by SomeLengthyOperation be a Cold Observable, which is what the Defer does here - it makes the Observable.Start not happen until someone Subscribes.