Ensure observable execution even without subscribers - rx-java2

I have a cache of observables and reuse them. They normally all use some sort of caching (mostly replay(1).refCount()) and I make sure, that the underlying calculation is done once only with this.
I now have cases, where the underlying stream emits items and noone is subscribed to my cached observable. I still want it to process this event. How can I do this?
Currently I only can do this like following:
val o = observable.reply(1)
o.connect() // make sure this hot observable always is connected and processes it's input
return o // this one is cached
Is there some better way? I want that the hot observable always acts as if someone is subscribed and never unsubscribes from the upstream...
Background
I have redux store like observables and those need to process EVERY input, no matter if someone is subscribed or not so that the cached values that a replayed are always the newest one...

IMO the correct answer is by #prom85 in the question comment section.
From the Learning RxJava Book by Thomas Nield
If you pass 0 to autoConnect() for the numberOfSubscribers argument,
it will start firing immediately and not wait for any Observers. This
can be handy to start firing emissions immediately without waiting for
any Observers.

Related

Akka-streams time based grouping

I have an application which listens to a stream of events. These events tend to come in chunks: 10 to 20 of them within the same second, with minutes or even hours of silence between them. These events are processed and result in an aggregate state, and this updated state is sent further downstream.
In pseudo code, it would look something like this:
kafkaSource()
.mapAsync(1)((entityId, event) => entityProcessor(entityId).process(event)) // yields entityState
.mapAsync(1)(entityState => submitStateToExternalService(entityState))
.runWith(kafkaCommitterSink)
The thing is that the downstream submitStateToExternalService has no use for 10-20 updated states per second - it would be far more efficient to just emit the last one and only handle that one.
With that in mind, I started looking if it wouldn't be possible to not emit the state after processing immediately, and instead wait a little while to see if more events are coming in.
In a way, it's similar to conflate, but that emits elements as soon as the downstream stops backpressuring, and my processing is actually fast enough to keep up with the events coming in, so I can't rely on backpressure.
I came across groupedWithin, but this emits elements whenever the window ends (or the max number of elements is reached). What I would ideally want, is a time window where the waiting time before emitting downstream is reset by each new element in the group.
Before I implement something to do this myself, I wanted to make sure that I didn't just overlook a way of doing this that is already present in akka-streams, because this seems like a fairly common thing to do.
Honestly, I would make entityProcessor into an cluster sharded persistent actor.
case class ProcessEvent(entityId: String, evt: EntityEvent)
val entityRegion = ClusterSharding(system).shardRegion("entity")
kafkaSource()
.mapAsync(parallelism) { (entityId, event) =>
entityRegion ? ProcessEvent(entityId, event)
}
.runWith(kafkaCommitterSink)
With this, you can safely increase the parallelism so that you can handle events for multiple entities simultaneously without fear of mis-ordering the events for any particular entity.
Your entity actors would then update their state in response to the process commands and persist the events using a suitable persistence plugin, sending a reply to complete the ask pattern. One way to get the compaction effect you're looking for is for them to schedule the update of the external service after some period of time (after cancelling any previously scheduled update).
There is one potential pitfall with this scheme (it's also a potential issue with a homemade Akka Stream solution to allow n > 1 events to be processed before updating the state): what happens if the service fails between updating the local view of state and updating the external service?
One way you can deal with this is to encode whether the entity is dirty (has state which hasn't propagated to the external service) in the entity's state and at startup build a list of entities and run through them to have dirty entities update the external state.
If the entities are doing more than just tracking state for publishing to a single external datastore, it might be useful to use Akka Persistence Query to build a full-fledged read-side view to update the external service. In this case, though, since the read-side view's (State, Event) => State transition would be the same as the entity processor's, it might not make sense to go this way.
A midway alternative would be to offload the scheduling etc. to a different actor or set of actors which get told "this entity updated it's state" and then schedule an ask of the entity for its current state with a timestamp of when the state was locally updated. When the response is received, the external service is updated, if the timestamp is newer than the last update.

Why would publisher send new items even after cancel?

The documentation of Subscription#cancel says that
Data may still be sent to meet previously signalled demand after calling cancel.
In which scenario would people expect the publisher to continue to send till previous signalled demand is met?
Also, if I don't want any new items to be sent after cancellation, what should I do?
Unless you are creating low level operators or Publishers, you don't have to worry about this.
In which scenario would people expect the publisher to continue to send till previous signalled demand is met?
None of the mainstream Reactive Streams libraries do that as they stop sending items eventually. RxJava 2 and Reactor 3 are pretty eager on this so you'd most likely have an extra item on a low-lever asynchronously issued cancellation. Akka Stream may signal more than that (last time I checked, they mix control and item signals and there is a configuration setting for max synchronous items per stream that can lead to multiple items being emitted before the cancellation takes effect).
Also, if I don't want any new items to be sent after cancellation, what should I do?
Depends on what you implement: a Publisher or a Subscriber.
In a Publisher the most eager method is to set a volatile boolean cancelled field and check that every time you are in some kind of emission loop.
In a Subscriber, you can have a boolean done field that is checked in each onXXX so that when you call Subscription.cancel() from onNext, any subsequent call will be ignored.

Concat operator semantics, but with immediate subscriptions to all undrelying observables

I want to concatenate a cold and a hot observables. That is, resulting observable should emit the result of cold observable first, then the stuff from the hot one. In the same time, I want to have subscription to the second observable, that is hot, to happen at the same time when subscription to the first one happens, otherwise I miss an important event from it.
That looks very similar to what merge would do. But I want to guarantee that the hot observable will not push anything before the cold one completes, which merge doesn't guarantee. What would be the right way around this?
Use the Replay or PublishLast operators, depending upon your needs. Each has an overload that accepts a selector function.
For example:
var coldThenHot = hot.PublishLast(cold.Concat);
Subscribing to coldThenHot causes PublishLast to invoke the selector first, creating the Concat query. Then it subscribes to it and your hot observable. The last value in the hot observable is buffered. When the cold observable completes, the sequence continues with the buffered value, or simply remains silent until the last value arrives.
However, I'm curious as to what exactly you meant by hot. If your hot observable doesn't generate a value until you subscribe, then technically it's cold. If your observable is truly hot, then you may have already missed the value by the time this query is created. Although, it's possible that it's implicitly buffered already (e.g., if it was created by Observable.FromAsyncPattern), in which case simply concatenate the sequences like normal.
var coldThenHot = cold.Concat(hot);
If you don't want to miss previous data from the hot observable, there is the ReplaySubject that does exactly this : as soon as you subscribe to it, it will push to the subscriber previous elements, which really looks like what you need here.
So what you have to do is subscribe to the cold observable, and when it completes (onCompleted) just subscribe to your ReplaySubject (your hot observable). You have no choice to have some buffering if you need to delay the important data of your hot observable.

Rx -several producers/one consumer

Have been trying to google this but getting a bit stuck.
Let's say we have a class that fires an event, and that event could be fired by several threads at the same time.
Using Observable.FromEventPattern, we create an Observable, and subscribe to that event. How exactly does Rx manage multiple those events being fired at once? Let's say we have 3 events fired in quick succession on different threads. Does it queue them internally, and then call the Subscribe delegate synchronously for each one? Let's say we were subscribing on a thread pool, can we still guarantee the Subscriptions would be processed separately in time?
Following on from that, let's say for each event, we want to perform an action, but it's a method that's potentially not thread safe, so we only want one thread to be in this method at a time. Now I see we can use an EventLoop Scheduler, and presumably we wouldn't need to implement any locking on the code?
Also, would observing on the Current Thread be an option? Is Current Thread the thread that the event was fired from, or the event the subscription was set up on? i.e. Is that current thread guaranteed to always be the same or could be have 2 threads running ending up in the method at the same time?
Thx
PS: I put an example together but I always seem to end up on the samethread in my subscrive method, even when I ObserveOn the threadpool, which is confusing :S
PSS: From doing a few more experiments, it seems that if no Schedulers are specified, then RX will just execute on whatever thread the event was fired on, meaning it processes several concurrently. As soon as I introduce a scheduler, it always runs things consecutively, no matter what the type of the scheduler is. Strange :S
According to the Rx Design Guidelines, an observable should never call OnNext of an observer concurrently. It will always wait for the current call to complete before making the next call. All Rx methods honor this convention. And, more importantly, they assume you also honor this convention. When you violate this condition, you may encounter subtle bugs in the behavior of your Observable.
For those times when you have source data that does not honor this convention (ie it can produce data concurrently), they provide Synchronize.
Observable.FromEventPattern assumes you will not be firing concurrent events and so does nothing to prevent concurrent downstream notifications. If you plan on firing events from multiple threads, sometimes concurrently, then use Synchronize() as the first operation you do after FromEventPattern:
// this will get you in trouble if your event source might fire events concurrently.
var events = Observable.FromEventPattern(...).Select(...).GroupBy(...);
// this version will protect you in that case.
var events = Observable.FromEventPattern(...).Synchronize().Select(...).GroupBy(...);
Now all of the downstream operators (and eventually your observer) are protected from concurrent notifications, as promised by the Rx Design Guidelines. Synchronize works by using a simple mutex (aka the lock statement). There is no fancy queueing or anything. If one thread attempts to raise an event while another thread is already raising it, the 2nd thread will block until the first thread finishes.
In addition to the recommendation to use Synchronize, it's probably worth having a read of the Intro to Rx section on scheduling and threading. It Covers the different schedulers and their relationship to threads, as well as the differences between ObserveOn and SubscribeOn, etc.
If you have several producers then there are RX methods for combining them in a threadsafe way
For combining streams of the same type of event into a single stream
Observable.Merge
For combining stream of different types of events into a single stream using a selector to transform the latest value on each stream into a new value.
Observable.CombineLatest
For example combining stock prices from different sources
IObservable<StockPrice> source0;
IObservable<StockPrice> source1;
IObservable<StockPrice> combinedSources = source0.Merge(source1);
or create balloons at the current position every time there is a click
IObservable<ClickEvent> clicks;
IObservable<Position> position;
IObservable<Balloons> balloons = clicks
.CombineLatest
( positions
, (click,position)=>new Balloon(position.X, position.Y)
);
To make this specifically relevant to your question you say there is a class which combines events from different threads. Then I would use Observable.Merge to combine the individual event sources and expose that as an Observable on your main class.
BTW if your threads are actually tasks that are firing events to say they have completed here is an interesting patterns
IObservable<Job> jobSource;
IObservable<IObservable<JobResult>> resultTasks = jobSource
.Select(job=>Observable.FromAsync(cancelationToken=>DoJob(token,job)));
IObservable<JobResult> results = resultTasks.Merge();
Where what is happening is you are getting a stream of jobs in. From the jobs you are creating a stream of asynchronous tasks ( not running yet ). Merge then runs the tasks and collects the results. It is an example of a mapreduce algorithm. The cancellation token can be used to cancel running async tasks if the observable is unsubscribed from (ie canceled )

What are the Hot and Cold observables?

I watched the video and I know the general principles - hot happens even when nobody is subscribed, cold happens "on demand".
Also, Publish() converts cold to hot and Defer() converts hot to cold.
But still, I feel I am missing the details. Here are some questions I'd like to have answered:
Can you give a comprehensive definition for these terms?
Does it ever make sense to call Publish on a hot observable or Defer on a cold?
What are the aspects of Hot/Cold conversions - do you lose messages, for example?
Are there differences between hot and cold definitions for IObservable and IEnumerable?
What are the general principles you should take into account when programming for cold or hot?
Any other tips on hot/cold observables?
From: Anton Moiseev's Book “Angular Development with Typescript, Second Edition.” :
Hot and cold observables
There are two types of observables: hot and cold. The main
difference is that a cold observable creates a data
producer for each subscriber, whereas a hot observable
creates a data producer first, and each subscriber gets the
data from one producer, starting from the moment of subscription.
Let’s compare watching a movie on Netflix to going into a
movie theater. Think of yourself as an observer. Anyone who decides to watch Mission: Impossible on Netflix will get the entire
movie, regardless of when they hit the play button. Netflix creates a
new producer to stream a movie just for you. This is a cold
observable.
If you go to a movie theater and the showtime is 4 p.m., the producer
is created at 4 p.m., and the streaming begins. If some people
(subscribers) are late to the show, they miss the beginning of the
movie and can only watch it starting from the moment of arrival. This
is a hot observable.
A cold observable starts producing data when some code invokes a
subscribe() function on it. For example, your app may declare an observable providing a URL on the server to get certain products. The
request will be made only when you subscribe to it. If another script
makes the same request to the server, it’ll get the same set of data.
A hot observable produces data even if no subscribers are
interested in the data. For example, an accelerometer in your
smartphone produces data about the position of your device, even if no
app subscribes to this data. A server can produce the latest stock
prices even if no user is interested in this stock.
Hot observables are ones that are pushing event when you are not subscribed to the observable. Like mouse moves, or Timer ticks or anything like that. Cold observables are ones that start pushing only when you subscribe, and they start over if you subscribe again.
I hope this helps.
Can you give a comprehensive
definition for these terms?
See my blog post at: https://leecampbell.com/2010/08/19/rx-part-7-hot-and-cold-observables
Does it ever make sense to call
Publish on a hot observable or Defer
on a cold?
No, not that I can think of.
What are the aspects of Hot/Cold
conversions - do you lose messages,
for example?
It is possible to "lose" messages when the Observable is Hot, as "events" happen regardless of subscribers.
Are there differences between hot and
cold definitions for IObservable and
IEnumerable?
I dont really understand the question. I hope this analogy helps though. I would compare a Hot Observable to an Eagerly evaluated IEnumerable. ie a List or an Array are both Eagerly evaluated and have been populated even if no-one enuemerates over them. A yield statement that gets values from a file or a database could be lazily evaluated with the Yield keyword. While lazy can be good, it will by default, be reevaluated if a second enumerator runs over it. Comparing these to Observables, a Hot Observable might be an Event (Button click) or a feed of temperatures; these events will happen regardless of a subscription and would also be shared if multiple subscriptions were made to the same observale. Observable.Interval is a good example of a Cold observable. It will only start producing values when a subscription is made. If multiple subscriptions as made then the sequence will be re-evaluated and the "events" will occur at seperate times (depending on the time between subscriptions).
What are the general principles you should take into account when programming for cold or hot?
Refer to the link in point one. I would also recommend you look into Publsh being used in conjunction with RefCount. This allows you to have the ability to have Lazy evaluation semantics of Cold Observables but the sharing of events that Hot Observables get.
Any other tips on hot/cold
observables?
Get your hands dirty and have a play with them. Once you have read about them for more than 30minutes, then time spent coding with them is far more productive to you than reading any more :)
Not pretending to give a comprehensive answer, I'd like to summarize in a simplest form what I have learned since the time of this question.
Hot observable is an exact match for event. In events, values usually are fed into the handler even if no subscribers are listening. All subscribers are receiving the same set of values. Because of following the "event" pattern, hot observables are easier to understand than the cold ones.
Cold observable is also like an an event, but with a twist - Cold observable's event is not a property on a shared instance, it is a property on an object that is produced from a factory each time when somebody subscribes. In addition, subscription starts the production of the values. Because of the above, multiple subscribers are isolated and each receives its own set of values.
The most common mistake RX beginners make is creating a cold observable (well, thinking they are creating a cold observable) using some state variables within a function (f.e. accumulated total) and not wrapping it into a .Defer() statement. As a result, multiple subscribers share these variables and cause side effects between them.