What are the Hot and Cold observables? - system.reactive

I watched the video and I know the general principles - hot happens even when nobody is subscribed, cold happens "on demand".
Also, Publish() converts cold to hot and Defer() converts hot to cold.
But still, I feel I am missing the details. Here are some questions I'd like to have answered:
Can you give a comprehensive definition for these terms?
Does it ever make sense to call Publish on a hot observable or Defer on a cold?
What are the aspects of Hot/Cold conversions - do you lose messages, for example?
Are there differences between hot and cold definitions for IObservable and IEnumerable?
What are the general principles you should take into account when programming for cold or hot?
Any other tips on hot/cold observables?

From: Anton Moiseev's Book “Angular Development with Typescript, Second Edition.” :
Hot and cold observables
There are two types of observables: hot and cold. The main
difference is that a cold observable creates a data
producer for each subscriber, whereas a hot observable
creates a data producer first, and each subscriber gets the
data from one producer, starting from the moment of subscription.
Let’s compare watching a movie on Netflix to going into a
movie theater. Think of yourself as an observer. Anyone who decides to watch Mission: Impossible on Netflix will get the entire
movie, regardless of when they hit the play button. Netflix creates a
new producer to stream a movie just for you. This is a cold
observable.
If you go to a movie theater and the showtime is 4 p.m., the producer
is created at 4 p.m., and the streaming begins. If some people
(subscribers) are late to the show, they miss the beginning of the
movie and can only watch it starting from the moment of arrival. This
is a hot observable.
A cold observable starts producing data when some code invokes a
subscribe() function on it. For example, your app may declare an observable providing a URL on the server to get certain products. The
request will be made only when you subscribe to it. If another script
makes the same request to the server, it’ll get the same set of data.
A hot observable produces data even if no subscribers are
interested in the data. For example, an accelerometer in your
smartphone produces data about the position of your device, even if no
app subscribes to this data. A server can produce the latest stock
prices even if no user is interested in this stock.

Hot observables are ones that are pushing event when you are not subscribed to the observable. Like mouse moves, or Timer ticks or anything like that. Cold observables are ones that start pushing only when you subscribe, and they start over if you subscribe again.

I hope this helps.
Can you give a comprehensive
definition for these terms?
See my blog post at: https://leecampbell.com/2010/08/19/rx-part-7-hot-and-cold-observables
Does it ever make sense to call
Publish on a hot observable or Defer
on a cold?
No, not that I can think of.
What are the aspects of Hot/Cold
conversions - do you lose messages,
for example?
It is possible to "lose" messages when the Observable is Hot, as "events" happen regardless of subscribers.
Are there differences between hot and
cold definitions for IObservable and
IEnumerable?
I dont really understand the question. I hope this analogy helps though. I would compare a Hot Observable to an Eagerly evaluated IEnumerable. ie a List or an Array are both Eagerly evaluated and have been populated even if no-one enuemerates over them. A yield statement that gets values from a file or a database could be lazily evaluated with the Yield keyword. While lazy can be good, it will by default, be reevaluated if a second enumerator runs over it. Comparing these to Observables, a Hot Observable might be an Event (Button click) or a feed of temperatures; these events will happen regardless of a subscription and would also be shared if multiple subscriptions were made to the same observale. Observable.Interval is a good example of a Cold observable. It will only start producing values when a subscription is made. If multiple subscriptions as made then the sequence will be re-evaluated and the "events" will occur at seperate times (depending on the time between subscriptions).
What are the general principles you should take into account when programming for cold or hot?
Refer to the link in point one. I would also recommend you look into Publsh being used in conjunction with RefCount. This allows you to have the ability to have Lazy evaluation semantics of Cold Observables but the sharing of events that Hot Observables get.
Any other tips on hot/cold
observables?
Get your hands dirty and have a play with them. Once you have read about them for more than 30minutes, then time spent coding with them is far more productive to you than reading any more :)

Not pretending to give a comprehensive answer, I'd like to summarize in a simplest form what I have learned since the time of this question.
Hot observable is an exact match for event. In events, values usually are fed into the handler even if no subscribers are listening. All subscribers are receiving the same set of values. Because of following the "event" pattern, hot observables are easier to understand than the cold ones.
Cold observable is also like an an event, but with a twist - Cold observable's event is not a property on a shared instance, it is a property on an object that is produced from a factory each time when somebody subscribes. In addition, subscription starts the production of the values. Because of the above, multiple subscribers are isolated and each receives its own set of values.
The most common mistake RX beginners make is creating a cold observable (well, thinking they are creating a cold observable) using some state variables within a function (f.e. accumulated total) and not wrapping it into a .Defer() statement. As a result, multiple subscribers share these variables and cause side effects between them.

Related

Ensure observable execution even without subscribers

I have a cache of observables and reuse them. They normally all use some sort of caching (mostly replay(1).refCount()) and I make sure, that the underlying calculation is done once only with this.
I now have cases, where the underlying stream emits items and noone is subscribed to my cached observable. I still want it to process this event. How can I do this?
Currently I only can do this like following:
val o = observable.reply(1)
o.connect() // make sure this hot observable always is connected and processes it's input
return o // this one is cached
Is there some better way? I want that the hot observable always acts as if someone is subscribed and never unsubscribes from the upstream...
Background
I have redux store like observables and those need to process EVERY input, no matter if someone is subscribed or not so that the cached values that a replayed are always the newest one...
IMO the correct answer is by #prom85 in the question comment section.
From the Learning RxJava Book by Thomas Nield
If you pass 0 to autoConnect() for the numberOfSubscribers argument,
it will start firing immediately and not wait for any Observers. This
can be handy to start firing emissions immediately without waiting for
any Observers.

Using scenarios for cold obserables in RxJS

Obserables in RxJS are cold by default, and they can be converted to hot Obserables if required. I am thinking about the scenarios for using cold Obserables. For hot Obserables, it seems perfect for handling DOM events or system events. What about cold Obserables? Some fellows mentioned like database query or http requests should use cold Obserables, but for me it sounds better to use hot obserables again to share the result.
Any expert could shed some light on using scenarios for cold obserables?
Cold observables are great for database queries etc as it is only executed when you subscribe to the stream. If you were to create a hot observable for a database query, it would be executed straight away, possibly prior to having an subscribers listening to it, so the result could be missed.
There are options where you could reemit the last event whenever a new subscriber attaches to it, but I wouldn't suggest doing that.

What is an "observable" in reactive programming?

Microsoft says, "developers represent asynchronous data streams with Observables." I'm trying to reason through the idea. If I were to tackle the concept implicitly, I would imagine that it's just, anything that could be observed in the data stream. Code should be more precise.
How would I know an "observable" if I saw it? Could you give me a better explanation of what an "observable" is?
Microsoft says, "developers represent asynchronous data streams with
Observables." I'm trying to reason through the idea. If I were to
tackle the concept implicitly, I would imagine that it's just,
anything that could be observed in the data stream. Code should be
more precise.
The code actually is more precise. An Observable is represented by the IObservable<T> interface. The main job of IObservable<T> is to handle IObserver<T>s. These two work in tandem: An IObservable<T> represents a stream of type T that can be be subscribed to. An IObserver<T> represents a handler that subscribes to the observable to handle those events.
There are three types of events that an observable can implicitly emit:
OnNext: The next instance of T
OnCompleted: A non-error (empty-message) terminator.
OnError: An error terminator.
However, observables don't emit these messages directly, rather they emit them only onto subscribed observers.
How would I know an "observable" if I saw it? Could you give me a
better explanation of what an "observable" is?
Imagine a service that reports the latest Apple stock price. You can think of the service as an observable. To get this information, you would have to subscribe to the service. Once subscribed, the service could emit one of three messages:
Next most-latest stock price
Market closed
Some sort of failure (connection failure would be most typical)
You would in turn write a handler to handle these three types of messages. That handler would be an observer to the observable stream of prices.
From Wikipedia:
The observer pattern is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods.
This definition is clear when applied to the events used in user interfaces: you observe button clicks by providing a event handler which the button calls when it is clicked. In this case, the button is an observable, which notifies a number of observers in the form of event handlers.
Applied to reactive programming, an observable is just a stream of events that you can subscribe - i.e. observe. Think of it as a pipe through which events traverse and that you can peek into. You do so by observing the stream and handling those events you are interested to. Furthermore, operations can be performed over streams - for instance merging a couple streams into a new one.
Both the publishing of events to the stream and the handling of those events - your observer which processes them - can be done asynchronously which promotes scalability.
Similar concepts are those of messages, topics, and subscribers: some stakeholder can publish messages to a topic, to which many different stakeholders can subscribe. Respectively, these would correspond to the events, the observable stream event, and the observers.
Microsoft uses the terms Observer and Observable while in some other reactive frameworks they may use other terms. The Getting started of Introduction to Rx can help you further clarify these concepts and the whole book is a free gem. Note that this book prefers to use the term sequence to refer to a stream of events.
I would imagine that it's just, anything that could be observed in the data stream.
That's right. Actually, in Microsoft's Rx, the main core are just the two interfaces interfaces defining the contract between observers and observables, the rest is pretty much abstracted away.
I think the terminology varies, but if you search for functional reactive programming papers e.g. on Google Scholar, you will find definitions of the basic concepts behavior and event. I think the following two definitions from a paper from Functional Reactive Programming from First Principles are representative:
Behavior is a value of type a that changes over time
Event is a
time-ordered sequence of event occurences
Intuitively, a behavior is a stream transformer: a function that takes
an infinite stream of sample times, and yields an infinite stream of
values. Similarly, an event is a stream transformer, and can be
thought of as a behavior where, at each time t, the event either
occurs or does not occur.
It seems MS fuses both into the concept of an Observable.
I think it is good to read some background papers to get the terminology. The papers from Conal Elliott are a good start. Or you could enlist in the Principles of Reactive Programming at coursera if you want a more interactive introduction.

Concat operator semantics, but with immediate subscriptions to all undrelying observables

I want to concatenate a cold and a hot observables. That is, resulting observable should emit the result of cold observable first, then the stuff from the hot one. In the same time, I want to have subscription to the second observable, that is hot, to happen at the same time when subscription to the first one happens, otherwise I miss an important event from it.
That looks very similar to what merge would do. But I want to guarantee that the hot observable will not push anything before the cold one completes, which merge doesn't guarantee. What would be the right way around this?
Use the Replay or PublishLast operators, depending upon your needs. Each has an overload that accepts a selector function.
For example:
var coldThenHot = hot.PublishLast(cold.Concat);
Subscribing to coldThenHot causes PublishLast to invoke the selector first, creating the Concat query. Then it subscribes to it and your hot observable. The last value in the hot observable is buffered. When the cold observable completes, the sequence continues with the buffered value, or simply remains silent until the last value arrives.
However, I'm curious as to what exactly you meant by hot. If your hot observable doesn't generate a value until you subscribe, then technically it's cold. If your observable is truly hot, then you may have already missed the value by the time this query is created. Although, it's possible that it's implicitly buffered already (e.g., if it was created by Observable.FromAsyncPattern), in which case simply concatenate the sequences like normal.
var coldThenHot = cold.Concat(hot);
If you don't want to miss previous data from the hot observable, there is the ReplaySubject that does exactly this : as soon as you subscribe to it, it will push to the subscriber previous elements, which really looks like what you need here.
So what you have to do is subscribe to the cold observable, and when it completes (onCompleted) just subscribe to your ReplaySubject (your hot observable). You have no choice to have some buffering if you need to delay the important data of your hot observable.

NoSQL as storage for publish-subscribe/multi-reader queue?

Looking for a storage solution for the following problem, preferably with some NoSQL-like speed and scalability:
Events. Lots of them, little data per event. This is what we need to store.
Not necessary to exactly keep the order in which the events arrive.
It would be nice not to store multiple copies of each event (as in separate storage for each observer).
Observers. A few of them (< 50) They need to read the events
At their own pace (pull model)
Preferably with a "get me the next chunk of unread events" API
Each observer needs to read every event (eventually)
No guarantees on how often they will pull the changes. It might be necessary to store lots of events before they are read.
In an RDBMS you'd probably just number the events sequentially and remember the "last read no" for every observer. Is it possible to implement something similar while trading some of the ACID for speed & scalability?
So far Redis with its lists looks good - anything better I should look at?
I think Redis lists are a good choice. I'd go with a list for each observer though - that way you have O(1) read and write with RPUSH/LPOP, and events automatically disappear from the system when all observers have received them.
You can reduce the storage required for each observer by just storing an event id in each list, though then you will need to keep a counter for each event to determine when it can be removed from the system.
To implement with a single list, set up a counter that is incremented every time an event is added to the head of list. Also set up a counter for each client indicating how many events they have received. The difference between those is the number of items you need to get from the list.
The disadvantage of this approach is that new items can be added to the list after you check the counters. You can get around this by counting from the tail of the list, but that is O(N) rather than O(1). You can reduce N by trimming received events from the list and maintaining a counter for tail position also - how well that works will depend on how many events can accumulate when an observer is offline.
You could take a look at how it's done in Tarantool, with a Lua procedure to keep a ring buffer for events:
https://github.com/mailru/tntlua/blob/master/notifications.lua