What is an "observable" in reactive programming? - system.reactive

Microsoft says, "developers represent asynchronous data streams with Observables." I'm trying to reason through the idea. If I were to tackle the concept implicitly, I would imagine that it's just, anything that could be observed in the data stream. Code should be more precise.
How would I know an "observable" if I saw it? Could you give me a better explanation of what an "observable" is?

Microsoft says, "developers represent asynchronous data streams with
Observables." I'm trying to reason through the idea. If I were to
tackle the concept implicitly, I would imagine that it's just,
anything that could be observed in the data stream. Code should be
more precise.
The code actually is more precise. An Observable is represented by the IObservable<T> interface. The main job of IObservable<T> is to handle IObserver<T>s. These two work in tandem: An IObservable<T> represents a stream of type T that can be be subscribed to. An IObserver<T> represents a handler that subscribes to the observable to handle those events.
There are three types of events that an observable can implicitly emit:
OnNext: The next instance of T
OnCompleted: A non-error (empty-message) terminator.
OnError: An error terminator.
However, observables don't emit these messages directly, rather they emit them only onto subscribed observers.
How would I know an "observable" if I saw it? Could you give me a
better explanation of what an "observable" is?
Imagine a service that reports the latest Apple stock price. You can think of the service as an observable. To get this information, you would have to subscribe to the service. Once subscribed, the service could emit one of three messages:
Next most-latest stock price
Market closed
Some sort of failure (connection failure would be most typical)
You would in turn write a handler to handle these three types of messages. That handler would be an observer to the observable stream of prices.

From Wikipedia:
The observer pattern is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods.
This definition is clear when applied to the events used in user interfaces: you observe button clicks by providing a event handler which the button calls when it is clicked. In this case, the button is an observable, which notifies a number of observers in the form of event handlers.
Applied to reactive programming, an observable is just a stream of events that you can subscribe - i.e. observe. Think of it as a pipe through which events traverse and that you can peek into. You do so by observing the stream and handling those events you are interested to. Furthermore, operations can be performed over streams - for instance merging a couple streams into a new one.
Both the publishing of events to the stream and the handling of those events - your observer which processes them - can be done asynchronously which promotes scalability.
Similar concepts are those of messages, topics, and subscribers: some stakeholder can publish messages to a topic, to which many different stakeholders can subscribe. Respectively, these would correspond to the events, the observable stream event, and the observers.
Microsoft uses the terms Observer and Observable while in some other reactive frameworks they may use other terms. The Getting started of Introduction to Rx can help you further clarify these concepts and the whole book is a free gem. Note that this book prefers to use the term sequence to refer to a stream of events.
I would imagine that it's just, anything that could be observed in the data stream.
That's right. Actually, in Microsoft's Rx, the main core are just the two interfaces interfaces defining the contract between observers and observables, the rest is pretty much abstracted away.

I think the terminology varies, but if you search for functional reactive programming papers e.g. on Google Scholar, you will find definitions of the basic concepts behavior and event. I think the following two definitions from a paper from Functional Reactive Programming from First Principles are representative:
Behavior is a value of type a that changes over time
Event is a
time-ordered sequence of event occurences
Intuitively, a behavior is a stream transformer: a function that takes
an infinite stream of sample times, and yields an infinite stream of
values. Similarly, an event is a stream transformer, and can be
thought of as a behavior where, at each time t, the event either
occurs or does not occur.
It seems MS fuses both into the concept of an Observable.
I think it is good to read some background papers to get the terminology. The papers from Conal Elliott are a good start. Or you could enlist in the Principles of Reactive Programming at coursera if you want a more interactive introduction.

Related

How to replay Event Sourcing events reliably?

One of great promises of Event Sourcing is the ability to replay events. When there's no relationship between entities (e.g. blob storage, user profiles) it works great, but how to do replay quckly when there are important relationships to check?
For example: Product(id, name, quantity) and Order(id, list of productIds). If we have CreateProduct and then CreateOrder events, then it will succeed (product is available in warehouse), it's easy to implement e.g. with Kafka (one topic with n1 partitions for products, another with n2 partitions for orders).
During replay everything happens more quickly, and Kafka may reorder the events (e.g. CreateOrder and then CreateProduct), which will give us different behavior than originally (CreateOrder will now fail because product doesn't exist yet). It's because Kafka guarantees ordering only within one topic within one partition. The easy solution would be putting everything into one huge topic with one partition, but this would be completely unscalable, as single-threaded replay of bigger databases could take days at least.
Is there any existing, better solution for quick replaying of related entities? Or should we forget about event sourcing and replaying of events when we need to check relationships in our databases, and replaying is good only for unrelated data?
As a practical necessity when event sourcing, you need the ability to conjure up a stream of events for a particular entity so that you can apply your event handler to build up the state. For Kafka, outside of the case where you have so few entities that you can assign an entire topic partition to just the events for a single entity, this entails a linear scan and filter through a partition. So for this reason, while Kafka is very likely to be a critical part of any event-driven/event-based system in relaying events published by a service for consumption by other services (at which point, if we consider the event vs. command dichotomy, we're talking about commands from the perspective of the consuming service), it's not well suited to the role of an event store, which are defined by their ability to quickly give you an ordered stream of the events for a particular entity.
The most popular purpose-built event store is, probably, the imaginatively named Event Store (at least partly due to the involvement of a few prominent advocates of event sourcing in its design and implementation). Alternatively, there are libraries/frameworks like Akka Persistence (JVM with a .Net port) which use existing DBs (e.g. relational SQL DBs, Cassandra, Mongo, Azure Cosmos, etc.) in a way which facilitates their use as an event store.
Event sourcing also as a practical necessity tends to lead to CQRS (they go together very well: event sourcing is arguably the simplest possible persistence model capable of being a write model, while its nearly useless as a read model). The typical pattern seen is that the command processing component of the system enforces constraints like "product exists before being added to the cart" (how those constraints are enforced is generally a question of whatever concurrency model is in use: the actor model has a high level of mechanical sympathy with this approach, but other models are possible) before writing events to the event store and then the events read back from the event store can be assumed to have been valid as of the time they were written (it's possible to later decide a compensating event needs to be recorded). The events from within the event store can be projected to a Kafka topic for communication to another service (the command processing component is the single source of truth for events).
From the perspective of that other service, as noted, the projected events in the topic are commands (the implicit command for an event is "update your model to account for this event"). Semantically, their provenance as events means that they've been validated and are undeniable (they can be ignored, however). If there's some model validation that needs to occur, that generally entails either a conscious decision to ignore that command or to wait until another command is received which allows that command to be accepted.
Ok, you are still thinking how did we developed applications in last 20 years instead of how we should develop applications in the future. There are frameworks that actually fits the paradigms of future perfectly, one of those, which mentioned above, is Akka but more importantly a sub component of it Akka FSM Finite State Machine, which is some concept we ignored in software development for years, but future seems to be more and more event based and we can't ignore anymore.
So how these will help you, Akka is a framework based on Actor concept, every Actor is an unique entity with a message box, so lets say you have Order Actor with id: 123456789, every Event for Order Id: 123456789 will be processed with this Actor and its messages will be ordered in its message box with first in first out principle, so you don't need a synchronisation logic anymore. But you could have millions of Order Actors in your system, so they can work in parallel, when Order Actor: 123456789 processing its events, an Order Actor: 987654321 can process its own, so there is the parallelism and scalability. While your Kafka guaranteeing the order of every message for Key 123456789 and 987654321, everything is green.
Now you can ask, where Finite State Machine comes into play, as you mentioned the problem arise, when addProduct Event arrives before createOrder Event arrives (while being on different Kafka Topics), at that point, State Machine will behave differently when Order Actor is in CREATED state or INITIALISING state, in CREATED state, it will just add the Product, in INITIALISING state probably it will just stash it, until createOrder Event arrives.
These concepts are explained really good in this video and if you want to see a practical example I have a blog for it and this one for a more direct dive.
I think I found the solution for scalable (multi-partition) event sourcing:
create in Kafka (or in a similar system) topic named messages
assign users to partitions (e.g by murmurHash(login) % partitionCount)
if a piece of data is mutable (e.g. Product, Order), every partition should contain own copy of the data
if we have e.g. 256 pieces of a product in our warehouse and 64 partitions, we can initially 'give' every partition 8 pieces, so most CreateOrder events will be processed quickly without leaving user's partition
if a user (a partition) sometimes needs to mutate data in other partition, it should send a message there:
for example for Product / Order domain, partitions could work similarly to Walmart/Tesco stores around a country, and the messages sent between partitions ('stores') could be like CreateProduct, UpdateProduct, CreateOrder, SendProductToMyPartition, ProductSentToYourPartition
the message will become an 'event' as if it was generated by an user
the message shouldn't be sent during replay (already sent, no need to do it twice)
This way even when Kafka (or any other event sourcing system) chooses to reorder messages between partitions, we'll still be ok, because we don't ever read any data outside our single-threaded 'island'.
EDIT: As #LeviRamsey noted, this 'single-threaded island' is basically actor model, and frameworks like Akka can make it a bit easier.

How do you ensure that events are applied in order to read model?

This is easy for projections that subscribe to all events from the stream, you just keep version of the last event applied on your read model. But what do you do when projection is composite of multiple streams? Do you keep version of each stream that is partaking in the projection. But then what about the gaps, if you are not subscribing to all events? At most you can assert that version is greater than the last one. How do others deal with this? Do you respond to every event and bump up version(s)?
For the EventStore, I would suggest using the $all stream as the default stream for any read-model subscription.
I have used the category stream that essentially produces the snapshot of a given entity type but I stopped doing so since read-models serve a different purpose.
It might be not desirable to use the $all stream as it might also get events, which aren't domain events. Integration events could be an example. In this case, adding some attributes either to event contracts or to the metadata might help to create an internal (JS) projection that will create a special all stream for domain events, or any event category in that regard, where you can subscribe to. You can also use a negative condition, for example, filter out all system events and those that have the original stream name starting with Integration.
As well as processing messages in the correct order, you also have the problem of resuming a projection after it is restarted - how do you ensure you start from the right place when you restart?
The simplest option is to use an event store or message broker that both guarantees order and provides some kind of global stream position field (such as a global event number or an ordered timestamp with a disambiguating component such as MongoDB's Timestamp type). Event stores where you pull the events directly from the store (such as eventstore.org or homegrown ones built on a database) tend to guarantee this. Also, some message brokers like Apache Kafka guarantee ordering (again, this is pull-based). You want at-least-once ordered delivery, ideally.
This approach limits write scalability (reads scale fine, using read replicas) - you can shard your streams across multiple event store instances in various ways, then you have to track the position on a per-shard basis, which adds some complexity.
If you don't have these ordering, delivery and position guarantees, your life is much harder, and it may be hard to make the system completely reliable. You can:
Hold onto messages for a while after receiving them, before processing them, to allow other ones to arrive
Have code to detect missing or out-of-order messages. As you mention, this only works if you receive all events with a global sequence number or if you track all stream version numbers, and even then it isn't reliable in all cases.
For each individual stream, you keep things in order by fetching them from a data store that knows the correct order. A way of thinking of this is that your query the data store, and you get a Document Message back.
It may help to review Greg Young's Polyglot Data talk.
As for synchronization of events in multiple streams; a thing that you need to recognize is that events in different streams are inherently concurrent.
You can get some loose coordination between different streams if you have happens-before data encoded into your messages. "Event B happened in response to Event A, therefore A happened-before B". That gets you a partial ordering.
If you really do need a total ordering of everything everywhere, then you'll need to be looking into patterns like Lamport Clocks.

Is Scala's actors similar to Go's coroutines?

If I wanted to port a Go library that uses Goroutines, would Scala be a good choice because its inbox/akka framework is similar in nature to coroutines?
Nope, they're not. Goroutines are based on the theory of Communicating Sequential Processes, as specified by Tony Hoare in 1978. The idea is that there can be two processes or threads that act independently of one another but share a "channel," which one process/thread puts data into and the other process/thread consumes. The most prominent implementations you'll find are Go's channels and Clojure's core.async, but at this time they are limited to the current runtime and cannot be distributed, even between two runtimes on the same physical box.
CSP evolved to include a static, formal process algebra for proving the existence of deadlocks in code. This is a really nice feature, but neither Goroutines nor core.async currently support it. If and when they do, it will be extremely nice to know before running your code whether or not a deadlock is possible. However, CSP does not support fault tolerance in a meaningful way, so you as the developer have to figure out how to handle failure that can occur on both sides of channels, and such logic ends up getting strewn about all over the application.
Actors, as specified by Carl Hewitt in 1973, involve entities that have their own mailbox. They are asynchronous by nature, and have location transparency that spans runtimes and machines - if you have a reference (Akka) or PID (Erlang) of an actor, you can message it. This is also where some people find fault in Actor-based implementations, in that you have to have a reference to the other actor in order to send it a message, thus coupling the sender and receiver directly. In the CSP model, the channel is shared, and can be shared by multiple producers and consumers. In my experience, this has not been much of an issue. I like the idea of proxy references that mean my code is not littered with implementation details of how to send the message - I just send one, and wherever the actor is located, it receives it. If that node goes down and the actor is reincarnated elsewhere, it's theoretically transparent to me.
Actors have another very nice feature - fault tolerance. By organizing actors into a supervision hierarchy per the OTP specification devised in Erlang, you can build a domain of failure into your application. Just like value classes/DTOs/whatever you want to call them, you can model failure, how it should be handled and at what level of the hierarchy. This is very powerful, as you have very little failure handling capabilities inside of CSP.
Actors are also a concurrency paradigm, where the actor can have mutable state inside of it and a guarantee of no multithreaded access to the state, unless the developer building an actor-based system accidentally introduces it, for example by registering the Actor as a listener for a callback, or going asynchronous inside the actor via Futures.
Shameless plug - I'm writing a new book with the head of the Akka team, Roland Kuhn, called Reactive Design Patterns where we discuss all of this and more. Green threads, CSP, event loops, Iteratees, Reactive Extensions, Actors, Futures/Promises, etc. Expect to see a MEAP on Manning by early next month.
There are two questions here:
Is Scala a good choice to port goroutines?
This is an easy question, since Scala is a general purpose language, which is no worse or better than many others you can choose to "port goroutines".
There are of course many opinions on why Scala is better or worse as a language (e.g. here is mine), but these are just opinions, and don't let them stop you.
Since Scala is general purpose, it "pretty much" comes down to: everything you can do in language X, you can do in Scala. If it sounds too broad.. how about continuations in Java :)
Are Scala actors similar to goroutines?
The only similarity (aside the nitpicking) is they both have to do with concurrency and message passing. But that is where the similarity ends.
Since Jamie's answer gave a good overview of Scala actors, I'll focus more on Goroutines/core.async, but with some actor model intro.
Actors help things to be "worry free distributed"
Where a "worry free" piece is usually associated with terms such as: fault tolerance, resiliency, availability, etc..
Without going into grave details how actors work, in two simple terms actors have to do with:
Locality: each actor has an address/reference that other actors can use to send messages to
Behavior: a function that gets applied/called when the message arrives to an actor
Think "talking processes" where each process has a reference and a function that gets called when a message arrives.
There is much more to it of course (e.g. check out Erlang OTP, or akka docs), but the above two is a good start.
Where it gets interesting with actors is.. implementation. Two big ones, at the moment, are Erlang OTP and Scala AKKA. While they both aim to solve the same thing, there are some differences. Let's look at a couple:
I intentionally do not use lingo such as "referential transparency", "idempotence", etc.. they do no good besides causing confusion, so let's just talk about immutability [a can't change that concept]. Erlang as a language is opinionated, and it leans towards strong immutability, while in Scala it is too easy to make actors that change/mutate their state when a message is received. It is not recommended, but mutability in Scala is right there in front of you, and people do use it.
Another interesting point that Joe Armstrong talks about is the fact that Scala/AKKA is limited by the JVM which just wasn't really designed with "being distributed" in mind, while Erlang VM was. It has to do with many things such as: process isolation, per process vs. the whole VM garbage collection, class loading, process scheduling and others.
The point of the above is not to say that one is better than the other, but it's to show that purity of the actor model as a concept depends on its implementation.
Now to goroutines..
Goroutines help to reason about concurrency sequentially
As other answers already mentioned, goroutines take roots in Communicating Sequential Processes, which is a "formal language for describing patterns of interaction in concurrent systems", which by definition can mean pretty much anything :)
I am going to give examples based on core.async, since I know internals of it better than Goroutines. But core.async was built after the Goroutines/CSP model, so there should not be too many differences conceptually.
The main concurrency primitive in core.async/Goroutine is a channel. Think about a channel as a "queue on rocks". This channel is used to "pass" messages. Any process that would like to "participate in a game" creates or gets a reference to a channel and puts/takes (e.g. sends/receives) messages to/from it.
Free 24 hour Parking
Most of work that is done on channels usually happens inside a "Goroutine" or "go block", which "takes its body and examines it for any channel operations. It will turn the body into a state machine. Upon reaching any blocking operation, the state machine will be 'parked' and the actual thread of control will be released. This approach is similar to that used in C# async. When the blocking operation completes, the code will be resumed (on a thread-pool thread, or the sole thread in a JS VM)" (source).
It is a lot easier to convey with a visual. Here is what a blocking IO execution looks like:
You can see that threads mostly spend time waiting for work. Here is the same work but done via "Goroutine"/"go block" approach:
Here 2 threads did all the work, that 4 threads did in a blocking approach, while taking the same amount of time.
The kicker in above description is: "threads are parked" when they have no work, which means, their state gets "offloaded" to a state machine, and the actual live JVM thread is free to do other work (source for a great visual)
note: in core.async, channel can be used outside of "go block"s, which will be backed by a JVM thread without parking ability: e.g. if it blocks, it blocks the real thread.
Power of a Go Channel
Another huge thing in "Goroutines"/"go blocks" is operations that can be performed on a channel. For example, a timeout channel can be created, which will close in X milliseconds. Or select/alt! function that, when used in conjunction with many channels, works like a "are you ready" polling mechanism across different channels. Think about it as a socket selector in non blocking IO. Here is an example of using timeout channel and alt! together:
(defn race [q]
(searching [:.yahoo :.google :.bing])
(let [t (timeout timeout-ms)
start (now)]
(go
(alt!
(GET (str "/yahoo?q=" q)) ([v] (winner :.yahoo v (took start)))
(GET (str "/bing?q=" q)) ([v] (winner :.bing v (took start)))
(GET (str "/google?q=" q)) ([v] (winner :.google v (took start)))
t ([v] (show-timeout timeout-ms))))))
This code snippet is taken from wracer, where it sends the same request to all three: Yahoo, Bing and Google, and returns a result from the fastest one, or times out (returns a timeout message) if none returned within a given time. Clojure may not be your first language, but you can't disagree on how sequential this implementation of concurrency looks and feels.
You can also merge/fan-in/fan-out data from/to many channels, map/reduce/filter/... channels data and more. Channels are also first class citizens: you can pass a channel to a channel..
Go UI Go!
Since core.async "go blocks" has this ability to "park" execution state, and have a very sequential "look and feel" when dealing with concurrency, how about JavaScript? There is no concurrency in JavaScript, since there is only one thread, right? And the way concurrency is mimicked is via 1024 callbacks.
But it does not have to be this way. The above example from wracer is in fact written in ClojureScript that compiles down to JavaScript. Yes, it will work on the server with many threads and/or in a browser: the code can stay the same.
Goroutines vs. core.async
Again, a couple of implementation differences [there are more] to underline the fact that theoretical concept is not exactly one to one in practice:
In Go, a channel is typed, in core.async it is not: e.g. in core.async you can put messages of any type on the same channel.
In Go, you can put mutable things on a channel. It is not recommended, but you can. In core.async, by Clojure design, all data structures are immutable, hence data inside channels feels a lot safer for its wellbeing.
So what's the verdict?
I hope the above shed some light on differences between the actor model and CSP.
Not to cause a flame war, but to give you yet another perspective of let's say Rich Hickey:
"I remain unenthusiastic about actors. They still couple the producer with the consumer. Yes, one can emulate or implement certain kinds of queues with actors (and, notably, people often do), but since any actor mechanism already incorporates a queue, it seems evident that queues are more primitive. It should be noted that Clojure's mechanisms for concurrent use of state remain viable, and channels are oriented towards the flow aspects of a system."(source)
However, in practice, Whatsapp is based on Erlang OTP, and it seemed to sell pretty well.
Another interesting quote is from Rob Pike:
"Buffered sends are not confirmed to the sender and can take arbitrarily long. Buffered channels and goroutines are very close to the actor model.
The real difference between the actor model and Go is that channels are first-class citizens. Also important: they are indirect, like file descriptors rather than file names, permitting styles of concurrency that are not as easily expressed in the actor model. There are also cases in which the reverse is true; I am not making a value judgement. In theory the models are equivalent."(source)
Moving some of my comments to an answer. It was getting too long :D (Not to take away from jamie and tolitius's posts; they're both very useful answers.)
It isn't quite true that you could do the exact same things that you do with goroutines in Akka. Go channels are often used as synchronization points. You cannot reproduce that directly in Akka. In Akka, post-sync processing has to be moved into a separate handler ("strewn" in jamie's words :D). I'd say the design patterns are different. You can kick off a goroutine with a chan, do some stuff, and then <- to wait for it to finish before moving on. Akka has a less-powerful form of this with ask, but ask isn't really the Akka way IMO.
Chans are also typed, while mailboxes are not. That's a big deal IMO, and it's pretty shocking for a Scala-based system. I understand that become is hard to implement with typed messages, but maybe that indicates that become isn't very Scala-like. I could say that about Akka generally. It often feels like its own thing that happens to run on Scala. Goroutines are a key reason Go exists.
Don't get me wrong; I like the actor model a lot, and I generally like Akka and find it pleasant to work in. I also generally like Go (I find Scala beautiful, while I find Go merely useful; but it is quite useful).
But fault tolerance is really the point of Akka IMO. You happen to get concurrency with that. Concurrency is the heart of goroutines. Fault-tolerance is a separate thing in Go, delegated to defer and recover, which can be used to implement quite a bit of fault tolerance. Akka's fault tolerance is more formal and feature-rich, but it can also be a bit more complicated.
All said, despite having some passing similarities, Akka is not a superset of Go, and they have significant divergence in features. Akka and Go are quite different in how they encourage you to approach problems, and things that are easy in one, are awkward, impractical, or at least non-idiomatic in the other. And that's the key differentiators in any system.
So bringing it back to your actual question: I would strongly recommend rethinking the Go interface before bringing it to Scala or Akka (which are also quite different things IMO). Make sure you're doing it the way your target environment means to do things. A straight port of a complicated Go library is likely to not fit in well with either environment.
These are all great and thorough answers. But for a simple way to look at it, here is my view. Goroutines are a simple abstraction of Actors. Actors are just a more specific use-case of Goroutines.
You could implement Actors using Goroutines by creating the Goroutine aside a Channel. By deciding that the channel is 'owned' by that Goroutine you're saying that only that Goroutine will consume from it. Your Goroutine simply runs an inbox-message-matching loop on that Channel. You can then simply pass the Channel around as the 'address' of your "Actor" (Goroutine).
But as Goroutines are an abstraction, a more general design than actors, Goroutines can be used for far more tasks and designs than Actors.
A trade-off though, is that since Actors are a more specific case, implementations of actors like Erlang can optimize them better (rail recursion on the inbox loop) and can provide other built-in features more easily (multi process and machine actors).
can we say that in Actor Model, the addressable entity is the Actor, the recipient of message. whereas in Go channels, the addressable entity is the channel, the pipe in which message flows.
in Go channel, you send message to the channel, and any number of recipients can be listening, and one of them will receive the message.
in Actor only one actor to whose actor-ref you send the message, will receive the message.

What is Event Driven Concurrency?

I am starting to learn Scala and functional programming. I was reading the book !Programming scala: Tackle Multi-Core Complexity on the Java Virtual Machine". Upon the first chapter I've seen the word Event-Driven concurrency and Actor model. Before I continue reading this book I want to have an idea about Event-Driven concurrency or Actor Model.
What is Event-Driven concurrency, and how is it related to Actor Model?
An Event Driven programming model involves registering code to be run when a given event fires. An example is, instead of calling a method that returns some data from a database:
val user = db.getUser(1)
println(user.name)
You could instead register a callback to be run when the data is ready:
db.getUser(1, u => println(u.name))
In the first example, no concurrency was happening; The current thread would block until db.getUser(1) returned data from the database. In the second example db.getUser would return immediately and carry on executing the next code in the program. In parallel to this, the callback u => println(u.name) will be executed at some point in the future.
Some people prefer the second approach as it doesn't mean memory hungry Threads are needlessly sat around waiting for slow I/O to return.
The Actor Model is an example of how Event-Driven concepts can be used to help the programmer easily write concurrent programs.
From a super high level, Actors are objects that define a series of Event Driven message handlers that get fired when the Actor receives messages. In Akka, each instance of an Actor is single Threaded, however when many of these Actors are put together they create a system with concurrency.
For example, Actor A could send messages to Actor B and C in parallel. Actor B and C could fire messages back to Actor A. Actor A would have message handlers to receive these messages and behave as desired.
To learn more about the Actor model I would recommend reading the Akka documentation. It is really well written: http://doc.akka.io/docs/akka/2.1.4/
There is also lot's of good documentation around the web about Event Driven Concurrency that us much more detailed than what I've written here. http://berb.github.io/diploma-thesis/original/055_events.html
Theon's answer provides a good modern overview. I'd like to add some historical perspective.
Tony Hoare and Robert Milner both developed mathematical algebra for analysing concurrent systems (Communicating Sequential Processes, CSP, and Communicating Concurrent Systems, CCS). Both of these look like heavy mathematics to most of us but the practical application is relatively straightforward. CSP led directly to the Occam programming language amongst others, with Go being the newest example. CCS led to Pi calculus and the mobility of communicating channel ends, a feature that is part of Go and was added to Occam in the last decade or so.
CSP models concurrency purely by considering automomous entities ('processes', v.lightweight things like green threads) interacting simply by event exchange. The medium for passing events is along channels. Processes may have to deal with several inputs or outputs and they do this by selecting the event that is ready first. The events usually carry data from the sender to the receiver.
A principle feature of the CSP model is that a pair of processes engage in communication only when both are ready - in practical terms this leads to what is usually called 'synchronous' communication. However, the actual implementations (Go, Occam, Akka) allow channels to be buffered (the normal state in Akka) so that the lock-step exchange of events is often actually decoupled instead.
So in summary, an event-driven CSP-based system is really a data-flow network of processes connected by channels.
Besides the CSP interpretation of event-driven, there have been others. An important example is the 'event-wheel' approach, once popular for modelling concurrent systems whilst actually having a single processing thread. Such systems handle events by putting them into a processing queue and dealing with them due course, usually via a callback. Java Swing's event processing engine is a good example. There were others, e.g. for time-based simulation engines. One might think of the Javascript / NodeJS model as fitting into this category as well.
So in summary, an event-wheel was a way to express concurrency but without parallelism.
The irony of this is that the two approaches I've described above are both described as event driven but what they mean by event driven is different in each case. In one case, hardware-like entities are wired together; in the other, almost all actions are executed by callbacks. The CSP approach claims to be scalable because it's fully composable; it's naturally adept at parallel execution also. If there are any reasons to favour one over the other, these are probably it.
To understand the answer to this you have to look at event concurrency from the OS layer up. First you start with threads which are the smallest section of code that can be run by the OS and eventually deal with I/O, timing and other kinds of events.
The OS groups threads into a process in which they share the same memory, protection and security permissions. Above that layer you have user programs which typically make I/O requests that are handled by user libraries.
The I/O libraries handle these requests in one of two ways. Unix-like systems use a "reactor" model in which the library registers I/O handlers for all the different types of I/O and events in the system. These handlers are activated when I/O is ready on a specific device. Windows-like systems use an I/O completion model in which I/O requests are made and a callback is triggered when the request is complete.
Both of these models require a significant amount of overhead to manage overall program state if you were to use them directly. However some programming tasks (web apps / services) lend themselves to a seemingly more direct implementation if you use an event model directly, but you still need to manage all of that program state. In order to track program logic across dispatches of several related events you have to manually track state and pass it around to the callbacks. This tracking structure is usually called a state context or baton. As you might imagine passing batons around all over the place to numerous seemingly unrelated handlers makes for some extremely hard to read and spaghetti-like code. It's also a pain to write and debug -- especially when you're trying to handle the synchronization of various concurrent paths of execution. You start getting into Futures and then the code becomes really difficult to read.
One well-known event processing library is call libuv. It's a portable event loop that integrates Unix's reactor model with Windows' completion model into a single model usually called a "proactor". Its the event handler that drives NodeJS.
Which brings us to communicating sequential processes.
https://en.wikipedia.org/wiki/Communicating_sequential_processes
Rather than writing asynchronous I/O dispatch and synchronization code using one or more concurrency models (and their often competing conventions), we flip the problem on its head. We use a "coroutine" which looks like normal sequential code.
A simple example is a coroutine that receives a single byte over an event channel from another coroutine that sends a single byte. This effectively synchronizes I/O producer and consumer because the writer/sender has to wait for a reader/receiver and vice-versa. While either process is waiting they explicitly yield execution to other processes. When a coroutine yields, its scoped program state is saved on a stack frame thus saving you from the confusion of managing multi-layered baton state in an event loop.
Using applications built on these event channels we can construct arbitrary, reusable, concurrent logic and the algorithms no longer look like spaghetti code. In pure CSP systems if you write to a channel and there is no reader, you will be blocked. The channel endpoints are known via handles internally to the program.
Actor systems are different in a couple of ways. First, the endpoints are the actor threads and they are named and known external to the mainline program. The second difference is that sends and receives on these channels are buffered. In other words if you send a message to an actor and there isn't one listening or its busy you aren't blocked until one reads from their input channel. Other differences exist like one actor can publish to two different actors concurrently.
As you might guess Actor systems can easily be built from CSP systems. There are other details like waiting for specific event patterns and selecting from them, but that's the basics.
I hope that clarifies things a bit.
Other constructs can be built from these ideas. Various programming systems (Go, Erlang, etc) include CSP implementations within them. Operating systems like Inferno and Node9 use CSPs and Channels as the basis of their distributed computing model.
Go: https://en.wikipedia.org/wiki/Go_(programming_language)
Erlang: https://en.wikipedia.org/wiki/Erlang_(programming_language)
Inferno: https://en.wikipedia.org/wiki/Inferno_(operating_system)
Node9: https://github.com/jvburnes/node9

What are the Hot and Cold observables?

I watched the video and I know the general principles - hot happens even when nobody is subscribed, cold happens "on demand".
Also, Publish() converts cold to hot and Defer() converts hot to cold.
But still, I feel I am missing the details. Here are some questions I'd like to have answered:
Can you give a comprehensive definition for these terms?
Does it ever make sense to call Publish on a hot observable or Defer on a cold?
What are the aspects of Hot/Cold conversions - do you lose messages, for example?
Are there differences between hot and cold definitions for IObservable and IEnumerable?
What are the general principles you should take into account when programming for cold or hot?
Any other tips on hot/cold observables?
From: Anton Moiseev's Book “Angular Development with Typescript, Second Edition.” :
Hot and cold observables
There are two types of observables: hot and cold. The main
difference is that a cold observable creates a data
producer for each subscriber, whereas a hot observable
creates a data producer first, and each subscriber gets the
data from one producer, starting from the moment of subscription.
Let’s compare watching a movie on Netflix to going into a
movie theater. Think of yourself as an observer. Anyone who decides to watch Mission: Impossible on Netflix will get the entire
movie, regardless of when they hit the play button. Netflix creates a
new producer to stream a movie just for you. This is a cold
observable.
If you go to a movie theater and the showtime is 4 p.m., the producer
is created at 4 p.m., and the streaming begins. If some people
(subscribers) are late to the show, they miss the beginning of the
movie and can only watch it starting from the moment of arrival. This
is a hot observable.
A cold observable starts producing data when some code invokes a
subscribe() function on it. For example, your app may declare an observable providing a URL on the server to get certain products. The
request will be made only when you subscribe to it. If another script
makes the same request to the server, it’ll get the same set of data.
A hot observable produces data even if no subscribers are
interested in the data. For example, an accelerometer in your
smartphone produces data about the position of your device, even if no
app subscribes to this data. A server can produce the latest stock
prices even if no user is interested in this stock.
Hot observables are ones that are pushing event when you are not subscribed to the observable. Like mouse moves, or Timer ticks or anything like that. Cold observables are ones that start pushing only when you subscribe, and they start over if you subscribe again.
I hope this helps.
Can you give a comprehensive
definition for these terms?
See my blog post at: https://leecampbell.com/2010/08/19/rx-part-7-hot-and-cold-observables
Does it ever make sense to call
Publish on a hot observable or Defer
on a cold?
No, not that I can think of.
What are the aspects of Hot/Cold
conversions - do you lose messages,
for example?
It is possible to "lose" messages when the Observable is Hot, as "events" happen regardless of subscribers.
Are there differences between hot and
cold definitions for IObservable and
IEnumerable?
I dont really understand the question. I hope this analogy helps though. I would compare a Hot Observable to an Eagerly evaluated IEnumerable. ie a List or an Array are both Eagerly evaluated and have been populated even if no-one enuemerates over them. A yield statement that gets values from a file or a database could be lazily evaluated with the Yield keyword. While lazy can be good, it will by default, be reevaluated if a second enumerator runs over it. Comparing these to Observables, a Hot Observable might be an Event (Button click) or a feed of temperatures; these events will happen regardless of a subscription and would also be shared if multiple subscriptions were made to the same observale. Observable.Interval is a good example of a Cold observable. It will only start producing values when a subscription is made. If multiple subscriptions as made then the sequence will be re-evaluated and the "events" will occur at seperate times (depending on the time between subscriptions).
What are the general principles you should take into account when programming for cold or hot?
Refer to the link in point one. I would also recommend you look into Publsh being used in conjunction with RefCount. This allows you to have the ability to have Lazy evaluation semantics of Cold Observables but the sharing of events that Hot Observables get.
Any other tips on hot/cold
observables?
Get your hands dirty and have a play with them. Once you have read about them for more than 30minutes, then time spent coding with them is far more productive to you than reading any more :)
Not pretending to give a comprehensive answer, I'd like to summarize in a simplest form what I have learned since the time of this question.
Hot observable is an exact match for event. In events, values usually are fed into the handler even if no subscribers are listening. All subscribers are receiving the same set of values. Because of following the "event" pattern, hot observables are easier to understand than the cold ones.
Cold observable is also like an an event, but with a twist - Cold observable's event is not a property on a shared instance, it is a property on an object that is produced from a factory each time when somebody subscribes. In addition, subscription starts the production of the values. Because of the above, multiple subscribers are isolated and each receives its own set of values.
The most common mistake RX beginners make is creating a cold observable (well, thinking they are creating a cold observable) using some state variables within a function (f.e. accumulated total) and not wrapping it into a .Defer() statement. As a result, multiple subscribers share these variables and cause side effects between them.