Converting Rx-Observables to Twitter Futures in Scala - scala

I want to implement the following functions in the most re-active way. I need these for implementing the bijections for automatic conversion between the said types.
def convertScalaRXObservableToTwitterFuture[A](a: Observable[A]): TwitterFuture[A] = ???
def convertScalaRXObservableToTwitterFutureList[A](a: Observable[A]): TwitterFuture[List[A]] = ???
I came across this article on a related subject but I can't get it working.

Unfortunately the claim in that article is not correct and there can't be a true bijection between Observable and anything like Future. The thing is that Observable is more powerful abstraction that can represent things that can't be represented by Future. For example, Observable might actually represent an infinite sequence. For example see Observable.interval. Obviously there is no way to represent something like this with a Future. The Observable.toList call used in that article explicitly mentions that:
Returns a Single that emits a single item, a list composed of all the items emitted by the finite source ObservableSource.
and later it says:
Sources that are infinite and never complete will never emit anything through this operator and an infinite source may lead to a fatal OutOfMemoryError.
Even if you limit yourself to only finite Observables, still Future can't fully express semantics of Observable. Consider Observable.intervalRange that generates a limited range one by one over some time period. With Observable the first event comes after initialDelay and then you get event each period. With Future you can get only one event and it must be only when the sequence is fully generated so Observable is completed. It means that by transforming Observable[A] into Future[List[A]] you immediately break the main benefit of Observable - reactivity: you can't process events one by one, you have to process them all in a single bunch.
To sum up the claim at the first paragraph of the article:
convert between the two, without loosing asynchronous and event-driven nature of them.
is false because conversion Observable[A] -> Future[List[A]] exactly looses the "event-driven nature" of Observable and there is no way to work this around.
P.S. Actually the fact that Future is less powerful than Observable should not be a big surprise. If it was not, why anybody would create Observable in the first place?

Related

Repeat last element in Flux if no elements are available upstream

I am searching for a way to repeat the last element when the subscriber of a Flux signals onNext but the publisher did not supply a new element.
Of course this approach would logically introduce eager streaming, but in my case that's exactly what I want, similarly to onBackpressureDrop and others, where an infinite demand is requested upstream.
I kind of need the exact opposite - with my subscriber being faster than the publisher.
I struggle to think of a case where it wouldn't be better for the subscriber to simply cache the last emitted value within itself and do what it needs to do there (whether that's looping, firing on a scheduled executor or something else entirely) rather than deliberately having an infinite demand on the last value emitted by the Flux.
Something akin to the following might work, but is incredibly hacky (that being said, I couldn't think of a better way):
flux.subscribe(str -> {
Mono.just(str).repeat().takeUntilOther(flux.next())
.subscribe(s -> {
//Actual subscriber
});
});

Parallel design of program working with Flink and scala

This is the context:
There is an input event stream,
There are some methods to apply on
the stream, which applies different logic to evaluates each event,
saying it is a "good" or "bad" event.
An event can be a real "good" one only if it passes all the methods, otherwise it is a "bad" event.
There is an output event stream who has result of event and its eventID.
To solve this problem, I have two ideas:
We can apply each method sequentially to each event. But this is a kind of batch processing, and doesn't apply the advantages of stream processing, in the same time, it takes Time(M(ethod)1) + Time(M2) + Time(M3) + ....., which maybe not suitable to real-time processing.
We can pass the input stream to each method, and then we can run each method in parallel, each method saves the bad event into a permanent storage, then the Main method could query the permanent storage to get the result of each event. But this has some problems to solve:
how to execute methods in parallel in the programming language(e.g. Scala), how about the performance(network, CPUs, memory)
how to solve the synchronization problem? It's sure that those methods need sometime to calculate and save flag into the permanent storage, but the Main just need less time to query the flagļ¼Œ which a delay issue occurs.
etc.
This is not a kind of tech and design question, I would like to ask your guys' ideas, if you have some new ideas or ideas to solve the problem ? Looking forward to your opinions.
Parallel streams, each doing the full set of evaluations sequentially, is the more straightforward solution. But if that introduces too much latency, then you can fan out the evaluations to be done in parallel, and then bring the results back together again to make a decision.
To do the fan-out, look at the split operation on DataStream, or use side outputs. But before doing this n-way fan-out, make sure that each event has a unique ID. If necessary, add a field containing a random number to each event to use as the unique ID. Later we will use this unique ID as a key to gather back together all of the partial results for each event.
Once the event stream is split, each copy of the stream can use a MapFunction to compute one of evaluation methods.
Gathering all of these separate evaluations of a given event back together is a bit more complex. One reasonable approach here is to union all of the result streams together, and then key the unioned stream by the unique ID described above. This will bring together all of the individual results for each event. Then you can use a RichFlatMapFunction (using Flink's keyed, managed state) to gather the results for the separate evaluations in one place. Once the full set of evaluations for a given event has arrived at this stateful flatmap operator, it can compute and emit the final result.

Mono vs Flux in Reactive Stream

As per the documentation:
Flux is a stream which can emit 0..N elements:
Flux<String> fl = Flux.just("a", "b", "c");
Mono is a stream of 0..1 elements:
Mono<String> mn = Mono.just("hello");
And as both are the implementations of the Publisher interface in the reactive stream.
Can't we use only Flux in most of the cases as it also can emit 0..1, thus satisfying the conditions of a Mono?
Or there are some specific conditions when only Mono needs to be used and Flux can not handle the operations?
Please suggest.
In many cases, you are doing some computation or calling a service and you expect exactly one result (or maybe zero or one result), and not a collection that contains possibly multiple results. In such cases, it's more convenient to have a Mono.
Compare it to "regular" Java: you would not use List as the return type of any method that can return zero or one result. You would use Optional instead, which makes it immediately clear that you do not expect more than one result.
Flux is equivalent to RxJava Observable is capable of emitting
- zero or more item (streams of many elements)
- and then OPTIONALLY , completing OR failing
Mono can only emit one item at the most (streams one element)
Relations:
If you concatente two Monos you will get a Flux
You can call single() on Flux to return a Mono
From the docs here
This distinction carries a bit of semantic information into the type, indicating the rough cardinality of the asynchronous processing. For instance, an HTTP request produces only one response, so there is not much sense in doing a count operation. Expressing the result of such an HTTP call as a Mono thus makes more sense than expressing it as a Flux, as it offers only operators that are relevant to a context of zero items or one item.
Simply as the Mono is used for handling zero or one result, while the Flux is used to handle zero to many results, possibly even infinite results.
And both two in common behave in a purely asynchronous and fully non-blocking.
I think it is good practice to use Mono in cases where we know we can only get one result. In this way, we make it known to other developers working on the same thing that the result can be 0 or 1.
We are following that approach on all our projects.
Here is one good tutorial on Reactive Streams and the uses of Mono and Flux -> Reactive programming in Java.

How to use Scala Futures the right way?

I'm wondering if Futures are better to be used in conjunction with Actors only, rather than in a program that does not use Actor. Said differently, is performing asynchronous computation with future something that should better be done within an Actors system?
Here why i'm saying that:
1 -
You perform a computation for which the result, would trigger some action that you may do in another thread.
For instance, i have a long operation to determine the price of something, from my main thread, i decide to launch an asynchronous process for it. In the mean time i could be doing other thing, then when the response is ready/availble or communicated back to me, i go on on that path.
I can see that with actor this is handy, because you can pipe a result to an actor. But with a typical threading model, you can either block or .... ?
2 -
Another issue, let say i need to update the age of a list of participant, by getting some information online. Let assume i just have one future for that task. Isn't closing over the participant list something wrong to do. Multiple thread maybe accessing that participant list at the same time. So making the update within the future would simply be wrong and in that case, we would need java concurrent collection isn't it ?
Maybe i see it the wrong way, future are not meant to do side effect
at all
But in that case, fair enough, no side effect, but we still have the problem of getting a value back from the calling thread, which can only be blocking. I mean let's imagine that, the result, would help the calling thread, to update some data structure. How to do that update asynchronously without closing over that data structure somehow.
I believe the call backs such as OnComplete can be use for
side-effecting (Am it right here?)
still, the call back would have to close over the data structure anyway. Hence i don't see how not using Actor.
PS: I like actors, i'm just trying to understand better the usage of future without actors. I read everywhere, that one should use actor only when necessary that is when state need to be manage. It seems to me that overall, using future, without actor, always involve blocking somewhere down the line, if the result need to be communicated back at some point to the thread that initiated the asynchronous task.
Actors are good when you are dealing with mutable state because they encapsulate the mutable state. and allow only message-based interaction.
You can use Future to execute in a different thread. You don't have to block on a Future because Scala's Future compose. So if you have multiple Futures in your code, you don't have to wait/block for all of them to compete. For example, if your pipeline is completely non-block or asyn (e.g., Play and Spray) you can return a Future back to the client.
Futures are lightweight compared to actors because you don't need a complete actorsystem.
Here is a quote from Martin Odersky that I really like.
There is no silver bullet for all concurrency issues; the right
solution depends on what one needs to achieve. Do you want to define
asynchronous computations that react to events or streams of values?
Or have autonomous, isolated entities communicating via messages? Or
define transactions over a mutable store? Or, maybe the primary
purpose of parallel execution is to increase the performance? For each
of these tasks, there is an abstraction that does the job: futures,
reactive streams, actors, transactional memory, or parallel
collections.
So choose your abstraction based on your use case and needs.

Requesting a clear, picturesque explanation of Reactive Extensions (RX)?

For a long time now I am trying to wrap my head around RX. And, to be true, I am never sure if I got it - or not.
Today, I found an explanation on http://reactive-extensions.github.com/RxJS/ which - in my opinion - is horrible. It says:
RxJS is to events as promises are to async.
Great. This is a sentence so full of complexity that if you do not have the slightest idea of what RX is about, after that sentence you are quite as dumb as before.
And this is basically my problem: All the explanations in the usual places you find about RX make (at least me) feel dumb. They explain RX as a highly sophisticated concept with lots of highly complicated words and terms and whatsoever, and I am never quite sure what it is about.
So my question is: How would you explain RX to someone who is five years old? I'd like a clear, picturesque explanation of what it is, what it is good for, and what its main concepts are?
So, LINQ (in JavaScript, these are high-level array methods like map, filter, reduce, etc - if you're not a C# dev, just replace that whenever I mention 'LINQ') gives you a bunch of tools that you can apply to Sequences ("Lists" in a crude sense), in order to filter and transform an input into an output (aka "A list that's actually interesting to me"). But what is a list?
What is a List?
A List, is some elements, in a particular order. I can take any list and transform it into a better list with LINQ.
(Not necessarily sorted order, but an order).
An Event is a List
But what about an Event? Let's subscribe to an event:
OnKeyUp += (o,e) => Console.WriteLine(e.Key)
>>> 'H'
>>> 'e'
>>> 'l'
>>> 'l'
>>> 'o'
Hm. That looks like some things, in a particular order. It now suddenly dawns upon you, a list and an event are the same thing!
If Lists and Events are the Same....
...then why can't I transform and filter input events into more interesting events. That's what Rx is. It's taking everything you know about dealing with sequences, including all of the LINQ operators like Select and Where and Aggregate, and applies them to events.
Easy peasy.
A Callback is a Sequence Too
Isn't a Callback just basically an Event that only happens once? Isn't it basically just like a List with one item? Turns out it is, and one of the interesting things about Rx is that it lets us treat Events and Callbacks (and things like Geolocation requests) with the same language (i.e. we can combine the two, or wait for ether one or the other, etc etc).
Along with Paul's excellent answer I'd like to add the concept of pulling vs pushing data.
Pipeline
Lets take the example of some code that generates a series of numbers, and outputs the result. If you think of this as a stream on one end you have a producer that is creating new numbers for you, and on the other end you have a consumer that is doing something with those numbers.
Pull - Primes List
Lets say the producer is generating a list of prime numbers. Normally you would have some function that yields a list of numbers, and every time it returned it would push the next value it has calculated through the pipe to the consumer, which would output that number to the screen.
Prime Generator ---> Console.WriteLine
In this scenario it is easy to see that the producer is doing most of the work, and the consumer would be sitting around waiting for the producer to send the next value. The consumer is pulling on the pipeline, waiting for the producer to return the next value.
Push - Progress percent events from a fast process (Reactive)
Ok, let's say you have a function that is processing 1,000,000 items. Each item takes milliseconds to process, and then the function yields out a percentage value of how far it has gotten. So lots of progress values, very fast.
At the other end of the pipeline you have a progress bar. Now if the progress bar was to handle every update the UI would block trying to keep up with the stream of values.
1-Million-Items-Processor ---> Progress Bar
In this scenario the data is being pushed through the pipeline by the producer and then the consumer is blocking because too much data is being pushed for it to handle.
Reactive allows you to put in delays, windows, or to sample the pipeline depending on how you wish to consume the data. In this case I would sample the data every second before updating the progress bar.
Lists vs Events
So lists and events are kinda the same. The difference is whether the data is pulled or pushed through the system. With lists the data is pulled. With events the data is pushed.