Given a method I can write reactive code in two ways:
public String getString() { return "Hello, World"; }
or
public Mono<String> getMonoString() { return Mono.just("Hello, World"); }
And then use it with either
someOtherPublisher.map( value-> getString() + value ).subscribe(System.out::println);
or
someOtherPublisher.flatMap( value-> getMonoString().map(str-> str + value ) ).subscribe(System.out::println);
My question is whether more reactive is better? I was arguing that the extra overhead of getMonoString was worse because of overhead and performance and the other was arguing that having methods be publishers and using .flatMap was better because it was "more reactive".
I am interested in finding some authority on why one is better or worse than the other or even whether it matters.
Clearly I could do some simple tests but sometimes simple tests can fail to be convincing.
first of all.
public String getString() { return "Hello, World"; }
this is not reactive. This is a standard imperative function, that has 0(1) time complexity. So over n number of runs this will perform sort of the same.
This is reactive:
public Mono<String> getMonoString() { return Mono.just("Hello, World"); }
But this also has 0(1) time complexity. Which means, that there is a possibility that it will switch threads in the middle of it, but the chance is very unlikely. Over n number of runs this will also perform sort of the same.
None of your examples matter, because non of them actually take any time.
When reactive shines is when dealing with side effects. Things that take time, like database calls, http calls, File I/O etc. etc.
Reactive is good when threads need to wait.
Related
Ref:
http://www.eff-lang.org/handlers-tutorial.pdf
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/algeff-tr-2016-v2.pdf
https://github.com/matijapretnar/eff
I have searched a lot of links, but it seems that no one could explain it specifically. Could anyone give some code(use javaScript) to explain it?
What is an Algebraic Effect?
TL;DR: In short, Algebraic Effects are an exception mechanism which lets the throwing function continue its operation.
Try to think of Algebraic Effects as some sort of try / catch mechanism, where the catch handler does not just "handle the exception", but is able to provide some input to the function which threw the exception. The input from the catch handler is then used in the throwing function, which continues as if there was no exception.
Some sample pseudo code:
Let's consider a function which needs some data to perform its logic:
function throwingFunction() {
// we need some data, let's check if the data is here
if (data == null) {
data = throw "we need the data"
}
// do something with the data
}
Then we have the code that invokes this function:
function handlingFunction() {
try {
throwingFunction();
} catch ("we need the data") {
provide getData();
}
}
As you see, the throw statement is an expression evaluating to the data provided by the catch handler (I used the keyword provide here, which afaik does not exist in any programming language of today).
Why is this important?
Algebraic Effects are a very general and basic concept. This can be seen by the fact that many existing concepts can be expressed in Algebraic Effects.
try/catch
If we had Algebraic Effects but no Exceptions in our favorite programming language, we could just omit the provide keyword in the catch handler, and voilĂ , we would have an exception mechanism.
In other words, we would not need any Exceptions if we had Algebraic Effects.
async/await
Look again at the pseudo code above. Let's assume the data which we need has to be loaded over the network. If the data is not yet there, we would normally return a Promise and use async/await to handle it. This means that our function becomes an asynchronous function, which can only be called from asynchronous functions. However, Algebraic Effects are capable of that behavior too:
function handlingFunction() {
try {
throwingFunction();
} catch ("we need the data") {
fetch('data.source')
.then(data => provide data);
}
}
Who said that the provide keyword has to be used immediately?
In other words, had we had Algebraic Effects before async/await, there would be no need to clutter up the languages with them. Furthermore, Algebraic Effects would not render our functions colorful - our function does not become aynchronous from the language viewpoint.
Aspect-Oriented Programming
Let's say we want to have some log statements in our code, but we don't yet know which logging library it will be. We just want some general log statements (I replaced the keyword throw with the keyword effect here, to make it a bit more readable - note that effect is not a keyword in any language I know):
function myFunctionDeepDownTheCallstack() {
effect "info" "myFunctionDeepDownTheCallstack begins"
// do some stuff
if (warningCondition) {
effect "warn" "myFunctionDeepDownTheCallstack has a warningCondition"
}
// do some more stuff
effect "info" "myFunctionDeepDownTheCallstack exits"
}
And then we can connect whichever log framework in a few lines:
try {
doAllTheStuff();
}
catch ("info" with message) {
log.Info(message);
}
catch ("warn" with message) {
log.Warn(message);
}
This way, the log statement and the code that actually does the logging are separated.
As you can see, the throw keyword is not really suited in the context of the very general Algebraic Effects. More suitable keywords would be effect (as used here) or perform.
More examples
There are other existing language or library constructs that could be easily realized using Algebraic Effects:
Iterators with yield. A language with Algebraic Effects does not need the yield statement.
React Hooks (this is an example of a construct at the library level - the other examples here are language constructs).
Today's support
AFAIK there are not many languages with support for Algebraic Effects out of the box (please comment if you know examples that do). However, there are languages which allow the creation of algebraic effects libraries, one example being Javascript with its function* and yield keywords (i.e. generators). The library redux-saga uses Javascript generators to create some Algebraic Effects:
function* myRoutineWithEffects() {
// prepare data load
let data = yield put({ /* ... description of data to load */ });
// use the data
}
The put is an instruction that tells the calling function to execute the data load call described in the argument. put itself does not load anything, it just creates a description of what data to load. This description is passed by the yield keyword to the calling function, which initiates the data load.
While waiting for the results, the generator routine is paused. Then, the results are passed back to the routine and can then be used there after being assigned to the data variable. The routine then continues with the local stack plus the loaded data.
Note that in this scenario, only the calling function (or the code that has a reference to the generator) can "serve" the algebraic effect, e.g. do data loads and other things. So it is not an algebraic effect as described above, because it is not an exception mechanism that can jump up and down the call stack.
Its hard to gain a solid theoretical understanding of algebraic effects without a basis in category theory, so I'll try to explain its usage in layman terms, possibly sacrificing some accuracy.
An computational effect is any computation that includes an alteration of its environment. For example, things like total disk capacity, network connectivity are external effects, that play a role in operations like reading/writing files or accessing a database. Anything that a function produces, besides the value it computes, is a computational effect. From the perspective of that function, even another function that accesses the same memory as that function does, can be considered an effect.
That's the theoretical definition. Practically, its useful to think of an effect as any interaction between a sub expression and a central control which handles global resources in a program. Sometimes, a local expression may need to send messages to the central control while execution, along with enough information so that once the central control is done, it can resume the suspended execution.
Why do we do this? Because sometimes large libraries have very long chains of abstractions, which can get messy. Using "algebraic effects", gives us a sort of short cut to pass things between abstractions, without going through the whole chain.
As a practical JavaScript example, let's take a UI library like ReactJS. The idea is that UI can be written as a simple projection of data.
This for instance, would be the representation of a button.
function Button(name) {
return { buttonLabel: name, textColor: 'black' };
}
'John Smith' -> { buttonLabel: 'John Smith', textColor: 'black' }
Using this format, we can create a long chain of composable abstractions. Like so
function Button(name) {
return { buttonLabel: name, textColor: 'black' };
}
function UsernameButton(user) {
return {
backgroundColor: 'blue',
childContent: [
Button(user.name)
]
}
}
function UserList(users){
return users.map(eachUser => {
button: UsernameButton(eachUser.name),
listStyle: 'ordered'
})
}
function App(appUsers) {
return {
pageTheme: redTheme,
userList: UserList(appUsers)
}
}
This example has four layers of abstraction composed together.
App -> UserList -> UsernameButton -> Button
Now, let's assume that for any of these buttons, I need to inherit the color theme of whatever machine it runs on. Say, mobile phones have red text, while laptops have blue text.
The theme data is in the first abstraction (App). It needs to be implemented in the last abstraction (Button).
The annoying way, would be to pass on the theme data, from App to Button, modifying each and every abstraction along the way.
App passes theme data to UserList
UserList passes it to UserButton
UserButton passes it to Button
It becomes obvious that in large libraries with hundreds of layers of abstraction, this is a huge pain.
A possible solution is to pass on the effect, through a specific effect handler and let it continue when it needs to.
function PageThemeRequest() {
return THEME_EFFECT;
}
function App(appUsers) {
const themeHandler = raise new PageThemeRequest(continuation);
return {
pageTheme: themeHandler,
userList: UserList(appUsers)
}
}
// ...Abstractions in between...
function Button(name) {
try {
return { buttonLabel: name, textColor: 'black' };
} catch PageThemeRequest -> [, continuation] {
continuation();
}
}
This type of effect handling, where one abstraction in a chain can suspend what its doing (theme implementation), send the necessary data to the central control (App, which has access to external theming), and passes along the data needed for continuation, is an extremely simplistic example of handling effects algebraically.
Well as far as I understand the topic, algebraic effects are currently an academic/experimental concept that lets you alter certain computational elements (like function calls, print statements etc.) called "effects" by using a mechanism that resembles throw catch
The simplest example I can think of in a language like JavaScript is modifying the output message in lets say console.log. Supposed you want to add "Debug Message: " in front of all your console.log statements for whatever reason. This would be trouble in JavaScript. Basically you would need to call a function on every console.log like so:
function logTransform(msg) { return "Debug Message: " + msg; }
console.log(logTransform("Hello world"));
Now if you have many console.log statements every single one of them needs to be changed if you want to introduce the change in logging. Now the concept of algebraic effects would allow you to handle the "effect" of console.log on the system. Think of it like console.log throwing an exception before invocation and this exception (the effect) bubbles up and can be handled. The only difference is: If unhandled the execution will just continue like nothing happened. No what this lets you do is manipulate the behaviour of console.log in an arbitrary scope (global or just local) without manipulating the actual call to console.log. Could look something like this:
try
{
console.log("Hello world");
}
catch effect console.log(continuation, msg)
{
msg = "Debug message: " + msg;
continuation();
}
Note that this is not JavaScript, I'm just making the syntax up. As algebraic effects are an experimental construct they are not natively supported in any major programming language I know (there are however several experimental languages like eff https://www.eff-lang.org/learn/). I hope you get a rough understanding how my made-up code is intended to work. In the try catch block the effect that might be thrown by console.log can be handled. Continuation is a token-like construct that is needed to control when the normal workflow should continue. It's not necessary to have such a thing but it would allow you to make manipulations before and after console.log (for example you could add an extra log message after each console.log)
All in all algebraic effects are an interesting concept that helps with many real-world problems in coding but it can also introduce certain pitfalls if methods suddenly behave differently than expected. If you want to use algebraic effects right now in JavaScript you would have to write a framework for it yourself and you probably won't be able to apply algebraic effects to core functions such as console.log anyway. Basically all you can do right now is explore the concept on an abstract scale and think about it or learn one of the experimental languages. I think that's also the reason why many of the introductory papers are so abstract.
You can check out algebraic-effects. It is a library that implements a lot of the concepts of algebraic effects in javascript using generator functions, including multiple continuations. It's a lot easier to understand algebraic-effects in terms of try-catch (Exception effect) and generator functions.
The following is a skeleton of break in scala using util.control.Breaks._ :
import util.control.Breaks._
breakable {
for (oct <- 1 to 4) {
if (...) {
} else {
break
}
}
}
This structure requires remembering several non intuitive names. I do not use break statements every other or third day - but part of that is the difficulty in remembering how to do them. In a number of other popular languages it is dead simple: break out of a loop and you're good.
Is there any mechanism closer to that simple keyword/structure in scala?
No, there isn't. for is not a loop but a syntactic sugar to a chain of functions with closures passed into them. Since the local stack changes, it cannot be translated into some goto instruction like a "normal" break does. To break the execution of arbitrarily nested closures you have to throw an exception, which is what both break and return do to exit early.
For particular situation you can often use something like .filter, .takeWhile, .dropWhile (sometimes with .sliding to look ahead), etc. For situation where you have no better idea, #tailrecursive function is always an option.
In your current example I would probably do:
(1 to 4).takeWhile(...).foreach { oct => ... }
Basically, I would consider every single case individually to write it in a declarative way.
i.e., by passing the error condition and not halting the entire Observable?
My Observable starts with a user-supplied list of package tracking numbers from common delivery services (FedEx, UPS, DHL, etc), looks up the expected delivery date online, then returns those dates in terms of number of days from today (i.e. "in 3 days" rather than "Jan 22"). The problem is that if any individual lookup results in an exception, the entire stream halts, and the rest of the codes won't be looked up. There's no ability to gracefully handle, say, UnknownTrackingCode Exception, and so the Observable can't guarantee that it will look up all the codes the user submitted.
public void getDaysTillDelivery(List<String> tracking_code_list) {
Observable o = Observable.from(tracking_code_list)
// LookupDeliveryDate performs network calls to UPS, FedEx, USPS web sites or APIs
// it might throw: UnknownTrackingCode Exception, NoResponse Exception, LostPackage Exception
.map(tracking_code -> LookupDeliveryDate(tracking_code))
.map(delivery_date -> CalculateDaysFromToday(delivery_date));
o.subscribe(mySubscriber); // will handle onNext, onError, onComplete
}
Halting the Observable stream as a result of one error is by design:
http://reactivex.io/documentation/operators/catch.html
Handling Exceptions in Reactive Extensions without stopping sequence
https://groups.google.com/forum/#!topic/rxjava/trm2n6S4FSc
The default behavior can be overcome, but only by eliminating many of the benefits of Rx in the first place:
I can wrap LookupDeliveryDate so it returns Dates in place of Exceptions (such as 1899-12-31 for UnknownTrackingCode Exception) but this prevents "loosely coupled code", because CalculateDaysFromToday would need to handle these special cases
I can surround each anonymous function with try/catch and blocks, but this essentially prevents me from using lambdas
I can use if/thens to direct the code path, but this will likely require maintaining some state and eliminating deterministic evaluation
Error handling of each step, obviously, prevents consolidating all error handling in the Subscriber
Writing my own error-handling operator is possible, but thinly documented
Is there a better way to handle this?
What exactly do you want to happen if there is an error? Do you just want to throw that entry away or do you want something downstream to do something with it?
If you want something downstream to take some action, then you are really turning the error into data (sort of like your example of returning a sentinel value of 1899-12-31 to represent the error). This strategy by definition means that everything downstream needs to understand that the data stream may contain errors instead of data and they must be modified to deal with it.
But rather than yielding a magic value, you can turn your Observable stream into a stream of Either values. Either the value is a date, or it is an error. Everything downstream receives this Either object and can ask it if it has a value or an error. If it has a value, they can produce a new Either object with the result of their calculation. If it has an error and they cannot do anything with it, they can yield an error Either themselves.
I don't know Java syntax, but this is what it might look like in c#:
Observable.From(tracking_code_list)
.Select(t =>
{
try { return Either.From(LookupDeliveryDate(t)); }
catch (Exception e)
{
return Either.FromError<Date>(e);
}
})
.Select(dateEither =>
{
return dateEither.HasValue ?
Either.From(CalculateDaysFromToday(dateEither.Value)) :
Either.FromError<int>(dateEither.Error);
})
.Subscribe(value =>
{
if (value.HasValue) mySubscriber.OnValue(value.Value);
else mySubscribe.OnError(value.Error);
});
Your other option is the "handle"/suppress the error when it occurs. This may be sufficient depending on your needs. In this case, just have LookupDeliveryDate return magic dates instead of exceptions and then add a .filter to filter out the magic dates before they get to CalculateDaysFromToay.
Reading Scala docs written by the experts one can get the impression that tail recursion is better than a while loop, even when the latter is more concise and clearer. This is one example
object Helpers {
implicit class IntWithTimes(val pip:Int) {
// Recursive
def times(f: => Unit):Unit = {
#tailrec
def loop(counter:Int):Unit = {
if (counter >0) { f; loop(counter-1) }
}
loop(pip)
}
// Explicit loop
def :#(f: => Unit) = {
var lc = pip
while (lc > 0) { f; lc -= 1 }
}
}
}
(To be clear, the expert was not addressing looping at all, but in the example they chose to write a loop in this fashion as if by instinct, which is what the raised the question for me: should I develop a similar instinct..)
The only aspect of the while loop that could be better is the iteration variable should be local to the body of the loop, and the mutation of the variable should be in a fixed place, but Scala chooses not to provide that syntax.
Clarity is subjective, but the question is does the (tail) recursive style offer improved performance?
I'm pretty sure that, due to the limitations of the JVM, not every potentially tail-recursive function will be optimised away by the Scala compiler as so, so the short (and sometimes wrong) answer to your question on performance is no.
The long answer to your more general question (having an advantage) is a little more contrived. Note that, by using while, you are in fact:
creating a new variable that holds a counter.
mutating that variable.
Off-by-one errors and the perils of mutability will ensure that, on the long run, you'll introduce bugs with a while pattern. In fact, your times function could easily be implemented as:
def times(f: => Unit) = (1 to pip) foreach f
Which not only is simpler and smaller, but also avoids any creation of transient variables and mutability. In fact, if the type of the function you are calling would be something to which the results matter, then the while construction would start to be even more difficult to read. Please attempt to implement the following using nothing but whiles:
def replicate(l: List[Int])(times: Int) = l.flatMap(x => List.fill(times)(x))
Then proceed to define a tail-recursive function that does the same.
UPDATE:
I hear you saying: "hey! that's cheating! foreach is neither a while nor a tail-rec call". Oh really? Take a look into Scala's definition of foreach for Lists:
def foreach[B](f: A => B) {
var these = this
while (!these.isEmpty) {
f(these.head)
these = these.tail
}
}
If you want to learn more about recursion in Scala, take a look at this blog post. Once you are into functional programming, go crazy and read RĂșnar's blog post. Even more info here and here.
In general, a directly tail recursive function (i.e., one that always calls itself directly and cannot be overridden) will always be optimized into a while loop by the compiler. You can use the #tailrec annotation to verify that the compiler is able to do this for a particular function.
As a general rule, any tail recursive function can be rewritten (usually automatically by the compiler) as a while loop and vice versa.
The purpose of writing functions in a (tail) recursive style is not to maximize performance or even conciseness, but to make the intent of the code as clear as possible, while simultaneously minimizing the chance of introducing bugs (by eliminating mutable variables, which generally make it harder to keep track of what the "inputs" and "outputs" of the function are). A properly written recursive function consists of a series of checks for terminating conditions (using either cascading if-else or a pattern match) with the recursive call(s) (plural only if not tail recursive) made if none of the terminating conditions are met.
The benefit of using recursion is most dramatic when there are several different possible terminating conditions. A series of if conditionals or patterns is generally much easier to comprehend than a single while condition with a whole bunch of (potentially complex and inter-related) boolean expressions &&'d together, especially if the return value needs to be different depending on which terminating condition is met.
Did these experts say that performance was the reason? I'm betting their reasons are more to do with expressive code and functional programming. Could you cite examples of their arguments?
One interesting reason why recursive solutions can be more efficient than more imperative alternatives is that they very often operate on lists and in a way that uses only head and tail operations. These operations are actually faster than random-access operations on more complex collections.
Anther reason that while-based solutions may be less efficient is that they can become very ugly as the complexity of the problem increases...
(I have to say, at this point, that your example is not a good one, since neither of your loops do anything useful. Your recursive loop is particularly atypical since it returns nothing, which implies that you are missing a major point about recursive functions. The functional bit. A recursive function is much more than another way of repeating the same operation n times.)
While loops do not return a value and require side effects to achieve anything. It is a control structure which only works at all for very simple tasks. This is because each iteration of the loop has to examine all of the state to decide what to next. The loops boolean expression may also have to be come very complex if there are multiple potential exit paths (or that complexity has to be distributed throughout the code in the loop, which can be ugly and obfuscatory).
Recursive functions offer the possibility of a much cleaner implementation. A good recursive solution breaks a complex problem down in to simpler parts, then delegates each part on to another function which can deal with it - the trick being that that other function is itself (or possibly a mutually recursive function, though that is rarely seen in Scala - unlike the various Lisp dialects, where it is common - because of the poor tail recursion support). The recursively called function receives in its parameters only the simpler subset of data and only the relevant state; it returns only the solution to the simpler problem. So, in contrast to the while loop,
Each iteration of the function only has to deal with a simple subset of the problem
Each iteration only cares about its inputs, not the overall state
Sucess in each subtask is clearly defined by the return value of the call that handled it.
State from different subtasks cannot become entangled (since it is hidden within each recursive function call).
Multiple exit points, if they exist, are much easier to represent clearly.
Given these advantages, recursion can make it easier to achieve an efficient solution. Especially if you count maintainability as an important factor in long-term efficiency.
I'm going to go find some good examples of code to add. Meanwhile, at this point I always recommend The Little Schemer. I would go on about why but this is the second Scala recursion question on this site in two days, so look at my previous answer instead.
Here are two definitions both achieving the same result:
def sendTrigger(teamId:Long, data:String) {
EngineSync.browserSockets.collect{ case ((i,(u,t)),s) => if(t==teamId) { s.send(data) } }
}
def sendTrigger(teamId:Long, data:String) {
EngineSync.browserSockets.foreach{ case ((i,(u,t)),s) => if(t==teamId) { s.send(data) } }
}
What's happening is I am looping through a list of sockets and filtering them to send data. Being a newbie to Scala, I am concerned about performance when this begins to scale. From what I understand foreach performance is poor compared to other methods, does anyone know if collect would fare better or if this is the wrong approach entirely?
Looping through a fair sized collection vs. performing network IO (at least when blocking) are entirely different scale operations, therefore I would not worry about performance issures at this phase.
If you really care about performance when scaling massively:
Use NIO for asynchronous socket IO
Wrap up the socket access logic inside an Actor (and maybe use Futures in clients to hide the Actors)
Or even spare yourself the time and use Akka's IO module