Process-level parallelism in Scala

Process-level parallelism in Scala - scala

I'd like to use reflection in combination with parallel processing in Scala, but I'm getting bitten by reflection's lack of thread safety.
So, I'm considering just running each task in its own process (not thread).
Is there any easy way to do this?
For example, is there a way to configure .par so it spawns processes, not threads? Or is there some function fork that takes a closure and runs it in a new process?
EDIT: Futures are apparently a good way to go.
However, I still need to figure out how to run them in separate processes.
EDIT 2: I'm still having concurrency issues, even when using Akka's "fork-join-executor" dispatcher, which sure sounds like it should be forking processes. However, when I run ManagementFactory.getRuntimeMXBean().getName() inside the Futures, it seems everything still lives in the same process.
Is this the right way to check for actual process-level parallelism?
Am I using the correct Akka dispatcher?
EDIT 3: I realize reflection sucks. Unfortunately it is used in a library I need.

Have you looked into Scala Actors or Akka? There may be no more compelling reason to use Scala than for parallel and asynchronous programming. It's baked into the language. Check out these facilities. I'm pretty sure you'll find what you need.

There's little information as regards the problem you're trying to solve here...previous answers are pretty much on the ball - look at Actors etc...Akka and you may find that you don't need to necessarily do anything too complicated. Introspection/reflection in a multi-threaded environment usually means a messy and not well thought-out strategy in terms of decomposing the problem in hand.

Related

Is the actor model not an anti-pattern, as the fire-and-forget style forces actors to remember a state?

When learning Scala, one of the first things I learned was that every function returns something. There is no "void"-function/method as there is, for instance in Java. Thus many Scala-functions are true functions, in a mathematic way, and objects can remain largely stateless.
Now I learned that the actor model is a very popular model among functional languages like Scala. However, actors promote a fire-and-forget style of programming, and callers usually don't expect callees to directly reply to messages (except when using the "ask"/"?"-method). Therefore, actors need to remember some sort of state.
Am I right assuming that the actor model is more like a trade-off between scalability and maintainability (due to its statefulness), and could sometimes even be considered an anti-pattern?

Yes you're essentially right (I'm not quite sure what you have in mind when you say scalability vs maintainability).
Actors are popular in Scala because of Akka (which presumably is in turn popular because of the support it gets from Lightbend). It is, not however, the case that actors are overwhelmingly popular in general in the functional programming world (although implementations exist for all the languages I'm thinking of). Below are my vastly simplified impressions (so take them with the requisite amount of salt) of two other FP language communities, both of which use actors (far?) less frequently than Scala does.
The Haskell community tends to use either STM/channels (often in an STM context). Straight up MVars also get used surprisingly often.
The Clojure community sometimes touts its own built-in version of STM, but its flagship concurrency model is really core.async, which is at its heart again channels.
As an aside STM, channels, and actors can all be layered upon one another; its sort of weird to compare them as if they were mutually exclusive approaches. In practice though it's rare to see them all used in tandem.
Actors do indeed involve state (and in the case of Akka skirt type safety) and as a result are very expressive and can pretty much do anything concurrency-wise. In this way they're similar to side-effectful functions, which are more expressive than pure functions. Indeed actors in a way are the pure essence of OO, with all its pros and cons.
As such there is a sizable chunk of the Scala community that would say yes, if most of the time when you face concurrency issues, you're using actors, that's probably an anti-pattern.
If you can, try to get away with just using Futures or scalaz.concurrent.Tasks. In return for less expressiveness you get more composability.
If your problem naturally lends itself to a single, global state (e.g. in the form of global invariants that you want to enforce), think about STM. In the Scala community, although an STM library exists, my impression is that STM is usually emulated by using actors.
If your concurrency problems mainly revolves around streaming multiple sources of data, think about using one of Scala's streaming libraries.

Actors are specifically a tool in the toolbox for handling and distributing state. So yes, they should have state - if they don't then you just could use Futures.
Please note however that Actors (at least Akka Actors) handle distribution (running location-transparently on multiple nodes) which neither functions of Futures are able to do. The concurrency aspects of Actors are a result of them handling the more complex case - networking. In that sense, Actors unify the remote case with the local case, by making the remote case be first-class. And as it turns out, on networks messaging is exactly what you can both count and build on if you want reliable, resilient and also fast systems.
Hope this answers the "big picture" part of your question.

What happened to Scala.React?

I read the paper cowritten by Odersky, "Deprecating the Observer Pattern
with Scala.React"
The github looks abandoned:
https://github.com/ingoem/scala-react
Also, the recent Reactive Programming Coursera class, used the JavaRx Observable library (with Scala support of course).
Is there a story behind this? I can presume scala.react just didn't make it very far. Is the JavaRx library based on Observable advisable? Or can we expect something similar or better from Typesafe?

Citing Li Haoyi,
who has used Scala.React, his observations are:
"it is extremely difficult to set up and get started."
"It requires a fair amount of global configuration"
"It took several days to get a basic dataflow graph (..,) working."
He had a lot of questions but did not manage to contact the author of the publication...
Li also implemented a Scala.RX addressing these and other issues.
The code is good shape but I cannot observe any action of pushing it into the Standard Scala library. Also, Li is the driver behind the ongoing Scala & Javascript effort thus he is mostly occupied with that project.
Answering your questions:
Is the JavaRx library based on Observable advisable?
JavaRx is based on the Observer pattern Martin Odersky tried to deprecate...
https://github.com/Netflix/RxJava/blob/master/rxjava-core/src/main/java/rx/Observer.java
https://github.com/Netflix/RxJava/blob/master/rxjava-core/src/main/java/rx/Observable.java
While every issue Martin pointed out in the paper is true and valid,
Netflix had exploited a major property of Observables:
Futures and Observables share an isomorphism, thus are composable.
In JavaRx, an Observable returns a stream of events. However, a Future
on the other hand, can be seen as a specialized Observable that returns
only a singleton. In this case, Futures and Observables can be asynchronously composed
whenever it makes sense.
Is there a story behind this?
No idea but maybe Netflix did some sponsoring. You may have noticed the Netflix logo appearing in the RX diamonds examples....
Or can we expect something similar or better from Typesafe?
I honestly doubt that. Why should they? Typesafe is busy with pushing their
stack into industry and advancing Akka further. Scala.React is a neat idea but
does not produce any cash whereas Akka brings them paying customers....
Instead I would ask the question what exactly Scala.React, after all, tries to solve?
IMHO,JavaRx already does a good job, is in production and those improvements Scala.React could possible add are most likely not enough for a major change.

RxJava: Reactive Extensions has very little in common with scala.react. RxJava deals with observers and concurrency but helps very little regarding correctness of evaluation order. Basically it is just streams of events, and if events that are split into several effects those will never be coherent again. Basically it's a mess and can only be used for GUI where precision in computation is not so critical. You never know when you get an extra update or extra refresh.
scala.react is a single threaded computation model and deals with order of computation with a strict evaluation order that is defined by the functional dependencies between computations.
Akka, or actors, again, is a third model and completely different thing. It is just threads with some fancy syntax and scheduliing, really.
No wonder everyone is confused. Sadly scala.react has not moved anywhere, which is bad as it's the only innovative model of these three.

Typed messages in akka

Akka framework recommends using typed actor only for interacting with external code. However, standard actors from akka are untyped. Is there any better way to create type safe actors? Are there some other actor frameworks or type safe wrappers around akka?

If you really want actors with static typing, then you might as well go ahead and use typed actors throughout your code. This is strongly discouraged for a couple of reasons.
1.) You run the risk of your system degenerating into a bunch of RPCs. An actor's receive method makes it pretty obvious that the whole thing is about message passing, much less so if you're just calling methods on a typed actor.
2.) An actor just really doesn't have a type. While it's running, the messages an actor is able to process may change depending on what state is in, as may what it does with those messages. This is an excellent way of modeling a lot of protocols, and Akka actors have first class support for it with FSMs.
So if you really want to do it, you're free to used typed actors everywhere and it'll work, but you should really think hard about the problem you're trying to solve before doing so.

For compile time checking see SynapseGrid framework. It defines a SystemBuilder that constructs the DataFlow topology. While constructing it is guaranteed that types that pass by are checked. Then the resulting system is converted to RuntimeSystem with nested and properly interconnected actors.

Why is this a problem for you? akka.actor.Actor has the receive method of type PartialFunction that will only be called for messages that it can handle. Why do you need compile time checks? But to answer your question: one way would be - for an external api - to build a wrapper around your ActorRef that then sends the messages to the actor.

Things are going quite fast, I thought about giving an update
1. Typed actors are deprecated
2. Instead a new concept of Akka Typed is being devloped at the momemnt
As I understood this should be the definitive solution to an typed actor system. But since this is at least the third try and planned earliest for Akka 2.4, this claim remains to be proven.
I personally do look forward to have both systems available: the existing one for more dynamic use cases, the new one for more robust ones

Is the actor model limited to specific languages?

I was reading an interesting blog post about Erlang/OTP and the actor model. I also hear that Scala supports the actor model. From the little I gathered so far, the actor model breaks down processing into components that communicate with each other by passing messages. Typically, those processes are immutable.
Are those features language-specific though or more at the architecture level? more specifically, can't you just implement the same actor model in almost any language, and just use some form of message-queue to pass messages between worker processes? (for example, use something like celery). Or is it that those languages like Erlang and Scala simply do this transparently and much faster?

Certainly you can define an "Actor Library" in virtually any language, but in Erlang the model is baked-in to the language, and is really the only concurrency model available.
While Scala's actors system is well implemented, at the end of the day, it still vulnerable to some hazards that Erlang is immune from. I'll draw your attention to this paper.
This would be the case for any Actor library implemented in any imperative language that supports shared mutable state.
An interesting exception to this is Nodes.js. Some work is being done with actors between Nodes that probably exhibit the same isolation properties as Erlang, simply because there is no shared mutable state.

Actor model is not limited to any specific platform or programming language, it's just a model after all.
Erlang and Scala have really good and useful implementations of this model, which fits nicely in typical technology stack of these platforms and helps to effectively solve certain kinds of tasks.

To add to the points mentioned above, the fact that in Erlang actor model is the only way you can program, makes your code scalable from the get-go. Erlang processes are lightweight, and you can spawn 10-100K on one machine (I don't think you can do it with python), this changes the way you approach problems. For example, in our product we parse web server logs with Erlang and spawn an Erlang process to handle each line. That way, if one log line is corrupted, or the process that handles it crashes, nothing happens to the other ones.
Another difference is when you start using OTP you get processes supervisors and you can make processes connected so if one terminates all the others do.
Other than that, Erlang has some other nice feature (which can be found in other languages through libraries, but again here it's baked in) like pattern matching and hot deploy.

No, there is nothing language-specific about the Actor Model. In fact, you already mention Scala in your question, where actors are not part of the language but are instead implemented as a library. (Three competing libraries, actually.)
However, just like Functional Programming or Object-Oriented Programming, having direct support for Actor Programming, or at least support for some abstractions that make it easier to implement, in the language will lead to a very different programming experience. Anyone who has ever done Functional Programming or Object-Oriented Programming in C will probably understand this.

Scala actor memory leaks, are they as bad as It was or improving?

I am currently studding Scala 2.8 using Programming in Scala 2nd edition.
But I am beginning to get really concerned about posts like this Clojure vs Scala
Is Scala this bad about memory leaks, this is not the first info I heard about issues with actors and memory leaks.
Is it that bad? are new versions fixing it in reasonable time? Is Akka going to solve all of it if or when if it's merged?
Because seeing big issues with one of scala biggest strong points (at least for me Erlang like actors are one of the major candies of the lang) is really a major drawback if they are not being able to fix them and improve on top of it.

I know of people using huge numbers of actors, so I'm pretty sure memory leaks are not widespread.
Did Scala Actors have memory leaks back in 2009 (Scala 2.7.x)? Yes, they did. For example, SI-1801 and SI-1948.
Right now, there are three tickets open on memory leaks that I could find: SI-3467,SI-3920 and SI-3921.
I do take issue with one comment you made, however:
one of scala biggest strong points (at least for me Erlang like actors
are one of the major candies of the lang)
Actors are NOT part of the language! They are a library! That is the whole point of Scala, it is the very meaning of "scalable" from which the name Scala came: that you can add stuff like this through libraries.
There are, right now, four different actor implementations in Scala: main library, Scalaz, Lift and Akka. There's absolutely no reason for you to tie yourself to the standard library one. In fact, one of the problems with the actors in the main library is that they were written more to prove that one could do it than to solve real problems.
If you want to use actors, use Akka. You can use it right now. Hell, you can even use it with Java, if you're into syntactic masochism. Akka is a superb library, which goes far beyond simply providing actors, and into providing all the supporting tool to make them useful (like supervisors and load balancers), plus other tools to fully support concurrency, like Agents (Clojure-style), STM (Multiverse-based), integration with Spring, Camel, AMQP, etc.
Scala's strength is making it possible to seemingly extend it through libraries. If you limit yourself to what's on the standard library, you are throwing that away.

You should try Akka. It's really robust, lightweight and tunable. For instance, you can bound mailbox sizes (and choose what to do when mailboxes are full).

I don't know Scala internals much, but I would guess that Scala's actors implementation is queuing every message without limit.
If the actors don't pull from the queue fast enough, the queue grows and consumes memory.
I guess a less memory angry implementation would limit the number of queued messages at once, and thus consume less memory (but would also block message senders while the queue is full).