Shared state in functional application - scala

Okay I'm trying to understand how to do programming without variables.
I've managed a simple tic-tac-toe game but it depended on a tail recursive state and as far as I can tell would only work in a single thread. Example can be seen at https://codereview.stackexchange.com/questions/187339/scala-tic-tac-toe-no-mutable-state
Now if I was to convert the same code to be either a webapp or even a gui program I can't see how the state could be shared across threads without having a mutable reference at the top level.
Even ScalaFX uses a variable for its PrimaryStage which I suspect is to provide such a top level reference.
How would one go about using shared state between two or more consumers without using variables? For instance tic tac toe as above or a web application that accepted posts of text and had an endpoint stating how many times an individual word was posted.
related:
Java: state sharing between threads in functional programming
Functional way to implement a thread safe shared counter
However the solutions to this all seem to require some mutable variable an atomic integer in one and STM in the other.
Is there a standard pattern for this type of work?
After reading
How can you do anything useful without mutable state? and How Functional Programming addresses concurrent increment/decrement operations invoked by different users I am pretty sure that this is not possible. There does need to be a reference to mutable state.
If this is the case how can the mutable state be minimised?
What patterns do people use for scala in web applications with internal state?

Related

Pattern(s) sought for avoiding "action at a distance"

I'm working on a complex and very large web application. Some of the classes within said application require execution of various methods in far-away objects, and I am quickly realizing and stumbling into bugs related to the "action at a distance" antipattern.
Example 1: Some of the classes require execution of daily "cleanup" methods, such as reaper(), cleanup(), send_daily_status_messages(), etc.
Example 2: Some classes mutate the app's state, and require a far-away object to perform a refresh() of its own state.
Example 3: Going back to Example 1, some objects spread throughout the app provide various bits of content to send_daily_status_messages().
To address this, our team has created an Events class that centralizes all of these calls. However we're finding the Events class itself to be a bit too "distant", in the sense that sometimes we make changes in the distributed objects, forget to make changes to the calls within the Events class, and then see bugs.
I'm wondering if there are any better patterns out there?
One thought: For objects to "register" themselves to some sort of dynamic Events class upon initialization. That would keep the invocation code near each object. Objects could maybe even also create different Events?
Lastly, this is for a Perl-based web application that using Moose. So any Perl-specific Moose-friendly recommendations, including CPAN recommendations, would be most appreciated!
The common pattern that sounds like what you're talking about is event dispatch. You can find many different takes on this pattern on CPAN, such as: Event::Distributor, Beam::Emitter, Mixin::Event::Dispatch, Mojo::EventEmitter (which I've extracted to a role). You have some object which is either an event dispatcher or has an event dispatching role applied to it, and everything can subscribe to events, and when something emits an event all of the subscribers get their callback called.

change state in functional programming

I want to write a piece of code (in Functional Programming Style) which should keep track of whether a user is logged in or not. I suppose I have to do things in immutable way.
Following pseudo code seem functional. It takes a state and returns its reverse value. It hasn't got side effects
changeState(Boolean state){
return !state
}
Somewhere in my logic, once the user logs in (or logs out), I'll call the above function passing it current value of logged in status. I am unable to think of how to store logged in status in Function way. This is wrong because currentLoggedState is val
val currentLoggedState = false;
//user entered login details correctly, change state
currentLoggedState = changeState(currentLoggedState)
How can I write such logic in Functional way?
State cannot be avoided. Point of functional programming is to enable better reasoning, not to make a purely mathematical model from your program.
For example, database is literally one giant state storage. When you're creating, updating, deleting etc. you are manipulating some state.
There are environments (such as akka actors model) where state is unavoidable as well. Try implementing a non-trivial system with actors and I guarantee that you'll have your actors full of lists and hashmaps. At some point it becomes inevitable. For example, even Coursera course held under the courtesy of EPFL, called "Reactive programming" (it's been renamed to FP Design in Scala or something like that), had a section held by Roland Kuhn himself and it involved working with actors and throughout the course assignments there was a shitload of state. I'm saying this to let you know that there are authoritative people in the Scala community saying that sometimes state cannot be avoided.
In your situation, it's best if you can push it to Redis or a similar storage, so that state is not present in the code itself (only mutability would be the persistence layer / storage).

Akka and singleton actors

I've recently started messing around with akka's actors and http modules. However I've stumbled upon a rather annoying little quirk, namely, creating singelton actors.
Here are two examples:
1)
I have an in-memory cache, my service is quite small (its an app rather) so I really like this in memory model. I can hold most information relevant to the user in a Map (well, a map of lists, but still, quite an easy to reason about structure) and I don't get the overhead and complexity of a redis, geode or aerospike.
The only problem is that this in-memory chache can be modified, by multiple sources and said modifications must be synchronous. Instead of synchornizing all 3 acess methods for this structure (e.g. by building a message queue or implementing locks) I thought I'd just wrap the structure and its access methods into an actor, build in message queue, easy receive->send logic and if things scale up it will be very easy to replace with a DA actors over a dedicated in memory db.
2) I have a "Service" layer that should be used to dispatch actors for various jobs (access the database, access the in-memory cache, do this computation with data and deliver the result to the user... etc).
It makes sense of this Service layer to be a "singleton" of sorts, a closure over some functions, since it does nothing that's blocking or cpu/memory intensive in any way, it simply assigns tasks further down the line (e.g. decides how many actors/thread/w.e should be created and where a request should go)
However, this thing would require either:
a) Making both object singleton actors or
b) Making both objects actual "objects"(as in the scala object notation that designates a single named singleton with functions that have closures over its scope)
There are plenty of problems with b), namely that the service layer will either have to get an actors system "passed" to it (and I'm not sure that's a best practice) in order o create actors, rather than creating its own "childrens" it will create children's using the global actors system and the messaging and monitoring logic will be a lot more awkward and unintuitive. Also, that the in-memory cache will not have the advantage of the built in message que (I'm not saying its hard to implement one, but this seems like one of those situation where one goes "Oh, jolly, its good that I have actors and I don't have to spend time implementing and testing this code")
a) seems to have the problem of being generally speaking poorly documented and unadvised in the akka documentation. I mean:
http://doc.akka.io/docs/akka/2.4/scala/cluster-singleton.html
Look at this shit, half of the docs are warning against using it, it was its own dependency and quite frankly its very hard to read for a poor sod like me which hasn't set foot in the functional&concurrent programming ivory tower.
So, ahm. Could any of you guys explain to me why its bad to use singleton actors ? How do you design singletons if they can't be actors ? Is there any way to design singleton actors that won't cause a lot of damage down the line ? Is the whole "service" model of having "global" services that are called rather than instantiated "un akka like" ?
Just to clarify the documentation, they're not warning against using it. They're warning that there are circumstances in which using a singleton will cause problems, which are expected given the circumstances. They mention the following situations:
If the singleton is a performance bottleneck. This makes sense. If everything relies on a single object that does work slowly, everything will be slow.
If the actor needs to be non-stop available, you'll run into problems if the singleton ever goes down, because those messages can't just be handled by another instance. It will take some amount of time to re-start the singleton before its work can be resumed.
The biggest problem happens if you have auto-downing turned on. Auto-downing is a policy by which an unreachable node is assumed to be down, and removed from the network. If you do this, but the node is not actually down but just unreachable due to a network partition, both sides of the partition will decide that they're the surviving nodes and create their own singletons. So now you have two singletons. Which is, of course, not what you want from a singleton. But you should never use auto-downing outside of testing anyway. It's a terrible recovery strategy that was included for completeness and convenience in testing.
So I don't read that as recommending against using it. Just being clear about the expected pitfalls if you do use it, based on the nature of the structure.

Is the actor model not an anti-pattern, as the fire-and-forget style forces actors to remember a state?

When learning Scala, one of the first things I learned was that every function returns something. There is no "void"-function/method as there is, for instance in Java. Thus many Scala-functions are true functions, in a mathematic way, and objects can remain largely stateless.
Now I learned that the actor model is a very popular model among functional languages like Scala. However, actors promote a fire-and-forget style of programming, and callers usually don't expect callees to directly reply to messages (except when using the "ask"/"?"-method). Therefore, actors need to remember some sort of state.
Am I right assuming that the actor model is more like a trade-off between scalability and maintainability (due to its statefulness), and could sometimes even be considered an anti-pattern?
Yes you're essentially right (I'm not quite sure what you have in mind when you say scalability vs maintainability).
Actors are popular in Scala because of Akka (which presumably is in turn popular because of the support it gets from Lightbend). It is, not however, the case that actors are overwhelmingly popular in general in the functional programming world (although implementations exist for all the languages I'm thinking of). Below are my vastly simplified impressions (so take them with the requisite amount of salt) of two other FP language communities, both of which use actors (far?) less frequently than Scala does.
The Haskell community tends to use either STM/channels (often in an STM context). Straight up MVars also get used surprisingly often.
The Clojure community sometimes touts its own built-in version of STM, but its flagship concurrency model is really core.async, which is at its heart again channels.
As an aside STM, channels, and actors can all be layered upon one another; its sort of weird to compare them as if they were mutually exclusive approaches. In practice though it's rare to see them all used in tandem.
Actors do indeed involve state (and in the case of Akka skirt type safety) and as a result are very expressive and can pretty much do anything concurrency-wise. In this way they're similar to side-effectful functions, which are more expressive than pure functions. Indeed actors in a way are the pure essence of OO, with all its pros and cons.
As such there is a sizable chunk of the Scala community that would say yes, if most of the time when you face concurrency issues, you're using actors, that's probably an anti-pattern.
If you can, try to get away with just using Futures or scalaz.concurrent.Tasks. In return for less expressiveness you get more composability.
If your problem naturally lends itself to a single, global state (e.g. in the form of global invariants that you want to enforce), think about STM. In the Scala community, although an STM library exists, my impression is that STM is usually emulated by using actors.
If your concurrency problems mainly revolves around streaming multiple sources of data, think about using one of Scala's streaming libraries.
Actors are specifically a tool in the toolbox for handling and distributing state. So yes, they should have state - if they don't then you just could use Futures.
Please note however that Actors (at least Akka Actors) handle distribution (running location-transparently on multiple nodes) which neither functions of Futures are able to do. The concurrency aspects of Actors are a result of them handling the more complex case - networking. In that sense, Actors unify the remote case with the local case, by making the remote case be first-class. And as it turns out, on networks messaging is exactly what you can both count and build on if you want reliable, resilient and also fast systems.
Hope this answers the "big picture" part of your question.

Scala immutable collections cannot be shared without synchronization?

From the «Learning concurrent programming in Scala» book:
In current versions of Scala (2.11.1), however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Could anyone demonstrate this with a small example? And does this still apply to 2.11.7?
The behavior of changes made in one thread when viewed from another is governed by the Java Memory Model. In particular, these rules are extremely weak when it comes to something like building a collection and then passing the built-and-now-immutable collection to another thread. The JMM does not guarantee that the other thread won't see an earlier view where the collection was not fully built!
Since synchronized blocks enforce an ordering, they can be used to get a consistent view if they're used on every single operation.
In practice, though, this is rarely actually necessary. On the CPU side, there is typically a memory barrier operation that can be used to enforce memory consistency (i.e. if you write the tail of your list and then pass a memory barrier, no other thread can see the tail un-set). And in practice, JVMs usually have to implement synchronized by using memory barriers. So one could hope that you could just pass the created list within a synchronzied block, trusting that a memory barrier would be issued, and everything thereafter would be fine.
Unfortunately, the JMM doesn't require that it be implemented in this way (and you can't assume that the memory-barrier-like behavior of object creation will actually be a full memory barrier that applies to everything in that thread as opposed to simply the final fields of that object), which is both why the recommendation is what it is, and why it's not fixed (yet, anyway) in the library.
For what it's worth, on x86 architectures, I've never observed a problem if you hand off the immutable object within a synchronized block. I have observed problems if you try to do it with CAS (e.g. by using the java.util.concurrent.atomic classes).
As an addition to the excellent answer from Rex Kerr:
it should be noted that most common use cases of immutable collections in a multithreading context are not affected by this problem. The only situation where this might affect you is when you do something that you probably should not do in the first place.
E.g. you have a variable var x: Vector[Int], which you write from one thread A and read from another thread B.
If you mark x with #volatile, there will be no problem, since the volatile write introduces a memory barrier. So you will never be able to observe the Vector in an inconsistent state. The same is true when using a synchronized { } block when writing and reading, or when using java.util.concurrent.atomic.AtomicReference.
If you don't mark x with #volatile, you might observe the vector in an inconsistent state (not just wrong elements, but internally inconsistent!). But in that case your code is arguably broken to begin with. It is completely undefined when you will see the changes from A in B.
You might see them
immediately
after there is a memory barrier somewhere else in your program
not at all
depending on the architecture you`re running on, the phase of the moon, whatever. So as Viktor Klang put it: "Unsafe publication is unsafe..."
Note that if you use a higher level concurrency framework such as akka actors, it is also guaranteed that receivers of messages can not see immutable collections in an inconsistent state.