Concurrency Primitives in Scala - scala

What would be a good concurrency primitives for accessing of an object that is CPU bound(without IO and Networking)?
For example, there's a FooCounter, which has a methods get(), set() and inc() for var counter: Int, that being shared among thousands and millions of threads.
object FooCounter{
var counter: Int = 0;
def get() = counter
def set(value: Int) = {counter = counter + value}
def inc() = counter + 1
}
I found that most literature on Scala is oriented about Akka. For me it seems that Actor model is not really suitable for this task.
There's also Futures/Promises but they are good for blocking tasks.
In Java there's a good primitive Atomics that uses latches, which is pretty robust and descent for this task.
Update:
I can use Java primitives for this simple task. However, my objective to use and learn Scala Concurrency model on this simple example.

You should be aware that your implementation of FooCounter is not thread-safe. If multiple threads simultaneously invoke the get, set and inc methods, there is no guarantee that the count will be accurate. You should use one of the following utilities to implement a correct concurrent counter:
synchronized statement (Scala synchronized statement is similar to the one in Java)
atomic variables
STMs (for example, ScalaSTM)
potentially, you could use an Akka actor that is a counter, but note that this is not the most straightforward application of actors
Other Scala concurrency utilities, such as futures and promises, parallel collections or reactive extensions, are not the best fit for implementing a concurrent counter, and have different usages.
You should know that Scala in some cases reuses the Java concurrency infrastructure. For example - Scala does not provide atomic variables of its own, since Java atomic classes already do the job. Instead, Scala aims to provide higher-level concurrency abstractions, such as actors, STMs and asynchronous event streams.
Benchmarking
To assess the running time and efficiency of your implementation, a good choice is ScalaMeter. The online documentation at the website contains detailed examples on how to do benchmarking.
Documentation and concurrency libraries
While Akka is the most popular and best documented Scala concurrency utility, there are many other high quality implementations, some of which are more appropriate for different tasks:
futures and promises
parallel collections and ScalaBlitz
software transactional memory
reactive extensions
I think Learning Concurrent Programming in Scala might be a good book for you. It contains detailed documentation of different concurrency styles in Scala, as well as directions on when to use which. Chapters 5 and 9 also deal with benchmarking.
Specifically, a concurrent counter utility is described, optimized, and made scalable in the Chapter 9 of the book.
Disclaimer: I'm the author.

You can use all Java syntonization primitives, like AtomicInteger for example for counting.
For more complicated tasks I personally like scala-stm library: http://nbronson.github.io/scala-stm/
With STM your example will look like this
object FooCounter{
private val counter = Ref(0);
def get() = atomic { implicit txn => counter() }
def set(value: Int) = atomic { implicit txn => counter() = counter() + value }
def inc() = atomic { implicit txn => counter() = counter() + 1 }
}
However for this simple example I would stop on Java primitives.

You are right about the "Akka orientation" in Scala: all in all I think there is a pretty overlap of Scala Language developer community and Akka ones. Late versions of Scala relies on Akka actors for concurrency, but frankly I cannot see anithing bad in this.
Regarding your question, to cite "Akka in Action" book:
Actors are great for processing many messages, capturing state and
reacting with different behaviors based on the messages they receive"
and
"Futures are the tool to use when you would rather use functions and
don't really need objects to do the job"
A Future is a placeholder for a value not yet available, so even if it is used for non blocking tasks, it's useful for more than that and I think it could be your choice here.

If you're not willing to use java.util.concurrent directly, then your most elegant solution may be using Akka Agents.
import scala.concurrent.ExecutionContext.Implicits.global
import akka.agent.Agent
class FooCounter {
val counter = Agent(0)
def get() = counter()
def set(v: Int) = counter.send(v)
def inc() = counter.send(_ + 1)
def modify(f: Int => Int) = counter.send(f(_))
}
This is asynchronous, and guarantees sequentiality of operations. The semantics are nice, and if you need more performance, you can always change it to the good old java.util.concurrent analogue:
import java.util.concurrent.atomic.AtomicInteger
class FooCounter {
val counter = new AtomicInteger(0)
def get() = counter.get()
def set(v: Int) = counter.set(v)
def inc() = counter.incrementAndGet()
def modify(f: Int => Int) = {
var done = false
var oldVal: Int = 0
while (!done) {
oldVal = counter.get()
done = counter.compareAndSet(oldVal, f(oldVal))
}
}
}

One of possible solutions of your task is actors. It is not fastest solution but safe and simple:
sealed trait Command
case object Get extends Command
case class Inc(value: Int) extends Command
class Counter extends Actor {
var counter: Int = 0
def receive = {
case Get =>
sender ! counter
case Inc(value: Int) =>
counter += value
sender ! counter
}
}

java.util.concurrent.atomic.AtomicInteger works very nicely with Scala, makes counting threadsafe, and you don't have to put additional dependencies into your project.

Related

Using scala-cats IO type to encapsulate a mutable Java library

I understand that generally speaking there is a lot to say about deciding what one wants to model as effect This discussion is introduce in Functional programming in Scala on the chapter on IO.
Nonethless, I have not finished the chapter, i was just browsing it end to end before takling it together with Cats IO.
In the mean time, I have a bit of a situation for some code I need to deliver soon at work.
It relies on a Java Library that is just all about mutation. That library was started a long time ago and for legacy reason i don't see them changing.
Anyway, long story short. Is actually modeling any mutating function as IO a viable way to encapsulate a mutating java library ?
Edit1 (at request I add a snippet)
Readying into a model, mutate the model rather than creating a new one. I would contrast jena to gremlin for instance, a functional library over graph data.
def loadModel(paths: String*): Model =
paths.foldLeft(ModelFactory.createOntologyModel(new OntModelSpec(OntModelSpec.OWL_MEM)).asInstanceOf[Model]) {
case (model, path) ⇒
val input = getClass.getClassLoader.getResourceAsStream(path)
val lang = RDFLanguages.filenameToLang(path).getName
model.read(input, "", lang)
}
That was my scala code, but the java api as documented in the website look like this.
// create the resource
Resource r = model.createResource();
// add the property
r.addProperty(RDFS.label, model.createLiteral("chat", "en"))
.addProperty(RDFS.label, model.createLiteral("chat", "fr"))
.addProperty(RDFS.label, model.createLiteral("<em>chat</em>", true));
// write out the Model
model.write(system.out);
// create a bag
Bag smiths = model.createBag();
// select all the resources with a VCARD.FN property
// whose value ends with "Smith"
StmtIterator iter = model.listStatements(
new SimpleSelector(null, VCARD.FN, (RDFNode) null) {
public boolean selects(Statement s) {
return s.getString().endsWith("Smith");
}
});
// add the Smith's to the bag
while (iter.hasNext()) {
smiths.add(iter.nextStatement().getSubject());
}
So, there are three solutions to this problem.
1. Simple and dirty
If all the usage of the impure API is contained in single / small part of the code base, you may just "cheat" and do something like:
def useBadJavaAPI(args): IO[Foo] = IO {
// Everything inside this block can be imperative and mutable.
}
I said "cheat" because the idea of IO is composition, and a big IO chunk is not really composition. But, sometimes you only want to encapsulate that legacy part and do not care about it.
2. Towards composition.
Basically, the same as above but dropping some flatMaps in the middle:
// Instead of:
def useBadJavaAPI(args): IO[Foo] = IO {
val a = createMutableThing()
mutableThing.add(args)
val b = a.bar()
b.computeFoo()
}
// You do something like this:
def useBadJavaAPI(args): IO[Foo] =
for {
a <- IO(createMutableThing())
_ <- IO(mutableThing.add(args))
b <- IO(a.bar())
result <- IO(b.computeFoo())
} yield result
There are a couple of reasons for doing this:
Because the imperative / mutable API is not contained in a single method / class but in a couple of them. And the encapsulation of small steps in IO is helping you to reason about it.
Because you want to slowly migrate the code to something better.
Because you want to feel better with yourself :p
3. Wrap it in a pure interface
This is basically the same that many third party libraries (e.g. Doobie, fs2-blobstore, neotypes) do. Wrapping a Java library on a pure interface.
Note that as such, the amount of work that has to be done is way more than the previous two solutions. As such, this is worth it if the mutable API is "infecting" many places of your codebase, or worse in multiple projects; if so then it makes sense to do this and publish is as an independent module.
(it may also be worth to publish that module as an open-source library, you may end up helping other people and receive help from other people as well)
Since this is a bigger task is not easy to just provide a complete answer of all you would have to do, it may help to see how those libraries are implemented and ask more questions either here or in the gitter channels.
But, I can give you a quick snippet of how it would look like:
// First define a pure interface of the operations you want to provide
trait PureModel[F[_]] { // You may forget about the abstract F and just use IO instead.
def op1: F[Int]
def op2(data: List[String]): F[Unit]
}
// Then in the companion object you define factories.
object PureModel {
// If the underlying java object has a close or release action,
// use a Resource[F, PureModel[F]] instead.
def apply[F[_]](args)(implicit F: Sync[F]): F[PureModel[F]] = ???
}
Now, how to create the implementation is the tricky part.
Maybe you can use something like Sync to initialize the mutable state.
def apply[F[_]](args)(implicit F: Sync[F]): F[PureModel[F]] =
F.delay(createMutableState()).map { mutableThing =>
new PureModel[F] {
override def op1: F[Int] = F.delay(mutableThing.foo())
override def op2(data: List[String]): F[Unit] = F.delay(mutableThing.bar(data))
}
}

What is the advantage of Free monads over plain old traits with IO monads?

So I've been getting deeper into FP concepts and I liked the concept of purity enclosed in the IO monad. Then I read this, and thought that the IO monad is indeed not as decoupled(?) as using Free Monads.
So I started doing my stuff using those concepts and then I realized that traits achieve the same purpose of separating structure from execution. Even worse than that, using free monads has a lot of limitations, like error handling and passing context bounds and implicit parameters into the interpreter/implementation.
So my question is: what's the advantage of using them? how do I solve the problems I just mentioned (implicit params & error handling)? Does the use of Free Monands limit to the academic realm, or can it be used in the industry?
Edit: An example to explain my doubts
import cats.free.Free._
import cats.free.Free
import cats.{Id, ~>}
import scala.concurrent.Future
sealed trait AppOpF[+A]
case class Put[T](key: String, value: T) extends AppOpF[Unit]
case class Delete(key: String) extends AppOpF[Unit]
//I purposely had this extend AppOpF[T] and not AppOpF[Option[T]]
case class Get[T](key: String) extends AppOpF[T]
object AppOpF {
type AppOp[T] = Free[AppOpF, T]
def put[T](key: String, value: T): AppOp[Unit] = liftF[AppOpF, Unit](Put(key, value))
def delete(key: String): AppOp[Unit] = liftF[AppOpF, Unit](Delete(key))
def get[T](key: String): AppOp[T] = liftF[AppOpF, T](Get(key))
def update[T](key: String, func: T => T): Free[AppOpF, Unit] = for {
//How do I manage the error here, if there's nothing saved in that key?
t <- get[T](key)
_ <- put[T](key, func(t))
} yield ()
}
object AppOpInterpreter1 extends (AppOpF ~> Id) {
override def apply[A](fa: AppOpF[A]) = {
fa match {
case Put(key,value)=>
???
case Delete(key)=>
???
case Get(key) =>
???
}
}
}
//Another implementation, with a different context monad, ok, that's good
object AppOpInterpreter2 extends (AppOpF ~> Future) {
override def apply[A](fa: AppOpF[A]) = {
fa match {
case a#Put(key,value)=>
//What if I need a Json Writes or a ClassTag here??
???
case a#Delete(key)=>
???
case a#Get(key) =>
???
}
}
}
Free algebra with IO monad serves the same purpose - to build a program as pure data structures. If you compare Free with some concrete implementation of IO, IO would probably win. It'll have more features and specialized traits that'll help you move fast and develop your program quickly. But it'll also mean that you'll have a major vendor lock on one implementation of IO. Whichever IO you choose, it'll be a concrete IO library that may have performance issues, bugs or maybe support problems - who knows. And changing your program from one vendor to another will cost you a lot because of this tight coupling between your program and implementation.
Free algebra, on the other hand, allows you to express your program without talking about your program's implementation. It separates your requirements from implementation in a way that you can test both easily and change them independently. As another benefit, Free allows you not to use IO at all. You can wrap standard Futures, java's standard CompletableFuture or any other third-party concurrency primitives in it and your program will still be pure. And for that, Free will require additional boilerplate (just as you showed in your example) and less flexibility. So the chose is yours.
There's also another way - final tagless. It's the approach that tries to balance pros from both sides providing less vendor lock and still not be as verbose as Free algebra. Worth checking it out.

Scala Actors: Returning a Future of a type other than Any.

I am working my way through a book on Scala Actors, and I am running into a bit of a syntactical hangup. In practice, I tend to assign my variables and function definitions as such:
val v: String = "blahblahblah"
def f(n: Int): Int = n+1
including the (return)type of the item after its name. While I know this is not necessary, I have grown comfortable with this convention and find that it makes the code more easily understood by myself.
That being said, observe the below example:
class Server extends Actor {
def act() = {
while (true) {
receive {
case Message(string) => reply("Good,very good.")
}
}
}
}
def sendMsg(m: Message, s: Server): Future[String] = {
s !! m
}
The above code produces an error at compile time, complaining that the server returned a Future[Any], as opposed to a Future[String]. I understand that this problem can be circumvented by removing the return type from sendMsg:
def sendMsg(m: Message,s: Server) = s !! m
However, this is not consistant with my style. Is there a way that I can specify the type of Future that the server generates (as opposed to Future[Any])?
Your problem is a lot deeper than just style: you get a Future[Any] because the compiler cannot statically know better—with the current Akka actors as well as with the now deprecated scala.actors. In the absence of compile-time checks you need to resort to runtime checks instead, as idonnie already commented:
(actorRef ? m).mapTo[String]
This will chain another Future to the original one which is filled either with a String result, a ClassCastException if the actor was naughty, or with a TimeoutException if the actor did not reply, see the Akka docs.
There might be a way out soon, I’m working on an Akka extension to include statically typed channels, but that will lead to you having to write your code a little differently, with more type annotations.

Scala concurrency on iterators as queues

I'm not really sure of the correct language of my problem, so feel free to provide me with the right terms.
Say I have a process A, which outputs an iterator (lazy evaluation)
This produces Iterator[A]
I then have another process B, which maps the events returning
Iterator[B]
This continues for several more processes
Iterator[A] -> Iterator[B] -> Iterator[C] -> ---
Now eventually I evaluate this stream into a list[Z].
This saves me the memory hit of having a List[A] -> List[B] -> List[C] etc
Now I want to improve performance by introducing parallelisation, but I don't want to parallelise the evaluation of each element across the iterators, but rather each iterator stack. So in this case a thread for process A fills a Queue[A] for Iterator[A], a thread for process B takes from Queue[A], applies whatever mapping, and then adds to Queue[B] for Iterator[B] to read from.
Now I have done this before in other languages by designing my own Async queues, I was wondering what Scala has to solve this.
Heres a first stab solutions I made using an actor.
Its fully blocking, so maybe an implementation using futures could be developed
case class AsyncIterator[T](iterator:Iterator[T]) extends Iterator[T] {
private val queue = new scala.collection.mutable.SynchronizedQueue[Int]()
private var end = !iterator.hasNext
def hasNext() = {
if (end) false
else if (!queue.isEmpty) true
else hasNext
}
def next() = {
while (q.isEmpty) {
if (end) throw new Exception("blah")
}
q.dequeue()
}
private val producer: Actor = actor {
loop {
if (!iterator.hasNext) {
end = true
exit
}
else {
q.enqueue(iterator.next)
}
}
}
producer.start()
}
Since you're open to alternative languages, how about Go?
There was a discussion recently about how to construct an event-driven pipeline, which would achieve the same thing as you describe but in a completely different way.
It's arguably easier to think about and design an event pipeline than it is to reason about lazy iterators because it becomes a data flow system in which the key question at each stage is 'what does this stage do with a single entity?' rather than 'how can I iterate efficiently over many entities?'
Once an event-driven pipeline has been implemented, the question of how to make it concurrent or parallel is moot - you've already done it.

Is there a FIFO stream in Scala?

I'm looking for a FIFO stream in Scala, i.e., something that provides the functionality of
immutable.Stream (a stream that can be finite and memorizes the elements that have already been read)
mutable.Queue (which allows for added elements to the FIFO)
The stream should be closable and should block access to the next element until the element has been added or the stream has been closed.
Actually I'm a bit surprised that the collection library does not (seem to) include such a data structure, since it is IMO a quite classical one.
My questions:
1) Did I overlook something? Is there already a class providing this functionality?
2) OK, if it's not included in the collection library then it might by just a trivial combination of existing collection classes. However, I tried to find this trivial code but my implementation looks still quite complex for such a simple problem. Is there a simpler solution for such a FifoStream?
class FifoStream[T] extends Closeable {
val queue = new Queue[Option[T]]
lazy val stream = nextStreamElem
private def nextStreamElem: Stream[T] = next() match {
case Some(elem) => Stream.cons(elem, nextStreamElem)
case None => Stream.empty
}
/** Returns next element in the queue (may wait for it to be inserted). */
private def next() = {
queue.synchronized {
if (queue.isEmpty) queue.wait()
queue.dequeue()
}
}
/** Adds new elements to this stream. */
def enqueue(elems: T*) {
queue.synchronized {
queue.enqueue(elems.map{Some(_)}: _*)
queue.notify()
}
}
/** Closes this stream. */
def close() {
queue.synchronized {
queue.enqueue(None)
queue.notify()
}
}
}
Paradigmatic's solution (sightly modified)
Thanks for your suggestions. I slightly modified paradigmatic's solution so that toStream returns an immutable stream (allows for repeatable reads) so that it fits my needs. Just for completeness, here is the code:
import collection.JavaConversions._
import java.util.concurrent.{LinkedBlockingQueue, BlockingQueue}
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] = new LinkedBlockingQueue[Option[A]]() ) {
lazy val toStream: Stream[A] = queue2stream
private def queue2stream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, queue2stream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
In Scala, streams are "functional iterators". People expect them to be pure (no side effects) and immutable. In you case, everytime you iterate on the stream you modify the queue (so it's no pure). This can create a lot of misunderstandings, because iterating twice the same stream, will have two different results.
That being said, you should rather use Java BlockingQueues, rather than rolling your own implementation. They are considered well implemented in term of safety and performances. Here is the cleanest code I can think of (using your approach):
import java.util.concurrent.BlockingQueue
import scala.collection.JavaConversions._
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] ) {
def toStream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, toStream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
object FIFOStream {
def apply[A]() = new LinkedBlockingQueue
}
I'm assuming you're looking for something like java.util.concurrent.BlockingQueue?
Akka has a BoundedBlockingQueue implementation of this interface. There are of course the implementations available in java.util.concurrent.
You might also consider using Akka's actors for whatever it is you are doing. Use Actors to be notified or pushed a new event or message instead of pulling.
1) It seems you're looking for a dataflow stream seen in languages like Oz, which supports the producer-consumer pattern. Such a collection is not available in the collections API, but you could always create one yourself.
2) The data flow stream relies on the concept of single-assignment variables (such that they don't have to be initialized upon declaration point and reading them prior to initialization causes blocking):
val x: Int
startThread {
println(x)
}
println("The other thread waits for the x to be assigned")
x = 1
It would be straightforward to implement such a stream if single-assignment (or dataflow) variables were supported in the language (see the link). Since they are not a part of Scala, you have to use the wait-synchronized-notify pattern just like you did.
Concurrent queues from Java can be used to achieve that as well, as the other user suggested.