There is TPriorityQueue in ZIO which is unbounded. Is there a way to adapt it to become bounded?
If you don't care about performance too much you can do something quick and dirty like this:
case class TPW(tpq: TPriorityQueue[Int]) {
def offer(e: Int) = ZSTM.whenM(tpq.size.map(_ < 50))(tpq.offer(e))
// .. other methods
}
This would be a dropping queue.
Related
What would be a good concurrency primitives for accessing of an object that is CPU bound(without IO and Networking)?
For example, there's a FooCounter, which has a methods get(), set() and inc() for var counter: Int, that being shared among thousands and millions of threads.
object FooCounter{
var counter: Int = 0;
def get() = counter
def set(value: Int) = {counter = counter + value}
def inc() = counter + 1
}
I found that most literature on Scala is oriented about Akka. For me it seems that Actor model is not really suitable for this task.
There's also Futures/Promises but they are good for blocking tasks.
In Java there's a good primitive Atomics that uses latches, which is pretty robust and descent for this task.
Update:
I can use Java primitives for this simple task. However, my objective to use and learn Scala Concurrency model on this simple example.
You should be aware that your implementation of FooCounter is not thread-safe. If multiple threads simultaneously invoke the get, set and inc methods, there is no guarantee that the count will be accurate. You should use one of the following utilities to implement a correct concurrent counter:
synchronized statement (Scala synchronized statement is similar to the one in Java)
atomic variables
STMs (for example, ScalaSTM)
potentially, you could use an Akka actor that is a counter, but note that this is not the most straightforward application of actors
Other Scala concurrency utilities, such as futures and promises, parallel collections or reactive extensions, are not the best fit for implementing a concurrent counter, and have different usages.
You should know that Scala in some cases reuses the Java concurrency infrastructure. For example - Scala does not provide atomic variables of its own, since Java atomic classes already do the job. Instead, Scala aims to provide higher-level concurrency abstractions, such as actors, STMs and asynchronous event streams.
Benchmarking
To assess the running time and efficiency of your implementation, a good choice is ScalaMeter. The online documentation at the website contains detailed examples on how to do benchmarking.
Documentation and concurrency libraries
While Akka is the most popular and best documented Scala concurrency utility, there are many other high quality implementations, some of which are more appropriate for different tasks:
futures and promises
parallel collections and ScalaBlitz
software transactional memory
reactive extensions
I think Learning Concurrent Programming in Scala might be a good book for you. It contains detailed documentation of different concurrency styles in Scala, as well as directions on when to use which. Chapters 5 and 9 also deal with benchmarking.
Specifically, a concurrent counter utility is described, optimized, and made scalable in the Chapter 9 of the book.
Disclaimer: I'm the author.
You can use all Java syntonization primitives, like AtomicInteger for example for counting.
For more complicated tasks I personally like scala-stm library: http://nbronson.github.io/scala-stm/
With STM your example will look like this
object FooCounter{
private val counter = Ref(0);
def get() = atomic { implicit txn => counter() }
def set(value: Int) = atomic { implicit txn => counter() = counter() + value }
def inc() = atomic { implicit txn => counter() = counter() + 1 }
}
However for this simple example I would stop on Java primitives.
You are right about the "Akka orientation" in Scala: all in all I think there is a pretty overlap of Scala Language developer community and Akka ones. Late versions of Scala relies on Akka actors for concurrency, but frankly I cannot see anithing bad in this.
Regarding your question, to cite "Akka in Action" book:
Actors are great for processing many messages, capturing state and
reacting with different behaviors based on the messages they receive"
and
"Futures are the tool to use when you would rather use functions and
don't really need objects to do the job"
A Future is a placeholder for a value not yet available, so even if it is used for non blocking tasks, it's useful for more than that and I think it could be your choice here.
If you're not willing to use java.util.concurrent directly, then your most elegant solution may be using Akka Agents.
import scala.concurrent.ExecutionContext.Implicits.global
import akka.agent.Agent
class FooCounter {
val counter = Agent(0)
def get() = counter()
def set(v: Int) = counter.send(v)
def inc() = counter.send(_ + 1)
def modify(f: Int => Int) = counter.send(f(_))
}
This is asynchronous, and guarantees sequentiality of operations. The semantics are nice, and if you need more performance, you can always change it to the good old java.util.concurrent analogue:
import java.util.concurrent.atomic.AtomicInteger
class FooCounter {
val counter = new AtomicInteger(0)
def get() = counter.get()
def set(v: Int) = counter.set(v)
def inc() = counter.incrementAndGet()
def modify(f: Int => Int) = {
var done = false
var oldVal: Int = 0
while (!done) {
oldVal = counter.get()
done = counter.compareAndSet(oldVal, f(oldVal))
}
}
}
One of possible solutions of your task is actors. It is not fastest solution but safe and simple:
sealed trait Command
case object Get extends Command
case class Inc(value: Int) extends Command
class Counter extends Actor {
var counter: Int = 0
def receive = {
case Get =>
sender ! counter
case Inc(value: Int) =>
counter += value
sender ! counter
}
}
java.util.concurrent.atomic.AtomicInteger works very nicely with Scala, makes counting threadsafe, and you don't have to put additional dependencies into your project.
Akka's documentation warns:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback
It seems to me that if I could get the Future which wants to access the mutable state to run on the same dispatcher that arranges for mutual exclusion of threads handling actor messages then this issue could be avoided. Is that possible? (Why not?)
The ExecutionContext provided by context.dispatcher is not tied to the actor messages dispatcher, but what if it were? i.e.
class MyActorWithSafeFutures {
implicit def safeDispatcher = context.dispatcherOnMessageThread
var successCount = 0
var failureCount = 0
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount += 1
case Failure(_) => failureCount += 1
}
response pipeTo sender()
}
}
}
Is there any way to do that in Akka?
(I know that I could convert the above example to do something like self ! IncrementSuccess, but this question is about mutating actor state from Futures, rather than via messages.)
It looks like I might be able to implement this myself, using code like the following:
class MyActorWithSafeFutures {
implicit val executionContext: ExecutionContextExecutor = new ExecutionContextExecutor {
override def execute(runnable: Runnable): Unit = {
self ! runnable
}
override def reportFailure(cause: Throwable): Unit = {
throw new Error("Unhandled throwable", cause)
}
}
override def receive: Receive = {
case runnable: Runnable => runnable.run()
... other cases here
}
}
Would that work? Why doesn't Akka offer that - is there some huge drawback I'm not seeing?
(See https://github.com/jducoeur/Requester for a library which does just this in a limited way -- for Asks only, not for all Future callbacks.)
Your actor is executing its receive under one of the dispatcher's threads, and you want to spin off a Future that's firmly attached to this particular thread? In that case the system can't reuse this thread to run a different actor, because that would mean the thread was unavailable when you wanted to execute the Future. If it happened to use that same thread to execute someClient, you might deadlock with yourself. So this thread can no longer be used freely to run other actors - it has to belong to MySafeActor.
And no other threads can be allowed to freely run MySafeActor - if they were, two different threads might try to update successCount at the same time and you'd lose data (e.g. if the value is 0 and two threads both try to do successCount +=1, the value can end up as 1 rather that 2). So to do this safely, MySafeActor has to have a single Thread that's used for itself and its Future. So you end up with MySafeActor and that Future being tightly, but invisibly, coupled. The two can't run at the same time and could deadlock against each other. (It's still possible for a badly-written actor to deadlock against itself, but the fact that all the code using that actor's "imaginary mutex" is in a single place makes it easier to see potential problems).
You could use traditional multithreading techniques - mutexes and the like - to allow the Future and MySafeActor to run concurrently. But what you really want is to encapsulate successCount in something that can be used concurrently but safely - some kind of... Actor?
TL;DR: either the Future and the Actor: 1) may not run concurrently, in which case you may deadlock 2) may run concurrently, in which case you will corrupt data 3) access state in a concurrency-safe way, in which case you're reimplementing Actors.
You could use a PinnedDispatcher for your MyActorWithSafeFutures actor class which would create a thread pool with exactly one thread for each instance of the given class, and use context.dispatcher as execution context for your Future.
To do this you have to put something like this in your application.conf:
akka {
...
}
my-pinned-dispatcher {
executor = "thread-pool-executor"
type = PinnedDispatcher
}
and to create your actor:
actorSystem.actorOf(
Props(
classOf[MyActorWithSafeFutures]
).withDispatcher("my-pinned-dispatcher"),
"myActorWithSafeFutures"
)
Although what you are trying to achieve breaks completely the purpose of the actor model. The actor state should be encapsulated, and actor state changes should be driven by incoming messages.
This does not answer your question directly, but rather offers an alternative solution using Akka Agents:
class MyActorWithSafeFutures extends Actor {
var successCount = Agent(0)
var failureCount = Agent(0)
def doSomethingWithPossiblyStaleCounts() = {
val (s, f) = (successCount.get(), failureCount.get())
statisticsCollector ! Ratio(f/s+f)
}
def doSomethingWithCurrentCounts() = {
val (successF, failureF) = (successCount.future(), failureCount.future())
val ratio : Future[Ratio] = for {
s <- successF
f <- failureF
} yield Ratio(f/s+f)
ratio pipeTo statisticsCollector
}
override def receive: Receive = {
case MakeExternalRequest(req) =>
val response: Future[Response] = someClient.makeRequest(req)
response.onComplete {
case Success(_) => successCount.send(_ + 1)
case Failure(_) => failureCount.send(_ + 1)
}
response pipeTo sender()
}
}
The catch is that if you want to operate on the counts that would result if you were using #volatile, then you need to operate inside a Future, see doSomethingWithCurrentCounts().
If you are fine with having values which are eventually consistent (there might be pending updates scheduled for the Agents), then something like doSometinghWithPossiblyStaleCounts() is fine.
#rkuhn explains why this would be a bad idea on the akka-user list:
My main consideration here is that such a dispatcher would make it very convenient to have multiple concurrent entry points into the Actor’s behavior, where with the current recommendation there is only one—the active behavior. While classical data races are excluded by the synchronization afforded by the proposed ExecutionContext, it would still allow higher-level races by suspending a logical thread and not controlling the intermediate execution of other messages. In a nutshell, I don’t think this would make the Actor easier to reason about, quite the opposite.
For an immutable flavour, Iterator does the job.
val x = Iterator.fill(100000)(someFn)
Now I want to implement a mutable version of Iterator, with three guarantees:
thread-safe on all transformations(fold, foldLeft, ..) and append
lazy evaluated
traversable only once! Once used, an object from this Iterator should be destroyed.
Is there an existing implementation to give me these guarantees? Any library or framework example would be great.
Update
To illustrate the desired behaviour.
class SomeThing {}
class Test(val list: Iterator[SomeThing]) {
def add(thing: SomeThing): Test = {
new Test(list ++ Iterator(thing))
}
}
(new Test()).add(new SomeThing).add(new SomeThing);
In this example, SomeThing is an expensive construct, it needs to be lazy.
Re-iterating over list is never required, Iterator is a good fit.
This is supposed to asynchronously and lazily sequence 10 million SomeThing instances without depleting the executor(a cached thread pool executor) or running out of memory.
You don't need a mutable Iterator for this, just daisy-chain the immutable form:
class SomeThing {}
case class Test(val list: Iterator[SomeThing]) {
def add(thing: => SomeThing) = Test(list ++ Iterator(thing))
}
(new Test()).add(new SomeThing).add(new SomeThing)
Although you don't really need the extra boilerplate of Test here:
Iterator(new SomeThing) ++ Iterator(new SomeThing)
Note that Iterator.++ takes a by-name param, so the ++ operation is already lazy.
You might also want to try this, to avoid building intermediate Iterators:
Iterator.continually(new SomeThing) take 2
UPDATE
If you don't know the size in advance, then I'll often use a tactic like this:
def mkSomething = if(cond) Some(new Something) else None
Iterator.continually(mkSomething) takeWhile (_.isDefined) map { _.get }
The trick is to have your generator function wrap its output in an Option, which then gives you a way to flag that the iteration is finished by returning None
Of course... If you're really pushing out the boat, you can even use the dreaded null:
def mkSomething = if(cond) { new Something } else null
Iterator.continually(mkSomething) takeWhile (_ != null)
Seems like you need to hide the fact that the iterator is mutable but at the same time allow it to grow mutably. What I'm going to propose is the same sort of trick I've used to speed up ::: in the past:
abstract class AppendableIterator[A] extends Iterator[A]{
protected var inner: Iterator[A]
def hasNext = inner.hasNext
def next() = inner next ()
def append(that: Iterator[A]) = synchronized{
inner = new JoinedIterator(inner, that)
}
}
//You might need to add some more things, this is a skeleton
class JoinedIterator[A](first: Iterator[A], second: Iterator[A]) extends Iterator[A]{
def hasNext = first.hasNext || second.hasNext
def next() = if(first.hasNext) first next () else if(second.hasNext) second next () else Iterator.next()
}
So what you're really doing is leaving the Iterator at whatever place in its iteration you might have it while still preserving the thread safety of the append by "joining" another Iterator in non-destructively. You avoid the need to recompute the two together because you never actually force them through a CanBuildFrom.
This is also a generalization of just adding one item. You can always wrap some A in an Iterator[A] of one element if you so choose.
Have you looked at the mutable.ParIterable in the collection.parallel package?
To access an iterator over elements you can do something like
val x = ParIterable.fill(100000)(someFn).iterator
From the docs:
Parallel operations are implemented with divide and conquer style algorithms that parallelize well. The basic idea is to split the collection into smaller parts until they are small enough to be operated on sequentially.
...
The higher-order functions passed to certain operations may contain side-effects. Since implementations of bulk operations may not be sequential, this means that side-effects may not be predictable and may produce data-races, deadlocks or invalidation of state if care is not taken. It is up to the programmer to either avoid using side-effects or to use some form of synchronization when accessing mutable data.
I'm looking for a FIFO stream in Scala, i.e., something that provides the functionality of
immutable.Stream (a stream that can be finite and memorizes the elements that have already been read)
mutable.Queue (which allows for added elements to the FIFO)
The stream should be closable and should block access to the next element until the element has been added or the stream has been closed.
Actually I'm a bit surprised that the collection library does not (seem to) include such a data structure, since it is IMO a quite classical one.
My questions:
1) Did I overlook something? Is there already a class providing this functionality?
2) OK, if it's not included in the collection library then it might by just a trivial combination of existing collection classes. However, I tried to find this trivial code but my implementation looks still quite complex for such a simple problem. Is there a simpler solution for such a FifoStream?
class FifoStream[T] extends Closeable {
val queue = new Queue[Option[T]]
lazy val stream = nextStreamElem
private def nextStreamElem: Stream[T] = next() match {
case Some(elem) => Stream.cons(elem, nextStreamElem)
case None => Stream.empty
}
/** Returns next element in the queue (may wait for it to be inserted). */
private def next() = {
queue.synchronized {
if (queue.isEmpty) queue.wait()
queue.dequeue()
}
}
/** Adds new elements to this stream. */
def enqueue(elems: T*) {
queue.synchronized {
queue.enqueue(elems.map{Some(_)}: _*)
queue.notify()
}
}
/** Closes this stream. */
def close() {
queue.synchronized {
queue.enqueue(None)
queue.notify()
}
}
}
Paradigmatic's solution (sightly modified)
Thanks for your suggestions. I slightly modified paradigmatic's solution so that toStream returns an immutable stream (allows for repeatable reads) so that it fits my needs. Just for completeness, here is the code:
import collection.JavaConversions._
import java.util.concurrent.{LinkedBlockingQueue, BlockingQueue}
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] = new LinkedBlockingQueue[Option[A]]() ) {
lazy val toStream: Stream[A] = queue2stream
private def queue2stream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, queue2stream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
In Scala, streams are "functional iterators". People expect them to be pure (no side effects) and immutable. In you case, everytime you iterate on the stream you modify the queue (so it's no pure). This can create a lot of misunderstandings, because iterating twice the same stream, will have two different results.
That being said, you should rather use Java BlockingQueues, rather than rolling your own implementation. They are considered well implemented in term of safety and performances. Here is the cleanest code I can think of (using your approach):
import java.util.concurrent.BlockingQueue
import scala.collection.JavaConversions._
class FIFOStream[A]( private val queue: BlockingQueue[Option[A]] ) {
def toStream: Stream[A] = queue take match {
case Some(a) => Stream cons ( a, toStream )
case None => Stream empty
}
def close() = queue add None
def enqueue( as: A* ) = queue addAll as.map( Some(_) )
}
object FIFOStream {
def apply[A]() = new LinkedBlockingQueue
}
I'm assuming you're looking for something like java.util.concurrent.BlockingQueue?
Akka has a BoundedBlockingQueue implementation of this interface. There are of course the implementations available in java.util.concurrent.
You might also consider using Akka's actors for whatever it is you are doing. Use Actors to be notified or pushed a new event or message instead of pulling.
1) It seems you're looking for a dataflow stream seen in languages like Oz, which supports the producer-consumer pattern. Such a collection is not available in the collections API, but you could always create one yourself.
2) The data flow stream relies on the concept of single-assignment variables (such that they don't have to be initialized upon declaration point and reading them prior to initialization causes blocking):
val x: Int
startThread {
println(x)
}
println("The other thread waits for the x to be assigned")
x = 1
It would be straightforward to implement such a stream if single-assignment (or dataflow) variables were supported in the language (see the link). Since they are not a part of Scala, you have to use the wait-synchronized-notify pattern just like you did.
Concurrent queues from Java can be used to achieve that as well, as the other user suggested.
For example suppose I want a list that contains 0 up to a max of 1000 elements. Above this, the oldest insertions should be dropped first. Do collections support this functionality natively? If not how would I go about the implementation? I understand that certain operations are very slow on Lists so maybe I need a different data type?
Looking at an element should not affect the list. I would like insert and size operations only.
It sounds like you want a size-bounded queue. Here's a similar question: Maximum Length for scala queue
There are three solutions presented in that question. You can,
Write a queue from scratch (paradigmatic gave code for this),
Extend Scala's Queue implementation by subclassing, or
Use the typeclass extension pattern (aka, "pimp my library") to extend Scala's Queue.
Here is my first pass implementation in case someone else find it useful
import scala.collection._
import mutable.ListBuffer
class FixedList[A](max: Int) extends Traversable[A] {
val list: ListBuffer[A] = ListBuffer()
def append(elem: A) {
if (list.size == max) {
list.trimStart(1)
}
list.append(elem)
}
def foreach[U](f: A => U) = list.foreach(f)
}
A circular array is the fastest implementation. It's basically an array with a read and write index which are wrapped when reaching the end of the array. Size is defined as:
def size = writeIndex - readIndex + (if (readIndex > writeIndex) array.size else 0)
While not an answer to the question's details (but does somewhat answer the question's title), List.fill(1000){0} would create a List of length 1000 with initial value of 0, which is from
Scala - creating a type parametrized array of specified length