Scala Queue and NoSuchElementException - scala

I am getting an infrequent NoSuchElementException error when operating over my Scala 2.9.2 Queue. I don't understand the exception becase the Queue has elements in it. I've tried switching over to a SynchronizedQueue, thinking it was a concurrency issue (my queue is written and read to from different threads) but that didn't solve it.
The reduced code looks like this:
val window = new scala.collection.mutable.Queue[Packet]
...
(thread 1)
window += packet
...
(thread 2)
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
Which results in
32
java.util.NoSuchElementException
at scala.collection.mutable.LinkedListLike$class.head(LinkedListLike.scala:76)
at scala.collection.mutable.LinkedList.head(LinkedList.scala:78)
at scala.collection.mutable.MutableList.head(MutableList.scala:53)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.mutable.MutableList.foreach(MutableList.scala:30)
The docs for LinkedListLike.head() say
Exceptions thrown
`NoSuchElementException`
if the linked list is empty.
but how can this exception be thrown if the Queue is not empty?

You should have window (mutable data structure) accessed from only a single thread. Other threads should send messages to that one.
There is Akka that allows relatively easy concurrent programming.
class MySource(windowHolderRef:ActorRef) {
def receive = {
case MyEvent(packet:Packet) =>
windowHolderRef ! packet
}
}
case object CheckMessages
class WindowHolder {
private val window = new scala.collection.mutable.Queue[Packet]
def receive = {
case packet:Packet =>
window += packet
case CheckMessages =>
window.dequeueAll(someFunction)
println(window.size)
window.foreach(println(_))
}
}
To check messages periodically you can schedule periodic messages.
// context.schedule(1 second, 1 second, CheckMessages)

Related

Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams. I would like to process this stream using Akka streams.
The Java API has a pub sub design and roughly works like this:
Subscription sub = createSubscription();
sub.addListener(new Listener() {
public void eventsReceived(List events) {
for (Event e : events)
buffer.enqueue(e)
}
});
I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success. Can anyone guide me on the best way to interface with this API using Akka? Is Akka Streams the best tool here?
To feed a Source, you don't necessarily need to use a custom graph stage. Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream.
There are a couple of tricky things to be aware of. The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription. Something like this:
def bufferSize: Int = ???
Source.fromMaterializer { (mat, att) =>
val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
val subscription = createSubscription()
subscription.addListener(
new Listener() {
def eventsReceived(events: java.util.List[Event]): Unit = {
import scala.collection.JavaConverters.iterableAsScalaIterable
import akka.stream.QueueOfferResult._
iterableAsScalaIterable(events).foreach { event =>
queue.offer(event) match {
case Enqueued => () // do nothing
case Dropped => ??? // handle a dropped pubsub element, might well do nothing
case QueueClosed => ??? // presumably cancel the subscription...
}
}
}
}
)
source.withAttributes(att)
}
Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue. Our subscription adds incoming elements to the queue.
The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up. The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.
Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst). If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer) to a stage which signals perpetual demand. For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed. Alternatively, it may be possible to combine your Events in some meaningful way, in which case a conflate stage will automatically combine incoming Events if the downstream can't process them quickly.
Great answer! I did build something similar. There are also kamon metrics to monitor queue size exc.
class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
private val logger = LoggerFactory.getLogger(getClass)
def bufferSize: Int = 1000
def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
Source.fromMaterializer { (mat, attr) =>
val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)
val receiver: MessageReceiver = {
(message: PubsubMessage, consumer: AckReplyConsumer) => {
metricsRegistry.inputEventQueueSize.update(queue.size())
queue.offer((message, consumer)) match {
case QueueOfferResult.Enqueued =>
metricsRegistry.inputQueueAddEventCounter.increment()
case QueueOfferResult.Dropped =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
case QueueOfferResult.Failure(ex) =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
case QueueOfferResult.QueueClosed =>
logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
consumer.nack()
mat.shutdown()
sys.exit(1)
}
}
}
val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
subscriber.startAsync().awaitRunning()
source.withAttributes(attr)
}
}
}

Throttling akka-actor that keeps only the newest messages

Situation
I am using akka actors to update data on my web-client. One of those actors is solely repsonsible for sending updates concerning single Agents. These agents are updated very rapidly (every 10ms). My goal now is to throttle this updating mechanism so that the newest version of every Agent is sent every 300ms.
My code
This is what I came up with so far:
/**
* Single agents are updated very rapidly. To limit the burden on the web-frontend, we throttle the messages here.
*/
class BroadcastSingleAgentActor extends Actor {
private implicit val ec: ExecutionContextExecutor = context.dispatcher
private var queue = Set[Agent]()
context.system.scheduler.schedule(0 seconds, 300 milliseconds) {
queue.foreach { a =>
broadcastAgent(self)(a) // sends the message to all connected clients
}
queue = Set()
}
override def receive: Receive = {
// this message is received every 10 ms for every agent present
case BroadcastAgent(agent) =>
// only keep the newest version of the agent
queue = queue.filter(_.id != agent.id) + agent
}
}
Question
This actor (BroadcastSingleAgentActor) works as expected, but I am not 100% sure if this is thread safe (updating the queue while potentionally clearing it). Also, this does not feel like I am making the best out of the tools akka provides me with. I found this article (Throttling Messages in Akka 2), but my problem is that I need to keep the newest Agent message while dropping any old version of it. Is there an example somewhere similar to what I need?
No, this isn't thread safe because the scheduling via the ActorSystem will happen on another thread than the receive. One potential idea is to do the scheduling within the receive method because incoming messages to the BroadcastSingleAgentActor will be handled sequentially.
override def receive: Receive = {
case Refresh =>
context.system.scheduler.schedule(0 seconds, 300 milliseconds) {
queue.foreach { a =>
broadcastAgent(self)(a) // sends the message to all connected clients
}
}
queue = Set()
// this message is received every 10 ms for every agent present
case BroadcastAgent(agent) =>
// only keep the newest version of the agent
queue = queue.filter(_.id != agent.id) + agent
}

How to save data from actors on system shutdown

For example I have following actors: Player and GameRoom.
GameRoom holds players with scores. When user left(terminates), we save player score in database:
class Player extends Actor {
...
}
object GameRoom {
case object Join
}
class GameRoom(database:ActorRef) extends Actor {
type Score = Int
var players: Map[ActorRef, Score] = Map.empty
def receive: Receive = {
case GameRoom.Join =>
context.watch(sender())
players = players + (sender() -> 100)
case Terminated(player) =>
players = players - player
database ! SavePlayerScore(...)
}
}
But what if I want to kill jvm process (SIGTERM)? In that case i have no way to save all users score to database on shutdown.
Any hints how to implement needed behaviour?
You can install a shutdown hook that will terminate your ActorSystem (using ActorSystem#terminate()) on application shutdown.
That will trigger an ordered termination of the Actors hierarchy, and in turn result in your GameRoom to receive the Terminated for the players.
Here is a small code snippet to install that shutdown hook:
Runtime.getRuntime.addShutdownHook(
new Thread("shutdown-hook") {
override def run() {
// This obviously needs to
try{
Await.ready(actorSystem.terminate(), Duration(2, TimeUnit.MINUTES))
}catch{
case _ : InterruptedException => // Termination was interrupted
case _ : Throwable => // Exception thrown by actor system termination
}}
})}
One important thing to notice here is: once the code in the Shutdown Hook is completed, the JVM will shutdown (killing all threads even if they are not done), so if you have any other cleanup to do, add it to the shutdown hook.
EDIT 1: The JVM will terminate even if the Await.ready threw an Exception. This means that, some of your state might have not been saved or something. You might want to handle those exception then and there because, again, once the run() method is complete, the JVM will die.

Can I safely create a Thread in an Akka Actor?

I have an Akka Actor that I want to send "control" messages to.
This Actor's core mission is to listen on a Kafka queue, which is a polling process inside a loop.
I've found that the following simply locks up the Actor and it won't receive the "stop" (or any other) message:
class Worker() extends Actor {
private var done = false
def receive = {
case "stop" =>
done = true
kafkaConsumer.close()
// other messages here
}
// Start digesting messages!
while (!done) {
kafkaConsumer.poll(100).iterator.map { cr: ConsumerRecord[Array[Byte], String] =>
// process the record
), null)
}
}
}
I could wrap the loop in a Thread started by the Actor, but is it ok/safe to start a Thread from inside an Actor? Is there a better way?
Basically you can but keep in mind that this actor will be blocking and a thumb of rule is to never block inside actors. If you still want to do this, make sure that this actor runs in a separate thread pool than the native one so you don't affect Actor System performances. One another way to do it would be to send messages to itself to poll new messages.
1) receive a order to poll a message from kafka
2) Hand over the
message to the relevant actor
3) Send a message to itself to order
to pull a new message
4) Hand it over...
Code wise :
case object PollMessage
class Worker() extends Actor {
private var done = false
def receive = {
case PollMessage ⇒ {
poll()
self ! PollMessage
}
case "stop" =>
done = true
kafkaConsumer.close()
// other messages here
}
// Start digesting messages!
def poll() = {
kafkaConsumer.poll(100).iterator.map { cr: ConsumerRecord[Array[Byte], String] =>
// process the record
), null)
}
}
}
I am not sure though that you will ever receive the stop message if you continuously block on the actor.
Adding #Louis F. answer; depending on the configuration of your actors they will either drop all messages that they receive if at the given moment they are busy or put them in a mailbox aka queue and the messages will be processed later (usually in FIFO manner). However, in this particular case you are flooding the actor with PollMessage and you have no guarantee that your message will not be dropped - which appears to happen in your case.

Scala program exiting before the execution and completion of all Scala Actor messages being sent. How to stop this?

I am sending my Scala Actor its messages from a for loop. The scala actor is receiving the
messages and getting to the job of processing them. The actors are processing cpu and disk intensive tasks such as unzipping and storing files. I deduced that the Actor part is working fine by putting in a delay Thread.sleep(200) in my message passing code in the for loop.
for ( val e <- entries ) {
MyActor ! new MyJob(e)
Thread.sleep(100)
}
Now, my problem is that the program exits with a code 0 as soon as the for loop finishes execution. Thus preventing my Actors to finish there jobs. How do I get over this? This may be really a n00b question. Any help is highly appreciated!
Edit 1:
This solved my problem for now:
while(MyActor.getState != Actor.State.Terminated)
Thread.sleep(3000)
Is this the best I can do?
Assume you have one actor you're want to finish its work. To avoid sleep you can create a SyncVar and wait for it to be initialized in the main thread:
val sv = new SyncVar[Boolean]
// start the actor
actor {
// do something
sv.set(true)
}
sv.take
The main thread will wait until some value is assigned to sv, and then be woken up.
If there are multiple actors, then you can either have multiple SyncVars, or do something like this:
class Ref(var count: Int)
val numactors = 50
val cond = new Ref(numactors)
// start your actors
for (i <- 0 until 50) actor {
// do something
cond.synchronized {
cond.count -= 1
cond.notify()
}
}
cond.synchronized {
while (cond.count != 0) cond.wait
}