Problem:
query1, query2, query3 are added to queue. Consumer starts consuming the added query's right away. Everything works fine.
After queue is empty, same query's are added to queue again. For some reason consumer now hangs for some time (5-10 seconds) before it starts consuming the given query's.
Pseudo:
from Queue import queue
from Threading import thread
fifo = queue()
class Producer:
def produce_query(self, query):
global fifo
fifo.put(query)
if consumer.not_started():
consumer.start()
class Consumer(Thread):
def run(self):
global fifo
while True:
query = fifo.get(block=True)
self.execute(query)
I'm not sure if I have misunderstood something with the queue work principle. Do I need to notify the queue/consumer to make the consumer working right away somehow or what?
I'd appreciate the help!
Related
I have a python asyncio application, with several coroutines being run within a single thread. Some data is being passed using queues.
The queue consumer looks like this:
async def queue_consumer(q):
"""Consume from an asyncio.Queue, making it an async iterable"""
while True:
try:
e = await q.get()
yield e
except:
continue
the consumer is being pulled from with async for.
In this particular case the coroutine which consumes from a specific queue sequentially calls some code which puts data into its queue with put_nowait.
EDIT: In this particular case coroutine A, which is listening for inbound network traffic, puts message into queue of coroutine B.
I have noticed that there is a consistent ~50ms delay between a call to put_nowait in coroutine A and then the data being processed as a result of pulling it from queue async iterable in coroutine B.
I suspect it might have something to do with some asyncio internal polling resolution, but I am not sure, not I would suspect where such configuration could be modified.
I would be very much interested in increasing event polling frequency in the asyncio loop, hence, decreasing observed delay between put_nowait and get to/from a queue between coroutines. Maybe also there's a way to hint asyncio framework to process items from the queue earlier?
NB: the application I am working with is not doing any computationally demanding work.
It turns out the problem was caused by my app doing some UI updates with prompt_toolkit. I tracked this down by placing some measurements within _run_once. Anyway, the queue was not being processed because the event loop was busy executing some UI code that I did not expect to take so much time.
I'm just starting to learn Akka Actors in Scala. My understanding is that messages received by an Actor are queued in an Actor's mailbox, and processed one at a time. By processing messages one at a time, concurrency issues (race conditions, deadlocks) are mitigated.
But what happens if the Actor creates a future to do the work associated with a message? Since the future is async, the Actor could begin processing the next several messages while the future associated with the prior message is still running. Wouldn't this potentially create race conditions? How can one safely use futures within an Actor's receive() method to do long running tasks?
The safest way to use futures within an actor is to only ever use pipeTo on the future and send its result as a message to an actor (possibly the same actor).
import akka.pattern.pipe
object MyActor {
def doItAsynchronously(implicit ec: ExecutionContext): Future[DoItResult] = {
/* ... */
}
}
class MyActor extends Actor {
import MyActor._
import context.dispatcher
def receive = {
case DoIt =>
doItAsynchronously.pipeTo(self)
case DoItResult =>
// Got a result from doing it
}
}
This ensures that you won't mutate any state within the actor.
Remember two things
The notion behind the term Future(a special actor) is that we can create an actor for any result while it(the result) is still being computed, started or finished but we can't have an address for that special actor or future.
Suppose I want to buy something (my result is buying something, and the process it to initiate steps to start buying procedure) we can create an actor for the result (buy) but if there is any problem and we can't buy the thing we will have an exception instead of the result.
Now how the above two things fit is explained below-
Say we need to compute the factorial of 1 billion we may think that it will take a lot of time to compute but we get the Future immediately it will not take time to produce the Future (" Since the future is async, the Actor could begin processing the next several messages while the future associated with the prior message is still running."). Next, we can pass this future, store it and assign it.
Hope so you understand what I'm trying to say.
Src : https://www.brianstorti.com/the-actor-model/
If you need to mutate state in futures without blocking incoming messages you might want to reconsider redesigning your actor model. I would introduce separate actors for each task you would use those futures on. After all, an actor's main task is to maintain its state without letting it escape thus providing safe concurrency. Define an actor for those long running task whose responsibility is only to take care of that.
Instead of taking care of the state manually you might want to consider using akka's FSM so you get a much cleaner picture of what changes when. I presonally prefer this approach to ugly variables when I'm dealing with more complex systems.
I have a worker pool with load balancing, which is defined as follows:
class Worker(workerNr: Int) extends Actor with Stash
...
val workers = (1 to poolSize).map(c => context.actorOf(Props(() => new Worker(c)).withDispatcher("stash-dispatcher"), "worker" + c))
val pool = context.actorOf(Props[Worker].withRouter(SmallestMailboxRouter(routees = workers)))
...
pool ! Request("do something")
Now this worker actor isn't stateless and uses become after he forwards a request to another actor (which does the actual work) and stashes all following requests away, until he gets a response for the current request (which can take a while). Then he sends the response to the requesting actor, unstashes all stashed messages and handles the next request after switching back with unbecome.
case request#Request(_) => {
val requestor = sender
requestHandler ! request
become {
case response#Response(_) => {
requestor ! response
unstashAll
unbecome
}
case msg => stash
}
}
My problem is the SmallestMailboxRouter, I am using. It routes the messages to the worker with the smallest mailbox. But since the workers aren't blocking, and stash messages away, which they can't handle at the moment, their mailboxes are always pretty empty (in contrast to their stash).
I would like to have a router, which routes the messages to the worker with the smallest stash. I thought of implementing a Router myself which does that, but looking at the implementation of Stash, it seems that I can't even access the stash size, because the stash itself is private to the stash trait.
private var theStash = Vector.empty[Envelope]
Is there a way to do this, or is this the wrong approach to implement a worker pool with load balancing?
Answering this question you asked in the end: "Is there a way to do this, or is this the wrong approach to implement a worker pool with load balancing?".
Here is the way I implemented worker pool with load balancing:
There is a single WorkerManager actor that receives job requests. It puts them in it's own queue right away. This can be any kind of queue that holds job requests Job, for example Queue[Job]. WorkerManager also has a list of workers with jobs assigned to them, something like List[(ActorRef, Option[Job]].
Whenever WorkerManager receives a Job request and right after it puts it in the queue it can check if there is any idle actor in the list of assigned jobs, i.e (ActorRef, None). If so, then it sets the assigned job in that list for that worker actor and sends a Job message to that actor. If there are no idle workers WorkerManager simply does nothing, and waits for one of the workers to reply with job completion message.
On the other hand whenever Worker finishes processing the Job it replies back to WorkerManager with that Job ID and WorkerManager removes that job from the list of assigned/running jobs. If Worker fails it can be restarted with the same job if desired.
You can chose who replies back to the Client - it can be a Worker or a WorkerManager. For these purposes you might want to send client ActorRef together with Job message to the Worker.
There is no problem in concurrent queue modification or any race conditions related to maintaining the queue because Actors process messages one by one, thus WorkerManager will always process the queue sequentially.
Additionally Worker can be a state machine with state transition timeouts to avoid waiting for it forever.
Workers can be either created by WorkerManager or they can be created separately and register with WorkerManager by sending a registration message. There could be multiple WorkerManager actors getting their tasks using one of the routing algorithms (round-robin, etc).
EDIT
Apparently there is a pattern for it :) - it's called work pulling pattern or something.
I'm currently sending a message to an Akka actor every second for it to perform a task.
If that task (function) is still running when a new message is received by the actor, I would like the actor to do nothing. Basically, I want the actor function to be run only if it's not already running.
What's a good way to do this? Should I use Akka actors or do it another way?
Cheers
Actors process a message at a time. The situation you describe cannot happen.
Akka actors process their messages asynchronously one-by-one, so you can only drop/ignore "expired" messages to avoid extra processing and OutOfMemoryException-s due to actor's Mailbox overflow.
You can ignore expired (more than 1 second old in your case) messages inside your actor:
case class DoWork(createdTime: Long = System.currentTimeMillis)
final val messageTimeout = 1000L // one second
def receive = {
case DoWork(createdTime) =>
if((System.currentTimeMillis - createdTime) < messageTimeout) { doWork() }
}
Or you can create a custom Mailbox which can drop expired messages internally.
Of course, as Robin Green already mentioned, actors in general are not supposed to run long-running operations internally so this approach can be suitable only if your actor doesn't need to process other kind of messages (they will not be processed in time). And in case of high CPU demands you may consider to move your actor on a separate dispatcher.
In Java, to write a library that makes requests to a server, I usually implement some sort of dispatcher (not unlike the one found here in the Twitter4J library: http://github.com/yusuke/twitter4j/blob/master/twitter4j-core/src/main/java/twitter4j/internal/async/DispatcherImpl.java) to limit the number of connections, to perform asynchronous tasks, etc.
The idea is that N number of threads are created. A "Task" is queued and all threads are notified, and one of the threads, when it's ready, will pop an item from the queue, do the work, and then return to a waiting state. If all the threads are busy working on a Task, then the Task is just queued, and the next available thread will take it.
This keeps the max number of connections to N, and allows at most N Tasks to be operating at the same time.
I'm wondering what kind of system I can create with Actors that will accomplish the same thing? Is there a way to have N number of Actors, and when a new message is ready, pass it off to an Actor to handle it - and if all Actors are busy, just queue the message?
Akka Framework is designed to solve this kind of problems, and is exactly what you're looking for.
Look thru this docu - there're lots of highly configurable dispathers (event-based, thread-based, load-balanced, work-stealing, etc.) that manage actors mailboxes, and allow them to work in conjunction. You may also find interesting this blog post.
E.g. this code instantiates new Work Stealing Dispatcher based on the fixed thread pool, that fulfils load balancing among the actors it supervises:
val workStealingDispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher("pooled-dispatcher")
workStealingDispatcher
.withNewThreadPoolWithLinkedBlockingQueueWithUnboundedCapacity
.setCorePoolSize(16)
.buildThreadPool
Actor that uses the dispatcher:
class MyActor extends Actor {
messageDispatcher = workStealingDispatcher
def receive = {
case _ =>
}
}
Now, if you start 2+ instances of the actor, dispatcher will balance the load between the mailboxes (queues) of the actors (actor that has too much messages in the mailbox will "donate" some to the actors that has nothing to do).
Well, you have to see about the actors scheduler, as actors are not usually 1-to-1 with threads. The idea behind actors is that you may have many of them, but the actual number of threads will be limited to something reasonable. They are not supposed to be long running either, but rather quickly answering to messages they receive. In short, the architecture of that code seems to be wholly at odds with how one would design an actor system.
Still, each working actor may send a message to a Queue actor asking for the next task, and then loop back to react. This Queue actor would receive either queueing messages, or dequeuing messages. It could be designed like this:
val q: Queue[AnyRef] = new Queue[AnyRef]
loop {
react {
case Enqueue(d) => q enqueue d
case Dequeue(a) if q.nonEmpty => a ! (q dequeue)
}
}