I am trying to use celery to parallelise the evaluation of a function with different parameters.
Here is a pseudo-code of why I am trying to achieve, which assumes that there is a function called evaluate decorated with #app.task
# 0. Setup cluster, celery or whatever parallelisation backend
pass
# 1. Prepare each node to simulate, this means sending some files
for node in mycluster:
#send files to node
pass
# 2. Evaluation phase
gen = Generator() # A Generator object creates parameter vectors that need to be evaluated
while not gen.finished():
par_list = gen.generate()
asyncs = []
for p in par_list:
asyncs.append(evaluate.delay(p))
results = [-1 for _ in par_list]
for i, pending in enumerate(asyncs):
if not pending.ready():
pending.wait()
if pending.successful():
results[i] = pending.get()
else:
pass # manage error
# send results to generator so that it generates a new set of parameters later
gen.tell(results)
# 3. Teardown phase
for node in mycluster:
#tell node to delete files
pass
The problems with this approach is that if my main application is running, and it has already passed the setup phase, then when new node connects, it certainly will not pass the setup phase. Similarly, the teardown phase will not be executed if a node disconnects.
A couple of solutions come to mind:
Instead of using a setup phase, chain two functions so that each node does setup | evaluate | teardown for each iteration of the "2. evaluation phase" loop. The problem here is that sending files through the message queue is something that I would like to avoid as much as possible.
Configure the workers to have a setup and teardown task so that they are automatically ready when they connect. I tried using bootsteps.StartStopStep , but I am not sure if this is the right way to go.
Setup a distributed file system so that there is no need to prepare and delete files before and after the evaluations
The concrete question here is, what's the recommended approach for these kind of tasks? I am sure that this is not a convoluted use-case and maybe one of you can provide some guidance on how should I approach this.
I'm not sure this is a worker issue - remember you may have a host of workers on a node. This sounds more like a node initialization issue. Why not have a job (a systemd task, an init script, whatever) that runs before the celery workers and which copies the files over. Similarly, in reverse, for tear down.
Related
I am exploring different ways to end a UVM test. One method that has come often from studying different blogs from Verification Academy and other sites is to use the Phase Ready to End. I have some questions regarding the implementation of this method.
I am using this method in scoreboard class, where my understanding is after my usual run phase is finished, it will call the phase ready to end method and implement it. The reason I am using it my scoreboard's run_phase finishes early, and there are some data into queues that need to be processed. So I am trying to prolong this scoreboard run_phase using this method. Here are is some pseudo-code that I have used.
function void phase_ready_to_end(uvm_phase phase);
if (phase.get_name() != "run") return;
if (queue.size() != 0) begin
phase.raise_objection(.obj(this));
fork
begin
delay_phase(phase);
end
join_none
end
endfunction
task delay_phase(uvm_phase phase);
wait(queue.size() == 0);
phase.drop_objection(.obj(this));
endtask
I have taken inspiration for this implementation from this link UVM-End of Test Mechanism for your reference. Here are some of the ungated thoughts in my mind on which I need guidance and help.
to the best of my understanding the phase_ready_to_end is called at the end of run_phase and when it runs it raises the objection for that scoreboard run_phase and runs delay_phase task.
That Delay Phase task is just waiting for the queue to end, but I am not seeing any method or task which will pop the items from the queue. Does I have to call some method to pop from the queue or as according to the 1st point above the raised objection will start the run phase so there is no need for that and we have to wait for a considerable amount of time?
Let me give you some pre-context to this question. I have a scoreboard where there are two queues whose write methods are implemented and they are being fed correctly by their source.
task run_phase (uvm_phase phase);
forever begin
compare_queues(); // this method takes data from two queues and compares them, both queues implementation are fine and they take data from their respective sources. Let me give you a scenario, let's suppose there are a total of 10 transactions being generated but the scoreboard was able to process only 6 of them and there are 4 transactions left when all objections are dropped. So to tackle that I implement this phase_to_ready_end method in my scoreboard.
end
endtask
The problem with this method that I am having is that, when I raise the objection in this phase_ready_to_end and call delay_phase method, nothing happens. And I am curious is there more to this implementation or not?
Sorry for the delay. I have shared more context to the existing question. Please see to that, let me know if it is confusing.
We have a pair of monitors that calls write method implemented inside the scoreboard. The monitors typically capture the transaction from BUS and call these WR methods to push the transactions. Thus two source and destination monitors WR into two - source and destination - queues as and when they find the transactions.
We have a checker task with RD-n-check running in forever loop in the run-phase of scoreboard. It's in a while loop and watches if the destination queue has non-zero entry. Once it finds so, it pops the head entry from destination queue and then pops the head entry from source queue as well and compares the two entries to declare if the check was a PASS or FAIL.
There are more than 2 queues and more than a pair of source/destination of course, but broadly this is the architecture around here.
Now in the current scenario, it seems that the checker tasks stop prints after certain point of time in some of the test cases. Upon adding debug prints thoroughly, it seems that checker tasks that does the job #2/#3 above and gets called inside the forever loop of the run-phase, exits gracefully one last time. However they are entered again - which is to say that the forever loop that should be calling them didn't call. As if the forever loop of run-phase stopped completely.
We also added another forever loop in run-phase that observes whether the queues are empty. From prints inside that parallel loop and from the monitor prints, we know that the queues aren't empty and monitors did push WRs into the queues for a long time.
It seems that the forever loop stopped working suddenly ( going by prints spewed out) all of a sudden but another set of threads that we added in runphase in another forever loop just to monitor those queues - keep printing that the queues have contents. So run-phase shouldn't be over but the checker tasks running in forever has stopped.
We are using Vivado 2020.2 for the simulation. This is a baffling/weird problem for us and we did go through prints multiple times to make sure nothing has been missed out. It seems we are missing very very basic or has hit a bug/broken some basics of UVM coding to land into here.
If you have any help, thoughts here, will appreciate that greatly.
The function phase_ready_to_end() gets called at the end of every task-based phase when all objections have been dropped (or never raised at all).
Typically a scoreboard has a queue or some kind of array of transactions waiting to be checked sent from a monitor via an analysis_port write() method. If your scoreboard is an in-order comparison checker, the queue size is zero when there are no more transactions waiting to be received.
If you look at the code in the link you shared, there is the following in the write_south method doing exactly that:
if (!item.compare(item_stream.pop_front()))
It's a question more about the architecture of a program that runs karma in a CI pipeline.
I have a set of web components. They are using karma to run tests (following open-wc.org recommendations). Then I have my custom CI pipeline that allows to schedule a test of selected group of components.
When the test is scheduled it execute tests for each component one by one. However in my logs I am getting messages like
MaxListenersExceededWarning: Possible EventEmitter memory leak
detected. 12 exit listeners added to [process]. Use
emitter.setMaxListeners() to increase limit
or sometimes
listen EADDRINUSE: address already in use 127.0.0.1:9877
which breaks the test (exists the process).
I can't really pinpoint the problem so I am guessing that I am not running the test in a correct way.
On the server I am using Server class to initialize the server, then I am calling start on the server. When the callback function passed to Server constructor is called I am assuming the server is stopped and I can start over with another component. But clearly it is not the case per errors I am getting.
So the question is what would be the right way of running Karma test in a loop, one by one, using node API instead of CLI.
Update
To be specific of how I am running the tests.
I am:
Creating configuration by calling config.parseConfig where the argument is component's karma config file
Calling new Server(opts, (code) => {}) where opts are the one generated in step 1
Adding listeners for browser_complete and browser_error to generate a report and to store it into the data store
Cleaning up (removing reference for the server) when constructor callback is called
Getting next component from the queue and going back to #1
To answer my on question,
I have moved the whole logic of executing a single test to a child process and after the test finishes, but before the next test is run, I am making sure the child process is killed. No more error messages are showing up.
I am working on a Perl script which does some periodic processing based on file-system contents.
The overall structure is like this:
# ... initialization...
while(1) {
# ... scan filesystem, perform actions depending on changes detected ...
sleep 5;
}
I would like to add the ability to input some data into this process by means of exposing an interface through HTTP. E.g. I would like to add an endpoint to skip the sleep, but also some means to input data that is processed in the next iteration. Additionally, I would like to be able to query some of the program's status through HTTP (i.e. a simple fork() to run the webserver-part in a separate process is insufficient?)
So far I have already used the Dancer2 framework once but it has a start; call that blocks and thus does not allow any other tasks (like my loop) to run. Additionally, I could of course move the code which is currently inside the loop to an endpoint exposed through Dancer2 but then I would need to call that periodically (though an external program?) which seems to be quite an obscure indirection compared to just having the webserver-part running in background.
Is it possible to unobtrusively (i.e. without blocking the program) add a REST-server capability to a Perl script? If yes: Which modules would be used for the purpose? If no: Should I really implement an external process to periodically invoke a certain endpoint or pursue a different solution altogether?
(I have tried to add a dancer2 tag, but could not do so due to insufficient reputation. Do not be mislead by this: I have so far only tried with Dancer2 not the Dancer (v.1))
You could try to launch your processing loop in a background thread, before you run start;.
See man perlthrtut
You probably want use threads::shared; to declare some variables shared between the REST part and the background thread. Or use dedicated queues/event mechanisms.
Is there any way to pause/resume a running workflow created using chains from celery 3.0?
Basically, we have two different types of tasks in our system: interactive and non-interactive ones. The non-interactive ones we have all the parameters for, but the interactive ones need user input. Note that for the interactive tasks, we can only ask for user input once all the previous taks in the chain have been completed, as their results will affect the interactive tasks (i.e. we cannot ask for user input before creating the actual chain).
Any suggestion on how to approach this? Really at a loss here..
Current ideas:
Create two subclasses of Task (from celery import Task). Add an extra instance (class member) variable to the Interactive task subclass that is set to false by default and represents that some user input is still needed. Somehow have access to the instance of the Task, and set it to true from outside the celery worker (Though I have looked this up quite a bit and it doesn't seem possible to have access to Task objects directly from another module)
Partition the chain into multiple chains delimited by Interactive jobs. Have some sort of mechanism outside the celery worker detect once a chain has reached it's end and trigger the interactive task's interactive client side component. Once the user has entered all this data, get the data, and start the new chain where the interactive task is at the head of the new chain.
We have implemented something like your second idea in our project & it works fine. Here is the gist of the implementation.
Add new field status to your model & override save method.
models.py:
class My_Model(models.Model):
# some fields
status = models.IntegerField(default=0)
def save(self, *args, **kwargs):
super(My_Model, self).save(*args, **kwargs)
from .functions import custom_func
custom_func(self.status)
tasks.py
#celery.task()
def non_interactive_task():
#do something.
#celery.task()
def interactive_task():
#do something.
functions.py
def custom_func(status):
#Change status after non interactive task is completed.
#Based on status, start interactive task.
Pass status variable to template which is useful for displaying UI element for user to enter information. When user enter required info, change the status. This calls custom_func which triggers your interactive_task.
I'm running in circles. I have webpage that creates a huge file. This file takes forever to be created and is in a subroutine.
What is the best way for my page to run this subroutine but not wait for it to be created/processed? Are there any issues with apache processes since I'm doing this from a webpage?
The simplest way to perform this task is to simply use fork() and have the long-running subroutine run in the child process. Meanwhile, have the parent return to Apache. You indicate that you've tried this already, but absent more information on exactly what the code looks like and what is failing it's hard to help you move forward on this path.
Another option is to have run a separate process that is responsible for managing the long-running task. Have the webpage send a unit of work to the long-running process using a local socket (or by creating a file with the necessary input data), and then your web script can return immediately while the separate process takes care of completing the long running task.
This method of decoupling the execution is fairly common and is often called a "task queue" (if there is some mechanism in place for queuing requests as they come in). There are a number of tools out there that will help you design this sort of solution (but for simple cases with filesystem-based communication you may be fine without them).
I think you want to create a worker grandchild of Apache -- that is:
Apache -> child -> grandchild
where the child dies right after forking the grandchild, and the grandchild closes STDIN, STDOUT, and STDERR. (The grandchild then creates the file.) These are the basic steps in creating a zombie daemon (a parent-less worker process unconnected with the webserver).