Celery: How to allow queueing "duplicate" tasks, but disallow working on "duplicate tasks"? - celery

We're already using celery-singleton (similar to celery-once) to disallow queueing duplicate tasks, but now we have a case where we would like to allow queuing duplicate tasks but disallow working on duplicate tasks at the same time.
Example case:
We have the job message {area_id: 42, item_id: 1} which is currently being worked on. For domain-specific reasons we would like to only have one task being worked on at the same time for any given area_id. The next message in the task queue is {area_id: 42, item_id: 2}. We would like this second task to wait until the first one is completed successfully. A task for a different area_id, e.g. {area_id: 23, item_id: 1} should be allowed to be executed at the same time by a different worker.
Are there any plugins to Celery similar to celery-singleton / celery-once that will easily allow us to accomplish this? I ask before trying to implement this myself in case a library already exists (which escaped my googling).

Related

How do I allocate fix number of users per single user class in locust

Suppose I have 3 separate user classes. I want to allocate fix number of users for each class. My code is as below.
class User_1(TaskSet):
# I need 3 users to execute the tasks within this user class
class User_2(TaskSet):
# I need only 1 user to execute the tasks within this user class
class User_3(TaskSet):
# I need only 1 user to execute the tasks within this user class
class API_User_Test(HttpUser):
#I already tried weighting the classes as below.
tasks = {Site_User_1: 3, User_2: 1, User_3: 1}
I've already tried weighting the classes as shown in the code above. But it doesn't work. Some times it will allocate more than 1 users for class User_2 or class User_3. Can someone tell me how to fix this issue.
A weight in Locust is just a statistical weight and is not guarantee. The weights determine how many times a task/user are put into a list to be selected from. When a new task/user is spawned, Locust randomly selects a task from the list. Given your weights:
tasks = {Site_User_1: 3, User_2: 1, User_3: 1}
Statistically speaking, spawning 5 users with weights 3/1/1 would get you 3/1/1 but it may not be that precise every time. While less likely, it's possible you could get 4/0/1 or 3/2/0 or 5/0/0.
From the Locust docs:
If the tasks attribute is specified as a list, each time a task is to be performed, it will be randomly chosen from the tasks attribute. If however, tasks is a dict - with callables as keys and ints as values - the task that is to be executed will be chosen at random but with the int as ratio. So with a task that looks like this:
{my_task: 3, another_task: 1}
my_task would be 3 times more likely to be executed than another_task.
Internally the above dict will actually be expanded into a list (and the tasks attribute is updated) that looks like this:
[my_task, my_task, my_task, another_task]
and then Python’s random.choice() is used pick tasks from the list.
If you absolutely have to have full control over exactly what users are running, I'd probably recommend having a single Locust user with a single task that contains your own logic on what to run. Create your own list of functions to call and iterate through it each time a new user is created. Might have to be external to the user as a global or something. But the idea is you manage the logic yourself and not Locust.
Edit:
Using the single user method to control what's running won't work well if you run on multiple workers as the workers don't communicate with each other. You may consider doing some more advanced things like sending messages between master and workers to coordinate, or use an external source like a database or other service the workers talk to to know what they should run.

Typo3 Extension: Tasks - scheduler or Task Center?

I'm still new to Typo3 but I need to create an automatic daily task. When searching for tutorials two different things have come up:
Task Center: https://docs.typo3.org/typo3cms/extensions/taskcenter/DevelopersGuide/CreatingANewTask/Index.html
Scheduler: https://docs.typo3.org/typo3cms/extensions/scheduler/DevelopersGuide/CreatingTasks/Index.html
...which should I be focusing on? I'm assuming task center creates a list of tasks but I would need something like the Scheduler extension to actually run them, whereas Scheduler lets me create and schedule tasks? Or have I got it wrong :S
The task will involve truncating a table, converting a csv file to mysql data and processing the SQL.
Task Center is used for backend editor's task, e.g. "create a new user in the backend for this specific domain".
Scheduler is used for "low level" tasks, especially things not bound to a user or the backend, e.g. running a batch job, cleaning a cache database etc.
"Automatic and Daily" probably points to scheduler.
(Actually the very first sentence in the introduction of the docs you referenced, states this)

How do i Re-run pipeline with only failed activities/Dataset in Azure Data Factory V2?

I am running a pipeline where i am looping through all the tables in INFORMATION.SCHEMA.TABLES and copying it onto Azure Data lake store.My question is how do i run this pipeline for the failed tables only if any of the table fails to copy?
Best approach I’ve found is to code your process to:
0. Yes, root cause the failure and identify if it is something wrong with the pipeline or if it is a “feature” of your dependency you have to code around.
1. Be idempotent. If your process ensures a clean state as the very first step, similar to Command Design pattern’s undo (but more naive), then your process can re-execute.
* with #1, you can safely use “retry” in your pipeline activities, along with sufficient time between retries.
* this is an ADFv1 or v2 compatible approach
2. If ADFv2, then you have more options and can have more complex logic to handle errors:
* for the activity that is failing, wrap this in an until-success loop, and be sure to include a bound on execution.
* you can add more activities in the loop to handle failure and log, notify, or resolve known failure conditions due to externalities out of your control.
3. You can also use asynchronous communication to future process executions that save success to a central store. Then later executions “if” I already was successful then stop processing before the activity.
* this is powerful for more generalized pipelines, since you can choose where to begin
4. Last resort I know (and I would love to learn new ways to handle) is manual re-execution of failed activities.
Hope this helps,
J

Dynamics CRM workflow failing with infinite loop detection - but why?

I want to run a plug-in every 30 minutes, to poll an external system for changes. I am in CRM Online, so I don't have ready access to a scheduling engine.
To run the plug-in, I have a 'trigger' entity with a timezone independent date-
Updating the field also triggers a workflow, which in pseudocode has this logic:
If (Trigger_WaitUntil >= [Process-Execution Time])
{
Timeout until Trigger:WaitUntil
{
Set Trigger_WaitUntil to [Process-Execution Time] + 30 minutes
Stop Workflow with status of: Succeeded
}
}
If Trigger_WaitUntil < [Process-Execution Time])
{
Send email //Tell an admin that the recurring task has self-terminated
Stop Workflow with status of: Canceled
}
So, the behaviour I expect is that every 30 minutes, the 'WaitUntil' field gets updated (and the Plug-in and workflow get triggered again); unless the WaitUntil date is before the Execution time, in which case stop the workflow.
However, 4 hours or so later (probably 8 executions, although I haven't verified that yet) I get an infinite loop warning "This workflow job was canceled because the workflow that started it included an infinite loop. Correct the workflow logic and try again. For information about workflow".
My question is why? Do workflows have a correlation id like plug-ins, which is being carried through to the child workflow? If so, is there anyway I can prevent this, whilst maintaining the current basic mechanism of using a single trigger record to manage the schedule (I've seen other solutions in which workflows create new records, but then you've got to go round tidying up the old trigger records as well)
Yes, this behavior is well-known. The only way to implement recurring workflows without issues with infinite loops in Dynamics CRM and using only OOB features is usage of Bulk Deletion functionality. This article describes how to implement it - http://www.crmsoftwareblog.com/2012/08/using-the-bulk-deletion-process-to-schedule-recurring-workflows/
UPD: If you want to run your code every 30 mins then you will have to create 48 bulkdelete jobs with correspond startdatetime like 12:00, 12: 30, 1:00 ...
The current supported method for CRM is to use the Azure Scheduler.
Excerpt:
create a Web API application to communicate with CRM and our external
provider running on a shared (free) Azure web site and also utilize
the Azure Scheduler to manage the recurrence pattern.
The free version of the Azure Scheduler limits us to execution no more
than once an hour and a maximum of 5 jobs. If you have a lot going on
$20 a month will get you executions every minute and up to 50 jobs -
which sounds like a pretty good deal.
so if you wanted every 30 minutes, you could create two jobs, one on the half hour, and one on the hour.
The Bulk Deletion is an interesting work around and something we've used before. It creates extra work and maintenance though so I try to avoid it if possible.
I would generally recommend building a windows application and using the windows scheduling feature (I know you said you don't have a scheduler available but this is often forgotten). This approach works really well and is very easy to troubleshoot. Writing to logs and sending error email alerts is pretty easy to make it robust. The server doesn't need to be accessible externally, it only needs to reach CRM. If you had CRM on-prem, you could just use the same server.
Azure Scheduler is a great suggestion. This keeps you in the cloud which is nice.
SSIS is another option if you already have KingswaySoft or Cozy Roc in place.
You could build a workflow that creates another record and cleans up after itself; however, this is really using the wrong tool for the job. Also, it's very easy for it to fail and then not initiate the next record.
There is a solution called "Scheduled Workflow Runner". You create a FetchXML query to create a record set to run against, and point it at an on-demand workflow that you want it to run on each record.
http://alexanderdevelopment.net/post/2013/05/18/scheduling-recurring-dynamics-crm-workflows-with-fetchxml/

How Can use real-time workflow in CRM 2015?

I have a real-time workflow for creating unique numbers. This workflow get a numeric field from my custom entity, increase it by 1, and update it for next use.
I want to run this workflow on multiple records.
Running on-demand mode, it works fine,and I have true and unique numbers, but for "Record is Created" mode, it dose not work fine and get repeated numbers.
What I have to do?
This approach wont work, when the workflow runs on demand its running multi-threaded, e.g. two users create two records, two instances of the workflow start. As there is no locking mechanism you end up with duplicated numbers.
I'm guessing this isn't happening when running on demand because you are running as a single user.
You will need to implement a custom auto number approach, such as Auto Number for DynamicsCRM.
Disclaimer: I work for Gap Consulting who produce the tool linked above.