MS CRM recursive workflow and performance - workflow

I’m about to write a workflow in CRM that calls itself every day. This is a recursive workflow.
It will run on half a million entities each day and deactive the record if it was not been upodated in the past 3 days.
I’m worried about performance has anyone else done this.

I haven't personally implemented anything like this, but that's 500,000 records that are floating around in the DB that the async service has to keep track of, which is going to tax your hardware. In addition, CRM keeps track of recursive workflow instances. I don't have the exact specs in front of me, but if a workflow calls itself a set number of times within a certain timeframe, CRM will kill the workflow.
Could you just write a console app that asks the Crm Service for records that haven't been updated in three days, and then deactivate them? Run it as a scheduled task once a day, and then your CRM system doesn't have the burden of keeping track of all those running workflow instances.
EDIT: Ah, I see now you might have been thinking of one workflow that runs on all the records as opposed to workflows running on each record. benjynito's advice makes sense if you go this route, although I still think a scheduled task would be more appropriate than using workflow.

You'll want to make sure your workflow is running in non-peak hours. Assuming you have an on-premise installation you should be able to get away with that. If you're using a hosted instance, you might be worried about one organization running the workflow while another organization is using the system. Use the timeout and maybe a custom workflow activity, if necessary, to force the start time to a certain period.
I'm assuming you'll be as efficient as possible in figuring out which records to deactivate. (i.e. Query Expression would only bring back the records you'll be deactivating).
The built-in infinite loop-protection offered by CRM shouldn't kill your workflow instances. It stops after a call depth of 8, but it resets to 1 if no calls are made for an hour. So the fact that you're doing this once a day should make you OK on the recursive workflow front.


Workflow platform for managing the processing of incoming files

In general, I have a single workflow that I want to be able to monitor. The workflow should start whenever new files arrive or alternatively at certain scheduled times, i.e. I want to be able to insert new "jobs" to the workflow as they come, and process the files by going through multiple different tasks and steps. I want to be able to monitor each file going through the tasks.
The queues and distributing the load for each task might be managed by Celery, but it's not decided yet either.
I've looked at Apache Airflow, and as far as I understand at the moment, is geared more towards monitoring many different workflows, such that each workflow is mostly running from start to end, not adding new files to the beginning of the flow before the previous run ended.
Cadence workflow seems like can do what I need, but also seems to be a bit of an overkill.
I'm not expecting a specific final solution here, but I would appreciate suggestions to more such solutions that I can look into and can fit the above.
Luigi -
Extremely light-weight and fast compared to Airflow.

Is there a way to automate the monitoring and termination of AWS ECS tasks that are silently progressing?

I've been using AWS Fargate for quite a while and have been a big fan of the service.
This week, I created a monitoring dashboard that details the latest runtimes of my containers, and the timestamp watermark of each of my tables (the MAX date updated value). I have SNS topics set up to email me whenever a container exits with code 1.
However, I encountered a tricky issue today that slipped past these safeguards because of what I suspect was a deadlock situation related to a Postgres RDS instance.
I have several tasks running at different points in the day on a scheduler (usually every X or Y hours). Most of these tasks will perform some business logic calculations and insert / update an RDS instance.
One of my tasks (when checking the Cloudwatch logs later) was stuck making an update to a table, and basically just hung there waiting. My guess is that a user (perhaps me) - was manually making a small update statement to the same table, triggering some sort of lock that.
Because I have my tasks set on a recurring basis, the same task had another container provisioned a few hours later, attempted to update the same table, and also hung.
I only noticed this issue because my monitoring dashboard showed that the date updated watermark was still a few days in the past, even though I hadn't gotten any alerts or notifications for errors during my container run time. By this time, I had 3 containers all running, each stuck on the same update to the same table.
After I logged into the ECS console, I saw that my cluster had 3 task instances running - all the same task, all stuck making the same insert.
So my questions are:
is there a way to specify a runtime maximum for these tasks (ie. if the task doesn't finish within 2 hours, terminate with an exit code of 1)?
I'm trying to figure out the best way to prevent this type of "silent failure" in the future? I've added in application logic to execute a query checking for blocked process IDs with queries within my RDS instance, and if it notices any blocked PIDS, it skips the update. But are there any more graceful ways of detecting and handling this issue?

How to handle large amounts of scheduled tasks on a web server?

I'm developing a website (using a LAMP stack) which must handle many user-made scheduling tasks. It works as following: an user creates an event and sets a date, and others users (as many as 63) may join. A few hours before the set date, the system must email each user subscribed to that event. And that's it.
However, I have never handled scheduling, and the only tools I know (poorly) are cron and at. My plan is to create an at job for each event, which will call a script that gets all subscribers emails and mails them.
My question is: is my plan/design good? Is it scalable? Are there better options that I should be aware of?
Why a separate cron job for each event? I've done something similar thing for a newsletter with a cron job just running once per hour and if there are any newsletters to be sent it just handles them. In your case you'd have a script that runs once every hour and gets a list of users for events that happen in the desired time interval since.
It will work. As far as scalability, at the minimum make sure that the script runs in it's own process so it doesn't bog down the server unnecessarily.
Create a php-cli script perhaps?
I'm doing most of my work in Rails nowadays, and there's a wealth of background processing libraries one of them is Resque it uses the redis server to keep track of the jobs
I found a PHP clone
Might be overkill for your use case, but give it a shot perhaps
If you would consider a proper framework that uses an application server (and not a simple webserver), Spring has a task scheduling layer that's simple to use. Scheduling jobs on the server really requires more than what a simple LAMP install can do, but I haven't used PHP in a while so maybe there's an equivalent.
Here's an article that compares some of your options.

CRM workflow run another workflow

Not a duplicate of this
I have a pretty simple CRM workflow, which basicly just adds some values to some fields that doesn't get filled whenever a user creates a new object. My challenge here is that a lot of objects are already created in CRM, with a lot of null values. We are talking thousands. So instead of asking the client to open every single object and running the workflow, I was thinking I could create a second workflow which initiates the first workflow to run on all current objects. Is this possible and how should I do it?
The problem is not the workflow execution. Its the selection of the record. Dynamics CRM doesn't have the possibility to execute a workflow against a massive amount of records.
You have to script a little program which selects the records for which you would like to run the workflow and start the workflow for each of them.
See How to run ondemand workflow over all pages

Is it possible to have an "internal" cron in mysql5?

The other day a friend suggested to play a web browser game called OGame. If you don't know it I'll tell you what it is:an rts game where you have to build things like mining factories, barracks and so on. The interesting thing that every building has a build time and you can log off while it's building because it will keep going.
Something like this I would believe is managed via dbms. I have my records where I have the end time of a costruction. How do I check when to update a building? Do I need an external application that checks every seconds what record needs to be updated? Is it possible with mysql5 to have an internal scheduler that launches a procedure on this table? And if so, is it a best practice?
I have built a similar game and I stored the construction end times (and other events to be fired) in an events table. I wrote a PHP daemon which regularly checks the events table for expired records and acts on them accordingly.
I couldn't find a way to do it in the database itself (and if I later wanted to migrate to another DB it would need rewriting). A cron'd script may overlap. A daemon can keep track of everything all the time, and output debug information if events are queuing faster than they're being processed. I also added a cron to check periodically that my daemon is still running, otherwise start it.
Creating a daemon in PHP (if you're using PHP)
Hope that helps.