Rollback support in Celery - celery

I am planning to use celery as the task management component of my project. It has almost all the features that I need for my project. I will have a set of tasks which can execute independently or in the specified sequence. In the sequential tasks, I want the ability to perform cleanup/rollbacks if one of the intermediate tasks fails. I was wondering if there is a feature available in celery out of the box to do the same or is any workaround available.

Celery doesn't support anything similar to rollback. Your tasks should be small atomic steps and intermediate steps should not corrupt your database.
If you need to revert changes, you can create task which is doing opposite operation and call it when celery is retrying to perform one of the tasks.

Related

Terminate all ECS tasks before new deployment

If an ECS task runs in multiple instances, the rolling deployment methods usually terminate tasks one by one and replace it with a newer version (newer task definition), in order to maximize uptime.
But we have some containers that may make schema upgrades on first-run (as an example), and here we never want old versions of a container to run at the same time as newer versions.
Another example is coupled data formats, that a new front-end talks to a new API. We'd rather have a moment of downtime than have old front-ends talk to new APIs, or new front-ends talk to old APIs.
ECS doesn't seem to provide a deployment strategy that terminates all instances of a task before starting to roll out new ones.
Before I roll some bash script that reduces task counts to 0 prior to an update and returns it to its original value, does anyone know if AWS provides a way to cleanly drain a service of tasks before rolling out a new task definition?
In a similar scenario I would do it as follows.
I would add my pipeline (if using a stack code) a CodeDeploy step and set the traffic to "All-at-once", this doc can start to direct you something.
In case I was unable to work with the stack code, I would create a lambda, which would be triggered by the pipeline, a step before the deploy, this lambda could interact with the ECS, resetting the tasks to receive the new version of the application, but in this situation, a downtime will be felt, of at least 2 ~ 5 minutes, depending on how your application is optimized. You can see a little more about the lib that will allow you to do this here.

What is the current recommended approach to manage/stop a spring-batch job?

We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?
To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).

Out of box distributed job queue solution

Are there any existing out of the box job queue framework? basic idea is
someone to enqueue a job with job status New
(multiple) workers get a job and work on it, mark the job as Taken. One job can only be running on at most one worker
something will monitor the worker status, if the running jobs exceed predefined timeout, will be re-queued with status New, could be worker health issue
Once a worker completes a task, it marks the task as Completed in the queue.
something keeps cleaning up completed tasks. Or at step #4 when worker completes a task, the worker simply dequeues the task.
From my investigation, things like Kafka (pub/sub) or MQ (push/pull & pub/sub) or cache (Redis, Memcached) are mostly sufficient for this work. However, they all require some sort of development around its core functionality to become a fully functional job queue.
Also looked into relational DB, the ones supports "SELECT * FOR UPDATE SKIP LOCKED" syntax is also a good candidate, this again requires a daemon between the DB and worker, which means extra effort.
Also looked into the cloud solutions, Azure Queue storage, etc. similar assessment.
So my question is, is there any out of the box solution for job queue, that are tailored and dedicated for one thing, job queuing, without much effort to set up?
Thanks
Take a look at Python Celery. https://docs.celeryproject.org/en/stable/getting-started/introduction.html
The default mode uses RabbitMQ as the message broker, but other options are available. Results can be stored in a DB if needed.

Use Airflow to run parametrized jobs on-demand and with a schedule

I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.

Is there a way to make jobs in Jenkins mutually exclusive?

I have a few jobs in Jenkins that use Selenium to modify a database through a website's front end. If some of these jobs run at the same time, errors due to dirty reads can result. Is there a way to force certain jobs in Jenkins to be unable to run at the same time? I would prefer not to have to place or pick up a lock on the database, which could be read or modified by any number of users who are also testing.
You want the Throttle Concurrent Builds plugin which lets you define global and per-node semaphores.
Locks and latches is being deprecated in favor of Throttle Concurrent builds.
I've tried both the locks & latches plugin and the port allocator plugin as ways to achieve what you're trying to do. Neither worked reliably for me. Locks & latches worked some of the time, but I'd occasionally get hung jobs. Using port allocator as a hack will work unless you have multiple jenkins nodes, but the config overhead is kind of high. What I've ultimately settled upon is another hack, but it works reliably and uses core Jenkins stuff (no plugins):
create a slave node running on the same box as the master (or not, if you have lots of boxes)
give this slave a single executor (key)
tie the 2 (or n) jobs that must not run together to this new slave node
optionally set the slave's usage to 'tied jobs only' if it'll screw up your other jobs if they happen to run on the new slave
Since the slave has only one executor, the jobs tied to it can never run together.
If you regard the database as a shared resource that can only be used exclusively then this fits the usecase of the Lockable resources plugin.
It is being actively developed and improved and is very versatile.