Cadence: What is the best practice to change the workflow cron schedule? - cadence-workflow

We have a workflow that uses cron based scheduling. We need to support a use case to change the cron expression.
What is the best practice to do so?

TL;DR
Start the same cron workflow again with the same workflowID, with IDReusePolicy = TerminteIfRunning
Example
Like in the documentation, CRON will stop only when canceled or terminate. So you can also terminate/cancel, and then start a new one. But there is no consistency guarantee if you use two requests to do it yourself.
Using IDReusePolicy = TerminteIfRunning will make sure terminate+start is an atomic operation in Cadence.
Here is the example of using it
1.Start a helloworld worker
./bin/helloworld -m worker &
[1] 24808
2021-03-22T20:08:09.404-0700 INFO common/sample_helper.go:97 Logger created.
2021-03-22T20:08:09.405-0700 DEBUG common/factory.go:131 Creating RPC dispatcher outbound {"ServiceName": "cadence-frontend", "HostPort": "127.0.0.1:7933"}
...
...
Starting a cron workflow
$./cadence --do samples-domain wf start --tl helloWorldGroup -w "test-cron" --execution_timeout 10 --cron "* * * * *" --wt "main.helloWorldWorkflow" -i '"Hello"'
Started Workflow Id: test-cron, run Id: 2d9f06f9-7e79-4c9d-942a-e2c6a20c9f85
Update the cron workflow
$./cadence --do samples-domain wf start --tl helloWorldGroup -w "test-cron" --execution_timeout 10 --cron "* * * * *" --wt "main.helloWorldWorkflow" -i '"Cadence"' --workflowidreusepolicy 3
Started Workflow Id: test-cron, run Id: 4344448d-5a95-4a91-a56e-ebc0b93b4d29
NOTE that in CLI: --workflowidreusepolicy 3 will set the IDReusePolicy = TerminteIfRunning
The CLI usage will be updated after this PR.
Then you should be able to see the helloworld workflow print the new value:
$2021-03-22T20:24:00.307-0700 INFO helloworld/helloworld_workflow.go:29 helloworld workflow started {"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "24808#IT-USA-25920#helloWorldGroup", "WorkflowType": "main.helloWorldWorkflow", "WorkflowID": "test-cron", "RunID": "1e2e6d2f-dcc7-410f-8d06-81c94622bbb7"}
2021-03-22T20:24:00.307-0700 DEBUG internal/internal_event_handlers.go:470 ExecuteActivity {"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "24808#IT-USA-25920#helloWorldGroup", "WorkflowType": "main.helloWorldWorkflow", "WorkflowID": "test-cron", "RunID": "1e2e6d2f-dcc7-410f-8d06-81c94622bbb7", "ActivityID": "0", "ActivityType": "main.helloWorldActivity"}
...

Related

Luigi does not send error codes to concourse ci

I have a test pipeline on concourse with one job that runs a set of luigi tasks. My problem is: failures in the luigi tasks do not rise up to the concourse job. In other words, if a luigi task fails, concourse will not register that failure and states that the concourse job completed successfully. I will first post the code I am running, then the solutions I have tried.
luigi-tasks.py
class Pipeline1(luigi.WrapperTask):
def requires(self):
yield Task1()
yield Task2()
yield Task3()
tasks.py
class Task1(luigi.Task):
def requires(self):
return None
def output(self):
return luigi.LocalTarget('stuff/task1.csv')
def run(self):
#uncomment line below to generate task failure
#assert(True==False)
print('task 1 complete...')
t = pd.DataFrame()
with self.output().open('w') as outtie:
outtie.write('complete')
# Tasks 2 and 3 are duplicates of this, but with 1s replaced with 2s or 3s.
config file
[retcode]
# codes are in increasing level of severity (for most applications)
already_running=10
missing_data=20
not_run=25
task_failed=30
scheduling_error=35
unhandled_exception=40
begin.sh
#!/bin/sh
set -e
export PYTHONPATH='.'
luigi --module luigi-tasks Pipeline1 --local-scheduler
echo $?
pipeline.yml
# <resources, resource types, and docker image build job defined here>
#job of interest
- name: run-docker-image
plan:
- get: timer
trigger: true
- get: docker-image-ecr
passed: [build-docker-image]
- get: run-git
- task: run-script
image: docker-image-ecr
config:
inputs:
- name: run-git
platform: linux
run:
dir: ./run-git
path: /bin/bash
args: ["begin.sh"]
I've introduced errors in a few ways: assertions/raising an exception (ValueError) within an individual task's run() method and within the wrapper, and sys.exit(luigi.retcodes.retcode().unhandled_exception). I also tried failing all tasks. I did this in case the error needed to be generated in a specific manner/location. Though they all produced a failed task, none of them produced an error in the concourse server.
At first, I thought concourse just gives a success if it can run the file or command tasked to it. I'm not sure it's that simple, though. Interestingly, when I run the pipeline on my local computer (luigi --modules luigi-tasks Pipeline1 --local-scheduler) I get an appropriate return code (e.g. 30), but when I run the pipeline within the concourse server, I get a return code of 0 after the luigi tasks complete (from echo $? in the bash script).
Would appreciate any insight into this problem.
My suspicion is that luigi doesn't see your config file with return codes. Its default behavior is to return 0, whether tasks fail or succeed.
This experiment should help to debug that:
Force a failed job: add an exit 1 at the end of begin.sh
Hijack the job: fly -t <target> i -j <pipeline>/<job> -> select run-script
cd ./run-git; /bin/bash begin.sh
Ensure the luigi config is present and named appropriately, e.g. luigi.cfg
Re-run the command: LUIGI_CONFIG_PATH=luigi.cfg bash ./begin.sh
Check output: echo $?

How do I run a function in only one thread in a multi-threaded Unicorn Sinatra server?

I put my cron task in a module, and then in my Sinatra server.
module Cron
scheduler = Rufus::Scheduler.new
scheduler.every "30m", :first => :now do
run_cmd('git pull')
puts "pulled the repo!!!"
end
end
class MyServer < Sinatra::Base
include Cron
end
The entry point for the app is unicorn (unicorn config/config.ru -p 9393 -c config/unicorn.rb), and in unicorn.rb, there's this line
worker_processes 7
Because of this, git pull is running seven times every 30 minutes, and pulled the repo!!! is printed seven times.
Is there a way I can run this task only in one thread? I tried putting it in unicorn.rb above worker_processes 7 line but I'm not sure if that's the best place for this code to live in.
Unicorn is a multi-process (not multi-threaded) Rack server. There's no native support for executing a specific code path only in one of the worker processes.
However, you can work around that by saving the worker number after fork into an environment variable and then checking its value in your application code.
In config/unicorn.rb use
after_worker_ready do |server, worker|
ENV["WORKER_NR"] = worker.nr.to_s
end
In your Sinatra app do:
if unicorn_worker_nr == "0"
scheduler.every "30m", :first => :now do
...
end
end
def unicorn_worker_nr
ENV["WORKER_NR"]
end

unable to trigger job in concourse

I was new to concourse, and set up the environment in my centos7.6 like below.
$ wget https://concourse-ci.org/docker-compose.yml
$ docker-compose up -d
Then login by `fly --target example login --team-name main --concourse-url http://192.168.77.140:8080/ -u test -p test`
I can see below.
[root#centostest ~]# fly targets
name url team expiry
example http://192.168.77.140:8080 main Sun, 16 Jun 2019 02:23:48 UTC
I used below yaml.xml named with 2.yaml
---
resources:
- name: my-git-repo
type: git
source:
uri: https://github.com/ruanbekker/concourse-test
branch: basic-helloworld
jobs:
- name: hello-world-job
public: true
plan:
- get: my-git-repo
- task: task_print-hello-world
file: my-git-repo/ci/task-hello-world.yml
Then I run below commands step by step.
fly -t example sp -c 2.yaml -p pipeline-01
fly -t example up -p pipeline-01
fly -t example tj -j pipeline-01/hello-world-job --watch
But i just hang on there , no useful response like below.
[root#centostest ~]# fly -t example tj -j pipeline-01/hello-world-job --watch
started pipeline-01/hello-world-job #3
Theoretically, it should print something like below.
Cloning into '/tmp/build/get'...
Fetching HEAD
292c84b change task name
initializing
running echo hello world
hello world
succeeded
Where I did wrong? thanks.
welcome to Concourse!
One thing that can be confusing when starting with Concourse is understanding when Concourse detects that the pipeline has changed and what happens if the pipeline is one file or multiple files.
Your pipeline (as the majority of real-world pipelines) is "nested": main pipeline file 2.yaml refers to a task file named my-git-repo/ci/task-hello-world.yml
What sets Concourse apart from other CI systems is that:
the main pipeline file (2.yaml) can reside everywhere, also in a different repository.
Due to 1, Concourse is unable to detect a change to the main pipeline file, you have to tell Concourse that the file has changed, either with fly set-pipeline or with automatic means such as the concourse-pipeline-resource.
So the following errors happen often:
Changing the main pipeline file, committing and pushing, and expecting Concourse to pick up the change. Missing: you have to do fly set-pipeline
Once doing fly set-pipeline becomes second nature, you can stumble upon the opposite error: Change both the main pipeline file and the nested task file, not pushing, doing set-pipeline. In this case, the only changes picked up by Concourse will be the ones to the main pipeline file, not to the task file. Missing: commit and push.
From the description of your problem, I have the feeling that it is a mixture of the gotchas I mentioned.

Is it possible to use custom routes for celery's canvas primitives?

I have distinct Rabbit queues each dedicated to a special kind of order processing:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
pass # elided ...
#celery.task
def process_order_for_product_y(order_id):
pass # elided ...
# settings.py
CELERY_QUEUES = {
"black_hole": {
"binding_key": "black_hole",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_x": {
"binding_key": "product_x",
"queue_arguments": {"x-ha-policy": "all"}
},
"product_y": {
"binding_key": "product_y",
"queue_arguments": {"x-ha-policy": "all"}
},
We have a policy of enforcing explicit routing by setting CELERY_DEFAULT_QUEUE = 'black_hole' and then never consuming from black_hole.
Each of these tasks may use celery's canvas primitives, like so:
# tasks.py
#celery.task
def process_order_for_product_x(order_id):
# These can run in parallel
stage_1_group = group(do_something.si(order_id),
do_something_else.si(order_id))
# These can run in parallel
another_group = group(do_something_at_end.si(order_id),
do_something_else_at_end.si(order_id))
# These run in a linear sequence
process_task = chain(
stage_1_group,
do_something_dependent_on_stage_1.si(order_id),
another_group)
process_task.apply_async()
Supposing I want specific uses of celery.group, celery.chord, celery.chord_unlock, and other canvas tasks to flow through the queue for its corresponding product, rather than getting trapped in a black_hole, is there a way to invoke each particular canvas task with either a custom task name or custom routing_key?
For reasons I won't go into I would prefer to not send all celery.* tasks to a catch-all celery_canvas queue, which is what I am doing in the meantime.
This method allows you to route Celery canvas tasks to the queue of a callback task.
It is possible to specify a custom class-based task router for Celery as described here.
Let's focus on the celery.chord_unlock task. Its signature is defined here.
def unlock_chord(self, group_id, callback, ...):
The second positional argument is the signature of the chord callback task.
Task signatures in Celery are basically dicts, so that gives us an opportunity to access task options, including the task queue name.
Here is an example:
class CeleryRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == 'celery.chord_unlock':
callback_signature = args[1]
options = callback_signature.get('options')
if options:
queue = options.get('queue')
if queue:
return {'queue': queue}
Add it to the Celery config:
CELERY_ROUTES = (CeleryRouter(),
I'm currently using Celery in my project. For some scenarios I need task to chain though different queues:
chain(get_staff.s(url), save_staff.s(dt, partner_id, url))()
Those two functions declared like so:
#task(queue='celery_gevent')
def get_staff(source_url):
#task # send to default queue
def save_staff(suggests, dt, partner, url):
btw, celery_gevent is handled by worker with gevent pool to make http requests.
This example, how you can specify queue implicitly. Also you can explicitly put task in a different queue by specifying additional params, like so:
In [1]: add.apply_async([4,5])
Out[1]: <AsyncResult: bda3dedd-c2c4-44db-be8e-6a97e718f8b0>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
...done.
In [2]: add.apply_async([4,5], queue='your_product')
Out[2]: <AsyncResult: 934f6161-298b-468b-9716-3da6fae58fa5>
$ sudo rabbitmqctl list_queues
Listing queues ...
celery 1
your_product 1
...done.
You can run whole canvas in custom queue:
process_task.apply_async(queue='your_queue')
Try to specify queue_name inside #task decorator. This should help.
Links:
http://docs.celeryproject.org/en/latest/reference/celery.app.task.html
http://docs.celeryproject.org/en/latest/_modules/celery/app/task.html#Task.apply_async

Stop Oozie workflow execution

Yesterday I kicked off an oozie workflow. It started two jobs that stalled all day. I killed them this morning, having made a change that I now want to test. After killing the two jobs it's like the workflow became unstuck and is now proceeding. I would like to kill the workflow so it doesn't keep starting new jobs to replace the ones I kill. How can I do that in the oozie command line?
Oozie commands
--------------
Note: Replace oozie server and port, with your cluster-specific.
 
1) Submit job:
$ oozie job -oozie http://localhost:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -submit job: 0000001-130712212133144-oozie-oozi-W
 
2) Run job:
$ oozie job -oozie http://localhost:11000/oozie -start 0000001-130712212133144-oozie-oozi-W
 
3) Check the status:
$ oozie job -oozie http://localhost:11000/oozie -info 0000001-130712212133144-oozie-oozi-W
 
4) Suspend workflow:
$ oozie job -oozie http://localhost:11000/oozie -suspend 0000001-130712212133144-oozie-oozi-W
 
5) Resume workflow:
$ oozie job -oozie http://localhost:11000/oozie -resume 0000001-130712212133144-oozie-oozi-W
 
6) Re-run workflow:
$ oozie job -oozie http://localhost:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -rerun 0000001-130712212133144-oozie-oozi-W
 
7) Should you need to kill the job:
$ oozie job -oozie http://localhost:11000/oozie -kill 0000001-130712212133144-oozie-oozi-W
 
8) View server logs:
$ oozie job -oozie http://localhost:11000/oozie -logs 0000001-130712212133144-oozie-oozi-W
 
Logs are available at:
/var/log/oozie on the Oozie server.
You can view your running jobs with:
oozie jobs
or if it's a coordinator, not a workflow:
oozie jobs -jobtype coordinator
And get the Job ID from there, then do:
oozie job -kill [id]
Here's the command line tool reference page: http://incubator.apache.org/oozie/docs/3.1.3/docs/DG_CommandLineTool.html
In addition to the post related to Oozie commands, sometimes we don't have to access to the respective workflow id to suspend/kill etc. and we get below error:
Error: E0508 : E0508: User [?] not authorized for WF job [0001304-190209190348229-oozie-mapr-W]
For this, to perform any operation like kill/suspend etc. we need to generate the authenticating token for our user id. For this, first, we need to clear the existing tokens from the file using below command and then perform suspend/kill etc. action on given workflow id:
rm .oozie-auth-token
From Apache Oozie docs:
Once authentication is performed successfully the received
authentication token is cached in the user home directory in the
.oozie-auth-token file with owner-only permissions. Subsequent
requests reuse the cached token while valid.
For more details, the link of Apache Oozie docs (refer Authentication section):
Official Documentation
I think you will find it helpful how to kill, rerun, etc multiple (example 200) jobs at the same time using bash.
In one single line:
$for jobid in `oozie jobs -filter status=SUSPENDED | cut -d" " -f1`; do echo "Killed job ${jobid}"; job -kill ${jobid}; done