I have the scheduler enabled on my salt master. I have a job that is configured to execute a runner function on a list of machines every month, like this example:
schedule:
updater:
args:
- L#machine1,machine2,machine3
cron: 0 * * * *
enabled: true
function: util.patch_selected
jid_include: true
maxrunning: 1
name: updater
I've confirmed that the runner function (including the arguments) works fine on its own, however the scheduler does not execute the runner function. I've run salt-run saltutil.sync_runner on the master, and salt '*' saltutil.refresh_pillar on all the minions. What am I missing to get this running?
Running salt-run jobs.last_run target=<minion name> on the minion helped decipher the issue. It turns out that the "function" portion of a job schedule refers to a runner on the master, which doesn't work when targeting a minion. A workable solution for this was
schedule:
patch_updates:
args:
- salt-call util.patch_selected
cron: 0 * * * *
enabled: true
function: cmd.run
jid_include: true
maxrunning: 1
name: patch_updates
Related
I have a test pipeline on concourse with one job that runs a set of luigi tasks. My problem is: failures in the luigi tasks do not rise up to the concourse job. In other words, if a luigi task fails, concourse will not register that failure and states that the concourse job completed successfully. I will first post the code I am running, then the solutions I have tried.
luigi-tasks.py
class Pipeline1(luigi.WrapperTask):
def requires(self):
yield Task1()
yield Task2()
yield Task3()
tasks.py
class Task1(luigi.Task):
def requires(self):
return None
def output(self):
return luigi.LocalTarget('stuff/task1.csv')
def run(self):
#uncomment line below to generate task failure
#assert(True==False)
print('task 1 complete...')
t = pd.DataFrame()
with self.output().open('w') as outtie:
outtie.write('complete')
# Tasks 2 and 3 are duplicates of this, but with 1s replaced with 2s or 3s.
config file
[retcode]
# codes are in increasing level of severity (for most applications)
already_running=10
missing_data=20
not_run=25
task_failed=30
scheduling_error=35
unhandled_exception=40
begin.sh
#!/bin/sh
set -e
export PYTHONPATH='.'
luigi --module luigi-tasks Pipeline1 --local-scheduler
echo $?
pipeline.yml
# <resources, resource types, and docker image build job defined here>
#job of interest
- name: run-docker-image
plan:
- get: timer
trigger: true
- get: docker-image-ecr
passed: [build-docker-image]
- get: run-git
- task: run-script
image: docker-image-ecr
config:
inputs:
- name: run-git
platform: linux
run:
dir: ./run-git
path: /bin/bash
args: ["begin.sh"]
I've introduced errors in a few ways: assertions/raising an exception (ValueError) within an individual task's run() method and within the wrapper, and sys.exit(luigi.retcodes.retcode().unhandled_exception). I also tried failing all tasks. I did this in case the error needed to be generated in a specific manner/location. Though they all produced a failed task, none of them produced an error in the concourse server.
At first, I thought concourse just gives a success if it can run the file or command tasked to it. I'm not sure it's that simple, though. Interestingly, when I run the pipeline on my local computer (luigi --modules luigi-tasks Pipeline1 --local-scheduler) I get an appropriate return code (e.g. 30), but when I run the pipeline within the concourse server, I get a return code of 0 after the luigi tasks complete (from echo $? in the bash script).
Would appreciate any insight into this problem.
My suspicion is that luigi doesn't see your config file with return codes. Its default behavior is to return 0, whether tasks fail or succeed.
This experiment should help to debug that:
Force a failed job: add an exit 1 at the end of begin.sh
Hijack the job: fly -t <target> i -j <pipeline>/<job> -> select run-script
cd ./run-git; /bin/bash begin.sh
Ensure the luigi config is present and named appropriately, e.g. luigi.cfg
Re-run the command: LUIGI_CONFIG_PATH=luigi.cfg bash ./begin.sh
Check output: echo $?
We have a workflow that uses cron based scheduling. We need to support a use case to change the cron expression.
What is the best practice to do so?
TL;DR
Start the same cron workflow again with the same workflowID, with IDReusePolicy = TerminteIfRunning
Example
Like in the documentation, CRON will stop only when canceled or terminate. So you can also terminate/cancel, and then start a new one. But there is no consistency guarantee if you use two requests to do it yourself.
Using IDReusePolicy = TerminteIfRunning will make sure terminate+start is an atomic operation in Cadence.
Here is the example of using it
1.Start a helloworld worker
./bin/helloworld -m worker &
[1] 24808
2021-03-22T20:08:09.404-0700 INFO common/sample_helper.go:97 Logger created.
2021-03-22T20:08:09.405-0700 DEBUG common/factory.go:131 Creating RPC dispatcher outbound {"ServiceName": "cadence-frontend", "HostPort": "127.0.0.1:7933"}
...
...
Starting a cron workflow
$./cadence --do samples-domain wf start --tl helloWorldGroup -w "test-cron" --execution_timeout 10 --cron "* * * * *" --wt "main.helloWorldWorkflow" -i '"Hello"'
Started Workflow Id: test-cron, run Id: 2d9f06f9-7e79-4c9d-942a-e2c6a20c9f85
Update the cron workflow
$./cadence --do samples-domain wf start --tl helloWorldGroup -w "test-cron" --execution_timeout 10 --cron "* * * * *" --wt "main.helloWorldWorkflow" -i '"Cadence"' --workflowidreusepolicy 3
Started Workflow Id: test-cron, run Id: 4344448d-5a95-4a91-a56e-ebc0b93b4d29
NOTE that in CLI: --workflowidreusepolicy 3 will set the IDReusePolicy = TerminteIfRunning
The CLI usage will be updated after this PR.
Then you should be able to see the helloworld workflow print the new value:
$2021-03-22T20:24:00.307-0700 INFO helloworld/helloworld_workflow.go:29 helloworld workflow started {"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "24808#IT-USA-25920#helloWorldGroup", "WorkflowType": "main.helloWorldWorkflow", "WorkflowID": "test-cron", "RunID": "1e2e6d2f-dcc7-410f-8d06-81c94622bbb7"}
2021-03-22T20:24:00.307-0700 DEBUG internal/internal_event_handlers.go:470 ExecuteActivity {"Domain": "samples-domain", "TaskList": "helloWorldGroup", "WorkerID": "24808#IT-USA-25920#helloWorldGroup", "WorkflowType": "main.helloWorldWorkflow", "WorkflowID": "test-cron", "RunID": "1e2e6d2f-dcc7-410f-8d06-81c94622bbb7", "ActivityID": "0", "ActivityType": "main.helloWorldActivity"}
...
This is the content of /etc/cloud/cloud.cfg of Ubuntu cloud 16.04 image:
# The top level settings are used as module
# and system configuration.
# A set of users which may be applied and/or used by various modules
# when a 'default' entry is found it will reference the 'default_user'
# from the distro configuration specified below
users:
- default
# If this is set, 'root' will not be able to ssh in and they
# will get a message to login instead as the above $user (ubuntu)
disable_root: true
# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
# Example datasource config
# datasource:
# Ec2:
# metadata_urls: [ 'blah.com' ]
# timeout: 5 # (defaults to 50 seconds)
# max_wait: 10 # (defaults to 120 seconds)
# The modules that run in the 'init' stage
cloud_init_modules:
- migrator
- ubuntu-init-switch
- seed_random
- bootcmd
- write-files
- growpart
- resizefs
- disk_setup
- mounts
- set_hostname
- update_hostname
- update_etc_hosts
- ca-certs
- rsyslog
- users-groups
- ssh
# The modules that run in the 'config' stage
cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
- emit_upstart
- snap_config
- ssh-import-id
- locale
- set-passwords
- grub-dpkg
- apt-pipelining
- apt-configure
- ntp
- timezone
- disable-ec2-metadata
- runcmd
- byobu
# The modules that run in the 'final' stage
cloud_final_modules:
- snappy
- package-update-upgrade-install
- fan
- landscape
- lxd
- puppet
- chef
- salt-minion
- mcollective
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
# System and/or distro specific settings
# (not accessible to handlers/transforms)
system_info:
# This will affect which distro class gets used
distro: ubuntu
# Default user name + that default users groups (if added/used)
default_user:
name: ubuntu
lock_passwd: True
gecos: Ubuntu
groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video]
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
shell: /bin/bash
# Other config here will be given to the distro class and/or path classes
paths:
cloud_dir: /var/lib/cloud/
templates_dir: /etc/cloud/templates/
upstart_dir: /etc/init/
package_mirrors:
- arches: [i386, amd64]
failsafe:
primary: http://archive.ubuntu.com/ubuntu
security: http://security.ubuntu.com/ubuntu
search:
primary:
- http://%(ec2_region)s.ec2.archive.ubuntu.com/ubuntu/
- http://%(availability_zone)s.clouds.archive.ubuntu.com/ubuntu/
- http://%(region)s.clouds.archive.ubuntu.com/ubuntu/
security: []
- arches: [armhf, armel, default]
failsafe:
primary: http://ports.ubuntu.com/ubuntu-ports
security: http://ports.ubuntu.com/ubuntu-ports
ssh_svcname: ssh
As you can see, package-update-upgrade-install is put in final stage, where runcmd is put in config stage. According to cloud-init document, modules in config stage are executed before final stage. As I understand, runcmd will be executed before package install.
However, the following code runs without any error:
packages:
- shorewall
runcmd:
- echo "printing shorewall version"
- shorewall version
That means runcmd can be executed after package install.
Is there any reason that make cloud-init disrespect the execution order defined in /etc/cloud/cloud.cfg?
While investigating how to get cloud-init to run things earlier in the boot process, I saw this too. In my testing, it appeared to me that runcmd was running in the config stage as you would expect, but all it was doing was creating a shell script from the runcmd data, which it put in /var/lib/cloud/instance/scripts/runcmd. Cloud-init then ran the shell script during the scripts-user module in the final stage. Below are bits from the /var/log/cloud-init.log log showing this:
"Mar 15 17:12:24 cloud-init[2796]: stages.py[DEBUG]: Running module runcmd (<module 'cloudinit.config.cc_runcmd' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_runcmd.pyc'>) with frequency once-per-instance",
"Mar 15 17:12:24 cloud-init[2796]: util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-xxx/sem/config_runcmd - wb: [644] 20 bytes",
"Mar 15 17:12:24 cloud-init[2796]: helpers.py[DEBUG]: Running config-runcmd using lock (<FileLock using file '/var/lib/cloud/instances/i-xxx/sem/config_runcmd'>)",
"Mar 15 17:12:24 cloud-init[2796]: util.py[DEBUG]: Shellified 1 commands.",
"Mar 15 17:12:24 cloud-init[2796]: util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-xxx/scripts/runcmd - wb: [700] 50 bytes",
...
"Mar 15 17:12:40 cloud-init[2945]: stages.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) with frequency once-per-instance",
"Mar 15 17:12:40 cloud-init[2945]: util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-xxx/sem/config_scripts_user - wb: [644] 20 bytes",
"Mar 15 17:12:40 cloud-init[2945]: helpers.py[DEBUG]: Running config-scripts-user using lock (<FileLock using file '/var/lib/cloud/instances/i-xxx/sem/config_scripts_user'>)",
"Mar 15 17:12:40 cloud-init[2945]: util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=True, capture=False)",
Hope this helps...
Is it possible to do the same thing in Saltsack, but by embedded functionality (without powershell workaround)?
installation:
cmd.run:
- name: ./installation_script
wait for installation:
cmd.run:
- name: powershell -command "Start-Sleep 10"
- unless: powershell -command "Test-Path #('/path/to/file/to/appear')"
Unfortunately there is not a better way to do this in the current version of salt. But there was retry logic added into states in the next release Nitrogen.
The way I would do this in that release is.
installation:
cmd.run:
- name: ./installation_script
wait for installation:
cmd.run:
- name: Test-Path #('/path/to/file/to/appear')
- retry:
- attempts: 15
- interval: 10
- until: True
- shell: powershell
And this will continue to run the Test-Path until it exits with a 0 exit code (or whatever the equivalent is in powershell)
https://docs.saltstack.com/en/develop/ref/states/requisites.html#retrying-states
Daniel
NB: While using retry, pay attention to the indent, it has to be 4 spaces from retry key to form a dictionary for salt. Otherwise it will default to 2 attempts with 30s interval. (2017.7.0.)
wait_for_file:
file.exists:
- name: /path/to/file
- retry:
attempts: 15
interval: 30
I have a cron file that contain this
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h dom mon dow command
MAILTO="myemail#gmail.com"
* * * * * php /home/forge/biossantibodies/artisan products:exportdiff --env=development
* * * * * php /home/forge/biossantibodies/artisan images:exportdiff --env=development
* * * * * php /home/forge/biossantibodies/artisan publications:exportdiff --env=development
* * * * * mailx -s "CronJob is run successfully" ben#gmail.com
I want it to send email to me after the cron job successfully ran.
I've tried:
MAILTO="myemail#gmail.com"
and:
* * * * * mailx -s "CronJob is run successfully" myemail#gmail.com
And I never received any emails, I looked in the spam folder as well, but I noticed the task between them work.
How would one go about configuring this?