Is it possible to do the same thing in Saltsack, but by embedded functionality (without powershell workaround)?
installation:
cmd.run:
- name: ./installation_script
wait for installation:
cmd.run:
- name: powershell -command "Start-Sleep 10"
- unless: powershell -command "Test-Path #('/path/to/file/to/appear')"
Unfortunately there is not a better way to do this in the current version of salt. But there was retry logic added into states in the next release Nitrogen.
The way I would do this in that release is.
installation:
cmd.run:
- name: ./installation_script
wait for installation:
cmd.run:
- name: Test-Path #('/path/to/file/to/appear')
- retry:
- attempts: 15
- interval: 10
- until: True
- shell: powershell
And this will continue to run the Test-Path until it exits with a 0 exit code (or whatever the equivalent is in powershell)
https://docs.saltstack.com/en/develop/ref/states/requisites.html#retrying-states
Daniel
NB: While using retry, pay attention to the indent, it has to be 4 spaces from retry key to form a dictionary for salt. Otherwise it will default to 2 attempts with 30s interval. (2017.7.0.)
wait_for_file:
file.exists:
- name: /path/to/file
- retry:
attempts: 15
interval: 30
Related
I am new to saltstack and I am trying to run a job using the SaltStack Config.
I have a master and minion(Windows machine).
the init.sls file is the following:
{% set machineName = salt['pillar.get']('machineName', '') %}
C:\\Windows\\temp\\salt\\scripts:
file.recurse:
- user: Administrator
- group: Domain Admins
- file_mode: 755
- source: salt://PROJECTNAME/DNS/scripts
- makedirs: true
run-multiple-files:
cmd.run:
- cwd: C:\\Windows\\temp\\salt\\scripts
- names:
- ./dns.bat {{ machineName }}
and the dns.bat file:
Get-DnsServerResourceRecordA -ZoneName corp.local
The main idea is to create DNS record, but for now I am trying to run only this command to check things out, but I get the following info message when I run the job:
"The minion has not yet returned. Please try again later."
I went to check out in the master and ran the command: salt-run manage.status and got the following:
salt-run manage.status
/usr/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.23) or chardet (4.0.0) doesn't match a supported version!
RequestsDependencyWarning)
down:
- machine1
- machine2
- machine3
up:
- saltmaster
I tried some commands, to restart the machines, but still no success.
Any help would be appreciated! Thanks in advance!
I have a test pipeline on concourse with one job that runs a set of luigi tasks. My problem is: failures in the luigi tasks do not rise up to the concourse job. In other words, if a luigi task fails, concourse will not register that failure and states that the concourse job completed successfully. I will first post the code I am running, then the solutions I have tried.
luigi-tasks.py
class Pipeline1(luigi.WrapperTask):
def requires(self):
yield Task1()
yield Task2()
yield Task3()
tasks.py
class Task1(luigi.Task):
def requires(self):
return None
def output(self):
return luigi.LocalTarget('stuff/task1.csv')
def run(self):
#uncomment line below to generate task failure
#assert(True==False)
print('task 1 complete...')
t = pd.DataFrame()
with self.output().open('w') as outtie:
outtie.write('complete')
# Tasks 2 and 3 are duplicates of this, but with 1s replaced with 2s or 3s.
config file
[retcode]
# codes are in increasing level of severity (for most applications)
already_running=10
missing_data=20
not_run=25
task_failed=30
scheduling_error=35
unhandled_exception=40
begin.sh
#!/bin/sh
set -e
export PYTHONPATH='.'
luigi --module luigi-tasks Pipeline1 --local-scheduler
echo $?
pipeline.yml
# <resources, resource types, and docker image build job defined here>
#job of interest
- name: run-docker-image
plan:
- get: timer
trigger: true
- get: docker-image-ecr
passed: [build-docker-image]
- get: run-git
- task: run-script
image: docker-image-ecr
config:
inputs:
- name: run-git
platform: linux
run:
dir: ./run-git
path: /bin/bash
args: ["begin.sh"]
I've introduced errors in a few ways: assertions/raising an exception (ValueError) within an individual task's run() method and within the wrapper, and sys.exit(luigi.retcodes.retcode().unhandled_exception). I also tried failing all tasks. I did this in case the error needed to be generated in a specific manner/location. Though they all produced a failed task, none of them produced an error in the concourse server.
At first, I thought concourse just gives a success if it can run the file or command tasked to it. I'm not sure it's that simple, though. Interestingly, when I run the pipeline on my local computer (luigi --modules luigi-tasks Pipeline1 --local-scheduler) I get an appropriate return code (e.g. 30), but when I run the pipeline within the concourse server, I get a return code of 0 after the luigi tasks complete (from echo $? in the bash script).
Would appreciate any insight into this problem.
My suspicion is that luigi doesn't see your config file with return codes. Its default behavior is to return 0, whether tasks fail or succeed.
This experiment should help to debug that:
Force a failed job: add an exit 1 at the end of begin.sh
Hijack the job: fly -t <target> i -j <pipeline>/<job> -> select run-script
cd ./run-git; /bin/bash begin.sh
Ensure the luigi config is present and named appropriately, e.g. luigi.cfg
Re-run the command: LUIGI_CONFIG_PATH=luigi.cfg bash ./begin.sh
Check output: echo $?
I supply the below cloud-init script through Azure portal in creating a VM. and the script never runs. appreciate if anyone can suggest what's wrong with my #cloud-config upload.
observation -
ubuntuVMexscript.sh is written
test.sh is NOT written in home directory
/etc/cloud/cloud.cfg doesn't show the change of [scripts-user, always] in final modules.
#cloud-config
package_upgrade: true
write_files:
- owner: afshan:afshan
path: /var/lib/cloud/scripts/per-boot/ubuntuVMexscript.sh
permissions: '0755'
content: |
#!/bin/sh
cat > testCat < /var/lib/cloud/scripts/per-boot/ubuntuVMexscript.sh
- owner: afshan:afshan
path: /home/afshan/test.sh
permissions: '0755'
content: |
#!/bin/sh
echo "test"
cloud_final_modules:
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- [scripts-user, always]
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
write_files runs before any user/group creation. Does the afshan user exist when write_files is being run? If not, attempting to set the own on the first file will throw an exception, and the write_files module will exit before attempting to create the second file. You can see if this is happening by checking /var/log/cloud-init.log on your instance.
/etc/cloud/cloud.cfg won't get updated by user data. It will stay as-is on disk, but your user data changes will get merged on top of it.
scripts-user refers to scripts written to /var/lib/cloud/instance/scripts. You haven't written anything there, so I'm not sure the purpose of your [scripts-user, always] change. If you're just looking to run a script every boot, the scripts-per-boot module (without any changes) should be fine. Every boot, it will run what's written to /var/lib/cloud/scripts/per-boot
Imagine I have a workflow with 5 steps.
Step 2 may or may not create a file as its output (which is then used as input to subsequent steps).
If the file is created, I want to run the subsequent steps.
If no file gets created in step 2, I want to mark the workflow as completed and not execute steps 3 through to 5.
I'm sure there must be a simple way to do this yet I'm failing to figure out how.
I tried by making step 2 return non-zero exit code when no file is created and then using
when: "{{steps.step2.outputs.exitCode}} == 0" on step 3, but that still executes step 4 and 5 (not to mention marks step 2 as "failed")
So I'm out of ideas, any suggestions are greatly appreciated.
By default, a step that exits with a non-zero exit code fails the workflow.
I would suggest writing an output parameter to determine whether the workflow should continue.
- name: yourstep
container:
command: [sh, -c]
args: ["yourcommand; if [ -f /tmp/yourfile ]; then echo continue > /tmp/continue; fi"]
outputs:
parameters:
- name: continue
valueFrom:
default: "stop"
path: /tmp/continue
Alternatively, you can override the fail-on-nonzero-exitcode behavior with continueOn.
continueOn:
failed: true
I'd caution against continueOn.failed: true. If your command throws a non-zero exit code for an unexpected reason, the workflow won't fail like it should, and the bug might go un-noticed.
This is the content of /etc/cloud/cloud.cfg of Ubuntu cloud 16.04 image:
# The top level settings are used as module
# and system configuration.
# A set of users which may be applied and/or used by various modules
# when a 'default' entry is found it will reference the 'default_user'
# from the distro configuration specified below
users:
- default
# If this is set, 'root' will not be able to ssh in and they
# will get a message to login instead as the above $user (ubuntu)
disable_root: true
# This will cause the set+update hostname module to not operate (if true)
preserve_hostname: false
# Example datasource config
# datasource:
# Ec2:
# metadata_urls: [ 'blah.com' ]
# timeout: 5 # (defaults to 50 seconds)
# max_wait: 10 # (defaults to 120 seconds)
# The modules that run in the 'init' stage
cloud_init_modules:
- migrator
- ubuntu-init-switch
- seed_random
- bootcmd
- write-files
- growpart
- resizefs
- disk_setup
- mounts
- set_hostname
- update_hostname
- update_etc_hosts
- ca-certs
- rsyslog
- users-groups
- ssh
# The modules that run in the 'config' stage
cloud_config_modules:
# Emit the cloud config ready event
# this can be used by upstart jobs for 'start on cloud-config'.
- emit_upstart
- snap_config
- ssh-import-id
- locale
- set-passwords
- grub-dpkg
- apt-pipelining
- apt-configure
- ntp
- timezone
- disable-ec2-metadata
- runcmd
- byobu
# The modules that run in the 'final' stage
cloud_final_modules:
- snappy
- package-update-upgrade-install
- fan
- landscape
- lxd
- puppet
- chef
- salt-minion
- mcollective
- rightscale_userdata
- scripts-vendor
- scripts-per-once
- scripts-per-boot
- scripts-per-instance
- scripts-user
- ssh-authkey-fingerprints
- keys-to-console
- phone-home
- final-message
- power-state-change
# System and/or distro specific settings
# (not accessible to handlers/transforms)
system_info:
# This will affect which distro class gets used
distro: ubuntu
# Default user name + that default users groups (if added/used)
default_user:
name: ubuntu
lock_passwd: True
gecos: Ubuntu
groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video]
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
shell: /bin/bash
# Other config here will be given to the distro class and/or path classes
paths:
cloud_dir: /var/lib/cloud/
templates_dir: /etc/cloud/templates/
upstart_dir: /etc/init/
package_mirrors:
- arches: [i386, amd64]
failsafe:
primary: http://archive.ubuntu.com/ubuntu
security: http://security.ubuntu.com/ubuntu
search:
primary:
- http://%(ec2_region)s.ec2.archive.ubuntu.com/ubuntu/
- http://%(availability_zone)s.clouds.archive.ubuntu.com/ubuntu/
- http://%(region)s.clouds.archive.ubuntu.com/ubuntu/
security: []
- arches: [armhf, armel, default]
failsafe:
primary: http://ports.ubuntu.com/ubuntu-ports
security: http://ports.ubuntu.com/ubuntu-ports
ssh_svcname: ssh
As you can see, package-update-upgrade-install is put in final stage, where runcmd is put in config stage. According to cloud-init document, modules in config stage are executed before final stage. As I understand, runcmd will be executed before package install.
However, the following code runs without any error:
packages:
- shorewall
runcmd:
- echo "printing shorewall version"
- shorewall version
That means runcmd can be executed after package install.
Is there any reason that make cloud-init disrespect the execution order defined in /etc/cloud/cloud.cfg?
While investigating how to get cloud-init to run things earlier in the boot process, I saw this too. In my testing, it appeared to me that runcmd was running in the config stage as you would expect, but all it was doing was creating a shell script from the runcmd data, which it put in /var/lib/cloud/instance/scripts/runcmd. Cloud-init then ran the shell script during the scripts-user module in the final stage. Below are bits from the /var/log/cloud-init.log log showing this:
"Mar 15 17:12:24 cloud-init[2796]: stages.py[DEBUG]: Running module runcmd (<module 'cloudinit.config.cc_runcmd' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_runcmd.pyc'>) with frequency once-per-instance",
"Mar 15 17:12:24 cloud-init[2796]: util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-xxx/sem/config_runcmd - wb: [644] 20 bytes",
"Mar 15 17:12:24 cloud-init[2796]: helpers.py[DEBUG]: Running config-runcmd using lock (<FileLock using file '/var/lib/cloud/instances/i-xxx/sem/config_runcmd'>)",
"Mar 15 17:12:24 cloud-init[2796]: util.py[DEBUG]: Shellified 1 commands.",
"Mar 15 17:12:24 cloud-init[2796]: util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-xxx/scripts/runcmd - wb: [700] 50 bytes",
...
"Mar 15 17:12:40 cloud-init[2945]: stages.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_scripts_user.pyc'>) with frequency once-per-instance",
"Mar 15 17:12:40 cloud-init[2945]: util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-xxx/sem/config_scripts_user - wb: [644] 20 bytes",
"Mar 15 17:12:40 cloud-init[2945]: helpers.py[DEBUG]: Running config-scripts-user using lock (<FileLock using file '/var/lib/cloud/instances/i-xxx/sem/config_scripts_user'>)",
"Mar 15 17:12:40 cloud-init[2945]: util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=True, capture=False)",
Hope this helps...