I installed Rundeck v3.3.5 (on CentOS 7 via RPM) to replace an old Rundeck instance that was decommissioned. I did the export/import of projects (which worked brilliantly) while connected to the new server as the default admin user. The imported jobs run properly on the correct schedule. I subsequently configured the new server to use LDAP authentication and configured ACLs for users/roles. That also works properly.
However, I see an error like this in the service.log:
ERROR services.NotificationService - Error sending notification email to foo#bar.com for Execution 9358 Error executing tag <g:render>: could not initialize proxy [rundeck.Workflow#9468] - no Session
My thought is to switch job owners from admin to a user that exists in LDAP. I mean, I would like to switch job owners regardless, but I'm also hoping it addresses the error.
Is there a way in the web interface or using rd that I can bulk-modify jobs to switch the owner?
It turns out that the error in the log was caused by notification settings in an included job. I didn't realize that notifications were configured on the parameterized shared job definition, but there were; removing the notification settings caused the error to stop being added to /var/log/rundeck/service.log.
To illustrate the problem, here are chunks of YAML I've edited to show just the important parts. Here's the common job:
- description: Do the actual work with arguments passed
group: jobs/common
id: a618ceb6-f966-49cf-96c5-03a0c2efb9d8
name: do_the_work
notification:
onstart:
email:
attachType: file
recipients: ops#company.com
subject: Actual work being started
notifyAvgDurationThreshold: null
options:
- enforced: true
name: do_the_job
required: true
values:
- yes
- no
valuesListDelimiter: ','
- enforced: true
name: fail_a_lot
required: true
values:
- yes
- no
valuesListDelimiter: ','
scheduleEnabled: false
sequence:
commands:
- description: The actual work
script: |-
#!/bin/bash
echo ${RD_OPTION_DO_THE_JOB} ${RD_OPTION_FAIL_A_LOT}
keepgoing: false
strategy: node-first
timeout: '60'
uuid: a618ceb6-f966-49cf-96c5-03a0c2efb9d8
And here's the job that calls it (the one that is scheduled and causes an error to show up in the log when it runs):
- description: Do the job
group: jobs/individual
name: do_the_job
...
notification:
onfailure:
email:
recipients: ops#company.com
subject: '[Rundeck] Failure of ${job.name}'
notifyAvgDurationThreshold: null
...
sequence:
commands:
- description: Call the job that does the work
jobref:
args: -do_the_job yes -fail_a_lot no
group: jobs/common
name: do_the_work
If I remove the notification settings from the common job, the error in the log goes away. I'm not sure if sending notifications from an included job is not supported. It would be useful to me if it was, so I could place notification settings in a single location. However, I can understand why it presents a problem for the scheduler/executor.
Related
I have a Cloudformation stack that conditionally invokes a nested stack to create a RDS instance, only if an existing database URL is not passed in as a parameter.
If I pass a value to the DBExistingEndpoint parameter in the stack, the condition CreateDB is set to false, and it will not invoke the nested RDS stack at all.
The issue is that in the AutoScaling launch config resource, there is a conditional dependency. I need to reference either the URL output from the nested stack, or the URL passed in as a parameter to place in a file in the newly launched instance.
Parameters:
DBExistingEndpoint:
Type: String
Description: Set to a URL of a RDS instance to use an existing DB, otherwise create one
Default: ''
...
Conditions:
CreateDB:
!Equals [!Ref DBExistingEndpoint, '']
...
Resources:
# Database created only if existing URL not passed in
DB:
Type: AWS::CloudFormation::Stack
Condition: CreateDB
Properties:
TemplateURL: ...
...
ClusterInstanceLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Metadata:
AWS::CloudFormation::Init:
config:
files:
/etc/dbenv:
mode: "000640"
owner: root
group: root
content:
!Join
- "\n"
-
- !Sub ["DB_HOST=${DBEndpointAddress}", DBEndpointAddress: !If [CreateDB, !GetAtt DB.Outputs.RDSEndPointAddress, !Ref DBExistingEndpoint]]
...
The issue is that if I pass in an existing endpoint URL, the DB resource is skipped (correctly), but the stack creation fails with Template format error: Unresolved resource dependencies [DB] in the Resources block of the template
Ideally the DB.output.RDSEndpointAddress reference in the ClusterInstanceLauchConfig resource should be ignored because the CreateDB condition in the !If is false
Does anybody know how to code around this limitation?
You should try to set the conditional statement on a different level than it is now.
What will work for sure, is having the conditional statement on the level of the LaunchConfiguration itself, which would also mean quite a lot of duplication of the code. But maybe you could try to see the conditional on the level of content or files etc, to see if there's a middle ground somewhere, to keep duplication low, but avoid the error you're getting right now.
Say I want to run a task only when a specific tag is NOT in the list of tags supplied on the command line, even if other tags are specified. Of these, only the last one will work as I expect in all situations:
- hosts: all
tasks:
- debug:
msg: 'not TAG (won't work if other tags specified)'
tags: not TAG
- debug:
msg: 'always, but not if TAG specified (doesn't work; always runs)'
tags: always,not TAG
- debug:
msg: 'ALWAYS, but not if TAG in ansible_run_tags'
when: "'TAG' not in ansible_run_tags"
tags: always
Try it with different CLI options and you'll hopefully see why I find this a bit perplexing:
ansible-playbook tags-test.yml -l HOST
ansible-playbook tags-test.yml -l HOST -t TAG
ansible-playbook tags-test.yml -l HOST -t OTHERTAG
Questions: (a) is that expected behavior? and (b) is there a better way or some logic I'm missing?
I'm surprised I had to dig into the (undocumented, AFAICT) variable ansible_run_tags.
Amendment: It was suggested that I post my actual use case. I'm using ansible to drive system updates on Debian family systems. I'm trying to notify at the end if a reboot is required unless the tag reboot was supplied, in which case cause a reboot (and wait for system to come back up). Here is the relevant snippet:
- name: check and perhaps reboot
block:
- name: Check if a reboot is required
stat:
path: /var/run/reboot-required
get_md5: no
register: reboot
tags: always,reboot
- name: Alert if a reboot is required
fail:
msg: "NOTE: a reboot required to finish uppdates."
when:
- ('reboot' not in ansible_run_tags)
- reboot.stat.exists
tags: always
- name: Reboot the server
reboot:
msg: rebooting after Ansible applied system updates
when: reboot.stat.exists or ('force-reboot' in ansible_run_tags)
tags: never,reboot,force-reboot
I think my original question(s) still have merit, but I'm also willing to accept alternative methods of accomplishing this same functionality.
For completeness, and since only #paul-sweeney has offered any alternative solution, I'll answer my own question with my current best solution and let people pick / up-vote their favorite:
---
- name: run only if 'TAG' not specified
debug:
msg: 'ALWAYS, but not if TAG in ansible_run_tags'
when: "'TAG' not in ansible_run_tags"
tags: always
I know it's an old(ish) question, but I had a similar requirement.
It's probably something best implemented another way ... but ... sometimes it can be useful.
I'd achieve it by setting a fact if the tag IS specified, then outputting the message only if the fact is not set, something like:
---
- name: "test task runs only if tag missing"
hosts: all
tasks:
- name: "suppress message if tag given"
set_fact: suppress_message=yes
tags: reboot,never
- name: "message"
debug:
msg: "You didn't say 'reboot'"
when: suppress_message is not defined
I think that we have states for controlling (example: started, restarted, stopped), states for installing (present,absent) and components (webserver, db,...).
Ansible is lacking a good separation of those 3 dimensions and mixing those 3 dimensions in a single tag system is leading to confusion.
For example, if you have a 'webserver' and a 'DB' tag, you want to 'restart' the DB and not the webserver using a 'restart' tag.
But it won't work if the 'restart' tasks of the DB and the webserver are in the same tasks file with the same 'restart' tag as the 'restart' tag will start both the DB and the webserver...
So you will have probably to separate webserver and DB tasks in 2 separate files and use the tag at the level of the include.
Using tags means that you have a tree of options, not a matrix of options.
I like the tag concept but the fact that it is not possible to use it in conditional expressions is making it less appealing.
What I recommend is to declare tags in a role but map them into variables as a first task. So the 'restart' and 'db' tags would become boolean variables in my role and use when: instead of tags:
ansible-playbook has a skip-tags option. The example from the docs is
ansible-playbook example.yml --skip-tags "packages"
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_tags.html
I've got a task to collect over 500 events from DC with winlogbeat. But windows got a limit 22 events to query. I'm using version 6.1.2. I've tried with processors like this:
winlogbeat.event_logs:
- name: Security
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
but with these settings client doesn't work, nothing in logs. If I run it from exe file it just starts and stops with no error.
If I try to do like it was written in the official manual:
winlogbeat.event_logs:
- name: Security
event_id: ...
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
client just crashes with "invalid event log key processors found". Also I've tried to create new custom view and take event from there, but apparently it also has query limit to 22 events.
I'm using salt for my deployment issues and have the following question.
Is there any mechanism to retry a command?
For instance I have some thing like this:
platform_deps_git:
git.latest:
- name: ...
- rev: master
- target: ...
- user: ...
- identity: ...
But sometimes the network may fail. Is there any way to retry platform_deps_git instruction?
The next version of Salt (2014.7.0) will have an "onfail" requisite. This will allow you to take another action if something fails.
The docs are here:
http://docs.saltstack.com/en/latest/ref/states/requisites.html#onfail
What I do is grep through salt output whenever I run a highstate and if it sees any failures I rerun the highstate.
There's a first-class retry mechanism for states that was added in 2017:
platform_deps_git:
git.latest:
- name: ...
- rev: master
- target: ...
- user: ...
- identity: ...
- retry:
attempts: 5
until: True
interval: 60
splay: 10
The retry option supports a few different options for controlling its behavior.
I have read the cookbook regarding deploying my symfony2 app to production environment. I find that it works great in dev mode, but the prod mode first wouldn't allow signing in (said bad credentials though I signed in with those very credentials in dev mode), and later after an extra run of clearing and warming up the prod cache, I just get http500 from my prod route.
I had a look in the config files and wonder if this has anything to do with it:
config_dev.php:
imports:
- { resource: config.yml }
framework:
router: { resource: "%kernel.root_dir%/config/routing_dev.yml" }
profiler: { only_exceptions: false }
web_profiler:
toolbar: true
intercept_redirects: false
monolog:
handlers:
main:
type: stream
path: %kernel.logs_dir%/%kernel.environment%.log
level: debug
firephp:
type: firephp
level: info
assetic:
use_controller: true
config_prod:
imports:
- { resource: config.yml }
#doctrine:
# orm:
# metadata_cache_driver: apc
# result_cache_driver: apc
# query_cache_driver: apc
monolog:
handlers:
main:
type: fingers_crossed
action_level: error
handler: nested
nested:
type: stream
path: %kernel.logs_dir%/%kernel.environment%.log
level: debug
I also noticed that there is a routing_dev.php but no routing_prod, the prod encironment works great however on my localhost so... ?
In your production environment when you run the app/console cache:warmup command you need to make sure you run it like this: app/console cache:warmup --env=prod --no-debug Also, remember that the command will warmup the cache as the current user, so all files will be owned by the current user and not the web server user (eg: www-data). That is probably why you get a 500 server error. After you warmup the cache run this: chown -R www-data.www-data app/cache/prod (be sure to replace www-data with your web server user.
Make sure your parameters.ini file has any proper configs in place since its common for this file to not be checked in to whatever code repository you might be using. Or (and I've even done this) its possible to simply forget to put parameters from dev into the prod parmeters.ini file.
You'll also need to look in your app/logs/prod.log to see what happens when you attempt to login.